Professional Documents
Culture Documents
ble
ns fera
t r a
n-
no Exadata
PTR/INT Oracle
a
) has Machine
Database i d e ฺ Install and
o m
c nt G u
eaฺMaintenance
is- tud e
c
s r y@ this S
l m a se
l i e ฺe to u Student Guide - Volume II
( e
y cens e
s r li
El Ma
E l i e
D80881GC10
Edition 1.0
April 2013
D81624
Authors Copyright © 2013, Oracle and/or it affiliates. All rights reserved.
David Winter This document contains proprietary information and is protected by copyright and
other intellectual property laws. You may copy and print this document solely for your
own use in an Oracle training course. The document may not be modified or altered
Technical Contributors in any way. Except where your use constitutes "fair use" under copyright law, you
and Reviewers may not use, share, download, upload, copy, print, display, perform, reproduce,
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
publish, license, post, transmit, or distribute this document in whole or in part without
Leslie Keller the express authorization of Oracle.
Oliver Sharwood
The information contained in this document is subject to change without notice. If you
find any problems in the document, please report them in writing to: Oracle University,
500 Oracle Parkway, Redwood Shores, California 94065 USA. This document is not
Editors warranted to be error-free.
Rashmi Rajagopal
Restricted Rights Notice
Richard Wallis
If this documentation is delivered to the United States Government or anyone using
Graphic Designer
the documentation on behalf of the United States Government, the following notice is
b le
fera
applicable:
Rajiv Chandrabhanu
U.S. GOVERNMENT RIGHTS
a n s
t r
The U.S. Government’s rights to use, modify, reproduce, release, perform, display, or
n-
disclose these training materials are restricted by the terms of the applicable Oracle
Publishers
Pavithran Adka
no
license agreement and/or the applicable U.S. Government contract.
a
Nita Brozowski
Trademark Notice
) has ideฺ
Jayanthy Keshavamurthy
ฺ c om t Gu
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names
may be trademarks of their respective owners.
i s -ea uden
@ c S t
sry e thi s
a
lm us
ฺ e
( e lie se to
a sry licen
El M
E l i e
Contents
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
iii
Exadata Storage Expansion Full Rack 1-41
Exadata Storage Expansion Half Rack 1-42
Exadata Storage Expansion Quarter Rack 1-43
Exadata Storage Expansion Rack 1-44
Exadata Storage Expansion Rack Upgrades 1-45
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
iv
One InfiniBand Spine Switch 2-21
36-Port Managed QDR InfiniBand Switch Differences 2-22
Exadata X3-2 Database Servers 2-23
Exadata X2-2 Database Servers Based on Sun Fire X4170 M2 2-24
Exadata Memory Expansion Kit 2-27
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
i s -ea uden
QPI Port Mapping 8-socket (4 CMOD) 2-44
c
CPU Module: CMOD Layout 2-45
@ S t
sry e thi
Memory Population 2-47 s
a
lm us
Population Order 2-48
ฺ e
( e lie se to
NEM: Overview 2-49
a sry licen
NEM Components 2-50
Service Processor and Universal Connector Port 2-51
El M
E l i e Chassis Cooling Zones 2-52
Exadata X2-8 Database Server Power Information 2-54
Exadata Storage Server Comparison 2-55
New Exadata X3-2L Storage Servers 2-57
Exadata X2-2 Storage Servers Based on Sun Fire X4270 M2 2-59
Exadata V2 Legacy Storage Servers 2-61
Thermal Sensing and Fan Control 2-62
New Sun Fire Flash Accelerator F40 PCIe Card Aura 2 2-63
LSI Nytro WarpDrive2 / Aura2 2-64
WarpDrive Firmware 2-65
Sun Fire Flash Accelerator F20 PCIe Card 2-66
F20 PCIe Card Versions 1.0 and 1.1 2-68
IB Dual Port 4x QDR PCIe Low-Profile HCA M2 2-70
Sun IB Dual Port 4x QDR PCIe ExpressModule HCA M2 2-71
IB HCA Ports and LEDs 2-72
Keyboard, Video, and Monitor Hardware (Exadata V2 and X2-2 only) 2-73
v
Sun Rack II 1242 2-74
Exadata Database Machine PDUs 2-75
New CMAs 2-76
Exadata and Storage Expansion Racks Low Voltage Single Phase PDU 2-77
Exadata and Storage Expansion Racks Low Voltage Three Phase PDU 2-78
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
Exadata and Storage Expansion Racks High Voltage Single Phase PDU 2-79
Exadata and Storage Expansion Racks High Voltage Three Phase PDU 2-80
X2-8 Low Voltage Single Phase PDU 2-81
X2-8 Low Voltage Three Phase PDU 2-82
X2-8 High Voltage Single Phase PDU 2-83
X2-8 High Voltage Three Phase PDU 2-84
Exadata Database Machine Architecture 2-85
ble
Disk Abstraction 2-86
Cell Disk 2-87 n sfera
t r a
Grid Disk 2-88
no n-
a
Oracle Exadata Database Machine Architecture 2-90
Automatic Storage Management 2-92
) has ideฺ
ASM Scale-Out Data Distribution 2-93
ฺ c om t Gu
ASM Data Redistribution 2-94
i s -ea uden
c t
Protection from Hardware Failure 2-95
@ S
sry e thi
Protection from Brownout 2-96 s
a
lm us
Exercises and Discussion 2-97
ฺ e
( e lie se to
Task 1: Quiz 2-98
a sry licen
Exercise Summary 2-99
Exercise Solutions 2-100
E lM
E l i e Summary 2-101
Additional Resources 2-102
X2 Exadata Database Machine Environmental 2-103
X2 Storage Expansion Rack Environmental 2-104
vi
Network Overview 3-13
Information Requirements 3-14
InfiniBand Network Addresses 3-16
Ethernet Network Addresses 3-17
X2-2 KVM Connections 3-18
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
i s -ea uden
X2/3-8 Full Rack: Server to Leaf Switch 3-36
c t
Storage Expansion Full Rack: Server to Leaf Switch 3-37
@ S
sry e thi s
Storage Expansion Half Rack: Server to Leaf Switch 3-38
a
lm us
Storage Expansion Quarter Rack: Server to Leaf Switch 3-39
ฺ e
( e lie se to
Scaling Out to Multiple Full Racks 3-40
a sry licen
Two Rack Case: Fat Tree Topology 3-41
Multiple Rack Case: Up to 8 Racks 3-42
El M
E l i e Scaling Out to 9 to 36 Racks 3-43
Multiple Rack Case: 9 to 36 Racks 3-44
Interconnecting Quarter Racks 3-45
Case 1: Two Quarter Racks 3-46
Case 2: Quarter Rack with One Half or Full Rack 3-47
Case 3: Quarter Rack with Two or More Racks 3-48
InfiniBand Network: External Connectivity 3-49
System Access 3-50
Default Passwords and Usernames 3-51
Unpacking and Staging the Oracle Exadata Database Machine 3-52
Installation 3-56
Post-installation 3-58
Check the Software and Patches at Deployment Time 3-62
Database Platform 3-64
Configure Rack Master Serial Number 3-65
Configuring the Avocent KVM MergePoint Unity Switch (<=X2-2 only) 3-66
vii
KVM Configuration: Example 3-67
Configure Network Settings 3-68
Configure the InfiniBand Switches 3-70
InfiniBand Platform/Version 3-73
Sun Datacenter 36-Port Managed QDR InfiniBand Switch Settings 3-74
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
viii
Exercise: Verify System Components 4-43
Task 4-1: System Monitoring 4-44
Task 4-2: Navigating System Components 4-45
Task 4-3: Storage Cell Setup 4-47
Exercise Summary 4-57
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
Summary 4-58
a sry licen
Component Replacement Procedures 5-19
Storage Cell Disk Replacement 5-20
E lM
E l i e IB-HCA Card Installation 5-26
InfiniBand Switch Maintenance 5-31
Backing Up Settings on the IB Switch 5-32
Replacing an InfiniBand Switch 5-33
Restoring Settings on the IB Switch 5-35
Verifying the InfiniBand Network Operation 5-36
New LSI RAID Battery Maintenance Procedure 5-37
LSI HBA Batteries 5-38
Write-Back Versus Write-Through Mode 5-39
LSI HBA Batteries 5-40
Battery Monitoring via Learn Cycles 5-42
New Cable Management Arm 5-44
Exadata X2-8 Database Server 5-45
X2-8 DB Server Service Processor Cabling 5-46
X2-8 DB Server Subassembly Module Removal and Replacement 5-48
CPU Module Orientation 5-50
ix
How to Remove a CPU Module 5-51
CPU Module Components 5-53
Prepare the Server for Operation 5-54
PCIe EM Designations and Population Rules 5-56
IB HCA Ports and LEDs 5-57
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
x
7 Advanced Tasks
Objectives 7-2
Relevance of Advanced Tasks 7-3
Additional Resources for Advanced Tasks 7-4
Oracle Database Machine Advanced Tasks 7-5
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
i s -ea uden
Changing the InfiniBand Network Information 7-45
c t
Understanding the InfiniBand Network Master Subnet Manager 7-55
@ S
sry e thi s
Changing IP Addresses on an Exadata Storage Server 7-56
a
lm us
Nonemergency Power Cycle Procedure 7-60
ฺ e
( e lie se to
Emergency Power-Off Considerations 7-66
a sry licen
Installing and Configuring Auto Service Request: Solaris Server 7-68
Installing and Configuring Auto Service Request: Enterprise Linux Server 7-70
E lM
E l i e Registering ASR Manager 7-72
Configuring ASR Trap Destinations 7-75
Activating ASR Destinations 7-84
Validating Auto Service Request 7-89
ASR Support Process 7-90
Checking MOS Hardware Serial Numbers 7-92
New LSI RAID Battery Maintenance Procedure 7-93
New OneCommand Utility Oracle Exadata Deployment Assistant 7-94
Summary 7-96
xi
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
E l i e
lM E
a
( e
ฺ e
sry licen
a
lie se to
lm us
@ c i s
sry e thi s S
ฺ
t
c
)
-ea uden
om t Gu
a
has ideฺ
n- no
t r a n
s
fera
b
le
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
Objectives
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
EST coordinates with ACS, CSM All Parties Informed and Engaged ble
and CIMs.
ns fera
t r a
CIMs and GCA help with
Escalation Management non-
escalated accounts. a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
Detailed features
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
Features Platinum
15-minute fault notification
15-minute service request generation for validated faults
Remote Fault Monitoring
Total 30 minutes maximum from fault to service request
generation
30-minute remote response from service request generation,
Severity 1 Remote Response
24/7
Severity 2 Remote Response 2-hour remote response from service request generation, 24/7
b le
Severity 1 Onsite Hardware Response
(< 25 miles)
2 hours, 24/7 from completion of remote diagnosis fera
n s
Severity 2 Onsite Hardware Response
4 hours, 8/7 from completion of remote n -t r a
diagnosis
(< 25 miles)
n o
Senior Support Engineers Remote response team a
Escalation Hotline and Escalation Managers
Exadata ) has idmanagers
Escalation hotline and escalation
e ฺ – dedicated to
Patching
ฺ c ompatch
Oracle will remotely
t G u 4 times per year
systems
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
E lM
Eli e
Patching
managers (CIM)
a no
Oracle will remotely patch systems quarterly (ACS) Customer responsibility
ฺ c om t Gu
alert customers (ACS)
-ea uden
Oracle will remotely collect diagnostic information
Diagnostic Data Collection Customer responsibility
related to faults (ACS)
c i s t
Support Portal
@ S
My Oracle Support with advanced monitoring portal
s
My Oracle Support
a sry e thi
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e
Copyright
el nse t o
y (
a sr lice
E lM
E l i e
Monitoring Gateway
Monitoring Gateway
Manufacturing Repair
FRU Description
Part Number Category
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a r lice
sreplacement
E lM
Two types of units are available: field replaceable units (FRUs) and customer
Elie
replaceable units (CRUs). FRUs are installed by trained Oracle field technicians. CRUs are
installed by the customer.
For a complete list of FRUs and CRUs, see the product documentation.
• Server’s Service Manual: http://docs.sun.com/source/820-5830-11/index.html
• InfiniBand Switch Service Manual: http://docs.sun.com/source/835-0784-04/toc.html
• Cisco Switch:
http://www.cisco.com/en/US/docs/switches/lan/catalyst4900/4948E/installation/guide/49
48E_ins.html
• KVM and KVM Components:
http://pcs.mktg.avocent.com/@@content/manual/590883501c.pdf
Manufacturing Repair
FRU Description
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
E l M
E l i e
Manufacturing Repair
FRU Description
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
E lM
Eli e
• CRUs
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
– Air Baffle
– Battery
– DIMMs
– DVD Drive
– Storage Drive
– Fan Module ble
– PCIe Card ns fera
t r a
– Power Supply Unit
non-
– SAS Expander Mod a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
E lM
E l i e
• FRUs
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
Eli e
units in the Sun Server X3-2 and Sun Server X3-2L.
“Oracle Exadata Database Machine X3 Services Training Update”
Eli e
field personnel should execute, debug, or follow replacement procedures.
Identify the storage cell with the failed hard drive. Be sure to
note the number of the drive that has failed. Perform the
following steps to map an affected LUN to a physical disk:
1. From CellCLI, list the current LUNs and identify the
affected LUN:
CellCLI> list lun
0_0 0_0 normal
b le
<snip>
0_5 0_5 normal
ns fera
t r a
n-
0_6 0_6 critical
0_7 0_7 normal
a no
has ideฺ
0_8 0_8 normal
0_9 0_9 normal
)
0_10 0_10 normal
ฺ c om t Gu
-ea udeensure
0_11 0_11 normal
Note: When performing a disk replacement, s n that only new,
c i t
unused disks are used. ry @ S
a s e this
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e
Copyright
el nse t o
y (
When laM asrcompletely
drive lice fails (spins down, red-lights), an Exadata alert is generated with
e E instructions for replacement. If you have configured for alert notifications, you will be
specific
i
l
E alerted of this via email. You can also see this alert with the CellCLI list alerthistory
command. Steps for post disk replacement are fully automated.
Some things to keep in mind with respect to disk replacement:
If the failed disk had more than one grid disk on it, you might encounter bug 9237258, which
causes the celldisk to get created as interleaved. If this occurs, perform the following steps:
1. Ensure that the disk is dropped from the ASM disk group.
2. Drop the celldisk.
3. Re-create the celldisk.
4. Re-create the grid disks with proper order and size.
errorCount: 0
id: 0_6
isSystemLun: FALSE
lunAutoCreate: FALSE
lunSize: 558.4059999994934G
lunUID: 0_6
physicalDrives: 20:6
ble
fera
raidLevel: 0
status: critical
a ns
List all the affected physical disk in detail:
n- t r
CellCLI> list physicaldisk 20:6 detail
a no
name: 20:6
) h as eฺ
deviceId: 14
c o m Guid
enclosureDeviceId: 20
- e aฺ ent
errMediaCount: 0
@ cis Stud
errOtherCount: 0 ry
s t h is
errorCount: m
l 0 a se
ฺ e to u
( e lie sefalse
foreignState:
r
id: n
y 0932E01C22
s ice
a luns: l0_6
ElM
E l i e makeModel: "SEAGATE ST360057SSUN600G"
physicalFirmware: 0605
physicalInsertTime: 2009-10-10T09:18:30-06:00
physicalInterface: sas
physicalSerial: 0932E01C22
physicalSize: 558.9109999993816G
slotNumber: 6
status: critical
Physical Disk nomenclature: 20:6
• 20 stands for HBA Enclosure ID
• 6 stands for slot
3. When the drive tray opens, remove the drive from the
storage cell by pulling gently on the drive tray handle.
4. Insert the new drive tray in the empty drive slot in the
storage cell by sliding in the drive tray and pushing in the
drive tray handle until it locks.
5. Verify that the new drive is detected. ble
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a srthe automatic
l i ce firmware update may not work and the LUN will not be
M
In rarelcases,
E
Eli e
automatically rebuilt.
This can be confirmed by checking the ms-odl.trc file.
If the disk was a system disk (a disk with a copy of the operating system on it), consider the
following:
The md sync status can be obtained with mdadm.
An example command is: mdadm --detail --scan /dev/md7 /dev/md5
In rare cases, the grub boot loader may not get properly installed on the newly replaced
system disk. If this occurs, it can be manually installed by doing the following:
Note: This may have to be done when booted off the cell boot USB depending on the severity
of the problem.
Go to the cell that you suspect has the flash drive write performance problem.
Drop everything on the flash using:
cellcli -e drop flashcache all
If you have griddisks on flash, you need to drop them as well:
cellcli -e drop griddisk all flashdisk
cellcli -e drop celldisk all flashdisk
ble
fera
Then run the following command:
lsscsi |grep MARV | awk '{print $NF}' | awk '{printf "time dd
a ns
if=/dev/zero of=%s bs=1048576 count=20\n", $1}' | sh –x n- t r
a no
This writes 20 1M blocks to the flash devices. Based on the elapsed time for each dd
has ideฺ
command, you can tell which flash card is bad. If you are physically near the machine, you
)
ฺ c om t Gu
should also be able to see the amber light on the flash card.
i s -ea uden
Assume that /dev/sdy is bad. How should you identify the actual PCI slot that contains the
bad flash card?
@ c S t
cellcli -e list lun detail sry e thi s
a
lm us
<snip>
ฺ e
name: 5_3 ( e lie se to
y cen
srFD_15_sgsas1
cellDisk:
l M a li
i e E
deviceName: /dev/sdy <= device you are interested in
l
E diskType: FlashDisk?
id: 5_3
isSystemLun: FALSE
lunAutoCreate: FALSE
4. Install the IB-HCA card into the slot, pushing the card’s
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
connected port. If the green LED is not on, check the cable
connections at the adapter and at the switch.
5. Check that the amber LED is illuminated for each
connected port.
6. Verify that the IB-HCA ports are functional and the driver is
attached: # dmesg | grep mlx4. le
f e r ab
The output shows system diagnostic messages that have n s
the string mlx4 in the message (the name -t r a
o n
of the Linux driver). Included in the output a n
is a message that indicates whether)the has ideฺ
port is up or down. ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
ElM
Elie
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a r lice
scomponents
E lM
Replaceable of an InfiniBand switch:
Elie1. Battery
2. Fan
3. Power Supply
To remove and replace an entire InfiniBand switch:
1. Identify the failed switch.
2. Disconnect the power and network cables from the failed switch.
3. Remove the failed switch from the rack.
4. Install the replacement switch in the rack.
5. Connect the power to the replacement switch.
6. Check the software on the replacement switch and confirm that it is the correct version.
7. Configure the replacement switch, restore from backup if possible.
8. Connect the network cables to the replacement switch.
9. Verify the status of the replacement switch.
notes below
describes how to
back up and restore a
switch with 1.1.3-2 or
higher firmware. The
backup needs to be
ble
done only once after
the switch is initially n sfera
t r a
configured with the n- no
right settings. a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr restore
l i cea switch with 1.1.3-2 or higher firmware:
To back
E lMup and
e
Eli 1. Navigate to the switch ILOM URL or IP address in a browser (for example,
http://10.7.4.227).
2. Log in as the ilom-admin user. The password is welcome1
Note: The default password is ilom-admin.
3. Click the Maintenance tab.
4. Click the Backup/Restore tab.
5. Select the Backup operation and Browser method.
6. Enter a pass phrase. This is used to encrypt sensitive information, such as user
passwords, in the backup.
7. Click Run and save the resulting XML file in a secure location.
8. Log in to the Sun Datacenter InfiniBand Switch 36 switch as the root user.
9. Use the scp command to copy the /etc/opensm/opensm.conf and
/conf/partitions.current files and save them with the backup XML file. This is
necessary because the backup does not save the Subnet Manager or IB partition
configuration respectively.
10. Save the output from the version command.
ble
ns fera
t r a
no n-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm us
ฺ e
( e lie se to
a sry licen
El M
E l i e
firmware:
1. Run the version command and ensure that the switch is
at the right firmware level. If not, upgrade the switch to the
correct firmware level.
2. Navigate to the switch ILOM URL or IP in a browser, as in
this example: http://10.7.4.227.
r a ble
3. Log in as the ilom-admin user, password welcome1sfe
n
Note: The default password is ilom-admin. n-tra
n o
4. Click the Maintenance tab. a
) h as eฺ
5. Click the Backup/Restore tab.
c o m Guid
ฺ
6. Select the Restore operation ent
-eaand Browser method.
@ cis Stud
a sry e this
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i
el nsee
Copyright
t o
y (
a sr and l i cethen select the XML file that contains the switch configuration backup.
E lM
7. Click Browse
El e
i 8. Enter the pass phrase that was used during the backup.
9. Click Run to restore the configuration.
10. Log in to the InfiniBand Switch switch as the root user.
11. Use the scp command to copy the /etc/opensm/opensm.conf and /conf files to
the switch. These were the files that were created during the backup.
12. Restart openSM from the switch CLI by using the following commands:
- disablesm
- enablesm
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
ble
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
An LSIlM asr6GbpsliHost
SAS2
ce bus adapter (HBA) is used in all Exadata Database Machine
e E based on Sun-Oracle hardware to control and interface the disk drives and contains
servers
i
l
E 512 MB of Low Voltage DDR2 memory it uses to cache data writes in order to improve
performance of disk write operations. It also contains a Battery Backup Unit (BBU), which is
designed to supply regulated battery power to the cache memory long enough for the main
system power to be brought back up on line, when there is a main system power outage. For
Exadata, the specified hold-up time is 48 hours, which requires a usable charge capacity of
674 mAh for low voltage DDR2 memory.
The BBU is a single cell Li-ion battery pack and like all Li-ion rechargeable batteries charge is
supplied via a chemical reaction and the battery packs ability to hold charge will degrade over
time. The BBU (also referred to as iBBU or Intelligent BBU) contains a small integrated circuit
board with a “smart” gas gauge, accessible through an I2C bus, which permits the RAID On a
Chip controller to monitor the actual battery capacity to ensure that caching is not permitted if
the capacity falls below the minimum necessary threshold. The BBU board also contains the
charge circuitry. It is designed to be removable and replaceable as a Field Replaceable Unit
(FRU) with a single mating connector that interfaces the BBU board to the HBA, and three
screws mounted under the HBA that physically retain it to the HBA.
When the BBU is present and operating normally, the virtual disks are
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
Elie
compatibility.
* BBU08 use on the older B2 HBA requires a firmware update to at least FW Package Build
12.9.0-0037 or later. This is shipped as part of Exadata image 11.2.2.1.1 or later. Older
installed systems may require update prior to replacing. The connector and screw mounts are
identical so that they are physically compatible.
Learn cycles are performed periodically to fully discharge the battery and
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
recharge it. When discharge is complete, the BBU determines the new
capacity of charge the battery can hold. Failure to perform learn cycles at the
recommended intervals may reduce the usable life of the battery by reducing
the full charge capacity more rapidly leading to premature end of service life.
This is reported by the Full Charge Capacity field in MegaCli BBU output and
will be updated after a learn cycle. Refer to the next section for an example.
When a learn cycle is initiated, the charging circuit automatically places any
virtual drives that are in WB mode into WT mode for the duration of the cycle, ble
which will temporarily reduce write performance. When the learn cycle
s f era
completes, the virtual drives are automatically transitioned back to tWB r a nmode
on- Learn
if the battery is still capable of holding the required charge amount.
n
cycle time will vary based on the BBU type. For BBU07,sthe a complete learn
cycle process and the cache in WT mode are expected ) a
h toidbee6ฺ to 8 hours. For
BBU08, the complete learn cycle process and c m cache
othe G uin WT mode are
ฺ t
expected to be 2 to 3 hours. -ea en
@ cis Stud
a sry e this
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i
el nsee
Copyright
t o
y (
a sa rnew BBU
l i ceis installed into a system, it will have a depleted charge state. Any
lM
Note, when
E
e
virtual
i drives
ElUsually,
attached will be forced into WT cache mode while a full learn cycle is performed.
a sufficient charge to maintain the cache is reached after this cycle is complete. This
may take 24 hours or longer. Status will show the following:
# /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -a0 | grep “Charging”
Charging Status: Charging
Learn cycles on Exadata are default configured as follows:
Storage Cells with image 11.2.1.2.x the learn cycle occurs monthly, based on when the
system was first powered on.
Storage Cells with image 11.2.1.3.1 or later, the learn cycle is manually scheduled quarterly to
start at 2 AM January 17, April 17, July 17, and October 17. The time is chosen to minimize
impact on day time operations.
Database nodes are set for automatic scheduled, which occurs every 30 days from first power
on. This may lead to variability in the time of day based on when the node was powered on.
Storage cells will put the following in the Cell alerthistory which may generate a service call:
4 2011-04-17T05:01:06-04:00 info "BBU on disk contoller at adapter 0
is going into a learn cycle. All Logical Volumes on harddisks will
go into WriteThrough caching mode. Write Throughput will be lower."
5_1 2011-04-17T09:46:07-04:00 critical "All Logical drives are in
WriteThrough caching mode. Either battery is in a learn cycle or it
needs to be replaced. Please contact Oracle Support"
b le
fera
5_2 2011-04-17T12:09:28-04:00 clear "Battery is back to a good
state"
a n s
If the last message indicating that it is back in a good state does not occur, n r
t this requires
-then
investigation as to why the battery is not good after the learn cycle. n o
Database nodes currently do not log learn cycles except in h a
the s a events
HBA ฺ log.
) i d e
Additional learn cycles may start occurring more frequently
ฺ c om tthan G u30 days if the full charge
capacity gets close to the replacement thresholds
- e a and the
e nremaining capacity goes low, which
i s d
tu capacity. This has been seen to occur
c full charge
will initiate a new learn cycle to relearn the
@ s S
sry e thi
as frequently as daily on a failing BBU.
a
lm umode,
To check if a battery is in learn-cycle
ฺ e s do the following:
e lie se t
# /opt/MegaRAID/MegaCli/MegaCli64 o -AdpBbuCmd -a0 | grep Learn
(
Learn Cycle
a sryRequested
l i cen : No
LearnE lMCycle Active : No
i e
ElLearn Cycle Status : OK
Learn Cycle Timeout : No
replacement procedures.
• Available in the eKit
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
ble
Video
1 ns fera
console r a
n- t
n o
Serial a
s 2
) ha eฺ
console
m Guid
c o
- e aฺ eUSB n t (2-
@ cis Stud connectors) 3
a sry e this
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e
Copyright
el nse t o
y (
FigurelM asr lice
Legend
i e E
E l 1. Net management ports 0 and 1
2. Serial management
3. Fault LED
4. Power/OK LED
5. Temperature LED
6. Multiport cable connector
7. Locate button/LED
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr thelicSubassembly
e
How to
E lM
Remove Module
ble
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sDor not attempt
l i ce this procedure on the lab equipment. Refer to video captures for
lM
WARNING!
E
Eli e
details on the Subassembly replacement procedure.
b le
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lM (CMOD) Designations
CPU Module
E
e
i the front of the server chassis and within the ILOM interfaces (web and command line),
ElOn
the CMODs are designated as BL 0–BL 3.
Because of the length and weight of the CMOD, more than one
person should perform the removal of the CMOD at this point.
Caution: Potential overheat condition. Unoccupied module
slots disrupt air flow and temperature control within the server.
Replace the module with a filler module or another CMOD.
b le
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a srthe CMOD,
l i ce simultaneously rotate both levers outward away from the center of
lM
To disengage
E
the
Eli e module. Do not attempt to remove the CMOD now.
Rotating the levers outward causes the pawls on the end of the levers to engage the sidewall
of the chassis and pull the CMOD out of its internal connector.
Use the handles to pull the CMOD partially out of its slot.
Pull the CMOD out so that approximately six inches extends from the front of the chassis.
Rotate the levers inward until they are closed and locked.
To remove the CMOD, have an assistant support the CMOD as you grab it with your hands
and slowly pull it out of the slot.
Install a CMOD filler in the slot.
ble
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr of lDIMMS,
i ce REM, FEM, system battery, heat sinks, and CPU sockets.
E lM
Note the location
Eli e
CMOD, simultaneously
rotate and push both levers
inward toward the center
of the module until the
locks on the handles click
into place.
le
This action pushes the module
f e rab
into the chassis and engages the connector on the back of the a n s
module
n r
-t are
with the connector on the interior midplane. When the handleso
a
locked, you cannot lift the levers without first releasing
nthe locks on
s
the handles. ) ha ideฺ
m u
ฺ
Caution: Pinch point. Keep your fingers
a co nclear
t G of the back of the
e
- udofe the module.
cis edges
lever, the lever hinges, and the t
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
r y
LocatelM
s licethat you need to populate.
theamodule slot
i e Ethe CMOD levers to the fully open position by squeezing together the green locking
E l
Open
tabs on the lever handles and rotating both handles outward, away from the center of the
module.
The levers do not extend beyond 90 degrees.
Orient the CMOD so that the cover faces upward.
Carefully slide the module into the chassis until it stops.
Do not force the module into the chassis in an attempt to engage the connectors on the
chassis midplane.
Ensure that the pawl on the end of each lever is aligned with the rectangular slot in the
chassis sidewall.
To latch and lock the CMOD, simultaneously rotate and push both levers inward toward the
center of the module until the locks on the handles click into place.
Use a vacuum to remove dust and debris from the server vents and chassis.
How to Verify CPLD Versions
All CMODs must have identical CPLD levels. After installing a CMOD, you must verify the
CPLD levels for all CMODs in the chassis.
Before You Begin
All CMODs in the chassis must be installed, and the chassis must be in standby power mode.
The green LED on all CMODs must be steady ON. ble
Log on to the ILOM.
ns fera
t r a
Enter the following command for each node in the chassis:
no n-
show /SYS/BLn/CPLD a
In this command, n is the node number.
) has ideฺ
Verify that all nodes return the same value. ฺ c om t Gu
i s -eaOracle
If all nodes do not return the same value, contact d e n
service.
@ c S tu
sry e thi s
a
lm us
ฺ e
( e lie se to
a sry licen
E lM
Eli e
b le
ns fera
t r a
n-no
The PCIe EM slots are designated starting
a
has ideฺ
from the bottom as EM 0.0–EM 3.1.
Note:oFor
)
m proper uairflow and cooling, slots
ฺ c
a containing t G
n a PCIe EM must be populated
i s -enot d e
@ c with
S tu panel.
a filler
a sry e this
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i
el nsee
Copyright
t o
y (
a sr are paired
l i ce and allocated to a single CMOD. The slots-to-CMOD pairing is as
PCIe EM
E lMslots
e
follows:
Eli • Slots EM 0.0 and 0.1 are paired to CMOD 0 (BL 0).
• Slots EM 1.0 and 1.1 are paired to CMOD 1 (BL 1).
• Slots EM 2.0 and 2.1 are paired to CMOD 2 (BL 2).
• Slots EM 3.0 and 3.1 are paired to CMOD 3 (BL 3).
1. InfiniBand Port 1
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
release handle and rotate the handle to the left to its fully
open position (1).
• To remove the PCIe EM, use the handle to pull the PCIe
EM from its slot (2).
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
sr liceserver.
theaback of the
AccesslM
i e E cables from the PCIe EM.
E l
Disconnect
To unlock the PCIe EM, pull out on the underside of the release handle and rotate the handle
to the left to its fully open position (1).
To remove the PCIe EM, use the handle to pull the PCIe EM from its slot (2).
ble
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a srX4800liserver
ce has eight PCIe EM slots. The PCIe EMs have a lever mechanism
The Sun
E lMFire
Eli e
that is used for removal and installation. The lever is held in place by a release latch.
Note: The first digit of the designation refers to the CMOD. The
second digit refers to the port.
b le
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a r lice
sExpress
E lM
The Network modules (NEMs) are designated as NEM 0 and NEM 1. NEM 0 is on
Elie
the left, and NEM 1 is on the right:
ble
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a r lice
sExpress
E lM
The Network modules (NEMs) provide server network connectivity options. In
ie have
addition
ElNEMs
to the four 10-Gigabit Ethernet ports and the four 10/100/1000Base-T ports, the
an indicator panel.
Note: Network Express modules are designated as customer replaceable units (CRU).
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
2. Connect a serial terminal, terminal server, or workstation with a TIP connection to the
USB-to-serial adapter. Configure the terminal or terminal emulator with these settings:
115200 baud
8 bits
No parity
1 Stop bit
b le
fera
No handshaking
a n s
n
3. Press the Enter key on the serial device several times to synchronize
r
t connection.
-the
n o
You might see text similar to the following: a
...
) has ideฺ
CentOS release 5.2 (Final)
ฺ c om t Gu
Kernel 2.6.27.13-nm2 on an-e a en
s tud
i686
c i
@
y this S
nm2name login: sr
a
lmhost name seof the management controller.
e
where nm2name isฺthe u
( e lie se to
sry licenas the login name and ilom-admin as the password.
4. Enterailom-admin
E lM nm2name login: ilom-admin
El i e Password: password
->
Note: As shipped, the ilom-admin user password is welcome1. If this does not work,
try ilom-admin for the password.
The firmware is downloaded. The upgrade begins. A warning is displayed and you
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
Note: Firmware upgrade will upgrade the SUN DCS 36p firmware. ILOM will enter
a special mode to load new firmware. No other tasks should be performed in ILOM
until the firmware upgrade is complete.
Are you sure you want to load the specified file (y/n)? y
ble
6. Answer y to the prompt to commit to the upgrade.
ns fera
t r a
no n-
The upgrade begins.
a
) has ideฺ
Setting up environment for firmware
ฺ c om upgrade.
t G u This will take
-ea uden
approximately 2 minutes.
c s
Starting SUN DCS 36p FW iupdate
t
@ s S
a sry e thi
ฺ e lm us I4 A
==========================
( e lie se to
Performing operation:
y cen
a I4r fw lupgrade
==========================
s i
ElM from 7.2.0(INI:2) to 7.2.300(INI:2):
E l i e Upgrade started...
Upgrade completed.
INFO: I4 fw upgrade from 7.2.0(INI:2) to 7.2.300(INI:2) succeeded
=========================================
Performing operation: SUN DCS 36p firmware update
=========================================
SUN DCS 36p: SUN DCS 36p is already at the given version.
Customers that need SSH server capability on their Cisco 4948 switch can obtain the updated
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
Cisco IOS version described in this procedure by opening a service request with Oracle
Support and requesting the update. Please mention this note number [1415044.1] in the
service request to streamline the process.
Assumptions and Prerequisites
Cisco 4948 switch included in the Oracle’s Engineered System environment has been
configured to communicate over management network.
Telnet access and enable password are available.
ble
fera
A system with a telnet client is available and is able to connect to the Cisco 4948 switch.
At least 20MB free flash storage space available on Cisco 4948 switch bootflash. The
a ns
command to confirm available space is described below. n- t r
a no
A tftp server is available on the network and can be reached by the Cisco 4948 switch.
s ฺ
How to verify free space available on Cisco 4948 flash) ha
o m u ide
a ฺc After
Log in to Cisco 4948 via telnet with superuser privileges.
n t Glogging in, issue the “show file
e
s- tude
systems” command to display the available space.
i
c
cisco4948-ip#show file systems
s r y@ this S
File Systems:
l m a se
l i e ฺe to u
r y (e Type n seFlags Prefixes
Size(b) Free(b)
as 45204152 e
lic flash rw bootflash:
l M
* 60817408
E
-ie- opaque rw system:
l
E
- - opaque rw tmpsys:
- - opaque ro crashinfo:
524280 523664 flash rw cat4000_flash:
…
The above sample output shows approximately 45MB free space in bootflash. As only 20MB
is required, this switch passes the prerequisite check for space available.
53.SG2.bin
60817408 bytes total (45204152 bytes free)
Next, save the current configuration, write to nvram and also save it in boot flash with a
unique name.
Transfer the new Cisco IOS SSH-capable firmware to switch’s boot flash
Copy the new firmware file into Cisco 4948 flash filesystem and verify its integrity in boot
flash. In this example, your tftp server is named “tftp-server” and you have staged the updated
IOS firmware on the tftp server at cisco4948/cat4500-ipbasek9-mz.122-53.SG1.bin.
o m
e a ฺc nt G
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[OK - 16170184 bytes]
c is- tude
s r y@ this S
16170184 bytes copied in a
ฺ e l u se secs (5355 bytes/sec)
m 3019.672
cisco4948-ip#
( e lie se to
a s r
cisco4948-ip#dir
l i c en
y bootflash:
lM of bootflash:/
Directory
E
Elie
1 -rwx 15613000 Nov 4 2010 05:42:31 -04:00 cat4500-ipbase-mz.122-53.SG2.bin
2 -rwx 16170184 Jan 13 2012 15:37:15 -05:00 cat4500-ipbasek9-mz.122-
53.SG1.bin
cisco4948-ip#verify bootflash:cat4500-ipbasek9-mz.122-53.SG1.bin
File system hash verification successful.
any breaks, sets baudrate to 9600 and boots into ROM if the main boot process fails for some
reason.
cisco4948-ip#configure terminal
Enter configuration commands, one per line. End with CNTL/Z.
cisco4948-ip(config)#config-register 0x2102
cisco4948-ip(config)#no boot system
ble
fera
cisco4948-ip(config)#boot system bootflash:cat4500-ipbasek9-mz.122-
53.SG1.bin
a n s
cisco4948-ip(config)#
n- t r
cisco4948-ip(config)# (type <control-z> here to end)
a no
cisco4948-ip#show run | include boot
) has ideฺ
boot-start-marker
ฺ c om t Gu
i s -ea uden
boot system bootflash:cat4500-ipbasek9-mz.122-53.SG1.bin
boot-end-marker
@ c S t
sry e thi s
cisco4948-ip#
a
lm us
ฺ e to nvram
( e lie sinto
Save the configuration e
sry copy n
cerunning-config
a
cisco4948-ip#
M l i startup-config all
E l
cisco4948-ip#write memory
l i e
E Building configuration...
Compressed configuration from 6725 bytes to 2261 bytes[OK]
cisco4948-ip# reload
You will be asked to confirm if you want to continue and reboot the Cisco switch.
configurations.
cisco4948-ip#conf terminal
Enter configuration commands, one per line. End with CNTL/Z.
cisco4948-ip(config)#crypto key generate rsa
% You already have RSA keys defined named cisco4948-ip.us.oracle.com.
% Do you really want to replace them? [yes/no]: yes
r a ble
s
Choose the size of the key modulus in the range of 360 to 2048 for your
n fe
General Purpose Keys. Choosing a key modulus greater than 512 trmay
- a take a few
minutes.
no n
a
How many bits in the modulus [512]: 768 ) has ideฺ
% Generating 768 bit RSA keys, keys will ฺc
m Gu
beo non-exportable...[OK]
- e a ent
@ cis Stud
cisco4948-ip(config)#
a s ry this
cisco4948-ip(config)#username
ฺ e se password 0 welcome1
lm uadmin
lie se tvty
cisco4948-ip(config)#line
e o 04
(
ry licen
a s
cisco4948-ip(config-line)#transport input all
E lM
cisco4948-ip(config-line)# exit
ie
Elcisco4948-ip(config)#aaa new-model
cisco4948-ip(config)#
cisco4948-ip(config)#ip ssh time-out 60
cisco4948-ip(config)#ip ssh authentication-retries 3
cisco4948-ip(config)#ip ssh version 2
cisco4948-ip(config)# (type <control-z> here to end)
Verify that the SSH configuration is working and configured properly using the “show ip ssh”
command:
cisco4948-ip#show ip ssh
SSH Enabled - version 2.0
Authentication timeout: 60 secs; Authentication retries: 3
cisco4948-ip#
access via SSH and telnet simultaneously. To disable telnet access, connect to the switch
using SSH (since telnet will be disabled as part of this procedure) and enter these commands:
a sry licen
E lM
cisco4948-ip#copy running-config startup-config all
i e
Elcisco4948-ip#copy running-config bootflash:cisco4948-ip-confg-with-
ssh
cisco4948-ip#write memory
Building configuration...
Compressed configuration from 6725 bytes to 2261 bytes[OK]
The configuration is complete. The bootflash on the 4948 is large enough to hold both the
original IOS version and the updated SSH-capable IOS version, so no cleanup is required.
solutions.
• Task 1: Loading Switch Software and Configuring the
Switch
– Review the solutions inline.
• Task 2: Loading the Cisco Switch Firmware
– Review the solutions inline. ble
• Task 3: Removing and Replacing X2-8 DB Server ns fera
Components n - tra
o n
– Review the solutions inline. a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
E lM
E l i e
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
E l i e
lM E
a
( e
ฺ e
sry licen
a
lie se to
lm us
@ c i s
sry e thi s S
ฺ
t
c
)
-ea uden
om t Gu
a
has ideฺ
n- no
t r a n
s
fera
b
le
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
Troubleshooting
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
Objectives
CPU System cannot operate. System halts, and the CPU BIST
le
b
fera
or system will not start.
a n s
Memory Correctable ECC is ignored. t
Memory controller interrupts OS.
n- r
Uncorrectable ECC error
halts on interrupt. a no
) h as eฺ
Disk Drive cannot be read or
o m G
Performance id because
degrades
u
Controller written on. a ฺ
of cSATA t
retries.
n This is a stand-
e
is- alone e
ddiagnostic.
c t u
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
s r y c e
a l i
Machine
E lMStatus and Verification
e
i following methods can be used to verify the status of the Oracle Exadata Database
ElThe
Machine.
Machine Visual Inspection
Many issues can be resolved simply through a visual inspection. Check all external network
cabling, power, LEDs, and devices by using the following checklist:
• Check the InfiniBand network, switches, cables, and HCAs.
• All active Ethernet connections should be blinking green, indicating a successfully
negotiated gigabit connection.
• Check the power connections on switches, nodes, and KVM.
• Check the network connections on switches and nodes.
• Conduct a visual verification of all LEDs.
Power There are various failures IPMI sensors, BIOS queries IPMI
Supply (see voltage section). sensors (or handheld voltmeter). ble
ns fera
Fan Cooling lessens or fails, and IPMI measures fan speeds,
r a
temperature rises. n- t
BIOS queries IPMI sensors (or
handheld voltmeter).
a no
Voltage There is loss of resources IPMI measures ) hasvoltage,
i d e ฺ BIOS
including drives, fans, and queries
ฺ c omIPMI, t G u might not
resource
CPU. Resource halts.
i s -eawork. d e n
@ c S tu
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a srBIOSlpower
i ce savings mode if a sudden drop in performance occurs. If the BIOS
lM
Double-check
E
has
Eli e changed from Power Savings mode, it should be reset.
Certain abnormal situations, such as excessive process termination by a large number of
shutdown aborts, can expose cellsrv to memory leaks. In a worst case situation, with a
large amount of memory leaked, the cell can hang, causing other problems. Cellsrv
memory can be monitored in a variety of ways, such as /proc file system, the ps command,
or the pmap command.
If a leak is detected, a service request should be filed and cellsrv restarted with the
cellcli command, alter cell restart services cellsrv. If the leak is so bad
that it completely hangs the system, the cell can be reset by contacting the BMC with
ipmitool, for example, ipmitool -H <ILOM address> -U root -P <password>
chassis power cycle.
Native commands
• vmstat: Overall system utilization; also useful to
aggregate I/O (bi/bo columns) on cells
• iostat: I/O stats on cells; -x for extended statistics
ble
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a r lice
srare
E lM
Note that, in cases, the APIC timer might be disabled, causing iostat to report reduced
ie this even
throughput,
Elconfirm
though the cell is actually generating throughput at expected rates. To
problem, compare the vmstat bi and bo (read and write throughput aggregates)
to iostat, because vmstat is not affected by this problem.
Also, dmesg output can be investigated by looking for data after the “Using local APIC timer
interrupts” line. Compare the detected MHz with that from a cell where iostat is working
properly and you will find that the former MHz is about half. A reboot will usually fix this
problem. To work around it permanently, reboots can be performed without the APIC timer.
ble
: User : MD5 PASSWORD
: Operator : MD5 PASSWORD
fe r a
: Admin : MD5 PASSWORD n s
: OEM : MD5 PASSWORD
n - tra
IP Address Source : Static Address
a no
IP Address : 10.7.7.215
) h as eฺ
Subnet Mask : 255.255.248.0 m Guid
MAC Address : 00:e0:81:34:3b:1daฺco t
- e e n
... is ud c S t
@ s
a sry e thi
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e
Copyright
el nse t o
y (
a sr lice
El M
E l i e
ble
Manufacturer ID : 42
Manufacturer Name : Sun Microsystems
fe r a
Product ID : 18177 (0x4701) n s
Device Available : yes
n - tra
Provides Device SDRs : no
a no
Additional Device Support :
) h as eฺ
Sensor Device
c o m Guid
SDR Repository Device
- e aฺ ent
… is ud c S t
@ s
a sry e thi
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e
Copyright
el nse t o
y (
a sr lice
El M
E l i e
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
ble
-> show /SYS/MB/BIOS fru_version
/SYS/MB/BIOS
fe r a
Properties:
a n s
fru_version = 07030004 n-tr no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
ILOM Version:
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
ble
SP firmware build number: 52264
SP firmware date: Fri Jan 29 21:14:38 CST 2010
e r a
SP filesystem version: 0.1.22 nsf tra
n -
no
a
a s
h ideฺ
)
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
/SYS
b le
fera
Properties:
product_serial_number = 0937XF5003
a ns
n- t r
a no
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
Determine
E lM the Version of System Components
e
ElIfi you have a machine that is up and running, you can obtain its serial number, universally
unique identifier (UUID), and other relevant information by using the dmidecode command.
The command returns a long list of information, including items, such as serial number, UUID,
and product name, which are required when submitting a Service Request.
This information applies to the database nodes and storage cells.
If the server is booted and you can log in to the system, the
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
a no
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lM the Version of System Components
Determine
E
i e
ElAnother option is generating a file with the data generated by the dmidecode command and
uploading the file to the service request.
# dmidecode > `hostname`_dmidecode.txt
4. Locate WarpDrive
5. Format WarpDrive
6. Display/Set Power Value
7. Flash WarpDrive Firmware
8. Flash WarpDrive BIOS
9. Update SSD Firmware
10. Dump Raw Data
11. Show Vital Product Data
12. Reset Adapter
13. Reset Target
14. Set SSD State
15. Extract/Erase/Query Panic Logs
16. Extract SMART Logs
****************************************************************************
LSI Corporation WarpDrive Management Utility
Version 01.250.41.04 (2012.06.04)
Copyright (c) 2011 LSI Corporation. All Rights Reserved.
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
****************************************************************************
-----------------------------------------------------------
VPD Information
-----------------------------------------------------------
Product Name : Sun Flash Accelerator F40 PCIe 2.0 Low Profile Adapter
PN : 7026993
EC : L3-25487-02B
ble
SN : 464168P+1223000600
ns fera
VA : Flash HBA
t r a
VB : 0000
no n-
a
has ideฺ
V1 : LSI Corporation
)
om t Gu
V2 : 1000
ฺ c
-ea uden
V3 : 007E
V4 : 108E
c i s t
@ s S
V5 : 0581
: 17.6W lma
sry e thi
V6
i e ฺ e to us
V7 l se
e0.1W
: 5.8W
(
V8
a sr licen
y :
MN
El M : 10080
E l
RV i e : 0x8e
V1 : SP22232067
V3 : 01
V4 : A3
V6 : V6
V7 : P
-----------------------------------------------------------
IB Switch Version:
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
/SYS/SERVICE
Targets:
Properties:
type = Indicator ble
ipmi_name = SERVICE
ns fera
value = Off
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
Hardware
E lMDiagnostic Best Practices
i e
ElCheck all the systems to see if any of the Service LEDs are lit. The following command can
be run on the ILOMs of the individual systems or the dcli-based command may be run from
the DB01 system. The results should all show the LEDs to be off.
# dcli -l root -g half "ipmitool sunoem cli 'show /SYS/SERVICE' |
grep value”
192.168.1.1: value = Off
192.168.1.2: value = Off
192.168.1.3: value = Off
192.168.1.4: value = Off
192.168.1.5: value = Off
192.168.1.6: value = Off
192.168.1.7: value = Off
192.168.1.8: value = Off
192.168.1.9: value = Off
…
Compute Nodes:
# dcli -g dbs_group -l root
"/opt/oracle.cellos/compmon/exadata_mon_hw_asr.pl -
get_snmp_subscribers -type asr”
Storage Nodes:
# dcli -g cell_group -l celladmin "cellcli -e list cell rab
le
e
attributes snmpsubscriber" nsf tra
n -
no a
a s
h ideฺ
)
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a srcannotliccheck
e
On ILOM,
E lM you if ASR is enabled. You can check only if it has not been enabled.
ie verify
To
Elverify
that the SNMP services are being monitored (not necessarily by ASR), you need to
if any of the SNMP rule sets have been configured with a destination address.
-> show /SP/alertmgmt/rules/[1-15]
/SYS/MB/RISER/PCIE2/F20CARD/UPTIME
Targets:
Properties:
a b le
type = Power Unit fe r
n s
ipmi_name = PCIE2/F20/UP
n - tra
class = Threshold Sensor
value = 768.000 Hours a no
upper_nonrecov_threshold = 17500.000 Hours
) has ideฺ
upper_critical_threshold = 17200.000
ฺ c om Hours
t G u
upper_noncritical_threshold =ea e
16800.000 n Hours
i s - d
<snip> c tu
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
r y
s oflithe ceFlash20 cards on an Exadata storage cell are:
lM
The four a
locations
i e E
El/SYS/MB/RISER1/PCIE1/
/SYS/MB/RISER1/PCIE4/
/SYS/MB/RISER2/PCIE2/
/SYS/MB/RISER2/PCIE5/
Check the system for messages indicating that an ESM has failed or one which is near its end
of useful life. The messages you see should be similar to one of the following:
/SYS/MB/RISER1/PCI4/F20CARD ESM is approaching its lifespan. Please
schedule a replacement as soon as possible.
Or
/SYS/MB/RISER2/PCI5/F20CARD ESM has exceeded its lifespan. Please
schedule a replacement as soon as possible.
192.168.1.1: Temperature: 42 C
192.168.1.2: Temperature: 47 C
192.168.1.3: Temperature: 33 C
192.168.1.4: Temperature: 40 C
ble
192.168.1.5:
<snip>
Temperature: 33 C
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
a ns
n- t r
a no
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
Hardware
E lMDiagnostic Best Practices
i e
ElCheck for memory (ECC) errors with the command:
# ipmitool sel list | grep ECC | cut -f1 -d : | sort –u
This command enables you to identify any memory (ECC) errors. If such errors exist,
reseating the DIMM might fix the problem. In some cases, the entire motherboard may have
to be replaced.
Symptoms of memory (ECC) errors include the following:
• The amount of system time for certain operations is unpredictably large, causing
network transfers to be slow, which can also result in application timeouts.
• The IPMI Linux driver times out on operations with ILOM.
• A message such as IPMI message handler: BMC returned incorrect
response, expected netfn b cmd 22, got netfn 1 cmd 0 appears in the
/var/log/messages file.
# /opt/oracle.SupportTools/sundiag.sh
Understand that certain error messages are not causing an application problem. The initial
production release includes various benign error messages that might seem critical but, in
fact, do not cause a problem.
Error messages in this category include:
ib1: multicast join failed for
ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -11
or ble
RDS/IB: send completion on 192.168.50.24 had status 12,
ns fera
t r a
n-
disconnecting and reconnecting
no
These messages can occur in /var/log/messages during various failure scenarios. They
a
has ideฺ
should not affect application performance, but all failures should still be investigated.
)
ฺ c om- 80:00:00:48:FE:80
HW address shared conflict bet ib0 and bond0 warning message:
t G u
(A) Warning: the permanent HWaddr of
- e a en
ib0 – is
still in use by bond0.i
@ cis Stud
Set the HWaddr of ib0 to raydifferent
s t h is address to avoid conflicts.
This message happenslm a s
on ifdowne bond0 and ib0 happens to be the
ฺ e u
to not a problem.
active interface.
( e lie Thiss e
is
(B) Duringry bootup, nmessages regarding "missing
a s li c e
l M
/sys/class/net/bond0/bonding/slaves" file come up.
E messages come up because the ib0/ib1 interfaces are configured by openibd before
l i e
These
E the bond0 interface is configured. These messages are okay as well and do not pose a
problem.
Line 109 cannot find the file message from ifup-eth on up/down of IB/and shutdown/boot.
First boot warnings on compute node: These are harmless and appear only on first boot
because of the way first boot loads. It loads only certain modules in the lower run level.
ble
ns fera
t r a
no n-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm us
ฺ e
( e lie se to
a sry licen
El M
E l i e
# /opt/oracle.oswatcher/osw/archive
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a srdata isligathered:
The following ce
E lM
e
Eli 10 seconds
Every
rds-info –c
Every 15 seconds
iostat -x 1 3
vmstat 1 2
top
ps -elf
mpstat 1 2
cat /proc/meminfo
cat /proc/slabinfo
Every 5 minutes
ibstatus
ib-bond --status
ibv_devinfo
ifconfig –a
Every hour
b le
ipmitool sel list
ns fera
Data is retained for 120 hours in bzip2-compressed files in the appropriate archivet r a
subdirectory.
no n-
a
The osw directory has a read me file and utilities to report and graph some types of collected
data. ) has ideฺ
ฺ c om t Gu
Linux Kernel Crash Core Files
i s -ea uden
The cells and database nodes of the Database
@ c Machine S t are configured to generate Linux
kernel crash core files in the /var/crash i s
sry e thdirectory, when a Linux crash utility is located. The
a
m uthes crash files. The crash files are automatically removed
crash utility can be used tolanalyze
ฺ e o files do not occupy more than 10 percent of the free disk
by the OSWatcher utility
( e lie sthat
so
e tthe
space on the file y system. nOlder crash files are removed first.
a s r li c e
E l M
E l i e
a s ry licen
diag/asm/cell/sc01.us.oracle.com/trace/svtrc_8030_0.trc
lM
diag/asm/cell/sc01.us.oracle.com/trace/rstrc_21240_1.trc
E
Eli e
adrci>
The trace files are located at $ADR_BASE/diag/asm/cell/<hostname>.
This is the place for other important trace files—for example, alert.log.
The CellCLI calibrate command can be used to determine if disks are operating according to
expectations.
Note: Calibrate should be run when the application is not accessing the cell so that it does not
itself cause an application performance problem. Preferably, the cellsrv process should be
completely shut down when running calibrate. The force option is required if cellsrv is up.
Calibrate will report any disk that is suboptimal. If a disk is reported as suboptimal, perform
the following.
Look for evidence of unrecoverable I/O errors in the following locations: b le
• The CellCLI command list physicaldisk detail shows the hardReadErrors and
ns fera
hardWriteErrors attributes which will increase with each occurrence.
t r a
no n-
Note: It is better to monitor these values while the workload is running as opposed to
making a conclusion on the initial value seen. a
• ) has ideฺ
The cell side alert log reports unrecoverable I/O errors with the "io_getevents err"
string in them.
ฺ c om t Gu
•
i s -ea uden
The database and ASM alert logs will report messages with "IO Failed" when
c
encountering these unrecoverable errors.
@ S t
sry e thi s
- If the I/O is a write, the disk will be offlined.
a
lm us
ฺ e
- If the I/O is a read, just the message will be written.
• ( e lie se to
In both cases, the secondary extent is leveraged to acquire the necessary data. Disks
sry licen
reporting these errors excessively should be dropped from the disk group with new disks
a
l M
taking their place.
E
E l i e
• Disk controller firmware logs can be checked with the following command:
- Megacli64 -fwtermlog -dsply –a0
In some cases, where a disk is performing poorly, simply reseating the drive can resolve the
issue. Ensure that the disk is taken offline (can be done by inactivating grid disk) before
performing this operation.
Write performance can be affected if the drives have been set back to the writethrough mode
le
from the default writeback. This can occur if there is a problem with the controller battery or if
b
can be checked with the following command: ns fera
it is in a special mode, such as a learn cycle. The writethrough/writeback status of the disks
t r a
no
MegaCli64 -LDInfo -Lall -aALL | grep 'Current Cache Policy’ n-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm us
ฺ e
( e lie se to
a sry licen
El M
E l i e
- e aฺ ent
# tar -czvf IBLogs_`hostname`.gz
@ tud
cis S/tmp/IBLogs*.log /tmp/ibdiagnet.*
s
sry e thi /var/log/boot.log /var/log/secure
/var/log/messages /var/log/opensm*
a
lm us
ฺ e
( e lie se to
a sry licen
E lM
Eli e
Datagram
Tools SM
SMA Subnet Manager
User Level
UDAPL Agent
MAD API
User PMA Performance
InfiniBand OpenFabrics User Level Verbs / API iWARP R-NIC
Manager Agent
APIs
User SDP Lib IPoIB IP over InfiniBand
Space
SDP Sockets Direct
Kernel Space Protocol
Upper
Layer NFS-RDMA Cluster SRP SCSI RDMA
EoIB IPoIB SDP SRP iSER RDS RPC File Sys Protocol (Initiator)
Protocol
iSER iSCSI RDMA
b le
Protocol (Initiator)
fera
Connection Manager
Abstraction (CMA) RDS Reliable Datagram
Kernel bypass
Kernel bypass
Mid-Layer
SA Connection Connection
a n s
Service
Client
MAD SMA
Manager Manager
n- t r
UDAPL User Direct Access
Programming Lib
InfiniBand no
OpenFabrics Kernel Level Verbs / API
a iWARP R-NIC HCA Host Channel
Adapter
om t Gu
Provider
Specific Driver Driver Key Common
ฺ c Apps &
-ea uden
Access
InfiniBand Methods
Hardware InfiniBand HCA
c i s t
iWARP R-NIC
iWARP
for using
OF Stack
@ s S
a sry e thi
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i
el nsee
Copyright
t o
y (
a sr Architecture:
l i ce
E lM
InfiniBand Layered Troubleshoot based on layers.
Elie
ble
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr ILOMliWeb
ce UI is not accessible, first check network settings. If the switch
E lMsettings look okay, check the following setting through the spsh:
If the IB switch
e
network
Eli
-> cd /SP/services/http
/SP/services/http
-> ls
Properties:
port = 80
secureredirect = enabled
servicestate = disabled
b le
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a r lice
sInfo
E lM
The System tab displays status information regarding the switch hardware, including
Eloriethe chassis and power supplies, and status of the power supplies and fans.
basic information about the management controller, firmware version and build date, FRU IDs
The Sensor Info tab displays the latest hardware sensor readings for the switch’s power
supplies and fans, including the current voltage and temperature values.
The IB Performance tab displays the status and available bandwidth of the switch ports by
using a table format. By clicking a column heading, the information in the table is sorted
according to that column heading, either in ascending or descending order.
The IB Port Map tab displays information about peer devices attached to the switch by using a
table format.
The Subnet Manager tab displays the current SM settings for this switch, including whether or
not it is the master SM, along with the priority of the SM.
ble
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr diagram
l i cedisplays four gateway receptacles, labeled 0A, 1A, 0B, and 1B. When
aie ElM is physically present in a gateway receptacle, the receptacle changes from a
The rear panel
connector
Elblack rectangle to a gray rectangle with four indicators. Each indicator represents one of the
four possible ports available at the connection. The rectangles left of the gateway connection
are the BX indicators, which display the status of the internal switch hardware connections.
Moving the cursor over a BX indicator opens a small window that provides information about
the BridgeX port. If the indicator is red, the window displays a reason for the respective state.
Clicking a gray gateway connector opens a window that displays connector FRU and port
information for that connection. At the top of the window is the connector name. There are two
parts of the window: the cable FRU ID information on the left, and a smaller status pane for
the ports on the right. Clicking a tab displays that port’s information, including the physical
state, logical state, protocol, and so on.
ble
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr diagram
l i cedisplays the presence of connectors and their status by using a
E lM
The rear panel
i e
graphic. The diagram displays the management controller’s IP address, the connector
Elreceptacles, and their respective connector names. When a cable is attached to a receptacle,
a connection is made. That connection is displayed in the diagram as a gray rectangle, with
three or four smaller indicators. Moving the cursor over an indicator, clicking an indicator, or
clicking a connection opens a window that provides additional information about that indicator
or connection.
In the rear panel diagram, there are 32 InfiniBand receptacles displayed, labeled 0A to 15A
and 0B to 15B. When a connector is physically present in an InfiniBand receptacle, the
receptacle changes from a black rectangle to a gray rectangle with three indicators. Moving
the cursor over an indicator that is orange or red opens a small window that provides the
reason for the respective state. A center indicator is orange when the link is at a speed slower
than QDR, such as SDR or DDR. A right indicator is red when there are significant errors
(symbol, recovery, and so on) on the link.
Clicking a gray InfiniBand connector opens a window that displays connector FRU, port state,
error, and statistical information for that connection.
ibstatus
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
Description:
ibstatus is a script that displays basic information obtained
from the local IB driver. Output includes LID, SMLID, port
state, link width active, and port physical state.
Syntax:
le
ibstatus [-h] [devname[:port]]... rab fe
n s
- tra
n on
s a
) a
h ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lM
Examples
E
e
El#i ibstatus
InfiniBand device 'is4_0' port 0 status:
default gid: fe80:0000:0000:0000:0021:283a:87cb:a0a0
base lid: 0x1
sm lid: 0x12
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 40 Gb/sec (4X QDR)
See also:
ibstat
ibroute
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
Description:
ibroute uses SMPs to display the forwarding tables (unicast
[LinearForwardingTable or LFT]) or multicast (Multicast
[ForwardingTable or MFT]) for the specified switch LID and
the optional LID (mlid) range.
The default range is all valid entries in the range 1...FDBTop. able
f e r
Syntax:
a n s
t r
ibroute [options] <switch_addr> [<startlid>
n on-
a
[<endlid>]]] as ฺ ) h uide
m
co nt G
a ฺ
c is-e tude
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
r y
s flags: ce
a l i
lM
Nonstandard
E
e
i Show all lids in range, even invalid entries.
El-a
-n Do not try to resolve destinations.
-M Show multicast forwarding tables. In this case, the range
parameters are specifying mlid range.
Examples
# ibroute 1
Unicast lids [0x0-0x19] of switch Lid 1 guid 0x0021283a87cba0a0 (Sun
DCS 36 QDR LC switch burxsw-ib2.east.sun.com):
Lid Out Destination
Port Info
0x0001 000 : (Switch portguid 0x0021283a87cba0a0: 'Sun DCS 36 QDR LC
switch burxsw-ib2.east.sun.com')
See also:
ibtracert
ibtracert
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
Description:
ibtracert uses SMPs to trace the path from a source
GID/LID to a destination GID/LID. Each hop along the path is
displayed until the destination is reached or a hop does not
respond. By using the -m option, multicast path tracing can
be performed between source and destination nodes.
r a ble
Syntax: n s fe
ibtracert [options] <src-addr> <dest-addr> n - tra
o n
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr flags:
l i ce
lM
Nonstandard
E
e
i Simple format; don't show additional information.
El-n
-m <mlid> Show the multicast trace of the specified mlid.
Examples
# ibtracert 1 2
From switch {0x0021283a87cba0a0} portnum 0 lid 1-1 "Sun DCS 36 QDR
LC switch burxsw-ib2.east.sun.com"
[3] -> ca port {0x00212800013e70bb}[1] lid 2-2 "burxcel04 C
192.168.20.114 HCA-1"
To ca {0x00212800013e70ba} portnum 1 lid 2-2 "burxcel04 C
192.168.20.114 HCA-1"
ibping
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
Description:
ibping uses vendor mads to validate connectivity between
IB nodes. On exit, (IP) ping-like output is shown. ibping is
run as client/server. The default is to run as client. Note also
that a default ping server is implemented within the kernel.
Syntax:
r a ble
e
ibping [options] <dest lid|guid> nsf tra
n -
no
a
a s
h ideฺ
)
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr flags:
l i ce
lM
Nonstandard
E
e
i <count>Stop after count packets.
El-c
-f Flood destination: send packets back to back without delay.
-o <oui>Use specified OUI number to multiplex vendor mads.
-S Start in server mode (do not return).
ibnetdiscover
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
Description:
ibnetdiscover performs IB subnet discovery and outputs a
human-readable topology file. GUIDs, node types, and port
numbers are displayed, as well as port LIDs and
NodeDescriptions. All nodes (and links) are displayed (full
topology).
r a ble
Syntax: n s fe
ibnetdiscover [options] [<topology-filename>] n - tra
o n
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr can luse
i cethis utility to list the current connected nodes. The output is printed to
E lM
Optionally, you
the
Eli e standard output unless a topology file is specified.
Nonstandard flags:
-l List of connected nodes
-H List of connected HCAs
-S List of connected switches
-g Grouping
ibhosts
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
Description:
ibhosts either walks the IB subnet topology or uses an
already saved topology file and extracts the Controller
Adapter (CA) nodes.
Syntax:
ibhosts [-h] [<topology-file>] ble
Dependencies: ns fera
t r a
ibnetdiscover, ibnetdiscover format non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
ibswitches
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
Description:
ibswitches either walks the IB subnet topology or uses an
already saved topology file and extracts the IB switches.
Syntax:
ibswitches [-h] [<topology-file>]
ble
Dependencies:
ns fera
ibnetdiscover, ibnetdiscover format t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lM
Examples
E
e
El#i ibswitches
Switch : 0x0021283a87cba0a0 ports 36 "Sun DCS 36 QDR LC switch sw-
ib2.east.sun.com" enhanced port 0 lid 1 lmc 0
Switch : 0x0021283a87b8a0a0 ports 36 "Sun DCS 36 QDR LC switch sw-
ib3.east.sun.com" enhanced port 0 lid 18 lmc 0
ibchecknet
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
Description:
ibchecknet uses a full topology file that was created by
ibnetdiscover, scans the network to validate the
connectivity, and reports errors (from port counters).
Syntax:
ibchecknet [-h] [<topology-file>] ble
Dependencies: n s fera
-t r a
ibnetdiscover, ibnetdiscover format, ibchecknode,
n o n
ibcheckport,
a
and ibcheckerrs as ฺ ) h uide
m
co nt G
a ฺ
c is-e tude
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
s r y ce
a l i
lM
Examples
E
e
El#i ibchecknet
#warn: counter SymbolErrors = 65533 (threshold 10) lid 1 port 255
Error check on lid 1 (Sun DCS 36 QDR LC switch burxsw-
ib2.east.sun.com) port all: FAILED
#warn: counter SymbolErrors = 65535 (threshold 10) lid 18 port 255
#warn: counter RcvSwRelayErrors = 152 (threshold 100) lid 18
port 255
Error check on lid 18 (Sun DCS 36 QDR LC switch burxsw-
ib3.east.sun.com) port all: FAILED
ibcheckerrs
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
Description:
ibcheckerrs checks the specified port (or node) and reports
errors that surpassed their predefined threshold. The port
address is LID unless the -G option is used to specify a GUID
address.
Syntax:
r a ble
ibcheckerrs [-h] [-G] [-t <threshold_file>] s e
f[-
tra n
s(how_thresholds)] <lid|guid> [<port>] o n -
a n
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr thresholds
l i ce can be dumped by using the -s option, and a user-defined
lM
The predefined
E
i e
threshold_file
El<file>
(using the same format as the dump) can be specified by using the -t
option.
Examples
# ibcheckerrs 1
#warn: counter SymbolErrors = 65533 (threshold 10) lid 1 port
255
Error check on lid 1 (Sun DCS 36 QDR LC switch burxsw-
ib2.east.sun.com) port all: FAILED
Dependencies
perfquery, perfquery output format, ibaddr
ibportstate
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
Description:
ibportstate allows the port state and port physical state of
an IB port to be queried or a switch port to be disabled or
enabled.
Syntax:
ibportstate [-d(ebug) -e(rr_show) -v(erbose) rable
-D(irect)-G(uid) -s smlid -V(ersion) -C nsfe
ca_name -P ca_port -t timeout_ms] <dest n - tra
dr_path|lid|guid><portnum> [<op>] a no
s ฺ
haquery
supported ops: enable, disable, m ) ide o t Gu
ฺ c
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lM
Examples
E
e
El#i ibportstate 3 1 disable # by lid
# ibportstate -G 0x2C9000100D051 1 enable # by guid
# ibportstate -D 0 1 # by direct route
ibcheckstate
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
Description:
ibcheckstate uses a full topology file that was created by
ibnetdiscover, scans the network to validate the port state and
port physical state, and reports any ports that have a port
state other than Active or a port physical state other than
LinkUp.
r a ble
Syntax: n s fe
ibcheckstate [-h] [<topology-file>] on-tr
a
a n
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lM
Dependencies
E
i e
Elibnetdiscover, ibnetdiscover format, ibchecknode, ibcheckportstate
Examples
# ibcheckstate
ibcheckportstate
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
Description:
ibcheckportstate checks the connectivity and the
specified port for proper port state (Active) and port physical
state (LinkUp). The port address is LID unless the –G option is
used to specify a GUID address.
Syntax:
r a ble
ibcheckportstate [-h] [-G] <lid|guid> n s fe
<port_number> n - tra
o n
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lM
Examples
E
e
El#i ibcheckportstate 2 3
ibwarn: [28833] mad_rpc: _do_madrpc failed; dport (Lid 2)
smpquery: iberror: failed: operation portinfo: port info query
failed
Port check lid 2 port 3: FAILED
ibcheckerrors
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
Description:
ibcheckerrors uses a full topology file that was created by
ibnetdiscover, scans the network to validate the
connectivity, and reports errors (from port counters).
Syntax:
le
ibnetcheckerrors [-h] [<topology-file>] rab fe
Dependencies: a n s
n -t r
ibnetdiscover, ibnetdiscover format, ibchecknode,
n o
ibcheckport,
a
and ibcheckerrs as ฺ ) h uide
m
co nt G
a ฺ
c is-e tude
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
s r y ce
M a l i
El
E l i e
ibdiscover.pl
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
Description:
ibdiscover.pl uses a topology file that was created by
ibnetdiscover, a discover.map file that was created by the
network administrator (which indicates the nodes to be
expected), and an ibdiscover.topo file, (which is the
expected connectivity and produces a new connectivity file
r a ble
[discover.topo.new]) and outputs the changes to s fe
stdout. The network administrator can choose to replace n
tra the
n -
no changes
“old” topo file with the new one or incorporate acertain
from the new file to the old file. has eฺ ) uid
o m
e a ฺc nt G
c is- tude
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
r y
s of the ceibdiscover.map file is:
a l i
lM
The syntax
E
i e
El<nodeGUID>|port|"Text for node"|<NodeDescription from
ibnetdiscover format>
Examples
8f10400410015|8|"ISR 6000"|# SW-NM2 port 0 lid 5
8f10403960558|2|"HCA 1"|# MT23108 InfiniHost Mellanox Technologies
The syntax of the old and new topo files (ibdiscover.topo and
ibdiscover.topo.new) are:
<LocalPort>|<LocalNodeGUID>|<RemotePort>|<RemoteNodeGUID>
10|5442ba00003080|1|8f10400410015
These topo files are produced by the ibdiscover.pl tool.
Syntax:
ibnetdiscover | ibdiscover.pl
ibnodes
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
Description:
ibnodes either walks the IB subnet topology or uses an
already saved topology file and extracts the IB nodes (CAs
and switches).
Syntax:
ibnodes [<topology-file>] ble
Dependencies: ns fera
t r a
ibnetdiscover and ibnetdiscover format non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lM
Examples
E
e
El#i ibnodes
Ca : 0x00212800013e6c22 ports 2 "db04 S 192.168.20.124 HCA-1"
Ca : 0x00212800013e6dfe ports 2 "db03 S 192.168.20.123 HCA-1"
Ca : 0x00212800013e6c12 ports 2 "cel07 C 192.168.20.117 HCA-1"
...
Ca : 0x00212800013e6cc6 ports 2 "cel01 C 192.168.20.111 HCA-1"
Ca : 0x00212800013e6cce ports 2 "cel02 C 192.168.20.112 HCA-1"
Switch : 0x0021283a87cba0a0 ports 36 "Sun DCS 36 QDR LC switch
burxsw-ib2.east.sun.com" enhanced port 0 lid 1 lmc 0
Switch : 0x0021283a87b8a0a0 ports 36 "Sun DCS 36 QDR LC switch
burxsw-ib3.east.sun.com" enhanced port 0 lid 18 lmc 0
ibclearerrors
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
Description:
ibclearerrors clears the PMA error counters in port
counters by either walking the IB subnet topology or by using
an already saved topology file.
Syntax:
le
ibclearerrors [-h] [<topology-file>] rab fe
Dependencies: a n s
n -t r
ibnetdiscover, ibnetdiscover format, and perfquery
n o
s a
) a
h ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
ibclearcounters
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
Description:
ibclearcounters clears the PMA port counters by either
walking the IB subnet topology or by using an already-saved
topology file.
Syntax:
le
ibclearcounters [-h] [<topology-file>] rab fe
Dependencies: a n s
n -t r
ibnetdiscover, ibnetdiscover format, and perfquery
n o
s a
) a
h ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
ibsysstat
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
Description:
ibsysstat uses vendor mads to validate connectivity
between IB nodes and obtain other information about the IB
node. ibsysstat is run as client/server. The default is to run
as client.
Syntax:
r a ble
ibsysstat [options] <dest lid|guid> [<op>]nsfe
- tra
no n
s a
) a
h ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr flags:
l i ce
lM
Nonstandard
E
i e
ElCurrent supported operations:
ping - Verify connectivity to server (default).
host - Obtain host information from server.
cpu - Obtain cpu information from server.
-o <oui> Use specified OUI number to multiplex vendor mads.
-S Start in server mode (do not return).
version:
# version
SUN DCS 36p version: 1.3.3-2
Build time: Apr 4 2011 11:15:19
SP board info:
Manufacturing Date: 2009.06.22
r a ble
Serial Number: "NCD3R0168"
n s fe
Hardware Revision: 0x0006 n - tra
Firmware Revision: 0x0102 a no
BIOS version: NOW1R112 ) has ideฺ
BIOS date: 04/24/2009 eaฺc
om t Gu
n
c is- tude
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
s r y ce
a l i
lM
env_test
E
e
i environment:
ElShow
The env_test command can be used to read values for all the HW
sensors.
The output will indicate if there are any faulty HW:
NM2 Environment test started:
Starting PSU test:
PSU 0 not present
PSU 1 present status: OK
PSU test returned OK
Starting Voltage test:
Voltage ECB OK
Measured 3.3V Main = 3.27 V
Measured 3.3V Standby = 3.37 V…
showunhealthy
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
checkpower
The checkpower command will display the status of the PSUs.
PSU 0 not present
PSU 1 present status: OK
ble
ns fera
getfanspeed t r a
no
The getfanspeed command will display the speed of the FANs. n-
a
Fan 0 not present
) has ideฺ
Fan 1 rpm 15314
ฺ c om t Gu
Fan 2 rpm 15314
i s -ea uden
Fan 3 rpm 15130
@ c S t
sry e thi s
Fan 4 not present a
lm us
ฺ e
( e lie se to
a sry licen
E lM
Eli e
listlinkup
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
setfan 128
(Set fan duty-cycle to 128 of 255, around 1/2 speed 15000 RPM)
Note: The speed of the fans will normally be automatically controlled, so changing the value
here will normally have no effect.
Voltage check:
checkvoltages
Voltage ECB OK ble
Measured 3.3V Main = 3.27 V
ns fera
t r a
Measured 3.3V Standby = 3.37 V
no n-
Measured 12V = 12.00 V
a
Measured 5V = 5.03 V
) has ideฺ
Measured VBAT = 3.28 V
ฺ c om t Gu
Measured 2.5V = 2.51 V
i s -ea uden
@ c S t
Measured 1.8V = 1.80 V
ry thi s
Measured I4 1.2V = 1.18as
ฺ e lm use
V
All voltages OKlie
( e e to
s r y cens
l M a li
i e E
E l
nm2port
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
enablecablelog / disablecablelog
Enable disable logging of Cable events in syslog (default enable)
enablelinklog / disablelinklog
Enable disable logging of Link events in syslog (default disable)
• getportstatus
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
• Usage:
• getportstatus i4num ibport
• Arguments:
• i4num (Values as for checkboot)
• ibport: IP port number(1-36)
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lM
Examples
E
e
El#i getportstatus A 1
Portstate 4
Portphystate 5
LinkWidthActive 2
LinkSpeedActice 4
i4reset
Reset I4-A
Note also after resetting the I4 that is connected to the
ComExpress, the I4 PCI-Express driver will no longer be working.
disableswitchport i4num ibport
(Arguments as getportstate)
enableswitchport i4num ibport
enablesm
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lM
setsmpriority
E
Eli e
Usage:
setsmpriority priority
Set priority of the SM
priority should be in the range 0-15
setsubnetprefix
Usage:
setsubnetprefix subnetprefix
Set subnet prefix for the SM
subnetprefix should be a valid lower case hex number starting with
0x
setcontrolledhandover
Usage:
setcontrolledhandover value
Enable/disable controlled handover for the SM value should be TRUE
or
FALSE
setloghost
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
# setloghost
usage: setloghost <IP address or host name>
To turn off remote logging, use localhost as
parameter.
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lM
PSU testing
E
Eli
# ea237test 1 read 0
Read psu 1 reg 0 value 0x8
For description of registers and values see specification of A237
PSU.
[root@o4nm2-36p-1 ~]# a237test 1 fruid
Read fruid psu 1
80 00 00 00 00 00 49 00 00 00 00 00 00 00 80 00
00 00 32 00 00 00 1f 71 11 91 00 00 00 00 00 00
<output truncated>
Error recovery
The CLI command managementreset can be used to recover from CPLD
FATAL (NM2-36p) and I4 therm shutdown. The CLI command will perform
the necessary actions to restore the system, and prompt the user to
reboot as necessary.
Is pre-boot version no
Upgrade pre-boot FW.
correct? (version)
yes
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lM
Run “boot”
E
i e
El....
init> boot
Previous application starts failed (3 times). Please run
check_app_partition. Will not start application image.
init>
Run “check_app_partition”
....
init> check_app_partition
Doing filesystem check ...
fsck (busybox 1.14.3, 2009-09-16 10:16:05 CEST)
e2fsck 1.39 (29-May-2006)
/: clean, 11982/104448 files, 268701/417656 blocks
Everything looks OK.
init> boot
pass
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
fail
Is it due to incorrect FW no
Fix URL
URL?
If nothing worked:
yes • Switch is corrupted
no • File service ticket.
Is it due to a power
outage? b le
fera
yes
a ns
FW URL (Switch):
n- t r
no
Example:
ftp://root:******@10.20.30.40/xyz.abc.com/Releases/1.3.2/sundcs_gw_repository_1.3.2_1.pk
a
has ideฺ
g
)
om t Gu
Protocols supported & tested:
1) FTP
ฺ c
2) HTTP
i s ea ondethe
-enabled n file server)
tu
(Make sure appropriate transport service is
@ c S
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
E lM
Eli e
“ibdiagnet” output:
....
-I---------------------------------------------------
-I- PM Counters Info
-I---------------------------------------------------
-E- lid=0x0006 guid=0x00212856cd22c0a0 dev=48438 Port=24
Performance Monitor counter : Value
symbol_error_counter : 0x5
....
ble
fera
“ibcheckerrors” output:
[root@nm2gw-42 ~]# ibcheckerrors
a ns
## Summary: 11 nodes checked, 0 bad nodes found
n- t r
## 36 ports checked, 0 ports have errors beyond
a no
has ideฺ
threshold
)
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
E l M
E l i e
port? (ibnetdiscover)
“ibnetdiscover” output:
....
vendid=0x2c9
devid=0xbd36
sysimgguid=0x212856cd22c0a3
switchguid=0x212856cd22c0a0(212856cd22c0a0)
Switch 36 "S-00212856cd22c0a0" # "SUN IB QDR GW switch nm2gw-41" enhanced port 0
ble
fera
lid 6 lmc 0
[30] "H-0002c90300089102"[1](2c90300089103)
[24] "H-0003ba000100e370"[1](3ba000100e371)
# " HCA-1" lid 51 4xQDR
# "nsn156-63 HCA-1" lid 2 4xQDR
a ns
[22] "H-00212800013ece9e"[1](212800013ece9f)
n- t
# "nsn156-61 HCA-1" lid 19 4xQDRr
[21] "H-0002c90300089102"[2](2c90300089104)
....
# " HCA-1" lid 54 4xQDR
a no
) has ideฺ
ฺ c om t Gu
Switch port GUID i s -ea uden
HCA port GUID c S t
@ s
a sry e thi
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e
Copyright
el nse t o
y (
a sr lice
E lM
Eli e
Switch HCA
Translate IB port number Trace IB port
to switch connector name number to HCA
(dcsport/listlinkup) connector
“dcsport” output:
le
rab2A
[root@nm2gw-41 ~]# dcsport -port 24
Port 24 maps to Connector
DCS-GW Switch port 24 maps to connector 2A
s fe
tra n
“listlinkup” output: on-
[root@nm2gw-41 ~]# listlinkup a n
a s ฺ
Connector 1A Present <-> Switch Port 22 up (Enabled)) h
Connector 0A Present <-> Switch Port 20 up (Enabled)
i d e
Connector 2A Present <-> Switch Port 24 up (Enabled)
c
Connector 3A Present <-> Switch Port 26 down ฺ(Enabled)
om t Gu
...
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
E lM
El i e
“ibnetdiscover” output:
....
vendid=0x2c9
ble
fera
devid=0xbd36
sysimgguid=0x212856cd22c0a3
a n s
switchguid=0x212856cd22c0a0(212856cd22c0a0)
Switch 36 "S-00212856cd22c0a0"
n-
# "SUN IB QDR GW switch nm2gw-41" enhanced port 0 t r
lid 6 lmc 0
a no
has ideฺ
[30] "H-0002c90300089102"[1](2c90300089103) # " HCA-1" lid 51 4xQDR
[24] "H-0003ba000100e370"[1](3ba000100e371) # "nsn156-63 HCA-1" lid 2 4xQDR
[22] "H-00212800013ece9e"[1](212800013ece9f) )
# "nsn156-61 HCA-1" lid 19 4xQDR
[21] "H-0002c90300089102"[2](2c90300089104)
c om t Gu
# " HCA-1" lid 54 4xQDR
ฺ
-eanumber n
....
HCA port GUID HCA port
i s u d e HCA port LID
c t
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
s r y ce
M a l i
El
E l i e
Switch HCA
Translate IB port number to
Trace IB port number
switch connector name
to HCA connector
(dcsport/listlinkup)
Switch:
“disableswitchport” & “enableswitchport”
output:
(dis[en]ableswitchport)
Adminstate:......................Disabled
LinkState:.......................Down
a n s
n- t
PhysLinkState:...................Disabled r
no
[root@nm2gw-41 ~]# enableswitchport 2A
a
has ideฺ
Enable connector 2A Switch port 24
Adminstate:......................Enabled
)
LinkState:.......................Down
ฺ c om t Gu
PhysLinkState:...................PortConfigurationTraining
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
HCA
E l M
i e
El“ibportstate enable/disable” output:
[root@nsn156-61 ~]# ibportstate 20 1 disable
Initial CA PortInfo:
# Port info: Lid 19 port 2
LinkState:.......................Down
PhysLinkState:...................Disabled
Lid:.............................20
.....
[root@nsn156-61 ~]# ibportstate 20 1 enable
Switch HCA
Translate IB port number to
Trace IB port number
switch connector name
to HCA connector
(dcsport/listlinkup)
yes
Disable & Enable
Still see link no
Switch/HCA port Problem solved!
errors?
(dis[en]ableswitchport)
b le
Connector firmly ns fera
no Replug cable &
t r a
plugged in? Link LED
on?
confirm a “click” n-o
a Ifnnothing worked:
cable
Replace cable ha
s • ฺRestart SM
Change cables. Does
switch o m ) u id•e Power cycle IB device
the problem follow
e
Bad
a ฺcSwitch nPortt G Finally:
Switch or cable? HCA
c is- BadtHCA u dePort • File service ticket
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
s r y ce
M a l i
El
E l i e
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
3. View the configuration on both switches. Verify that the values match the expected
configuration.
4. View the configuration on the nodes. Verify that the values match the expected
configuration.
5. If you cannot access data, verify that the authorized client property is set as desired.
6. Verify that each node has the proper IP, gateway, and netmask configured.
7. Verify that the switch has the admin IP configured on one of its interfaces. Verify that the
netmask and gateway are correct. b le
8. Verify that the InfiniBand switches are functioning properly.
ns fera
t r a
no n-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm us
ฺ e
( e lie se to
a sry licen
El M
E l i e
s r y cens
ring.............................[SUCCESS]
l M a li
i
Youe Eknow that the issue is with trnacel07. Looking at a snippet of the ibnetdiscover output,
l
E you see:
vendid=0x2c9
devid=0x673c
sysimgguid=0x212800013e6995
caguid=0x212800013e6992
Ca 2 "H-00212800013e6992" # "trnacel07 C 10.7.6.117 HCA-1“
[2](212800013e6994) "S-0021283a8371a0a0"[8] # lid 9 lmc 0 "Sun DCS
36 QDR LC switch localhost" lid 23 4xQDR
a s r y c(4X)
rate: 10 Gb/sec
l i en
lM device 'mlx4_0' port 2 status:
InfiniBand
E
ie
Eldefault gid: fe80:0000:0000:0000:0021:2800:013e:6994
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 40 Gb/sec (4X QDR)
This shows that port 1 on trnacel07 is down. Indeed, for this test case, the cable was removed
from port 1 on trnacel07.
Advanced Tasks
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
Objectives
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
The following procedure describes how to reclaim the disk space used by the Linux operating
system. This procedure should be performed on each database server.
1. Log in as the root user.
2. Change to the /opt/oracle.SupportTools directory.
3. Check the current disk configuration using the following command:
./reclaimdisks.sh –check
The command returns a detailed layout of the logical and physical disks. For Oracle Exadata
ble
fera
Database Machine X2-2, the last line of the output should be the following:
[INFO] Valid dual boot configuration found for Linux: RAID1 from 2
a ns
disks
n- t r
no
For Oracle Exadata Database Machine X2-8 Full Rack, the last line of the output should be
a
the following:
) has ideฺ
ฺ c om t Gu
[INFO] Valid dual boot configuration found for Linux: RAID5 from 3
-ea uden
disks and 1 global hot spare disk
c i s t
4. Start the disk reclamation process using the following command:
@ s S
./reclaimdisks.sh -free
a sry –reclaim
e thi
The command frees anyฺe lm us disks, and reclaims all free disks for Linux. The
Solaris-configured
process may take two
e o Exadata Database Machine X2-2, and five hours for
liehourssefor tOracle
(
a ryDatabase
Oracle Exadata
s l i c enMachine X2-8 Full Rack. To check the progress of the reclamation
lMuse the following command:
process,
E
Elie tail -f /var/log/cellos/reclaimdisks.bg.log
pools.
The following procedure describes how to reclaim the disk space used by the Linux operating
system. This procedure should be performed on each database server.
1. Log in as the root user on the database server.
2. Change to the /opt/SupportTools directory.
3. Run the reclaimdisks.pl script as follows:
reclaimdisks.pl [--unattended] ble
The script runs in interactive mode. To run the script in unattended mode, use the
ns fera
--unattended option. t r a
no n-
Respond to the prompts, as needed. The script prompts for confirmation to remove the Linux
a
has ideฺ
virtual disks, and to create Solaris virtual disks.
)
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm us
ฺ e
( e lie se to
a sry licen
El M
E l i e
– Respond with “no” if this is not the first time, and go to step
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
5.
5. Respond to the prompt about setting SSH for the root
user.
6. Respond with either “yes” or “no” to the prompt about
Oracle Management Server.
– Respond with “yes” if the server has gone down. This will
a b le
bring up Oracle Management Server. If Oracle Management fe r
n s
n - tra
Server is not up, the hardening script will not complete.
– Respond with “no” if the server is up. no a
a s
h ideฺ
)
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
1. Determine the cell disks, grid disks, and LUNs on the physical disk
that you want to replace by using the alert log, as follows:
CellCLI> LIST ALERTHISTORY WHERE ALERTMESSAGE LIKE
"Logical drive lost.*" DETAIL
The action text for this alert lists the cell disk and grid disks that are
affected.
Review the alert log and select the disk by name, time, and so on.
r a ble
2. Replace the physical disk on Exadata Storage Server.
n s fe
Note: When you replace a physical disk, the disk must be
n - tra
acknowledged by the RAID controller before you can
a nouse it. This
does not take long, but you should use the LIST
) h as PHYSICALDISK
e ฺ
command to ensure that the status is NORMAL.m u i d
a ฺ co nt G
c is-e tude
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
s r y e
cwhen
a l i
An alert
E lMis generated a disk fails. The alert includes specific instructions for replacing
Eltoiethe designated address. Consider the following when replacing a failed disk:
the disk. If you have configured the system for alert notifications, the alert will be sent by email
• The disk could be dropped by ASM, and the rebalance operation may have been
successfully run. Check the ASM alert logs to confirm this.
• The disk could be dropped, and the rebalance operation may be currently running.
Check the GV$ASM_OPERATION view to determine if the rebalance operation is still
running.
• The disk could be dropped by ASM, and the rebalance operation may have failed.
Check the V$ASM_OPERATION.ERROR code to determine whether the rebalance
operation failed.
• Rebalance operations from multiple disk groups can be performed on different ASM
instances in the same cluster if the physical disk being replaced contains grid disks from
multiple disk groups. Multiple rebalance operations cannot be run in parallel on just one
ASM instance. The operations will be queued for the instance.
• If the repair timer has not expired, the disk could be offline.
3. If ASM has not dropped the disk, drop the ASM disks from the ASM
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
disk group to which the grid disks belong by using the SQL ALTER
DISKGROUP DROP DISK command on the ASM instance. You must
drop the ASM disks from the ASM disk group before dropping the
corresponding grid disks from the cell.
4. From the Exadata Storage Server, run the following command by
using the LUN name. This command creates the cell disk and grid
disk on the LUN: ALTER LUN lun_name REENABLE FORCE.
The preceding command implicitly rebuilds the associated cell disks r a ble
and grid disks so that you do not have to perform additionalans
fe
procedures on the cell disks and grid disks. n - tr
n o
5. a
Add the grid disks to the ASM disk group. The addition
s ฺ of the grid
h a
disks to the group is automatically followed
o m i de
) by aurebalance
operation that populates the new disk t Gshare of ASM extents.
ฺc with its
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
To replace a boot disk due to disk failure, see the notes below.
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
ble
ns fera
t r a
n-
no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr the l i ceLUN was fixed and cleared but never
M
lBOOT
### Evidence,
E
the missing
Eli e
CELL labels ###
CellCLI> list alerthistory
169 2011-01-25T03:19:58+00:00 warning "Physical
drive state changed on Adapter 0 deviceId 9, enclosureId 20, slotId
1 from online to failed."
170 2011-01-25T03:20:01+00:00 critical Logical
drive lost. Lun: 0_1. Status: critical. Physical hard disk: 20:1.
Slot Num: 1. Serial Number: E046LE.
Celldisk: CD_01_sdm1cel02.
Griddisks: RECO_CD_01_sdm1cel02, DATA_CD_01_sdm1cel02.
171_1 2011-01-25T10:52:51+00:00 critical
Cell configuration check discovered the following problems:
=CELLBOOT USB=
=disk partitions, MD devices, CELLBOOT usb
[ERROR] Cells must have 3 bootable devices. 2 with label BOOT and 1
ble
with label CELLBOOT
ns fera
/dev/sda BOOT
t r a
/dev/sdm CELLBOOT
no n-
a
has ideฺ
/dev/sda7 CELLSW
)
om t Gu
/dev/sda5 CELLSYS
ฺ c
/dev/sda1 BOOT
i s - ea den
/dev/sdm1 CELLBOOT
@ c S tu
sry e thi s
a
lm us2 BOOT, 2 CELLSYS and 2 CELLSW labels
e
[ERROR] Exactly 1 ฺCELLBOOT,
lie se to you risk data loss
must exist. Ifeuncorrected
(
a
y one
[WARNING]srOnly
l i cendisk has system disk layout. Either the second
system
E lMdisk is unavailable or this is not an Exadata cell
i e
El[ERROR] The Cell has missing system disks or improperly configured
and partitioned disks
system."
177 2011-01-26T12:35:38+00:00 clear "Physical
drive inserted on Adapter 0 deviceId 21, enclosureId 20, slotId 1."
178 2011-01-26T12:36:31+00:00 info "Physical
drive state changed on Adapter 0 deviceId 21, enclosureId 20, slotId
1 from unconfigured-good to online."
179 2011-01-26T12:36:52+00:00 clear Logical
drive found. Lun: 0_1. Status: normal. Physical hard disk: 20:1. ble
Slot Num: 1. Serial Number: E1NQF3. It was empty. Will auto-create.
ns fera
t r a
n-
Celldisk: CD_01_sdm1cel02.
Griddisks: DATA_CD_01_sdm1cel02, RECO_CD_01_sdm1cel02.
a no
) has ideฺ
## Checking CELL ##
ฺ c om t Gu
i s -ea uden
CellCLI> alter cell validate @ c S t
configuration
sry e thi s
a
lm us
ฺ e
/dev/md1
( e lie se to
0 ry 8 n 10 0 active sync /dev/sda10
a s li c e
l
/dev/md2
E 0M
E l i e 8 9 0 active sync /dev/sda9
/dev/md4
0 8 1 0 active sync /dev/sda1
1 65 209 1 active sync /dev/sdad1
<snip>
/dev/md8
0 8 8 0 active sync /dev/sda8
1 65 216 1 active sync /dev/sdad8
/dev/sdm1 CELLBOOT
[ERROR] Exactly 1 CELLBOOT, 2 BOOT, 2 CELLSYS and 2 CELLSW labels
must exist. If uncorrected you risk data loss
[ERROR] The Cell has missing system disks or improperly configured
and partitioned disks
CellCLI>
ble
#####################################
ns fera
t r a
## Checking IMAGEINFO for above devices ##
no n-
##################################### a
has ideฺ
[root@sdm1cel02 MegaCli]# /opt/oracle.cellos/imageinfo
)
ฺ c om t Gu
i s
Kernel version: 2.6.18-128.1.16.0.1.el5-ea ud#1 enSMP Tue Jun 30 16:48:30
EDT 2009 x86_64 @ c S t
sry e thi s
a
Cell version: OSS_11.2.1.2.6_LINUX.X64_100511
lm us
ฺ e
Cell rpm version:
( e lie se to
cell-11.2.1.2.6_LINUX.X64_100511-1
a s ry licen
lMimage version: 11.2.1.2.6
Active
E
ie
ElActive image activated: 2010-07-28 11:22:16 +0100
Active image status: success
Active system partition on device: /dev/md5
Active software partition on device: /dev/md7
TYPE="ext3"
/dev/md4: LABEL=OOT" UUID="873922fa-afb8-4418-a917-c1bca7be2645"
TYPE="ext3" <---- BOOT is here.
/dev/sdad2: UUID=dd7ffcc-0c31-415c-9a48-39321b8471ed" TYPE="ext2"
/dev/sdad5: LABEL=ELLSYS" UUID="3bd269f4-c94c-4a5c-9892-
efcaaa6299e6" TYPE="ext3"
/dev/sdad6: UUID=d45e1b1-d879-47e7-955b-af53375c7132"
ble
SEC_TYPE="ext2" TYPE="ext3"
ns fera
r
/dev/sdad7: LABEL=ELLSW" UUID="3e84d917-3123-47f1-bde4-22a20e6f21c6"
t a
TYPE="ext3"
no n-
a
has ideฺ
/dev/sdad8: UUID=ef2ba4d-96d9-4121-b5bc-3b03ac6e27ee"
SEC_TYPE="ext2" TYPE="ext3"
)
ฺ c om t Gu
i s -ea uden
@ c S t
#### FIX ####
sry e thi s
a
lm us
ฺ e to
Resync MD device
( e lie tosediscover labels if missing.
# mdadm --query
a en
sry lic/dev/md4
E lM <-----Run again, shows the missing BOOT label
# blkid added and
e
i VALIDATION reports successful.
ElCELL
/dev/sdad1: LABEL=OOT" UUID="873922fa-afb8-4418-a917-c1bca7be2645"
TYPE="ext3"
You may need to replace a physical disk when the disk is in predictive
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
diskType: HardDisk
enclosureDeviceId: 28
errMediaCount: 0
errOtherCount: 0
foreignState: false
luns: 0_3
makeModel: "SEAGATE ST360057SSUN600G”
physicalFirmware: 0705
ble
fera
physicalInterface: sas
physicalSerial: E07L8E
a ns
physicalSize: 558.9109999993816G
n- t r
slotNumber: 3
a no
has ideAND
status: predictive failure
)
CellCLI> LIST PHYSICALDISK WHERE diskType=HardDisk ฺ status="poor
performance" DETAIL
ฺ c om t Gu
name: 28:3
i s - ea den
deviceId: 19
@ c S tu
diskType: HardDisk ry
s t hi s
a
lm 28us
enclosureDeviceId: e
ฺ e
( e lie se0 to
errMediaCount:
s r y cen 0
errOtherCount:
lM li
a foreignState: false
i e E luns: 0_3
E l makeModel: "SEAGATE ST360057SSUN600G”
physicalFirmware: 0705
physicalInterface: sas
physicalSerial: E07L8E
physicalSize: 558.9109999993816G
slotNumber: 3
status: poor performance
2. Wait until the Oracle ASM disks associated with the grid disks on the physical disk have
been successfully dropped. To determine if the grid disks have been dropped, query the
V$ASM_DISK_STAT view on the Oracle ASM instance.
Caution: The disks in the first two slots are system disks, which store the operating
system and Oracle Exadata Storage Server Software. One system disk must be in
working condition to keep the cell up and running. Wait until ALTER CELL VALIDATE
CONFIGURATION shows no mdadm errors, which indicates that the system disk resync
has completed, before replacing the other system disk.
Note: When you replace a physical disk, the disk must be acknowledged by the RAID
controller before you can use it. This does not take a long time, but you should use the
list physicaldisk command to ensure that the status is NORMAL.
Oracle ASM rebalance occurs when dropping or adding a disk. To check the status of the
rebalance, do the following:
• The rebalance operation may have been successfully run. Check the Oracle ASM alert
logs to confirm this.
ble
fera
• The rebalance operation may be currently running. Check the GV$ASM_OPERATION
view to determine if the rebalance operation is still running.
a ns
n- t r
• The rebalance operation may have failed. Check the V$ASM_OPERATION.ERROR view
to determine if the rebalance operation failed.
a no
has ideฺ
• Rebalance operations from multiple disk groups can be performed on different Oracle
)
ฺ c om t Gu
ASM instances in the same cluster if the physical disk that is being replaced contains
ASM disks from multiple disk groups. One Oracle ASM instance can run one rebalance
s -ea uden
operation at a time. If all Oracle ASM instances are busy, rebalance operations will be
i
queued. @ c S t
sry e thi s
a
lm us
ฺ e
( e lie se to
a sry licen
El M
E l i e
other good disks. It is better to remove the bad disk from the
system than let it remain. To identify a bad physical disk, use
the CALIBRATE command and look for very low throughput
and input/output operations per second (IOPS) for each
physical disk.
Note: If a disk exhibits extremely poor performance, it is le
marked as poor performance and its grid disks are r a b
n s fe
automatically dropped from the ASM disk group. tra
n on-
s a
) a
h ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a r lice
sdisk
E lM
After the bad has been identified, perform the following procedure:
Elie1. Find all the grid disks on the bad disk. Use the following command to direct Oracle ASM
to stop using the bad disk immediately:
ALTER DISKGROUP diskgroup_name DROP DISK asm_disk_name FORCE
It is possible that the DROP command with the FORCE option could fail due to offline
partners. You can restore the Oracle ASM data redundancy by correcting other cell or
disk failures and retry DROP...FORCE, or use the following command to direct Oracle
ASM to rebalance the data out of the bad disk:
ALTER DISKGROUP diskgroup_name DROP DISK asm_disk_name NOFORCE
2. Wait until the Oracle ASM disks associated with the grid disks on the bad disk have
been successfully dropped by querying the V$ASM_DISK_STAT view.
3. Remove the badly performing disk. When you remove the disk, you get an alert.
controller before you can use it. This does not take long, but you should use the LIST
PHYSICALDISK command to ensure that the status is NORMAL.
ble
ns fera
t r a
no n-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm us
ฺ e
( e lie se to
a sry licen
El M
E l i e
You may want to delete all data on a disk and then use the disk
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
You may need to move all the drives from one Exadata cell to
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
3. Move the disks from the original Exadata cell to the new
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
Exadata cell.
Caution: Ensure that the first two disks, which are the
system disks, are in the same first two slots. Failure to do
so will cause the Exadata cell to function improperly.
4. Start the cell.
5. Restart the cell services by using the following command: ble
CellCLI> ALTER CELL RESTART SERVICES ALLns fera
t ra
on-
6. Activate the grid disks by using the following command:
n
sa
CellCLI> ALTER GRIDDISK ALL ACTIVE
ha ideฺ
If the Oracle ASM disks on this cell )
ฺ c omhave t G unot been
dropped, they are changed-etoaonline e nautomatically and
start getting used. @c i s tu d
s S
a sry e thi
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e
Copyright
el nse t o
y (
a sr lice
El M
E l i e
following steps:
1. Replace the physical disk immediately, and wait for the disk to
be recognized by the RAID controller.
2. Run the following command to obtain the LUN and accept the
cell disks and grid disks on it:
CELLCLI -e "ALTER LUN lun_name REENABLE FORCE"
3. From ASM, verify the status of the ASM disks on the physical fera
ble
disk. a n s
-t r
4. Run the following command from the ASM host: non
s aONLINE DISK
SQL> ALTER DISKGROUP diskgroup_name
) a
h ideฺ
asm_disk_name m u co nt G
a ฺ
c is-e tude
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
s r y ce
M a l i
El
E l i e
PCIe cards. Each card has four flash disks (FDOMs) for a total
of 16 flash disks. The four F20 PCIe cards are present on the
PCI slots numbered 1, 2, 4, and 5. The F20 PCIe cards are not
hot pluggable; therefore, the Exadata cell must be powered
down before replacing the flash disks or cards.
ble
n sfera
t r a
n-no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a r lice
sfailed
E M
To identify a
lCellCLI> flash disk, use the following command:
name: [9:0:2:0]
diskType: FlashDisk
id: 508002000092e70FMOD2
luns: 1_2
makeModel: "MARVELL SD88SA02"
physicalFirmware: D20R
physicalInsertTime: 2009-10-27T13:11:16-07:00
physicalInterface: sas
physicalSerial: 508002000092e70FMOD2
physicalSize: 22.8880615234375G
slotNumber: "PCI Slot: 1; FDOM: 2"
status: critical
should be replaced with a new flash disk at the earliest opportunity. If the flash disk is used for
flash cache, the effective cache size for the cell is reduced. If the flash disk is used for grid
disks, the Oracle ASM disks associated with these grid disks are automatically dropped with
the FORCE option from the Oracle ASM disk group and Oracle ASM rebalance will ensue to
restore the data redundancy.
To replace a flash disk due to disk failure, perform the following steps:
1. Shut down the cell.
b le
fera
2. Replace the failed flash disk based on the PCI number and FDOM number.
3. Power up the cell. The cell services will be started automatically.
a ns
4. Bring all grid disks online by using the following command: n- t r
CellCLI> ALTER GRIDDISK ALL ACTIVE a no
5. Verify that all grid disks have been successfully put online
) hasbyidusing
e ฺ the following
command:
ฺ c om t Gu
CellCLI> LIST GRIDDISK ATTRIBUTES
i s -ea udasmmodestatus
en
Wait until asmmodestatus shows ONLINE
@ c for allSgrid
t disks.
The new flash disk will be automaticallysry eused s
thi by the system. If the flash disk is used for flash
a
lm will increase.
cache, the effective cache
i e ฺ e size
t o us If the flash disk is used for grid disks, the grid
disks will be recreated
( e l seon the new flash disk. If those grid disks were part of an Oracle ASM
ywill beceadded
srdisk
disk group, they n back to the disk group and the data will be rebalanced on them
based on M a
the l i
group redundancy and the asm_power_limit parameter.
l
E ASM rebalance occurs when dropping or adding a disk. Consider the following:
E i e
Oracle
l
• The rebalance operation may have been successfully run. Check the Oracle ASM alert
logs to confirm.
• The rebalance operation may be currently running. Check the GV$ASM_OPERATION
view to determine if the rebalance operation is still running.
• The rebalance operation may have failed. Check the V$ASM_OPERATION.ERROR view
to determine if the rebalance operation failed.
Rebalance operations from multiple disk groups can be performed on different Oracle ASM
instances in the same cluster if the physical disk being replaced contains ASM disks from
multiple disk groups. One Oracle ASM instance can run one rebalance operation at a time. If
all Oracle ASM instances are busy, the rebalance operations are queued.
See the Sun Flash Accelerator F20 PCIe Card User’s Guide for additional information about
the F20 PCIe cards at http://docs.sun.com/app/docs/prod/flash.pcie?l=en&a=view.
ble
n sfera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a r lice
spredictive
E lM
To identify a failure flash disk, use the following command:
Elie
CellCLI> LIST PHYSICALDISK WHERE DISKTYPE=flashdisk AND
STATUS='predictive \failure' DETAIL
name: [9:0:2:0]
diskType: FlashDisk
id: 508002000092e70FMOD2
luns: 1_2
makeModel: "MARVELL SD88SA02"
physicalFirmware: D20R
physicalInsertTime: 2009-10-27T13:11:16-07:00
physicalInterface: sas
physicalSerial: 508002000092e70FMOD2
physicalSize: 22.8880615234375G
slotNumber: "PCI Slot: 1; FDOM: 2"
status: predictive failure
…<truncated>
slotNumber: "PCI Slot: 1; FDOM: 2"
status: poor performance
To replace a flash disk due to disk problems, perform the following steps:
1. Shut down the cell.
2. Replace the failed flash disk based on the PCI number and FDOM number.
3. Power up the cell. The cell services will be started automatically.
b le
4. Bring all grid disks online by using the following command:
nfera
s
CellCLI> ALTER GRIDDISK ALL ACTIVE
r a
-t following
5. Verify that all grid disks have been successfully put online by usingn the
command: n o
a
CellCLI> LIST GRIDDISK ATTRIBUTES asmmodestatus
) has ideฺ
Wait until asmmodestatus shows ONLINE for all grid
ฺ c omdisks. u
t G
The new flash disk will be automatically used-by e athe system.
n
e the flash disk is used for flash
If
c i s
cache, the effective cache size will increase. If the t u d
r y@ i s S flash disk is used for grid disks, the grid
disks will be re-created on the new
a h If those grid disks were part of an Oracle ASM
s flashedisk.
tdisk
disk group, they will be added
ฺ e l mback to
u sthe group and the data is rebalanced on them
based on the disk group
e lie se t o
redundancy and the asm_power_limit parameter.
(
ry liceoccurs
Oracle ASM rebalance n when dropping or adding a disk. To check the status of the
a s
lM do the following:
rebalance,
E
Elie
• The rebalance operation may have been successfully run. Check the Oracle ASM alert
logs to confirm.
• The rebalance operation may be currently running. Check the GV$ASM_OPERATION
view to determine if the rebalance operation is still running.
• The rebalance operation may have failed. Check the V$ASM_OPERATION.ERROR view
to determine if the rebalance operation has failed.
Rebalance operations from multiple disk groups can be performed on different Oracle ASM
instances in the same cluster if the physical disk being replaced contains ASM disks from
multiple disk groups. One Oracle ASM instance can run one rebalance operation at a time. If
all Oracle ASM instances are busy, the rebalance operations are queued.
good flash disks. It is better to remove the bad flash disk from
the system than to let it remain. To identify a bad flash disk, use
the CALIBRATE command and look for very low throughput
and IOPS for each flash disk.
If a flash disk exhibits extremely poor performance, it is marked
as poor performance. The flash cache on that flash disk will be le
automatically disabled, and the grid disks on that flash disk will f e rab
be automatically dropped from the Oracle ASM disk group. a n s
t r
n on-
s a
) a
h ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a r lice
sflash
E lM
After the bad disk is identified, perform the following steps:
s r y operation
• The rebalance
c e ns may be currently running. Check the GV$ASM_OPERATION
viewa
l M li if the rebalance operation is still running.
to determine
• EThe rebalance operation may have failed. Check the V$ASM_OPERATION.ERROR view
e
E l i to determine if the rebalance operation has failed.
Rebalance operations from multiple disk groups can be performed on different Oracle ASM
instances in the same cluster if the physical disk being replaced contains ASM disks from
multiple disk groups. One Oracle ASM instance can run one rebalance operation at a time. If
all Oracle ASM instances are busy, the rebalance operations are queued.
5. After performing the maintenance, start the cell. The cell services will start automatically.
6. Bring all grid disks online by using the following command:
CellCLI> ALTER GRIDDISK ALL ACTIVE
When the grid disks become active, Oracle ASM will automatically synchronize the gird
disks to bring them back into the disk group.
7. Verify that all grid disks have been successfully put online by using the following
command:
CellCLI> LIST GRIDDISK ATTRIBUTES name, asmmodestatus
r a ble
Wait until asmmodestatus is ONLINE for all grid disks. The following is an example
n s fe of
the output:
n - tra
DATA_CD_00_dm01cel01 ONLINE
a no
has ideฺ
DATA_CD_01_dm01cel01 SYNCING
)
om t Gu
DATA_CD_02_dm01cel01 OFFLINE
ฺ c
-ea uden
…<truncated>
DATA_CD_11_dm01cel01 OFFLINE
c i s t
@ s S
asmmodestatus=ONLINE. Before a srytaking
Oracle ASM synchronization is complete
e
only
t h when
another
all grid disks show
i Storage Server offline, Oracle ASM
l m
ฺe tono the
synchronization must complete
s
u restarted Oracle Exadata Storage Server. If
synchronization is(e l i e
r y e n seoutput:
not complete, the check performed on another Storage Server will fail. The
l M asCellCLI>
following is an example of the
lic list griddisk attributes name where
i e E
E l asmdeactivationoutcome != 'Yes'
DATA_CD_00_dm01cel02 "Cannot de-activate due to other offline
disks in the diskgroup"
DATA_CD_01_dm01cel02 "Cannot de-activate due to other offline
disks in the diskgroup“
DATA_CD_02_dm01cel02 "Cannot de-activate due to other offline
disks in the diskgroup"
DATA_CD_03_dm01cel02 "Cannot de-activate due to other offline
disks in the diskgroup"
DATA_CD_04_dm01cel02 "Cannot de-activate due to other offline
disks in the diskgroup"
DATA_CD_05_dm01cel02 "Cannot de-activate due to other offline
disks in the diskgroup"
ble
ns fera
t r a
no n-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm us
ฺ e
( e lie se to
a sry licen
El M
E l i e
a ns
Oracle Clusterware, and Oracle Automatic Storage Management (Oracle ASM) are up
on all the Database Servers.
n- t r
a. Log in as the oracle user.
a no
) has ideฺ
b. Set ORACLE_SID = +ASM1. The base for the ORACLE_HOME environment
variable must be set to the Grid Infrastructure home.
ฺ c om t Gu
-ea uden
c. List the available cluster interfaces by using the following command:
$ oifcfg iflist
c i s t
@ s S
sry eof tthe
The following is an example
a hi output:
eth0 10.204.78.0
ฺ e lm us
( e lie se to
bond1 10.204.76.0
n
y 192.168.120.0
asr lice
bond0
E lM
d. List the assigned interfaces by using the following command:
El i e $ oifcfg getif
The following is an example of the output:
bond1 10.204.76.0 global public
bond0 192.168.120.0 global cluster_interconnect
e. Delete the current global cluster_interconnect interface by using the
following command:
$ oifcfg delif -global assigned_interface
In the preceding command, assigned_interface is the interface to be deleted.
The following is an example of the command:
$ oifcfg delif -global bond0
Note: If there is an error, indicating that one or more cell services are not running,
restart the cell services by using the following command:
# cellcli -e alter cell restart services all
7. Change the InfiniBand IP addresses on each Database Server as follows:
a. Log in as the root user.
b. Change to the /etc/sysconfig/network-scripts directory.
c. Copy the ifcfg-bond0 file. The copied file name must not start with ifcfg.
b le
The following is an example of the copy command:
n s fera
# cp ifcfg-bond0 orig_ifcfg-bond0
t r a
-NETWORK,
d. Edit the ifcfg-bond0 file to update the IPADDR, NETMASK,
n o n and
BROADCAST fields. The following is an example of theaoriginal file and an
updated file:
) has ideฺ
Example of the original ifcfg-bond0 file:
ฺ c om t Gu
#### DO NOT REMOVE THESE LINES
i s - ea ####d e n
#### %GENERATED BY CELL%
@ c ####Stu
DEVICE=bond0
a sry e this
USERCTL=no lm
i e ฺ e to us
y ( el nse
BOOTPROTO=none
a sr lice
ONBOOT=yes
El M IPADDR=192.168.120.253
E l i e NETMASK=255.255.254.0
NETWORK=192.168.120.0
BROADCAST=192.168.121.255
BONDING_OPTS="mode=active-backup miimon=100 downdelay=5000
updelay=5000"
IPV6INIT=no
MTU=6552
Example of the updated ifcfg-bond0 file:
#### DO NOT REMOVE THESE LINES ####
#### %GENERATED BY CELL% ####
DEVICE=bond0
USERCTL=no
cell="192.168.121.12"
cell="192.168.121.13"
cell="192.168.121.14"
cell="192.168.121.15"
cell="192.168.121.16"
Example of the updated file:
cell="192.168.3.9"
b le
fera
cell="192.168.3.10"
cell="192.168.3.11"
a ns
cell="192.168.3.12"
n- t r
cell="192.168.3.13"
a no
has ideฺ
cell="192.168.3.14"
)
om t Gu
cell="192.168.3.15"
ฺ c
-eatheudcli
cell="192.168.3.16“
i
Note: If using SSH user-equivalence,s d enutility can be used to copy the
@ c Server
updated file from the firstyDatabase S t to the other Database Servers. The
s
thi command:
srof thee dcli
a
following is an example
lm -g /root/dbs_group
# dcli -lฺe
i e root
t o us -f /etc/oracle/cell/network-
l
(e nse
config/cellip.ora
r y
as # dcli
lic-le root -g /root/dbs_group "mv /root/cellip.ora \
lM /etc/oracle/cell/network-config/“
e E
Eli f. Change the InfiniBand IP addresses in the cellinit.ora file. The following is an
example of the original file, and an updated file:
Example of the original file:
ipaddress="192.168.120.253/23"
Example of the updated file:
ipaddress="192.168.3.253/22"
Update the cellinit.ora file on each Database Server. The contents of the file
are specific to the Database Server. The dcli utility cannot be used for this step.
c. Change the InfiniBand IP addresses for the Database Servers and Oracle Exadata
Storage Servers.
10. Update the cluster binaries to use the UDP protocol on each Database Server as
follows:
a. Log in as the root user.
b. Unlock the cluster binaries by using the following commands:
# /u01/app/11.2.0/grid/crs/install
ble
fera
# perl rootcrs.pl -unlock -crshome /u01/app/11.2.0/grid
c. Log in as the oracle user. s
t r
d. Set ORACLE_SID=+ASM1. The base for ORACLE_HOME must be set to the Grida n
Infrastructure home. no n-
a
) h as eฺ
e. Change to the /u01/app/11.2.0/grid/rdbms/lib directory.
f. Run the following command:
c o m Guid
$ make -f ins_rdbms.mk ipc_g aioracle
e ฺ n t
- e
11. Start Oracle Clusterware and Oracle
@
s
S tudCRS on each Database Server as
ciClusterware
follows:
a s ry this
a. Log in as the root
ฺ e lm user.use
( e e to CRS by using the following command:
lieClusterware
b. Start Oracle
s#ry
e ns
/u01/app/11.2.0/grid/bin/crsctl
c start crs
a l i
E lM
c. Start Oracle Clusterware by using the following command:
E lM
El i e
1. Set the disks in the ASM disk group that are on the
Exadata Storage Server to OFFLINE by using the following
command on every ASM node and instance that accesses
the cell:
ALTER DISKGROUP diskgroup_name OFFLINE DISK
asm_disk_name ble
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a r lice
sthe
E lM
You must do following when changing the fundamental configuration of a cell, such as
Elie
changing the IP address, host name, and InfiniBand address:
• Before changing the cell configuration, ensure that all ASM, Oracle RAC, and database
instances that use the cell will not access the cell while you are changing the IP
address.
• After changing the cell configuration, ensure that consumers of cell services have their
devices correctly reconfigured to use the new connect information of the cell.
• When changing a cell configuration, change only one cell at a time to ensure that ASM
and Oracle RAC work properly during the changes.
Elie
by using the CellCLI utility after you verify your changes.
To verify changes, use the ipconf utility and review the values that are displayed.
You can also examine the /opt/oracle.cellos/ipconf.pl.log log.
The ipconf utility makes a backup of the files it modifies. When you rerun the utility, it
overwrites the existing backup file if you modify values. The log file maintains the complete
history of every ipconf operation you perform.
a. Start CRS.
b. Start ASM.
c. Start the database.
Repeat these startup steps for the next database node.
8. Set the disks in the ASM disk group to ONLINE by using
the following command:
r a ble
CellCLI> ALTER DISKGROUP disk_group_name n s fe
ONLINE DISK asm_disk_name n - tra
o n
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
Database Servers
Oracle Exadata Storage Servers and Database Servers are
powered on by either pressing the power button at the front of
the machine, or by logging in to the ILOM interface and
powering up the system.
When a Database Server is powered on and the operating
r a ble
system boots, Oracle Clusterware is automatically started ifsitfeis
r
installed. Oracle Clusterware then starts all resources-that
t anare
configured to start automatically. n on
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a srsequence
l i ceis as follows:
lM
The power-on
E
e
Eli 1. Rack, including switches
Ensure that the switches have had power applied for a few minutes to complete
power-on configuration before starting Oracle Exadata Storage Servers.
2. Oracle Exadata Storage Servers
Ensure that all Oracle Exadata Storage Servers complete the boot process before
starting the Database Servers.
3. Database Servers
Powering On Servers Using ILOM
Servers can be powered on by using Integrated Lights Out Manager (ILOM). ILOM can be
accessed by using the web console, the command-line interface (CLI), IPMI, or SNMP.
For example, to apply power to the dm01cel01 server by using IPMI, where dm01cel01-
ilom is the host name of the ILOM for the server to be powered on, run the following
command from a server that has ipmitool installed:
# ipmitool -H dm01cel01-ilom -U root chassis power on
ble
ns fera
t r a
no n-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm us
ฺ e
( e lie se to
a sry licen
El M
E l i e
ble
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sris the lrecommended
i ce
E lM
The following shutdown procedure for Database Servers:
ble
ns fera
t r a
no n-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm us
ฺ e
( e lie se to
a sry licen
El M
E l i e
Elie
To perform an emergency power-off procedure for the Database Machine, turn off power at
the circuit breaker or pull the emergency power-off switch in the computer room. After the
emergency, contact Oracle Support Services to restore power to the machine.
Emergency Power-Off Switch
Emergency power-off (EPO) switches are required when computer equipment contains
batteries that are capable of supplying more than 750 volt-amperes for more than five
minutes. Systems that have these batteries include internal EPO hardware for connection to a
site EPO switch or relay. Use of the EPO switch will remove power from the Database
Machine.
Cautions and Warnings
The following cautions and warnings apply to the Database Machine:
• Do not touch the parts of this product that use high-voltage power. Touching them might
result in serious injury.
• Do not power off the Database Machine unless there is an emergency. In the case of an
emergency, follow the emergency power-off procedure.
b le
ns fera
t r a
no n-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm us
ฺ e
( e lie se to
a sry licen
El M
E l i e
ฺ c om t Gu
-ea uden
b. Install the ASR package as the root user by using the following command:
c i s t
# pkgadd -d SUNWswasr.version_number.pkg
c. Add the asr commandrto @
y the PATH s S
i variable as follows:
a s e t h
ฺ e lm us
# PATH=$PATH:/opt/SUNWawasr/bin/asr
# EXPORT
( e liePATH e to
s r y cens
l M a li
E
Elie
ASR will validate the login. After the login credentials are
validated, the registration is complete.
Your SOA email address receives output from ASR
reports, notifications of ASR problems, and service request
(SR) generation.
Note: Passwords are not stored.
r a ble
5. Check the registration status by using the following nsfe
a
command: n-tr no
# asr show_reg_status a
The following is an example of the output: ) has ideฺ
# registered ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
Database Servers.
c. Shut down the Oracle stack, including the Grid
Infrastructure, Oracle Automatic Storage Management
(Oracle ASM), and Oracle Database.
d. Log in as the root user on each Database Server.
e. Unzip the db_11.2.1.3.1_patch.zip file to the
db_patch_11.2.1.3.1 directory. r a ble
f. Change to the db_patch_11.2.1.3.1 directory. ansf
e
g. Run the install.sh script. n - tr
o n
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
2. a. (continued)
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
command:
/opt/oracle.cellos/compmon/exadata_mon_hw_asr.p
l -get_snmp_subscribers \
-type asr
The following is an example of the output:
(host=dm01db04,port=162,community=public,type=a
le
rab
sr)
Note: To use the dcli utility to verify the trap destinations s f e
r a n
for Database Servers, use the following command:-t
n on
dcli -g dbs_group -l root
"/opt/oracle.cellos/compmon/\ as
a
) h ideฺ
ฺ c om t Gu
exadata_mon_hw_asr.pl -get_snmp_subscribers -
type asr“
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
3. b. (continued)
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
Note: To use the dcli utility to add the trap destinations for
Exadata cells:
dcli -g cell_group -l celladmin "cellcli -e alter
cell \
snmpSubscriber = \(\(host=hostname, port=162,
community=public, \type=ASR\)\)“
In the preceding command, cell_group is the file that
r a ble
contains a list of all Exadata cells in the Database Machine.
n s fe
c. Verify the trap destinations for Exadata cells by using
n - trathe
following command: no a
CellCLI> LIST CELL ATTRIBUTES snmpSubscriber a s
h ideฺ
)
The following is an example of the
ฺ c omoutput: t G u
i s -ea uden
((host=dm01db04,port=162,community=public,type=ASR))
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
3. c. (continued)
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
command:
asr list_asset -i asset_IP
In this command, asset_IP is the IP address of the Database
Server host or ILOM, or the Exadata cell host or ILOM. To list all
assets, use the following command:
asr list_asset
ble
The following is an example of the output:
ns fera
P_ADDRESS HOST_NAME SERIAL_NUMBER ASR PRODUCT_NAME
t r a
----------------------------------------------------------------
non-
10.204.79.22 dm01cel07 0123ABC021 Enabled SUN FIRE X4275 SERVER
a
has ideฺ
10.204.79.33 dm01cel07-c 0234ABC021 Enabled SUN FIRE X4275
SERVER )
c om t Gu
10.204.79.26 dm01db04-c 0345ABC51E6 Enabled SUN FIRE X4170
ฺ
-ea uden
SERVER
c i s t
10.204.79.15 dm01db04 0456ABC1E6 Enabled SUN FIRE X4170 SERVER
@ s S
a sry e thi
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e
Copyright
el nse t o
y (
a sr lice
El M
E l i e
following command:
# asr report
The following is an example of the output:
Successfully submitted request for activation
status report. Activation status report will be
sent to email address associated with Sun Online
Account:netadmin@example.com The report is an e- able
mail message that lists all activated assets. sfe
r
t r a n
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a srthe Orion
l i ceGCS Call Center Worker privilege in APS.
lM
1. Request
E
El i e - "Access & Mailing List Request"
- What: Oracle Applications
- How: Add Privileges
- Next Step
- Select One Account: "Orion“
- Next Step
- Available Privileges: "Orion GCS Call Center Worker“
- Move
- Next Step
- Justification: "Needed to support the ACS ASR installation process”
2. Wait for approval; follow instructions if request is rejected.
3. Log in to the Internal Support Portal (ISP): https://support.us.oracle.com
- Click the "Customer" tab.
- Click the "Asset" menu.
- Click "Menu" on the right and select "Create View."
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
b le
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr licesection of the MOS article “Database Machine and Exadata Storage
See the
E lMOneCommand
Eli e
Server 11g Release 2 (11.2) Supported Versions (Doc ID 888828.1).”
Optional Activity
Downloading and using the Oracle Exadata Deployment Assistant:
1. Download the Oracle Exadata Deployment Assistant (also known as OneCommand)
and unpack it to a local UNIX filesystem.
2. Run the onecmd.sh script located in the <PatchNumber>/onecmd/Exaconf directory.
3. Fill out all the details using the lab diagram as a guide and generate the configuration
files.
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
Utility Scripts and Tools
t r a
celllist.txt:
no n-
cellserver1 a
cellserver2 ) has ideฺ
iv) Set up the SSH for the root user to the cells:
ฺ c om t Gu
dcli -l root -k -g celllist.txt
i s -ea uden
@ c
celladmin@cellserver1’s password: ****** S t
sry e thi s
a
celladmin@cellserver2’s password: ******
lm
cellserver1: sshi e ฺ
keye added
t o us
( l se
essh
cellserver2:
y keyn added
s r li
a be prompted c e
The user
l M might to acknowledge cell authenticity, and might be prompted for the
cell E
root password.
Elie
After this acknowledgment, you can execute commands on the same cells without being
prompted for a password for that user from the host.
Example
$ dcli -g celllist cellcli -e list cell
cellserver1: cellserver1 online
cellserver2: cellserver2 online
( e lie se to
/tmp/sosreport-dwinter.10132345-817953-683b39.tar.bz2
The md5sumry
a s l i c en
is: 5a249a63e062f723cfc8b23fc6683b39
Please
E lMsend this file to your support representative.
ElAtiethis point, the packaged zip file is in /tmp and ready for shipment to Oracle Support.
[-] is optional.
N - Number of lines per page
MegaCli64 -v
MegaCli64 -help|-h|?
...
Gather information le
b
Controller information
ns fera
MegaCli64 -AdpAllInfo -aALL
t r a
MegaCli64 -CfgDsply -aALL
no n-
a
has ideฺ
MegaCli64 -AdpEventLog -GetEvents -f events.log -aALL && cat
events.log
)
Enclosure information
ฺ c om t Gu
MegaCli64 -EncInfo –aALL -ea e n
c i s tu d
Virtual drive information @
y t–aALL s S
MegaCli64 -LDInfo a sr-Lall
e hi
MegaCli64ฺe lm us–aALL
–LDPDInfo
(
(showse to VD/LD and physical disk relationship)
ie ebetween
lMapping
s r y cens
l M a
Physical drive li
information
l M as lic
i e E
E l
Rebuild drive:
MegaCli64 -PDRbld -Start -PhysDrv [E:S] -aN
MegaCli64 -PDRbld -Stop -PhysDrv [E:S] -aN
MegaCli64 -PDRbld -ShowProg -PhysDrv [E:S] –An
Clear drive:
MegaCli64 -PDClear -Start -PhysDrv [E:S] -aN
MegaCli64 -PDClear -Stop -PhysDrv [E:S] -aN
b le
fera
MegaCli64 -PDClear -ShowProg -PhysDrv [E:S] –aN
Bad to good:
a ns
MegaCli64 -PDMakeGood -PhysDrv[E:S] –aN n- t r
Changes drive in state Unconfigured-Bad to Unconfigured-Good a no
Hot Spare Management ) has ideฺ
ฺ c om t Gu
-ea [E:S]
Set global hot spare: n–aN
i s d e
c
MegaCli64 -PDHSP -Set -PhysDrv
@ S tu
Remove hot spare:
s ry thi s
MegaCli64 -PDHSP a
lm -Rmv se-PhysDrv [E:S] –aN
ฺ e u
( e lie se to
Set dedicated hot spare:
1. cp /var/log/messages* .
2. /bin/dmesg > `hostname -a`_$mdate_dmesg.out
3. /sbin/lspci > `hostname -a`_$mdate_lspci.out
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ
c i s
for (i=1; i<=counter; i+=1) printf ( "Slot %02d Device %02d
t
@ s S
sry e thi
(%s)
a
lm us
status is: %s <br/>\n", slot[i], device[i], name_drive[i],
ฺ e
( e lie se to
state_drive[i]); }
7.
sry licen
/opt/MegaRAID/MegaCli/MegaCli64 -AdpEventLog -GetEvents
a
l M
-f /tmp/logfile -aALL | cp /tmp/logfile ./`hostname -
E
E l i ea`_$mdate_megacli64-GetEvents-all.out
8. /opt/MegaRAID/MegaCli/MegaCli64 -LdPdInfo -aALL >`hostname
-a`_$mdate_megacli64-LdPdInfo.out
9. /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL >`hostname -
a`_$mdate_megacli64-PdList_long.out
10. /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -LALL -aALL
>`hostname -a`_$mdate_megacli64-LdInfo.out
11. cellcli -e list cell detail > `hostname -a`_$mdate_cell-
detail.out
12. cellcli -e list celldisk > `hostname -
a`_$mdate_celldisk.out
13. cellcli -e list lun detail > `hostname -
a`_$mdate_lundetail.out
14. cellcli -e list physicaldisk detail > `hostname -
a`_$mdate_physicaldisk-detail.out
15. cellcli -e list physicaldisk where status!=normal
>`hostname -a`_$mdate_physicaldisk-fail.out
( e lie se to
[ "$NUM_FAILED" -ne 0 ] && STATUS=1
done
a sry licen
El M
exit $STATUS
hostname
for a in $CONT
do
NAME=`$MEGACLI -AdpAllInfo -$a |grep "Product Name" | cut -d: -
f2`
echo "Controller $a: $NAME"
noonline=`$MEGACLI PDList -$a | grep Online | wc -l` ble
echo "No of Physical disks online : $noonline"
ns fera
t r a
DEGRADED=`$MEGACLI -AdpAllInfo -a0 |grep "Degrade"`
echo $DEGRADED no n-
a
NUM_DEGRADED=`echo $DEGRADED |cut -d" " -f3`
) has ideฺ
[ "$NUM_DEGRADED" -ne 0 ] && STATUS=1
ฺ c om t Gu
i s -ea uden
FAILED=`$MEGACLI -AdpAllInfo -a0 |grep "Failed Disks"`
echo $FAILED
@ c S t
sry e thi s
NUM_FAILED=`echo $FAILED |cut -d" " -f4`
a
lm us
ฺ e
[ "$NUM_FAILED" -ne 0 ] && STATUS=1
done ( e lie se to
a sry licen
exit $STATUS
El M# /opt/MegaRAID/MegaCli/testraid.sh
E l i e Checking RAID status on trnadb03.sodm.com
Controller a0: LSI MegaRAID SAS 9261-8i
No of Physical disks online : 3
Degraded : 0
Failed Disks : 0
GetSCConf.scl"
The preceding command will execute a list of cellcli commands specified in the
GetSCConf.scl script on Exadata Storage Servers cel01, cel02, cel03, and cel04.
GetSCConf.scl
A preview of the GetSCConf.scl script:
set echo on
list cell detail
ble
list lun detail
list physicaldisk detail ns fera
t r a
list celldisk detail
no n-
list griddisk detail a
) has ideฺ
You can keep the host names of the Exadata Storage Servers in a text file and use it with
dcli:
ฺ c om t Gu
s -ea uden
$dcli -g cells.txt "cellcli -e start GetSCConf.scl"
i
where cells.txt is: @ c S t
sry e thi s
• cel01 a
lm us
ฺ e
• cel02
( e lie se to
• cel03
a s ry licen
• cel04
E lM
Eli e
Getting the Information
To obtain the data, execute the GetSCConf.sh script:
#!/bin/sh
$dcli -g cells.txt "cellcli -e start GetSCConf.scl"
$dcli -g cells.txt ifconfig
1. Log in to one of the Exadata Storage Servers as the celladmin user.
2. Create the GetSCConf.sh script and the cells.txt file.
3. Modify the content of the cells.txt file, replacing the correct names of the Exadata
Storage Servers (host names).
4. Execute: $sh GetSCConf.sh > GetSCConf.out.
5. The GetSCConf.out file can be downloaded for further analysis.
• echo "####################"
• echo "Getting cellip.ora"
• echo "####################"
• dcli -l usupport -g dbsrv.txt cat
• /etc/oracle/cell/network-config/cellip.ora
• echo "#####################"
• echo "Getting cellinit.ora"
ble
• echo "#####################"
ns fera
• dcli -l usupport -g dbsrv.txt cat t r a
• /etc/oracle/cell/network-config/cellinit.ora no n-
a
has ideฺ
• File dbsrv.txt: Contains the host names for the Database Servers
)
• db01
ฺ c om t Gu
• Db02
i s -ea uden
c t
To execute GetDBConf.sh, perform the following steps:
@ S
sry e thi s
1. Log in to one of the Exadata Storage Servers as the celladmin user.
a
lm us
2. Create the GetDBConf.sh script and the dbsrv.txt file.
ฺ e
e lie se to
3. Modify content of the dbsrv.txt file, replacing the correct names of the Database
(
sry licen
Servers (host names).
a
l M
4. Execute $sh GetDBConf.sh > GetDBConf.out.
E
E l i e The GetDBConf.out file can be downloaded for further analysis.
• The second option uses ILOM to monitor the F20 card. ILOM tracks the ESM lifespan
and notifies you when to replace the ESM. You must use this monitoring option for F20
cards with part numbers 511-1500-05 or greater, and with ILOM system firmware
version 7.2.7.d or later.
Sun Flash Accelerator F20 ESM Monitoring Utility
The Sun Flash Accelerator F20 ESM Monitoring Utility is a simple tool that you install on your
host server to track the life of the ESM. After it is installed, the ESM Monitoring Utility runs
weekly to track the age of your ESM. The utility sends messages to the console and the
b le
/var/adm/messages file as the ESM approaches or exceeds the two-year replacement
ns fera
interval. Optionally, you can use an external monitoring tool to configure an SNMP trap that
t r a
sends an email alert when these messages appear. n-no
a
The utility can be run manually any time to display the current ESM replacement data on all
installed cards.
) has ideฺ
Note: Installation of this utility is required on cards with
ฺ c omparttnumber
G u 511-1500-01 to maintain
optimal performance for the life of the card. This
- e aoption ewillnnot work on cards with part
numbers 511-1500-05 or higher.
@ c i s
S tud
sry eintthe
To install the utility, follow the directions
a hisread me file.
Purpose
ฺ e lm us
e
The Sun Flash Acceleratorlie sF20 e o Monitoring Utility checks the onboard Energy Storage
tESM
(
Module (ESM)
a srydaily,liand
c enindicates when to replace the ESM as it reaches its life-time
lM Replacement of the ESM ensures continued optimal performance while accessing
threshold.
E
Eli
thee FMods on the Sun Flash Accelerator F20 PCIe Card.
i s -ea uden
2. The card does not have the latest LSI HBA firmware with the SSID fix. You can
c t
find the download instruction and firmware at the following location:
@ S
/net/elis-ha2-
sry e thi s
a
nfs.east/export/ds02/d134/mongo/aura/teams/fw/LSI/1068E/phase15-1.27.3-bios-
lm us
ฺ e
SSID
( e lie se to
a sry licen
El M
E l i e
For later-generation F20 cards (part number 511-1500-05 or greater), the ESM lifespan is
automatically monitored by the ILOM system management firmware (system firmware version
7.2.7.d or greater) installed on your host.
ILOM monitors ESMs by recording the Total_Time_On for each installed F20 card, and then
issues warning messages (to the event log and to the host Solaris syslog) as an ESM
approaches the end of its two-year lifespan.
For example, one week before an ESM reaches its two-year threshold, ILOM issues this
warning message: b le
"/SYS/MB/RISER1/PCI4/F20CARD ESM is approaching its lifespan. Please schedule a
ns fera
replacement as soon as possible."
t r a
n-
no
When an ESM reaches its two-year threshold, ILOM issues this critical event message:
a
has ideฺ
"/SYS/MB/RISER1/PCI4/F20CARD ESM has exceeded its lifespan. Please schedule a
replacement as soon as possible." )
Note: You can configure ILOM to send these alerts ฺ c om
by t
email G
or
u trap. See your ILOM
SNMP
documentation for more information. i s -ea uden
@ c in theSSun
t Flash Accelerator F20 PCIe User’s
Service the ESM (F371-4650) as described
sry e thi s
Guide (820-7265). a
lmESM, uuses
After you have replaced i e ฺ e
your t o ILOM’s standard fault clearing methods to remove the
( e l e
fault warnings; y
s r c e ns see
this also resets the F20 card Total_Time_On counter to 0. For more
information
l M aabout li
using ILOM, http://docs.sun.com/app/docs/coll/ilom3.0?l=en.
i e E
E l