You are on page 1of 6

How to Diagnose and Replace a Faulty FMOD on Exadata V2/X2-

2/X2-8 and SPARC SuperCluster T4-4 [ID 1381209.1]


Applies to:

Exadata Database Machine X2-8 - Version All Versions and later


Exadata Database Machine X2-2 Qtr Rack - Version All Versions and later
SPARC SuperCluster T4-4 - Version All Versions and later
Exadata Database Machine X2-2 Hardware - Version All Versions and later
SPARC SuperCluster T4-4 Half Rack - Version All Versions and later
Information in this document applies to any platform.

Goal

Service a faulty FMOD on Exadata V2, Exadata X2, and SPARC SuperCluster T4-4 systems

Fix

DISPATCH INSTRUCTIONS:
- WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED:
- TIME ESTIMATE: 90 Minutes
- TASK COMPLEXITY: 3

FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS:


- PROBLEM OVERVIEW:
Service a faulty FMOD on Exadata.

- WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE


RESOLUTION ACTIVITY?:
The Storage Cell that contains the faulty FMOD needs to be powered off. It may also require
patch update prior to powering off.

Important Part Number NOTE: FMOD 7061269 with D21Y firmware will substitute and
replace both 371-5014 D20Y and 371-4415 D20R FMOD FRU's.

If 371-4415 or 371-5014 is no longer available locally, then the customer must install patch
14793859 which installs D21Y firmware into the image, prior to replacing the FMOD.

- WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE:

Main Steps:

A. If necessary, review these documents:


# 1285796.1 "Aura (F20) Hardware and Software Troubleshooting Document"
and/or
# 820-7265 "Sun Flash Accelerator F20 PCIe Card Users Guide"
B. Observe the LEDs on the Flash F20 Card and Determine which
FMOD is Showing a Fault.

C. Identify and Shutdown the Storage Cell - that contains the faulty FMOD

Refer to Document # 1188080.1

Applies to:
Oracle Exadata Storage Server Software - Version: 11.2.1.2.0 to 11.2.2.2.0 - Release: 11.2
to 11.2 Linux x86-64

Goal:
To provide steps to power down or reboot an Exadata Storage cell without affecting ASM.

Solution:

Steps to power down or reboot a cell without affecting ASM:

When performing maintenance on Exadata Cells, it may be necessary to power down or


reboot the cell. If a storage server is to be shut down when one or more databases are running,
then verify that taking thestorage server offline will not impact Oracle ASM disk group and
database availability. The ability to take Oracle Exadata Storage Server offline without
affecting database availability depends on the level of Oracle ASM redundancy used on the
affected disk groups, and the current status of disks in other Oracle Exadata Storage Servers
that have mirror copies of data as Oracle Exadata Storage Server to be taken offline.

1) By default, ASM drops a disk shortly after it is taken offline; however, you can set the
DISK_REPAIR_TIME attribute to prevent this operation by specifying a time interval to
repair the disk and bring it back online. The default DISK_REPAIR_TIME attribute value of
3.6h should be adequate for most environments.

(a) To check repair times for all mounted disk groups - log into the ASM instance and
perform the following query:

SQL> select dg.name,a.value from v$asm_diskgroup dg, v$asm_attribute a where


dg.group_number=a.group_number and a.name='disk_repair_time';

(b) If you need to offline the ASM disks for more than the default time of 3.6 hours then
adjust the parameter
by issuing the command below as an example:

SQL> ALTER DISKGROUP DATA SET ATTRIBUTE 'DISK_REPAIR_TIME'='8.5H';

2) Next you will need to check if ASM will be OK if the grid disks go OFFLINE.
The following command should return 'Yes' for the grid disks being listed:

cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome


3) If one or more disks return asmdeactivationoutcome='No', you should wait for some time
and repeat step #2. Once all disks return return asmdeactivationoutcome='Yes', you can
proceed with taking the griddisk offline in step #4.

Note: Taking the storage server offline when one or more grid disks return
asmdeactivationoutcome='No' will cause
Oracle ASM to dismount the affected disk group, causing the databases to shut down
abruptly.

4) Run cellcli command to Inactivate all grid disks on the cell you wish to power
down/reboot.

CellCLI> ALTER GRIDDISK ALL INACTIVE

* Please note - This action could take 10 minutes or longer depending on activity. It is very
important to make sure you were able to offline all the disks successfully before shutting
down the cell services. Inactivating the grid disks will automatically OFFLINE the disks in
the ASM instance.

5) Confirm that the griddisks are now offline by performing the following actions:

(a) Execute the command below and the output should show asmmodestatus=UNUSED and
asmdeactivationoutcome=Yes for all griddisks once the disks are offline in ASM. Only then
is it safe to proceed with shutting down or restarting the cell:

cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome

( there has also been a reported case of asmmodestatus= OFFLINE: Means Oracle ASM has
takenthis grid disk offline. This status is also fine and can proceed with remaining
instructions)

(b) List the griddisk to confirm that all show offline:

CellCLI> LIST GRIDDISK

6) If the system has not already installed Patch 14793859 during troubleshooting with D21Y
and the FRU received is 7061269 then the Patch MUST be installed prior to installing the
replacement FMOD. To identify if the patch is already installed, the firmware for the
FMOD's in the system will report as D21Y. If it is not patched, it will report as D20R or
D20Y with the following command in column 3 of the output:

# cellcli -e list physicaldisk attributes name, id, physicalFirmware where diskType =


'FlashDisk'

Follow the steps in Note 1504776.1 for how to apply the patch. The patch will also update
ILOM/BIOS on Storage Servers in X2-2/X2-8 racks and will enforce a reboot or two. It may
take 15 minutes to complete. Failure to install the patch will result in the firmware on the
FRU to be down-graded to D20R or D20Y depending on the current image installed. This
may cause the replacement FRU to fail, or the original problem to return.

7) You can now reboot the cell. Oracle Exadata Storage Servers are powered off and rebooted
using the Linux shutdown command.

(a) The following command will shut down Oracle Exadata Storage Server immediately: (as
root):

#shutdown -h -y now

(When powering off Oracle Exadata Storage Servers, all storage services are automatically
stopped.)

(b) The following command will reboot Oracle Exadata Storage Server immediately:

# shutdown -r -y now

(end of Main Step C)

==================================================================

Continuation of Main Steps:

Note, the slot numbers for the PCI cards - as well as the locations of
the FMODs are clearly shown on the Service Label - on the chassis cover.

D. Remove the PCI Riser - that contains the associated Flash F20 Card.

To Remove PCIe Riser:

1. Prepare the server for service.

a. Power off the server and disconnect the power cord (or cords) from the power supply (or
supplies).

b. Extend the server to the maintenance position.

c. Attach an antistatic wrist strap.

d. Remove the top cover.

2. If you are servicing a PCIe card, locate its position in the system.

3. Disconnect any data cables connected to the cards on the PCIe riser being removed.

Label the cables to ensure proper connection later.


4. Remove the back panel PCI crossbar.

a. Loosen the two captive Phillips screws on the end of the PCI crossbar.

b. Lift the PCI crossbar up and back to remove it from the chassis.

5. Remove the PCIe riser from the system.

a. Loosen the captive screw holding the riser to the motherboard.

b. Lift up the riser and any PCIe cards that are attached to it as a unit.

The server can have up to three PCIe risers. Each riser can support two PCIe cards.

E. Remove the Flash F20 Card - that contains the Faulty FMOD(s)

The PCI numbers and FMOD locations for F20 Version 1.0 are plainly seen on
the side of the ESM plastic housing (once the PCI card is removed from the system).

For F20 Version 1.1, the FMOD labels are a bit trickier to see. Once the
PCI card is removed from the system, look on the ESM plastic housing.
They are located on a small white label.

Remove the PCIe card:

1. Locate the PCIe card that you want to remove, and note its corresponding riser board.

2. If necessary, make a note of where the PCIe cards are installed.

3. Unplug all data cables from the PCIe card.

To disconnect the cables from the PCIe card, press the latch, push in toward the connector,
and then pull out the cable to remove it.

Note the location of all cables for reinstallation later.

4. Remove the PCIe riser.

5. Carefully remove the PCIe card from the riser board connector.

6. Place the PCIe card on an antistatic mat.

F. Remove the Faulty FMOD(s) and Install Replacements from Stock.

1. Carefully remove the faulty FMOD

2. Install a replacement in the same slot.


G. Re-install the Components (that were removed).

1. Reinstall the PCIe card into the riser.

2. Reinstall the PCIe riser back into the system

3. Ensure that all cables are re-connected.

4. Put the cover back on the system

H. Poweron and Bring Up the Storage Cell

I. If the flashcache is dropped prior to the FMOD replacement (ie, if the following is run
CellCLI> drop celldisk all flashdisk force), then the following steps will need to be run in
order to recreate the flashcache:

CellCLI> create celldisk all flashdisk


CellCLI> create flashlog all
CellCLI> create flashcache all

Note: If you are running an image version prior to 11.2.2.4, DO NOT run the 'create flashlog
all' operation as this feature was introduced in the 11.2.2.4 release.

OBTAIN CUSTOMER ACCEPTANCE


- WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO
TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:

- Verify that HW Components (associated with that Storage Cell)


are functioning properly.

PARTS NOTE:

7061269 24GB Solid State Flash Memory Module, FW D21Y

371-5014 24GB Solid State Flash Memory Module, FW D20Y

371-4415 24GB Solid State Flash Memory Module, FW D20R

You might also like