Professional Documents
Culture Documents
00 KB)
9/22/2016 9:16 AM
Last saved by Frank Marchant
.2
upndu050_R011
Path A - If there are FLARE ATA LUNs that span more than one enclosure, there are
special instructions which follow in this section.
Path A Procedure:
If there are ATA LUNs that span more than one enclosure, there are special instructions which follow in
this section for Path A.
Reason: When a LUN spans more than one enclosure, it is subject to having one of its disks marked
for rebuild if one of the chassis experiences a power-fail or is temporarily out of service due to a glitch
during NDU. The ATA chassis being updated to FLARE code that includes FRUMON code 1.53 is
subject to this type of Glitch 1-2% of the time. If a chassis has problems during/immediately following
an NDU update, refer to EMC Knowledgebase article emc88535.
Procedure when there are ATAs LUNs that span more than one chassis:
a. Identify which ATA RAID groups span more than 1 chassis.
b. Identify one LUN on one of those RAID groups.
c. Check the sniff rate on that LUN. It will generally be indicative of all the LUNs on the array.
If this array ever ran Release 11 FLARE, then the default sniff rate of 30 is most likely still
set. This means it takes longer to sniff an entire disk. This will determine how far back
you need to look for ATA disk errors in the next step. Use navicli to determine the sniff
rate on the LUN as follows:
navicli h <SP_ip_address> getsniffer <LUN#>
or
naviseccli h <SP_ip_address> getsniffer <LUN#>
The output will include the latest sniffer results for that LUN and will give you the sniffer
settings for that LUN at the top of the report.
VERIFY RESULTS FOR UNIT 1
Sniffing state:ENABLED
Sniffing rate(100 ms/IO):5
Background verify priority:ASAP
Historical Total of all Non-Volatile Recovery Verifies(0
passes
d. Search the SP event logs for the following types of error on any ATA drives:
Data Sector Reconstructed
Stripe Reconstructed
Sector Reconstructed
Done
0x683
0x687
0x689
Check the past 30 days if the sniffing rate from above = 5 ( Default for Release12 and higher )
Check the past 60 days if the sniffing rate from above = 6-20
1
or
naviseccli h <SP_IP_address> setsniffer <lun_number> 1 bv bvtime ASAP
where SP_IP_address specifies the IP address or network name of the SP that owns
the target LUN and lun_number specifies the logical unit number of the LUN
NOTE: Using the command line option cr when starting a background verify will
create a new sniffer (or verify) report and reset all counters to 0.
The progress of the background verify can be monitored by entering the following
command:
navicli h <SP_IP_address> getsniffer <lun_number>
or
naviseccli h <SP_IP_address> getsniffer <lun_number>
NOTE: You cannot check information from the non-owning SP. The above command
Path B Procedure:
When cache status has changed from disabling to disabled, zero the memory assigned
to write cache in Array Properties Memory tab. The memory cannot be changed to 0mb
until cache has disabled.
Ensure that the write cache has been completely disabled. Confirm that the cache is
disabled and zeroed by using Navisphere CLI to check cache status. Use the CLI
command:
navicli h < SP_IP_address > getcache
or
naviseccli h < SP_IP_address > getcache
Ensure that all caching statuses from the CLI state are DISABLED and that all write
cache sizes are 0MB before continuing.
c. If the array is not a CDL (CLARiiON Disk Library), unbind all ATA hot spares (the RAID
group that held the hot spare can remain).
d. Confirm that there are no ATA drives with a current Status of stuck in power-up or slot
empty when a drive is installed. If there are, remove them by backing them out of their
fully inserted position, leaving them in the enclosure slot to be reinserted later.
e. Return to the previous module that brought you here, to complete the NDU procedure.
Return here after the NDU to determine if ATA FRUMON code has been updated
successfully prior to re-enabling write cache and allowing I/O.
.4
WARNING: Write cache must not be re-enabled, hot spares must not be rebound, and host I/O
must not be allowed until the FRUMON code update has been confirmed.
a. Run the following Navisphere CLI commands to confirm that the BCCs now contain the
new FRUMON code:
navicli h <SPA IP address> getcrus lccreva -lccrevb
or
naviseccli h <SPA IP address> getcrus lccreva -lccrevb
navicli h <SPB IP address> getcrus lccreva -lccrevb
or
naviseccli h <SPB IP address> getcrus lccreva -lccrevb
This will report status of all LCCs and BCCs including the FRUMON revision listed as
Revision. ATA enclosures going to this new version should be at Revision 1.53.
b. Confirm that all ATA LCCs have been updated to the new FRUMON code before
continuing. Below is an example of 1 chassis. A navicli getcrus command will
display all the DAE2 chassis, FC and ATA.
DAE2-ATA Bus 1 Enclosure 1
Bus 1 Enclosure 1 Fan A State: Present
Bus 1 Enclosure 1 Fan B State: Present
1
1
1
1
1
1
1
1
Enclosure
Enclosure
Enclosure
Enclosure
Enclosure
Enclosure
Enclosure
Enclosure
1
1
1
1
1
1
1
1
Power
Power
LCC A
LCC B
LCC A
LCC B
LCC A
LCC B
A State: Present
B State: Present
State: Present
State: Present
Revision: 1.53
Revision: 1.53
Serial #: SCN00041900684
Serial #: SCN00042000067
When the SP event logs report completion and you have confirmed that the LCCs have
the new FRUMON code a seen above, you must also observe all LCCs and ATA drives for
fault LEDs before considering this update complete.
d. If drives were pulled because they were in a state of stuck in power up or slot empty,
reinsert them now.
e. Rebind any hot spares which were unbound for this procedure.
f.
Reset and re-enable write cache to its original settings if you previously disabled it.
g. You can now return again, to the procedure in the previous module that instructed you to
come to this ATA module.
See the event log examples below.
The following figure shows the completed upgrade on one ATA chassis. There will be an entry for each
chassis (ATA and FC):
The next figure is from an SPA event log that clearly shows that LCC Firmware Upgrades are
complete on all.
FLARE will read the revision of the FRUMON code of each (DAE-2 and ATA chassis)
FRUMON image file.
b.
FLARE will initiate a request to do an upgrade of all LCCs that are lower revision.
c.
FLARE will pause 8 minutes on SPB and 10 minutes on SPA to allow all parts of the NDU
to complete.
d.
Starting with the highest numbered enclosure on the highest numbered bus, FLARE will
check the revision of FRUMON code on each LCC.
e.
When an LCC is found with a revision lower than that included with the NDU package,
FLARE will begin downloading the code to the LCC. This requires about 2 - 3 minutes to
a DAE-2 (depending on distance from SP to LCC), and approximately a minute to an ATA
DAE.
f.
FLARE will issue the update command, wait for the LCC to reboot and come back online.
This requires about one minute. The LCC is off line 9 to 15 seconds.
g.
The process will pause approximately 85 seconds between upgrading ATA chassis and
40 seconds between DAE-2 chassis to allow I/O operations to catch up.
h.
Timing Example:
Example 2 - ATA chassis is chassis 6 on Bus 0 and there are 5 DAE-2 chassis on Bus 1
10 minutes - to begin
18 minutes - 3 minutes to do each of the 5 chassis on Bus 1 with 40 seconds between
chassis updates
2 minutes - approximately 2 minutes to start and do the ATA chassis as it is the highest
chassis on Bus 0
30 minutes after the NDU completes, this ATA chassis could resume I/O.
Example 3 - ATA chassis is chassis 1 on Bus 0 and there are 240 drives total (8 chassis on
each backend bus)
10 minutes - to begin
29 minutes - 3 minutes to do each of the 8 DAE-2 chassis on Bus 1 with 40 seconds
between
21 minutes to do the 5 DAE-2 chassis on Bus 0 with 40 seconds between each
1 minute to complete the update of the ATA chassis
61 minutes after the NDU completes, this ATA could resume I/O.
Table of Contents