If Software Update Is Offline Due To ATA Chassis: Path A

330584543.doc (184.
00 KB)
9/22/2016 9:16 AM
Last saved by Frank Marchant
If software update is offline due to ATA chassis

.1
.2
upndu050_R011
To complete this procedure, select one of the following two paths:
Path A - If there are FLARE ATA LUNs that span more than one enclosure, there are
special instructions which follow in this section.
Path B - If there are no FLARE ATA LUNs spanning chassis
Path A Procedure:
If there are ATA LUNs that span more than one enclosure, there are special instructions which follow in
this section for Path A.
Reason: When a LUN spans more than one enclosure, it is subject to having one of its disks marked
for rebuild if one of the chassis experiences a power-fail or is temporarily out of service due to a glitch
during NDU. The ATA chassis being updated to FLARE code that includes FRUMON code 1.53 is
subject to this type of Glitch 1-2% of the time. If a chassis has problems during/immediately following
an NDU update, refer to EMC Knowledgebase article emc88535.
Procedure when there are ATAs LUNs that span more than one chassis:
a. Identify which ATA RAID groups span more than 1 chassis.
b. Identify one LUN on one of those RAID groups.
c. Check the sniff rate on that LUN. It will generally be indicative of all the LUNs on the array.
If this array ever ran Release 11 FLARE, then the default sniff rate of 30 is most likely still
set. This means it takes longer to sniff an entire disk. This will determine how far back
you need to look for ATA disk errors in the next step. Use navicli to determine the sniff
rate on the LUN as follows:
navicli h <SP_ip_address> getsniffer <LUN#>
or
naviseccli h <SP_ip_address> getsniffer <LUN#>
The output will include the latest sniffer results for that LUN and will give you the sniffer
settings for that LUN at the top of the report.
VERIFY RESULTS FOR UNIT 1
Sniffing state:ENABLED
Sniffing rate(100 ms/IO):5
Background verify priority:ASAP
Historical Total of all Non-Volatile Recovery Verifies(0
passes
d. Search the SP event logs for the following types of error on any ATA drives:
Data Sector Reconstructed
Stripe Reconstructed
Sector Reconstructed
Done
0x683
0x687
0x689
Check the past 30 days if the sniffing rate from above = 5 ( Default for Release12 and higher )
Check the past 60 days if the sniffing rate from above = 6-20
1
330584543.doc (184.00 KB)

9/22/2016 9:16 AM
Check the past 120 days if the sniffing rate from above = 20-30
e. If any of the above event codes are found within the designated time period, perform a
background verify of all LUNs on any ATA drive reporting one on the above event codes as
follows:
Start a background verify for the LUN by entering the applicable Navisphere CLI command
as follows:
FLARE versions Release 19 and later:

navicli -h <SP_IP_address> setsniffer <lun_number> | -rg
<raid_group_number> | -all -bv -bvtime ASAP
or
naviseccli -h <SP_IP_address> setsniffer <lun_number> | -rg
<raid_group_number> | -all -bv -bvtime ASAP
where
SP_IP_address is the IP address of SP
lun_number is the LUN number in decimal to start background verify on
raid_group_number is the RAID group number in decimal to start background
verify on
-all is an option to apply sniffer parameters to all LUNs in the storage system.
The target SP must own one LUN at minimum.
The progress of the background verify can be monitored by entering the following
command:
navicli -h < SP_IP_address> getsniffer < lun_number> | -rg
<raid_group_number> | -all
or
naviseccli -h < SP_IP_address> getsniffer < lun_number> | -rg
<raid_group_number> | -all
FLARE versions prior to Release 19:

navicli h <SP_IP_address> setsniffer <lun_number> 1 bv bvtime ASAP
or
naviseccli h <SP_IP_address> setsniffer <lun_number> 1 bv bvtime ASAP
where SP_IP_address specifies the IP address or network name of the SP that owns
the target LUN and lun_number specifies the logical unit number of the LUN
NOTE: Using the command line option cr when starting a background verify will
create a new sniffer (or verify) report and reset all counters to 0.
The progress of the background verify can be monitored by entering the following
command:
navicli h <SP_IP_address> getsniffer <lun_number>
or
naviseccli h <SP_IP_address> getsniffer <lun_number>
NOTE: You cannot check information from the non-owning SP. The above command
330584543.doc (184.00 KB)

9/22/2016 9:16 AM
has to be run from both SPs. If the command is only run with SPA's IP address, then
the output will only contain the report from the LUNs owned by SPA. The same must
be done for SPB.
f. Confirm that no new corrected or uncorrectable errors are encountered when the
background verify is completed. If the same type errors occur as noted above, run a
second background verify on just the LUN reporting the reconstruction. If the background
verify on the LUN reports another reconstruction, do not continue with the NDU, call
CLARiiON Tech Support. Otherwise continue to the next step.
g. Stop and prevent ALL I/O to the array.
h. Disable and zero write cache on the array for NDUs designated as offline in the table
in the prior section.
View and note all write cache settings under Array Properties. You will need to reset
them later to the same settings. Disable the write cache in Array Properties Cache tab.
When cache status has changed from disabling to disabled, zero the memory assigned
to write cache in Array Properties Memory tab. The memory cannot be changed to 0mb
until cache has disabled.
Ensure that the write cache has been completely disabled. Confirm that the cache is
disabled and zeroed by using Navisphere CLI to check cache status. Use the CLI
command:
navicli h < SP_IP_address > getcache
or
naviseccli h < SP_IP_address > getcache
Ensure that all caching statuses from the CLI state are DISABLED and that all write
cache sizes are 0MB before continuing.
i. If the array is not a CDL (CLARiiON Disk Library), unbind all ATA hot spares (the RAID
group that held the hot spare can remain).
j. Confirm that there are no ATA drives with a current Status of stuck in power-up or slot
empty when a drive is installed. If there are, remove them by backing them out of their
fully inserted position, leaving them in the enclosure slot to be reinserted later.
k. Return to the previous module that brought you here, to complete the NDU procedure.
Return here after the NDU to determine if ATA FRUMON code has been updated
successfully prior to re-enabling write cache and allowing I/O.
.3
Path B Procedure:
a. Stop and prevent ALL I/O to the array.

b. Disable and zero write cache on the array for NDUs designated as offline in the table
in the prior section.
View and note all write cache settings under Array Properties. You will need to reset
them later to the same settings. Disable the write cache in Array Properties Cache tab.
330584543.doc (184.00 KB)

9/22/2016 9:16 AM
When cache status has changed from disabling to disabled, zero the memory assigned
to write cache in Array Properties Memory tab. The memory cannot be changed to 0mb
until cache has disabled.
Ensure that the write cache has been completely disabled. Confirm that the cache is
disabled and zeroed by using Navisphere CLI to check cache status. Use the CLI
command:
navicli h < SP_IP_address > getcache
or
naviseccli h < SP_IP_address > getcache
Ensure that all caching statuses from the CLI state are DISABLED and that all write
cache sizes are 0MB before continuing.
c. If the array is not a CDL (CLARiiON Disk Library), unbind all ATA hot spares (the RAID
group that held the hot spare can remain).
d. Confirm that there are no ATA drives with a current Status of stuck in power-up or slot
empty when a drive is installed. If there are, remove them by backing them out of their
fully inserted position, leaving them in the enclosure slot to be reinserted later.
e. Return to the previous module that brought you here, to complete the NDU procedure.
Return here after the NDU to determine if ATA FRUMON code has been updated
successfully prior to re-enabling write cache and allowing I/O.
.4
Following the NDU:
WARNING: Write cache must not be re-enabled, hot spares must not be rebound, and host I/O
must not be allowed until the FRUMON code update has been confirmed.
a. Run the following Navisphere CLI commands to confirm that the BCCs now contain the
new FRUMON code:
navicli h <SPA IP address> getcrus lccreva -lccrevb
or
naviseccli h <SPA IP address> getcrus lccreva -lccrevb
navicli h <SPB IP address> getcrus lccreva -lccrevb
or
naviseccli h <SPB IP address> getcrus lccreva -lccrevb
This will report status of all LCCs and BCCs including the FRUMON revision listed as
Revision. ATA enclosures going to this new version should be at Revision 1.53.
b. Confirm that all ATA LCCs have been updated to the new FRUMON code before
continuing. Below is an example of 1 chassis. A navicli getcrus command will
display all the DAE2 chassis, FC and ATA.
DAE2-ATA Bus 1 Enclosure 1
Bus 1 Enclosure 1 Fan A State: Present
Bus 1 Enclosure 1 Fan B State: Present
330584543.doc (184.00 KB)

9/22/2016 9:16 AM
Bus
Bus
Bus
Bus
Bus
Bus
Bus
Bus
c.
1
1
1
1
1
1
1
1
Enclosure
Enclosure
Enclosure
Enclosure
Enclosure
Enclosure
Enclosure
Enclosure
1
1
1
1
1
1
1
1
Power
Power
LCC A
LCC B
LCC A
LCC B
LCC A
LCC B
A State: Present
B State: Present
State: Present
State: Present
Revision: 1.53
Revision: 1.53
Serial #: SCN00041900684
Serial #: SCN00042000067
When the SP event logs report completion and you have confirmed that the LCCs have
the new FRUMON code a seen above, you must also observe all LCCs and ATA drives for
fault LEDs before considering this update complete.
d. If drives were pulled because they were in a state of stuck in power up or slot empty,
reinsert them now.
e. Rebind any hot spares which were unbound for this procedure.
f.
Reset and re-enable write cache to its original settings if you previously disabled it.
g. You can now return again, to the procedure in the previous module that instructed you to
come to this ATA module.
See the event log examples below.
The following figure shows the completed upgrade on one ATA chassis. There will be an entry for each
chassis (ATA and FC):
The next figure is from an SPA event log that clearly shows that LCC Firmware Upgrades are
complete on all.
330584543.doc (184.00 KB)

9/22/2016 9:16 AM
Detail of a FRUMON update:

Sequence of events:
a.
FLARE will read the revision of the FRUMON code of each (DAE-2 and ATA chassis)
FRUMON image file.
b.
FLARE will initiate a request to do an upgrade of all LCCs that are lower revision.
c.
FLARE will pause 8 minutes on SPB and 10 minutes on SPA to allow all parts of the NDU
to complete.
d.
Starting with the highest numbered enclosure on the highest numbered bus, FLARE will
check the revision of FRUMON code on each LCC.
e.
When an LCC is found with a revision lower than that included with the NDU package,
FLARE will begin downloading the code to the LCC. This requires about 2 - 3 minutes to
a DAE-2 (depending on distance from SP to LCC), and approximately a minute to an ATA
DAE.
f.
FLARE will issue the update command, wait for the LCC to reboot and come back online.
This requires about one minute. The LCC is off line 9 to 15 seconds.
g.
The process will pause approximately 85 seconds between upgrading ATA chassis and
40 seconds between DAE-2 chassis to allow I/O operations to catch up.
h.
Continue to next enclosure.
Timing Example:
Example 1 - If the ATA chassis is chassis 3 of 5 on Bus 1

All chassis higher on the loop than chassis 3 on Bus 1 will be done first.
6
330584543.doc (184.00 KB)

9/22/2016 9:16 AM
10 minutes - to start.
3 minutes -to update Chassis 5
40 seconds - wait until next chassis starts
3 minutes - to do Chassis 4
40 seconds - to start the ATA chassis 3
1 minute - to update the ATA chassis
17 minutes total after the NDU completes, this ATA chassis could resume I/O.
Example 2 - ATA chassis is chassis 6 on Bus 0 and there are 5 DAE-2 chassis on Bus 1
10 minutes - to begin
18 minutes - 3 minutes to do each of the 5 chassis on Bus 1 with 40 seconds between
chassis updates
2 minutes - approximately 2 minutes to start and do the ATA chassis as it is the highest
chassis on Bus 0
30 minutes after the NDU completes, this ATA chassis could resume I/O.
Example 3 - ATA chassis is chassis 1 on Bus 0 and there are 240 drives total (8 chassis on
each backend bus)
10 minutes - to begin
29 minutes - 3 minutes to do each of the 8 DAE-2 chassis on Bus 1 with 40 seconds
between
21 minutes to do the 5 DAE-2 chassis on Bus 0 with 40 seconds between each
1 minute to complete the update of the ATA chassis
61 minutes after the NDU completes, this ATA could resume I/O.
Table of Contents

If Software Update Is Offline Due To ATA Chassis: Path A

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

If Software Update Is Offline Due To ATA Chassis: Path A

Uploaded by

Copyright:

Available Formats

330584543.doc (184.

If software update is offline due to ATA chassis

To complete this procedure, select one of the following two paths:

Path B - If there are no FLARE ATA LUNs spanning chassis

330584543.doc (184.00 KB)

FLARE versions Release 19 and later:

FLARE versions prior to Release 19:

330584543.doc (184.00 KB)

a. Stop and prevent ALL I/O to the array.

330584543.doc (184.00 KB)

Following the NDU:

330584543.doc (184.00 KB)

330584543.doc (184.00 KB)

Detail of a FRUMON update:

Continue to next enclosure.

Example 1 - If the ATA chassis is chassis 3 of 5 on Bus 1

330584543.doc (184.00 KB)

You might also like