Professional Documents
Culture Documents
by the TAC
BRKCOM-3010
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 3
Session Goals
• Learn how to avoid the avoidable
– Leverage UCS best practices
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 4
A real UCS customer saga…
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 5
Release Notes
6
Release Notes
Unofficial Survey…”Do you read the Release Notes?”
7
20
Yes Yes
No No
93 80
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 7
Release Notes
Pay particular attention to:
• Open Caveats
– Resolved Caveats are the typical reason for an upgrade
• New Features
• Internal Dependencies
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 8
Release Notes
Mixed Cisco UCS Release Support
• Supported in release 2.1(x) and above
• Allows for independent infrastructure upgrades
• Consult the release notes for details
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 9
Release Notes
Mixed Cisco UCS Release Support matrix example
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 10
Release Notes
Mixed Release
• Refer to the “Minimum B/C Bundle…Features” section
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 11
Release Notes
Caveat details if running in a mixed firmware environment
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 12
UCS Firmware Upgrades
13
UCS Firmware Upgrades
Treat them like elective surgery
• Pre-op check-up • Pro-active TAC SR
• The operation • The upgrade
• Recovery Room • Verify functionality
• Released from surgical center • Resume production
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 14
UCS Firmware Upgrades
Pre-Upgrade Check list
• Consult Release Notes, work with your account team
• Back-up your system
• Review Compatibility Matrices
• Eliminate Critical/Major Faults
• Watch our Video Upgrade Guides
• Check Cisco’s online community and support forums
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 15
UCSM Firmware Upgrades
Frequently forgotten or missed items
• Updating OS drivers to meet the compatibility matrix
• Backing up the system prior to upgrade
• Upgrade of blade BIOS & Board Controller
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 16
UCSM Firmware Upgrade
UCS HW and SW Interoperability example
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 17
UCSM Firmware Upgrade (cont.)
Results (continued)
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 18
UCSM Upgrades
Host Firmware Package – Simple option
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 19
UCS Firmware Upgrade
New features in 2.2(2)
• During the Auto-Upgrade process:
– System back-up reminder
– Critical/Major fault presence alerts
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 20
Maintenance Windows
21
Maintenance Windows
TAC customer example
• Customer opens a case regarding a critical fault
• We explain how to resolve it, and due to the potential for it to be service
impacting, strongly suggest a maintenance window
• 1 Hour later, the same customer is back with a P1 SR!
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 22
Maintenance Windows
• Better safe than sorry
• An Industry standard best practice for Data Centers
• Especially critical for Fabric Interconnect changes
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 23
System Back Up
24
System Backups
TAC Case example
• During an attempted upgrade, configuration database corrupted
• UCSM was in a degraded state, not allowing back-ups nor show tech
• 11 month old ‘show tech ucsm’ found in our TAC SR database
• Painful, but successful, reconfiguration (4+ hour effort) of entire UCS domain
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 25
UCSM Back-ups
• UCS Back-up Types
1. Full State
2. System Configuration
3. Logical Configuration
4. All Configuration
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 26
Backup Types
Full State
• Binary file
• Encrypted (passwords and sensitive data not in clear text)
• Intended for Disaster Recovery
• Ideal for pre-Upgrade
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 27
Backup Types
System Configuration
• XML file
• System configuration such as username/roles
• Exportable to external Fabric Interconnects
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 28
Backup Types
Logical Configuration
• XML file
• Service Profiles, VLANs, VSANs, pools & policies
• Exportable to external Fabric Interconnects
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 29
Backup Types
All Configuration
• XML file
• Includes System & Logical configuration settings
• Exportable to external Fabric Interconnects
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 30
Fibre-Channel Port Channels
31
FC Port Channels
TAC Case Example
• Hosts reporting high storage latency
• They added 3 additional FC uplinks (doubling bandwidth)
• No change in latency! What happened?
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 32
FC Switch
FC Port Channels
Individual FC Uplink behavior
Fabric Int.
Individual FC Uplinks
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 33
FC Port Channels
The power of the bundle
Fabric Interconnect
Port-channel bundle
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 34
FC Port Channels
Details
• Requires MDS or Nexus upstream switch
• Dynamically modify Port Channel link membership
• Load Balancing amongst member links is inherent
– No need to be concerned about multiple high b/w hosts pinned to same link
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 35
FC Port Channels
Back to our TAC case…
No PC
25000
20000
15000
10000 No PC
5000
0
Port 1 Port 2 Port 3 Port 4 Port 5 Port 6
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 36
FC Port Channels
Port Channel vs. individual FC uplinks
25000
20000
15000
PC
10000 No PC
5000
0
Port 1 Port 2 Port 3 Port 4 Port 5 Port 6
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 37
FC Port Channels
Conclusion
• Port Channels (if possible) are preferred
• Provides optimal traffic distribution
• Dynamic PC membership changes
• No known down-side
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 38
FC Topologies
FC Topologies
Best Practice Model
FC Switches
A-Side B-Side
Fabric
Interconnects
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 40
FC Topologies
Common Mistake 1 (ISL)
FC Switches
A-Side B-Side
Fabric
Interconnects
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 41
FC Topologies
Common Mistake 2 (crossing)
FC Switches
A-Side B-Side
Fabric
Interconnects
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 42
FC Topologies
Rare Mistake (ISL + Cross)
FC Switches
A-Side B-Side
Fabric
Interconnects
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 43
3rd Party Transceivers
44
3rd Party Transceivers
TAC Case example
• We recently encountered “Cisco Compatible” non-Cisco twin-ax installed in a
relatively large UCS B-Series deployment
• What we found:
– Cisco PID spoofed, prevented “unsupported transceiver” fault
– Low percentage of frames with CRC errors
– Fibre Channel performance severely impacted due to the dropped frames
• Please be aware of our experience when choosing 3rd Party transceivers
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 45
DIMM Faults
Degraded DIMM alerts
Background
• Probability of ECC increases as DIMM geometries shrink
• ECC threshold monitoring can lead to Degraded DIMM marked faults
• Not to be confused with DIMMs marked “Inoperable”
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 47
Degraded DIMMs
Impact
• Per UCS Engineering studies:
– UCS servers handle ECC errors without impact to server
– No Performance impact with DIMM’s in degraded state
– Our thresholds for marking a DIMM degraded deemed too conservative
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 48
Degraded DIMMs
Resolution
• New Firmware will change • Workaround:
thresholds to practical values – You can safely ignore the ‘degraded
DIMM’ faults until you upgrade, or
• Fixed in 2.2(1b) and 2.1(3c)
– RMA the degraded DIMM
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 49
DIMM Blacklisting
New in 2.2(1b)
• With DIMM Blacklisting feature enabled, if uncorrectable DIMM errors are
encountered:
– CIMC records location of faulty DIMM
– During next boot sequence, the faulty DIMM gets mapped out
• Benefits
– Allows server to safely remain in production
– Allows for RMA of faulty DIMM when convenient
• Please note that the feature is disabled by default
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 50
C-Series
C-Series
CIMC configuration
• Please configure your CIMC!
• Troubleshooting is nearly impossible without it
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 52
C-Series
TAC Case example
• Large C-series standalone deployment
• Disk performance degraded on a number of servers
• TAC found failed BBU’s
• Servers had moved from write-back to write-through
• Recommendation:
– Employ SNMP or IPMI monitoring, especially in large deployments
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 53
C-Series
Monitoring feature
• We expect to have a UCSM-like feature to facilitate standalone C-Series
monitoring available in a future release.
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 54
C-Series
C-Series Integration
• Please ensure to update the Rack Servers
– Bundle C is used to upgrade integrated racks servers
• Caution: Re-ack of a FEX will require a re-ack of all associated rack servers
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 55
Working with the TAC
Working with the TAC
General Best Practices
• Generate appropriate ‘show tech’ dumps ASAP
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 57
Working with the TAC
Why a great Problem Description is worth the effort
• Speeds up the problem resolution process
• An engineer familiar with the symptoms is more likely to grab it
• Helps the SR reviewer ensure that the case is progressing as expected
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 58
Working with the TAC
What to include in your Problem Descriptions
• Firmware Version & type(s) of equipment
• Error messages
• Clear and concise explanation of the problem
• Pertinent details, such as:
– New installation vs. production environment
– Any changes that may have led to the problem
– Impact to your business
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 59
Working with the TAC
Upgrades
• Engage the TAC if you have any questions prior to an upgrade
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 60
Working with the TAC
Upgrades (continued)
• Cisco Services are available to assist with upgrades and bug scrubs
• Consult your Cisco account team for details
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 61
Working with the TAC
Hardware Failures & RMA’s
• Expect requests for logs, and allow time for analysis
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 62
Working with TAC
Hardware Failures & RMA’s (continued)
• Fully assembled blades are an option
• Typically available Next Business Day (NBD)
• Subject to part(s) availability at the assembly depot
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 63
Working with the TAC
SR Ownership
• Only re-queue when immediate assistance is required
• Re-queuing for a status update can be counter-productive
• Behind every UCS TAC Engineer:
– Colleagues & Mentors
– Subject Matter Experts
– Team Leads
– Managers
– Technical Leaders
– Escalation engineers
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 64
Conclusion
65
Wrapping it up, the rest of the story…
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 66
Wrap Up
Release Notes – What was missed (from the 2.0 Release Notes)
After an upgrade from a prior release to 2.0(1), a critical fault may be raised about
an overlapping or matching FCoE VLAN ID used for a vSAN and an Ethernet
VLAN ID under the same fabric as the FCoE VLAN.
The fault can be avoided by changing either the FCoE VLAN ID or the Ethernet
VLAN ID so that they have two different IDs prior to the upgrade.
Resolving the problem after the upgrade may lead to down time for the system.
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 67
Wrap Up
The rest of the story
• His boss joins the conference bridge and asks: “ Did we have a maintenance
window in place for the change?”
• Awkward silence after he admitted that there was no maintenance window in
place.
• This didn’t have to happen, please don’t let it happen to you.
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 68
Key Takeaways
• Utilize Maintenance Windows
• Read and understand Release Notes
• Maintain good Compatibility Matrix hygiene
• Leverage the TAC efficiently & effectively
• When in doubt, call the TAC
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 69
Complete Your Online Session Evaluation
• Give us your feedback and you
could win fabulous prizes. Winners
announced daily.
• Complete your session evaluation
through the Cisco Live mobile app
or visit one of the interactive kiosks
located throughout the convention
center.
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 70
Continue Your Education
• Demos in the Cisco Campus
• Walk-in Self-Paced Labs
• Table Topics
• Meet the Engineer 1:1 meetings
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 71
I2C – A brief history
73
I2C
What is I2C?
• I2C = Inter-Integrated Circuit, around for many decades
• Master/Slave bus technology
• Employed by UCS to facilitate IO Module communication with chassis
components:
– Fan and PSU readings
– Chassis SEEPROM (a.k.a Shared Storage) access
– Blade readings
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 74
I2C Bus in the UCS Chassis
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 75
I2C
Typical Defect Symptoms
• Fan faults
• Shared Storage faults
• Temperature warnings
• Fans spinning at 100%
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 76
I2C
Improving over time
• I2C issues improve as firmware progresses:
BRKCOM-3010 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Public 77