You are on page 1of 41

Exadata Infrastructure Patching und Upgrades

in Eigenregie
Klaus Eckstein, Daniel Hillinger
Dr. Klaus Eckstein
• DBA @ V-TServices

• Focus:
• OEM
• RAC including Grid Infrastructure
• Exadata

@klausecks
Value Transformation Services
Joint Venture founded 2013 by IBM and Unicredit
1000 employees, 6 countries, 6 data centers (www.v-tservices.com)
• Exadatas since 2010
• 21 Exadatas running
• All versions: X3 – X8; HC, HP, EF
• All expansions: quarter/half/full rack, flex configuration
Daniel Hillinger
• Senior Consultant at Trivadis Germany GmbH
Munich
• Focus:
• Oracle (RAC, Grid Infrastructure, Exadata, Dataguard)
• Unix/Linux (OEL, RedHat, Solaris)
• Azure (Automation, Design and Security)

@daniel8192 daniel8192.wordpress.com
Agenda
• Platinum service

• Architectural overview of Exadatas

• Components

• Time frames

• Lessons learned
Platinum service
Platinum service (patching)
Limits (as of 2019):
• Configuration must be certified (Config/Patch level)
• Patching at a minimum of 2x per year
• Max. 20/40 (full rack) DBs in max. 2 Oracle Homes (max. 4 virtual RACs per rack)

• Upgrades of GI and DB not included

• New: team in Europe, more instances/DBs


Architectural overview
Exadata
Architectural overview Exadata

• Minimum rack capacity :


• 2 Database servers
• 3 Storage servers

• Maximum rack capacity :


• 22 Servers
• All possible Database and
Storage servers
combinations (respecting
the minimum rack rule).
Architectural overview Exadata X8M-2

• Minimum rack capacity :


• 2 Database servers
• 3 Storage servers

• Maximum rack capacity :


2 x RoCE Network Fabric *
• 22 Servers
• All possible Database and
Storage servers
combinations (respecting
the minimum rack rule).
1 x RoCE Network Fabric *

* RDMA over Converged Ethernet (RoCE)


Network Fabric Cisco 9336c RoCE Switch
Connections to other Oracle systems

• Multi-rack
• Connect several Exadata racks to one logical Exadata (serveral clusters possible)
• ZFS storage appliance
• Storage for backup or storage tiering (HCC supported)
• Connected directly to leaf IB switches
• Exalogic / Private Cloud Appliance
• Engineered system for weblogic (connected with Infiniband)
• Listener on Infiniband interfaces
• Big Data Appliance
• Zero Data Loss Recovery Appliance
Patches
• MOS note 888828.1
• Two patch download options:
• Single patches , download patch for each component
• Quarterly Full Stack Download Patch (QFSDP) contains the complete collection of current
software patches (DB node OS, Cell, InfiniBand Switch, Power Distribution Unit, Oracle
Database and Grid Infrastructure DBBP, Oracle JavaVM PSU, OPatch, OPlan, EM Agent, EM
OMS, EM Plug-ins) > 20G
• Base releases not included

• Check dependencies between components


• BE CAREFUL: different patchmgr for Cell/IB <> DB nodes
Patch order
• Infiniband
• Storage Cell
• DB nodes OS
• Grid Infrastructure
• Database
• OEM agent
• Cisco Switch (independent)
• PDU (independent)

• May vary depending on the patches or patch level (i.e.: GI first)


Components
Infiniband switches
• Utility to patch: patchmgr
• Which parts will be patched: firmware/OS
• Type: rolling
• Time: 30 minutes (including an intermediate release: 1h)

• Best practices
• Reboot before patching

• Problems
• Switch subnet master (sm)
• CentOS → OEL major release switch …
• Upgrade fails → manual update
Infiniband switches
• Change to the patch directory
• Check prerequisites
• Patch

./patchmgr -ibswitches infini_group.txt -upgrade -ibswitch_precheck


./patchmgr -ibswitches infini_group.txt -upgrade
Infiniband switches
• Manual update
load -force -source
sftp://root:welcome1@<db_node1>//<path_to_patch>/sundcs_36p_repository_x.p
kg

• Status of the update


telnet <ib-switch> 1234
Storage cells
• Utility to patch: patchmgr
• Which parts will be patched: firmware, OS, cell software
• Type: offline / rolling possible
• Time: 45 min

• Best practices
• Reboot OS and ILOM before update

• Problems
• If Exadata is enlarged, maybe different hardware present
• (Next slide)
Storage cells
• Change to the patch directory
• Cleanup the previous update utility runs
• Check prerequisites
• Shutdown the cell services and patch
• Cleanup

./patchmgr -cells cell_group.txt -reset_force


./patchmgr -cells cell_group.txt –cleanup
./patchmgr -cells cell_group.txt -patch_check_prereq
dcli -g cell_group.txt -l root "cellcli -e alter cell shutdown services all"
./patchmgr -cells cell_group.txt –patch [-rolling]
./patchmgr -cells cell_group.txt -cleanup
Storage cells
• A lot of ORA-700 in the cell alert.log
• Overload of the cell offload groups

• Install one-off patch:


# export TMPDIR=/var/log/exadatatmp/SAVE_patch_20830449

# /var/log/exadatatmp/SAVE_patch_20830449/cell-
12.1.2.1.1.20830449V1_LINUX.X64_150521-1-rpm.bin --doall –force

# rpm -qa | grep ^cell- # check for version


cell-12.1.2.1.1_LINUX.X64_150521-1.x86_64,
DB server
• Utility to patch:
• dbnodeupdate.sh: may patch local node as well
• patchmgr: patch remote servers only
• Which parts will be patched: firmware, OS, cell software
• Type: offline / rolling
• Time: 1h - 1,5h

• Best Practices
• Reset ILOM → included in patchmgr and dbnodeupdate
• Use latest dbserver.patch (DOC ID 1553103.1, patchmgr/dbnodeupdate.sh)
• Remove all NFS mounts (umount and comment in /etc/fstab)
• Cleanup root fs, to decrease backup time
• Install only packages from iso (example: exadata_ol7_base_repo_19.2.7.0.0.191012.iso)
DB server
• Make sure that password less login is possible
• Reset ILOM
# dcli -l root -g /root/db_group.txt ipmitool bmc reset cold

• Run precheck
• Start upgrade
• -log_dir auto automatically generates a log directory based on the current location.

./patchmgr -dbnodes db_group.txt -precheck -iso_repo


/u01/p30346161_192000_Linux-x86-64.zip -target_version 19.2.7.0.0.191012
[-modify_at_prereq]
./patchmgr -dbnodes db_group.txt -upgrade -iso_repo
/u01/p30346161_192000_Linux-x86-64.zip -target_version 19.2.7.0.0.191012 -
log_dir auto [-rolling]
./patchmgr -dbnodes db_group.txt -log_dir auto -get log_dir
DB server
• Problems
• Resolve dependency issues: use -modify_at_prereq
• Sometimes even more: manually downgrade packages, remove custom packages

• If custom filesystem layout is used, link /<custom_fs> to /u01


• Check HCC to ZFSSA after upgrade (to avoid problems opening DBs on ZFSSA!)
snmpget -v1 -c public <zfssa> 1.3.6.1.4.1.42.2.225.1.4.2.0
SNMPv2-SMI::enterprises.42.2.225.1.4.2.0 = STRING: "Sun ZFS Storage 7420"

• Strange upgrades OEL5 → OEL6 (RHEL does not support upgrades at all)
Database

• Only Bundle Patches supported (PSU technically possible)

• Check dependencies to cell software version


Time frames
Time frames
• Quarter rack - patching one component after the other without resyncing cells

Component Amount Time Time Total


Infiniband 2 0,5h 1h
Cell 3 0,75h 2,25h
DB Node 2 1h 2h
Grid Infrastructure 2 0,75h 1,5h
Sum ~ 7h
Time frames
• Full rack - patching one component after the other without resyncing cells

Component Amount Time Time Total


Infiniband 3 0,5h 1,5h
Cell 14 0,75h 10,5h
DB Node 8 1h 8h
Grid Infrastructure 8 0,75h 6h
Sum ~ 26h
With Resync ~ 40h
Time frames Cells
Cells offline:
• Separate update of one Cell per hardware type (to be on the safe side)
• Remaining Cells in parallel

Cells rolling (High Redundancy):


• One after the other, without moving all data from the patching cell
• Resync only the delta

Cells rolling:
• One after the other, with rebalance
Time frames DB nodes
DB nodes in parallel?
• How many instances have to run?

Grid Infrastructure
• First node separate; rest in parallel (since 19c)
Time frames
• Full Rack
• Patching cells offline

Component Patching Amount Time Time Total


Infiniband rolling 3 0,5h 1,5h
Cell offline 14 0,75h 1,5h
DB Node rolling 8 1h 4h
Grid rolling 8 0,75h 2,25h
Infrastructure
Sum ~ 9h
Downtime 1,5h
Patching Timelines (MOS: 1915259.1)
Lessons learned
Lessons learned
• Always run latest exachk in advance and after

• Always use latest dbnodeupdate

• Patch regularly

• Read the readme carefully

• Plan your patching strategy

• Don‘t forget Cloud Contol agents and plugins (agents are part of QFSDP)
Do it Yourself

Know How should be present for the operation


of the exadata anyway

Oracle does not know your customisations


Any Questions?
Visit us at our booth on level 3
▪ Trivadis barista (good coffee from morning to night)
▪ Birthday cake (daily from 14:00)
▪ Speed-Sessions at the booth with raffle:
Tue 14:45: Martin Berger "Oracle Cloud Free Tier – eben
mal kurz ausprobiert…"
Wed 10:45: Guido Schmutz "Warum ich Apache Kafka
als IT-Spezialist kennen sollte!"
Wed 14:45: Stefan Oehrli "PDB Sicherheit und Isolation,
Herausforderungen von Oracle Container DB‘s"
Thu 10:45: Chris Antognini "Welche Query-Optimizer
Funktionalitäten sind nur in Autonomous Database
verfügbar? “
▪ Participation in our DOAG raffle
▪ Networking and discussions with our speakers
▪ Party with drinks and snacks after the sessions on
Tuesday 17:30
Exacheck
Exachk
• Exachk like orachk (former racchk)

• Special switch for pre upgrade check –preupgrade

• Released every 90 days ; use latest

• Include best practices and known issues


Exachk

You might also like