You are on page 1of 6

AHF and TFA Management

Recently I posted about the upgrade of AHF/TAF from version 19 to 21 at Exadata and also for ODA. But with version 21 of AHF,
some collections are made automatically and this can impact your space usage. Here you can see how to check this and
disable/modify some of these.

The automatic collection for AHF/TFA is a feature that generates the diagnostic packages (to send to Oracle) when some specifics
errors appear in the database. The collected errors follow some patterns like ORA-0600, ORA-07445, and several others. The basic
idea can be seen in the official doc here and in the image below (retried directly from the official doc).

In my case, the automatic collection generates a problem with space usage. Look below the space consumption for AHF:

[root@exdbsrv01 ~]# cd /u01/app/grid/


[root@exdbsrv01 grid]# du -chs oracle.ahf/data/*
90M oracle.ahf/data/exdbsrv01
9.0G oracle.ahf/data/repository
4.0K oracle.ahf/data/work
9.1G total
[root@exdbsrv01 grid]#

As you can see, more than 9GB for data collection at AHF. This occurred because one database error generated a lot of ORA-600,
and made AHF/TFA collect and generate traces for each one of these errors. This is designed for AHF/TFA, but unfortunately not
desired here in my case. As the documentation says: Automatic collections are ON by default (look in my server):

[root@exdbsrv01 oracle.ahf]# /opt/oracle.ahf/bin/tfactl get autodiagcollect


.-------------------------------------------------.
| exdbsrv01 |
+-----------------------------------------+-------+
| Configuration Parameter | Value |
+-----------------------------------------+-------+
| Auto Diagcollection ( autodiagcollect ) | ON |
'-----------------------------------------+-------'
[root@exdbsrv01 oracle.ahf]#

But fortunately, is easy to disable it (“-c” propagate to all nodes of the cluster):


[root@exdbsrv01 grid]# /opt/oracle.ahf/bin/ahfctl set autodiagcollect=OFF -c
Successfully set autodiagcollect=OFF
.-------------------------------------------------.
| exdbsrv01 |
+-----------------------------------------+-------+
| Configuration Parameter | Value |
+-----------------------------------------+-------+
| Auto Diagcollection ( autodiagcollect ) | OFF |
'-----------------------------------------+-------'
[root@exdbsrv01 grid]#

If you already collected a lot of diagnostics packages (like me) you can easily delete it directly from AHF/TFA with the “purge”
command (but remember to purge in each node of your cluster, there is no option to call just from one node and delete at all):

[root@exdbsrv01 grid]# /opt/oracle.ahf/bin/tfactl purge -h


Delete collections from TFA repository
Usage : /opt/oracle.ahf/tfa/bin/tfactl purge -older x[h|d] [-force]
Examples:
/opt/oracle.ahf/tfa/bin/tfactl purge -older 30d - To remove file(s) older than 30 days.
/opt/oracle.ahf/tfa/bin/tfactl purge -older 10h - To remove file(s) older than 10 hours.
[root@exdbsrv01 grid]#

And to delete everything older than 5 hours here is the example (you can use the “-force” to avoid the “Y/N” question:

[root@exdbsrv01 grid]# /opt/oracle.ahf/bin/tfactl purge -older 5h


List of files in the repository older than 5h:
/u01/app/grid/oracle.ahf/data/repository/auto_srdcORA-00700_Sun_Jul_25_10_13_17_CEST_2021_node_ex
/u01/app/grid/oracle.ahf/data/repository/auto_srdcORA-00700_Tue_Jul_27_08_40_39_CEST_2021_node_ex
/u01/app/grid/oracle.ahf/data/repository/collection_Tue_Jul_27_20_00_22_CEST_2021_node_exdbsrv01
/u01/app/grid/oracle.ahf/data/repository/auto_srdcORA-00700_Sat_Jul_24_10_30_56_CEST_2021_node_ex


/u01/app/grid/oracle.ahf/data/repository/auto_srdcORA-00700_Sat_Jul_24_15_11_23_CEST_2021_node_ex
Do you want to delete the above files. [Y|y|N|n] [Y]: Y
Deleting /u01/app/grid/oracle.ahf/data/repository/auto_srdcORA-00700_Sun_Jul_25_10_13_17_CEST_202
Deleting /u01/app/grid/oracle.ahf/data/repository/auto_srdcORA-00700_Tue_Jul_27_08_40_39_CEST_202


Deleting /u01/app/grid/oracle.ahf/data/repository/auto_srdcORA-00700_Sun_Jul_25_21_34_25_CEST_202
Deleting /u01/app/grid/oracle.ahf/data/repository/auto_srdcORA-00700_Sat_Jul_24_15_11_23_CEST_202
[root@exdbsrv01 grid]#

But is not just that we can do, there are several other things that we can check and enable/disable with commands “get” and “set”:

[root@exdbsrv01 oracle.ahf]# /opt/oracle.ahf/bin/tfactl get collect


Invalid option specified for get
GET various TFA features
Usage : /opt/oracle.ahf/tfa/bin/tfactl get [ autodiagcollect | trimfiles | tracelevel| reposizeMB
Examples:
/opt/oracle.ahf/tfa/bin/tfactl get autopurge
/opt/oracle.ahf/tfa/bin/tfactl get match-pattern -match
[root@exdbsrv01 oracle.ahf]#

Other important commands are related to the print of current collections and configs. Look below that I collect the report+changed
the collection+generate the report again (and correctly show the change):

[root@exdbsrv01 grid]# /opt/oracle.ahf/bin/tfactl get collect -match


.------------------------------------------------------------------------------.
| exdbsrv01 |
+----------------------------------------------------------------------+-------+
| Configuration Parameter | Value |
+----------------------------------------------------------------------+-------+
| ISA Data Gathering ( collection.isa ) | ON |
| collectTrm | OFF |
| collectAllDirsByFile | ON |
| Auto Diagcollection ( autodiagcollect ) | ON |
| Generation of Mini Collections ( minicollection ) | ON |
| chaautocollect | ON |
| Maximum File Collection Size (MB) ( maxFileCollectionSize ) | 5120 |
| Maximum Collection Size of Core Files (MB) ( maxCoreCollectionSize ) | 200 |
| minTimeForAutoDiagCollection | 300 |
'----------------------------------------------------------------------+-------'
[root@exdbsrv01 grid]#
[root@exdbsrv01 grid]# /opt/oracle.ahf/bin/ahfctl set chaautocollect=OFF -c
Successfully set chaautocollect=OFF
.---------------------------------.
| exdbsrv01 |
+-------------------------+-------+
| Configuration Parameter | Value |
+-------------------------+-------+
| chaautocollect | OFF |
'-------------------------+-------'
[root@exdbsrv01 grid]#
[root@exdbsrv01 grid]# /opt/oracle.ahf/bin/tfactl get collect -match
.------------------------------------------------------------------------------.
| exdbsrv01 |
+----------------------------------------------------------------------+-------+
| Configuration Parameter | Value |
+----------------------------------------------------------------------+-------+
| ISA Data Gathering ( collection.isa ) | ON |
| collectTrm | OFF |
| collectAllDirsByFile | ON |
| Auto Diagcollection ( autodiagcollect ) | OFF |
| Generation of Mini Collections ( minicollection ) | ON |
| chaautocollect | OFF |
| Maximum File Collection Size (MB) ( maxFileCollectionSize ) | 5120 |
| Maximum Collection Size of Core Files (MB) ( maxCoreCollectionSize ) | 200 |
| minTimeForAutoDiagCollection | 300 |
'----------------------------------------------------------------------+-------'
[root@exdbsrv01 grid]#

A more comprehensive report came from AHF:

[root@exdbsrv01 oracle.ahf]# /opt/oracle.ahf/bin/ahfctl print config


.------------------------------------------------------------------------------------------------
| exdbsrv01
+------------------------------------------------------------------------------------------------
| Configuration Parameter
+------------------------------------------------------------------------------------------------
| TFA Version ( tfaversion )
| Java Version ( javaVersion )
| Public IP Network ( publicIp )
| Repository current size (MB) ( currentsizemegabytes )
| Repository maximum size (MB) ( maxsizemegabytes )
| Cluster Event Monitor ( clustereventmonitor )
| scandiskmon
| scanacfslog
| File Data Collection ( inventory )
| Automatic Purging ( autoPurge )
| Internal Search String ( internalSearchString )
| ISA Data Gathering ( collection.isa )
| Trim Files ( trimfiles )
| collectTrm
| chmdataapi
| chanotification ( chanotification )
| Skip event if it was flood controlled ( floodcontrol_events )
| Consolidate similar events (COUNT shows number of events occurences) ( consolidate_events )
| Managelogs Auto Purge ( manageLogsAutoPurge )
| scanacfseventlog
| Alert Log Scan ( rtscan )
| debugips
| generateZipMetadataJson
| collectAllDirsByFile
| scanvarlog
| Auto Diagcollection ( autodiagcollect )
| Public IP Network ( publicIp )
| Flood Control ( floodcontrol )
| Generation of Mini Collections ( minicollection )
| odscan
| Disk Usage Monitor ( diskUsageMon )
| Start consuming data provided by SQLTicker ( sqlticker )
| Discovery ( discovery )
| analyze
| indexInventory
| Generation of Telemetry Data ( telemetry )
| chaautocollect
| Granular Tracing ( granulartracing )
| minPossibleSpaceForPurge
| disk.threshold
| mem.swapfree
| mem.util.samples
| inventoryThreadPoolSize
| mem.swaptotal.samples
| maxFileAgeToPurge
| mem.free
| actionrestartlimit
| Minimum Free Space to enable Alert Log Scan (MB) ( minSpaceForRTScan )
| cpu.io.samples
| mem.util
| Maximum single Zip File Size (MB) ( maxZipSize )
| Time interval between consecutive Disk Usage Snapshot(minutes) ( diskUsageMonInterval )
| TFA ISA Purge Thread Delay (minutes) ( tfaDbUtlPurgeThreadDelay )
| firstDiscovery
| TFA IPS Pool Size ( tfaIpsPoolSize )
| Maximum File Collection Size (MB) ( maxFileCollectionSize )
| Time interval between consecutive Managelogs Auto Purge(minutes) ( manageLogsAutoPurgeInterval
| arc.backupmissing.samples
| cpu.util.samples
| cpu.usr.samples
| cpu.sys
| Flood Control Limit Count ( fc.limit )
| Flood Control Pause Time (minutes) ( fc.pauseTime )
| Maximum Number of TFA Logs ( maxLogCount )
| DB Backup Delay Hours ( dbbackupdelayhours )
| cdb.backup.samples
| arc.backupstatus
| purgeFrequency
| TFA ISA Purge Age (seconds) ( tfaDbUtlPurgeAge )
| Maximum Collection Size of Core Files (MB) ( maxCoreCollectionSize )
| cpu.util
| mem.swapfree.samples
| cdb.backupstatus
| mem.swaputl.samples
| arc.backup.samples
| unreachablenodeTimeOut
| Flood Control Limit Time (minutes) ( fc.limitTime )
| mem.swaputl
| mem.free.samples
| Maximum Size of Core File (MB) ( maxCoreFileSize )
| disk.samples
| cpu.sys.samples
| cpu.usr
| arc.backupmissing
| cpu.io
| Archive Backup Delay Minutes ( archbackupdelaymins )
| inventoryPurgeThreadInterval
| Age of Purging Collections (Hours) ( minFileAgeToPurge )
| cpu.idle.samples
| unreachablenodeSleepTime
| cpu.idle
| mem.swaptotal
| TFA ISA CRS Profile Delay (minutes) ( tfaDbUtlCrsProfileDelay )
| cdb.backupmissing
| cdb.backupmissing.samples
| Trim Size ( trimsize )
| Maximum Size of TFA Log (MB) ( maxLogSize )
| minTimeForAutoDiagCollection
| skipScanThreshold
| fileCountInventorySwitch
| TFA ISA Purge Mode ( tfaDbUtlPurgeMode )
| country
| Debug Mask (Hex) ( debugmask )
| Setting for ACR redaction (none|SANITIZE|MASK) ( redact )
| language
| AlertLogLevel
| BaseLogPath
| encoding
| UserLogLevel
| Logs older than the time period will be auto purged(days[d]|hours[h]) ( manageLogsAutoPurgePoli
| isaMode
'------------------------------------------------------------------------------------------------
[root@exdbsrv01 oracle.ahf]#

Another important change is related to CPU usage for AHF/TFA. At several places, we can see relates/doubts/posts/forums telling
about high CPU usage due to TFA. And if you link with the automatic collection you can pass several problems due to the limits. My
example:

[root@exdbsrv01 oracle.ahf]# /opt/oracle.ahf/bin/ahfctl getresourcelimit


Tool: tfa, Resource: cpu, Limit value: 4.0
Tool: tfa, Resource: kmem no resource limit set
Tool: tfa, Resource: swmem no resource limit set
[root@exdbsrv01 oracle.ahf]#

As you can see above the CPU limit is 4. This means that TFA can use 4 CPUs of my servers to collect and generate data. This
value can be changed to a more reasonable value. The way to think is that 1 represents 100% of single CPU usage. 4, means
100% for 4 CPU usage.  So, to set to 50% of only one cpu you define the value as 0.5  (example from the doc):  ahfctl
setresourcelimit -value 0.5.

There are several other things to change and to set for AHF/TFA. The documentation is good and full of examples. Another good
source of information is the Markus Flechtner presentation from 2019.

Some other examples of AHF/TFA management:

[root@exdbsrv01 grid]# /opt/oracle.ahf/bin/tfactl toolstatus


Running command tfactltoolstatus on exdbsrv02 ...
.------------------------------------------------------------------.
| TOOLS STATUS - HOST : exdbsrv02 |
+----------------------+--------------+--------------+-------------+
| Tool Type | Tool | Version | Status |
+----------------------+--------------+--------------+-------------+
| Development Tools | exachk | 20.2.2.0.0 | DEPLOYED |
| | oratop | 14.1.2 | DEPLOYED |
+----------------------+--------------+--------------+-------------+
| Support Tools Bundle | darda | 2.10.0.R6036 | DEPLOYED |
| | oswbb | 8.3.2 | NOT RUNNING |
| | prw | 12.1.13.11.4 | NOT RUNNING |
+----------------------+--------------+--------------+-------------+
| TFA Utilities | alertsummary | 20.2.2.0.0 | DEPLOYED |
| | calog | 20.2.2.0.0 | DEPLOYED |
| | dbcheck | 18.3.0.0.0 | DEPLOYED |
| | dbglevel | 20.2.2.0.0 | DEPLOYED |
| | grep | 20.2.2.0.0 | DEPLOYED |
| | history | 20.2.2.0.0 | DEPLOYED |
| | ls | 20.2.2.0.0 | DEPLOYED |
| | managelogs | 20.2.2.0.0 | DEPLOYED |
| | menu | 20.2.2.0.0 | DEPLOYED |
| | param | 20.2.2.0.0 | DEPLOYED |
| | ps | 20.2.2.0.0 | DEPLOYED |
| | pstack | 20.2.2.0.0 | DEPLOYED |
| | summary | 20.2.2.0.0 | DEPLOYED |
| | tail | 20.2.2.0.0 | DEPLOYED |
| | triage | 20.2.2.0.0 | DEPLOYED |
| | vi | 20.2.2.0.0 | DEPLOYED |
'----------------------+--------------+--------------+-------------'
Note :-
DEPLOYED : Installed and Available - To be configured or run interactively.
NOT RUNNING : Configured and Available - Currently turned off interactively.
RUNNING : Configured and Available.
.------------------------------------------------------------------.
| TOOLS STATUS - HOST : exdbsrv01 |
+----------------------+--------------+--------------+-------------+
| Tool Type | Tool | Version | Status |
+----------------------+--------------+--------------+-------------+
| Development Tools | exachk | 20.2.2.0.0 | DEPLOYED |
| | oratop | 14.1.2 | DEPLOYED |
+----------------------+--------------+--------------+-------------+
| Support Tools Bundle | darda | 2.10.0.R6036 | DEPLOYED |
| | oswbb | 8.3.2 | NOT RUNNING |
| | prw | 12.1.13.11.4 | NOT RUNNING |
+----------------------+--------------+--------------+-------------+
| TFA Utilities | alertsummary | 20.2.2.0.0 | DEPLOYED |
| | calog | 20.2.2.0.0 | DEPLOYED |
| | dbcheck | 18.3.0.0.0 | DEPLOYED |
| | dbglevel | 20.2.2.0.0 | DEPLOYED |
| | grep | 20.2.2.0.0 | DEPLOYED |
| | history | 20.2.2.0.0 | DEPLOYED |
| | ls | 20.2.2.0.0 | DEPLOYED |
| | managelogs | 20.2.2.0.0 | DEPLOYED |
| | menu | 20.2.2.0.0 | DEPLOYED |
| | param | 20.2.2.0.0 | DEPLOYED |
| | ps | 20.2.2.0.0 | DEPLOYED |
| | pstack | 20.2.2.0.0 | DEPLOYED |
| | summary | 20.2.2.0.0 | DEPLOYED |
| | tail | 20.2.2.0.0 | DEPLOYED |
| | triage | 20.2.2.0.0 | DEPLOYED |
| | vi | 20.2.2.0.0 | DEPLOYED |
'----------------------+--------------+--------------+-------------'
Note :-
DEPLOYED : Installed and Available - To be configured or run interactively.
NOT RUNNING : Configured and Available - Currently turned off interactively.
RUNNING : Configured and Available.
[root@exdbsrv01 grid]#
[root@exdbsrv01 grid]# /opt/oracle.ahf/bin/ahfctl set chaautocollect=OFF -c
Successfully set chaautocollect=OFF
.---------------------------------.
| exdbsrv01 |
+-------------------------+-------+
| Configuration Parameter | Value |
+-------------------------+-------+
| chaautocollect | OFF |
'-------------------------+-------'
[root@exdbsrv01 grid]#

Disclaimer:  “The postings on this site are my own and don’t necessarily represent my actual employer positions, strategies or
opinions. The information here was edited to be useful for general purpose, specific data and identifications were removed to allow
reach the generic audience and to be useful for the community. Post protected by copyright.”

You might also like