Professional Documents
Culture Documents
Introduction 1
Caveat 1
Batch concepts 2
Batch Program 2
Batch Controls 2
Timed Batch 3
Level Of Service 3
Batch Overview 4
Execution Modes 7
Worker Initialization 12
Submitter Initialization 13
Process 13
Member Validation 13
Scheduler Daemon 14
Timed Batch 18
EXTENDED Mode 20
Threading Overview 24
Parameter Guidelines 25
Multi-threading Guidelines 27
Use of Spacenames 31
Sample Setup 36
Setup threadpools 36
JMX Monitoring 42
Commit Strategies 43
Enabling BatchEdit 55
Common Errors 65
Communication Delays 65
MAX-ERRORS Exceeded 66
Typically a foreground process (a.k.a. online or Web Service transaction), performs operations on a single instance
of an object within the product. Maintaining a person's contact details or processing a single payment is two such
examples. Background processing, by contrast, performs operations on multiple instances of an object. This coins
the term batch where the background process processes batches of objects. The term background processing
implies that the processing is performed in the background with little or no user interaction.
The product ships with a preset number of background processes that may be used and configured to perform the
necessary business functions for your site. These background processes can be extended (just like the rest of the
product functionality) and custom background processes can be added.
This white paper outlines the common and best practices used for the background processing (a.k.a. batch)
component of the Oracle Utilities Application Framework. The advice in this whitepaper is based upon Oracle
internal studies and customer feedback around the world. This information is provided to guide other sites in
implementing or maintaining the product in production.
This document is a companion document to the product documentation and the Performance Troubleshooting
Guidelines – Batch Troubleshooting (Doc Id: 560382.1) whitepaper available from My Oracle Support.
Note: For publishing purposes, the word product will be used to be denote all Oracle Utilities Application Framework
based products.
Caveat
While all care has been taken in providing this information, implementation of the practices outlined in this document
may NOT guarantee the same level of (or any) improvement. Not all practices outlined in this document will be
appropriate for your site. It is recommended that each practice be examined in light of your particular organizational
policies and use of the product. If the practice is deemed beneficial to your site, then consider implementing it. If the
practice is not appropriate (e.g. for cost and other reasons), then it should not be considered.
Advice or instructions marked with this icon apply to Oracle Utilities Application Framework
V2.2 based products and above.
Advice or instructions marked with this icon apply to Oracle Utilities Application Framework
V4.0 based products and above.
Advice or instructions marked with this icon apply to Oracle Utilities Application Framework
V4.1 based products and above.
Advice or instructions marked with this icon apply to Oracle Utilities Application Framework
V4.2.0.0.0 based products and above.
Advice or instructions marked with this icon apply to Oracle Utilities Application Framework
Note: Advice in this document is primarily applicable to the latest version of the Oracle Utilities Application
Framework at time of publication. Some of this advice may apply to other versions of the Oracle Utilities Application
Framework and may be applied at site discretion.
Note: In some sections of this document the environment variable $SPLEBASE (or %SPLEBASE%) is used. This
denotes the root location of the product install. Substitute the appropriate value for the environment used at your
site.
Note: This document is a companion to the Server Administration Guide (V4.3 and above), Batch Server
Administration Guide (V4.0 - V4.2 only) or Operations And Configuration Guide (V2.x)
Batch concepts
The following section outlines some basic concepts when configuring and executing the batch component of the
Oracle Utilities Application Framework.
Batch Program
The main component of the background process is the batch program. This program contains the logic to select the
batch of records that the process will perform actions upon. The batch of records is progressively passed to the
relevant objects to complete processing. In essence, the program acts as a driver to push individual records to the
relevant objects. Objects in the product are shared across all modes of access to maximize reuse.
Apart from the logic to decide the subset of records, the batch program contains the following additional information:
» The batch program contains the code necessary to interface to the framework to automatically manage its
individual execution and restart (if necessary).
» Some of the batch programs contain the code necessary to multi-thread the processing. In essence the program
determines which slice or subset of the overall data it needs to process. Not all programs support multi-threading
(most do, except the extract to file type processes).
The product ships with a preset number of batch programs associated with background processes that may be used
and configured to perform the necessary business functions for your site. These background processes can be
extended (just like the rest of the product functionality) and custom batch programs can be added.
Batch Controls
A batch program and its parameters must be defined in metadata prior to initial execution. This data is stored in a
Batch Control object within the meta-data component of the framework.
The Batch Control contains the definition of the batch program and the following additional information:
» A Batch Code used as an identifier. This is used by the framework to identify the job internally and used to denote
output.
» Basic execution statistics for the last execution, including last execution date and time and the latest run number
which is primarily used for extract processing.
» Process specific parameters identify whether a parameter is optional or required and may provide a site specific
default value.
The product ships with a preset number of batch controls associated with background processes that may be used
and configured to perform the necessary business functions for your site. Custom batch controls can be added as
needed.
One of the features of the Oracle Utilities Application Framework is the ability to create timed batch jobs. Typically
most batch jobs are executed as a single execution on a regular time period such as once a day, once an hour etc.
In some implementation scenarios, it is required to run a batch job continuously (such as a daemon) to process data
as it is found. The Oracle Utilities Application Framework introduced a feature to allow implementations to create
jobs that are run continuously.
» The batch program must be designed to run continuously. Existing jobs in the product cannot be converted from
non-timed to timed just using configuration. The Oracle Utilities SDK contains information about writing
continuous batch programs.
» An instance of the timed batch program runs for a timer interval and then completes on next commit after the
interval is reached. After completion a new instance is automatically started. This simulates a continuous
execution whilst minimizing performance impact on the overall system 1.
» The batch control for timed jobs must be configured with additional information including:
» Timer Interval - The duration each instance of the batch job will execute.
» Timer Active - This controls whether the job is executing or not. The job will only execute when the timer
active is set to true/Yes.
» UserId/Batch Language - Userid used for authorization and language for batch messages
» Email Address - Optional, email address used for monitoring for error executions.
» The job must be started manually initially to start the process executing.
» To stop a timed job, set the Timer Active to false.
Level Of Service
Note: This feature is only available in Oracle Utilities Application Framework V4.2.0.2,0 and above.
Note: This feature has been designed for use with Timed Batch only with algorithms optimized for that style only
provided. It is applicable to other styles but requires an algorithm to be developed and configured.
One of the newer features of Batch processing is the ability to configure a level of service check on the batch control
to check the latest execution of the job against some criteria to ascertain whether it meets service expectations. This
facility is designed to provide basic service level feedback.
By default, the facility is disabled with the message Disabled - Level of Service reporting is not enabled for this batch
job displayed on the Batch Control screen. This indicates that the level of service is not configured for this batch.
» An algorithm of type Batch Level of Service must be created using the Oracle Utilities SDK or using ConfigTools
using Service Scripts. This algorithm must be defined as an algorithm of this type and contain the logic to check
the last execution against a target (the logic is up to your business criteria). A sample, F1-BAT-LSDEF, has been
provided as a guide.
» On the Batch Control Algorithms tab, configure the new algorithm with the appropriate service levels on the
algorithm definition.
Whenever the batch control is displayed the level of service will be assessed according
1 As timed batch is continuous it may execute concurrently with peak online hours.
In terms of background processing, the framework wraps the batch program to execute it and manage it from the
operational point of view. This includes the following:
» Providing the interface to track the progress of the threads via the Batch Run Tree transaction.
» Providing the infrastructure for the recording of restart checkpoints.
» Providing the infrastructure to handle other components like algorithms, developed in either COBOL or Java.
Background processes are executed within a JVM which has loaded the Framework components necessary for
batch execution. Each submission method may have other unique elements, but this basic summary is true for
each.
Note: This section does not apply to Oracle Utilities Application Framework V4.3.x and above .
The Oracle Utilities Application Framework supports both java and COBOL 2 based background processes. Since
the Oracle Utilities Application Framework is Java based, all processes are executed within a batch JVM (e.g.
threadpool). When the process needs to invoke a COBOL based module, the required module is loaded and
executed internally by the JVM. The results are then passed back to the main Java objects. This method differs
from the one used by the Online and Web Service Adapter in that they use separate child JVMs to isolate COBOL
calls. The reason for this difference is that in Online and Web Services there could be a large number of different
COBOL objects called. Batch is usually limited to a smaller number of COBOL objects.
Given that COBOL and Java live in the same batch JVM, there are a number of concepts which must be
understood:
» Any COBOL program that misbehaves (bad data/bad memory management etc) can cause the failure of the
executing batch JVM. This may affect other jobs using the same batch JVM (threadpool) at the same time (or
even future scheduled executions if the batch JVM is not restarted). Regularly checking the threadpool via JMX
and logs is advised to avoid issues.
» COBOL programs are attached to the batch JVM as executed. This can increase the memory footprint over time.
COBOL modules cannot be garbage collected using the default method provided with Java. Over time, a long
running batch JVM, that is reused for a lot of batch process may need to be stopped and restarted to avoid
memory issues.
2 COBOL is not supported as of Oracle Utilities Application Framework V4.3.x and above.
Note: In Oracle Utilities Application Framework V2.2 it is now possible, after implementing Patch 9364072, to use
environment variables when specifying parameters. For example, using ${SPLOUTPUT} in the FILE-PATH
parameter on the online submission. Environment variables must be surrounded by ${}.
» Batch Scheduler Submission – A set of utilities is provided to allow third party schedulers to execute
background processes from the product. These allow a scheduler to micro-manage the processes on behalf of
the product and also allow integration to non-product processes into your overall schedule.
Most implementations use the methods in the following ways:
» The Command Line method is not used by site personnel. Developers may consider using it but may use the
other methods.
» The Online Submission Method is used by most implementations for non-production environments where there is
no scheduler present. This allows testing personnel to submit background processes as necessary. The
implementation has a choice in terms of how the execution via the daemon actually occurs. A discussion of the
various execution methods are discussed in Online Daemon Or Standalone Daemon.
» The Scheduler Submission Method is commonly used for production and a scheduler test environment (it is
expected that the IT group will get familiar with the scheduler and scheduler before implementing it into
production). Guidelines for this method are outlined in Scheduler Implementation Guidelines.
» A worker JVM is started (termed a threadpool) using the threadpoolworker[.sh] utility. This starts a JVM
and loads the Oracle Utilities Application Framework ready to accept work. The threadpoolworker[.sh]
utility uses a set of configuration files to determine the characteristics of any threadpool it needs to manage. In
Oracle Utilities Application Framework V2.2, it is possible to manage the threadpools via JMX using a JMX
console or the provided jmxbatchclient[.sh] utility. The threadpool is given a name which is used by the
Oracle Utilities Application Framework to attribute work to it as directed.
» Each thread (or multiple threads) can be submitted to the named threadpool using the submitjob[.sh] utility.
During this process a small submitter JVM is created to initiate communication between the background process
3 Whilst sites will use the implementation of the product to introduce the batch scheduler, it is apparent that the scheduler is best used for enterprise
wide scheduling.
Note: Do not attempt to edit the properties files directly. These are rebuilt from the .template files each time
initialSetup[.sh] is executed (i.e. when patches are applied) and all changes WILL BE lost. All changes
should be made in the threadpoolworker.properties.template and
submitbatch.properties.template respectively, and initialSetup[.sh] executed after each change. Refer to
the Operations And Configuration Guide, Batch Server Administration Guide or Server Administration Guide for your
product for details of how to implement custom templates.
Note: Keep backups of your .template files. Patches and Service packs will overwrite these files with defaults.
When this happens, check the new .templates for any NEW settings which may be necessary for the newly patched
environment(s) prior to restoring your site-specific configuration.
» The threadpoolworker[.sh] and submitjob[.sh] have internal defaults that are used if no configuration
files exist. These are not recommended to be used, except in development testing, as these defaults are usually
not recommended for production use.
» The internal defaults may be overridden by a properties file associated with the utility. The
threadpoolworker[.sh] utility uses the file $SPLEBASE/etc/threadpoolworker.properties. The
submitjob[.sh] utility uses the file $SPLEBASE/etc/submitbatch.properties. Default templates for
these files are provided with the product.
» For the submitjob[.sh] utility, it is possible to create a job specific configuration file which will contain only
those characteristics unique to a particular background process. This configuration file is usually named
<batchcode>.properties or <batchcode>.properties.xml. The xml version is primarily provided for
character sets other than Western European 4.
» The threadpoolworker[.sh] and submitjob[.sh] support command line options to override previous
configuration settings.
The figure below illustrates the hierarchies:
4 Western European character set includes USA, Canada, Australia, Europe (except Eastern Europe) and New Zealand.
Overridden by Overridden by
threadpoolworker.properties submitbatch.properties
Overridden by Overridden by
Execution Modes
Note: The CLUSTERED execution mode applies to Oracle Utilities Application Framework V2.2 SP7 and above only.
Note: A new mode has been introduced called EXTENDED mode that is available for Oracle Utilities Application
Framework V4.1 and above only after applying patch 1173516.
The execution method is specified in both the threadpoolworker.properties file and the
submitbatch.properties file as runtime defaults and are established at configuration time with the
configureEnv.sh utility. This can be overridden at runtime using either submitjob[.sh] and
threadpoolworker[.sh] with the -e option.
The THIN execution mode executes a single thread of a single job in a single JVM. This is primarily designed to be
used by developers to test their batch processes during initial development and testing activities. It can be used for
other uses in an implementation (testing etc) but is seen as inefficient compared to other methods due to the fact
that a JVM is required per thread per job.
The DISTRIBUTED execution mode (also known as Classic mode) 5, allows numerous threads from numerous jobs
to be execute by one or more JVM's known as threadpools. A threadpool is made up of worker nodes that process
work as instructed from submitter nodes.
Each worker node offers n number of threads to the grid in a specific thread pool (where n is the number of
concurrent threads the worker can run), and creates and takes out a lease on a THREAD_OFFER entry on the
F1_TSPACE_ENTRY database table to register and stay alive as a participant in the grid. Similarly, a submitter node
leases a WORK_OWNER entry on the table to register its participation. By default, these leases are renewed every 20
seconds by updating a lease expiry timestamp on the respective row. There is therefore some overhead involved,
and it requires adequate database response times to function properly.
5 The DISTRIBUTED mode is the default mode from Oracle Utilities Application Framework 2.0, 2.1 and 2.2.
A submitter node inserts a GRID_WORK entry into the table to notify the grid that there's work to be done. This entry
contains the batch code and batch parameters for the job, which enables the worker that picks up the request to
execute it. The worker takes out a lease on this GRID_WORK entry to signal its intent to process the work, and
continues to initiate the individual threads for the job. Once the threads have ended, the worker inserts a
WORK_ENDED entry into the table, which notifies the submitter that its work has been processed.
Note: This is a highly simplified overview of the process. In reality, the worker itself also creates GRID_WORK and
WORK_ENDED as well as PER_STATE entries for the individual threads when those are initiated. There is also a
HousekeepingDaemon process that has its own GRID_WORK entry and which periodically monitors the
F1_TSPACE_ENTRY for expired leases, etc., as well as an optional SchedulerDaemon which looks for submissions
from the online system.
GRID_WORK
WORK_ENDED
For publishing purposes, the following facilities are not depicted in the above figure:
» The PER_STATE rows that contain the ThreadWorkUnit entries (i.e. the data) for each submitted thread.
» The HouseKeapingDaemon. This singleton periodically monitors the F1_TSPACE_ENTRY table for expired
entries and deletes them. It also has a GRID_WORK entry on the F1_TSPACE_ENTRY table.
» The SchedulerDaemon. This singleton looks for submissions from the online system by polling the
CI_BATCH_JOB table used for online submission. If found, it submits the job to the grid. It also has a
GRID_WORK entry on the F1_TSPACE_ENTRY table.
The figure below illustrates the flow of data in DISTRIBUTED execution mode:
SubmitBatch
StandaloneExecuter
DistributedGridNode
StandaloneExecuter
Grab Work SpaceManager
SpaceJDBC
DistributedGridNode SpaceChangePoller
DistributedJobExecuterStub ThreadPool
ThreadOfferManager
Execute
WorkProcessor AbstractGridWork
Thread
Execute Job AbstractBatchWork
JavaJob CobolJob
GRID_WORK AbstractGridWork
Executer Executer
Submit Job JobExecuterWork
BatchJobExecuter
DistributedJobExecuter Read Thread Work
Submit Threads JavaJob CobolJob
Executer Executer
PER_STATE
Store Thread Work
In DISTRIBUTED mode, the submitter node serializes the appropriate GridWork entry to the F1_TSPACE_ENTRY
table, and then waits for the job completion WORK_ENDED entry.
The worker performs the job portion of the execution by calling either JavaJobExecuter or CobolJobExecuter,
in which the initialization of the job takes place. In the case of JavaJobExecuter, the application’s getJobWork
method is invoked and the thread work units that it collects are divided into the thread chunks and stored in
F1_TSPACE_ENTRY PER_STATE rows. The worker then submits the threads by serializing the thread GridWork
entries to F1_TSPACE_ENTRY GRID_WORK rows so that a similar path is followed to execute the individual threads.
If a worker node drops off unexpectedly, as happens if a program crashes, the submitter node is not made aware of
it immediately. To the user it looks like the job is still in progress. When the worker is restarted, the submitter does
get notified at that point, which is not ideal.
Note: In this case the worker node logs the frequently misunderstood message, "Maximum number of grid work
failures was reached (1)".
If a submitter node drops off, the worker nodes are not aware of that and will continue to process the job to the end.
This is a lesser problem, but is also not ideal.
A worker or submitter node’s health is highly dependent on good, consistent database response. The database, by
nature, is volatile and response times can vary, and it causes intermittent problems that are difficult to troubleshoot.
The most common error is one that indicates that a lease cannot be renewed, and the Oracle Utilities Application
Framework can and does self-correct most of the time, but it very often leads to further problems.
Oracle Coherence provides a grid-enabled implementation of the IBM and BEA CommonJ Work Manager, which is
the basis for JSR-237. Using a Work Manager, the product can submit a collection of work (a job or set of threads)
that needs to be executed. The Work Manager distributes that work in such a way that it is executed in parallel,
typically across the grid. In other words, if there are ten work items submitted and ten servers in the grid, then each
server will likely process one work item. Further, the distribution of work items across the grid can be tailored, so that
certain servers (e.g. one that acts as a gateway to a particular mainframe service) will be the first choice to run
certain work items, for sake of efficiency and locality of data. The application can then wait for the work to be
completed, and can provide a timeout for how long it is willing to wait.
When a worker starts, the initialization is similar to DISTRIBUTED mode, except that a Coherence based cluster
node is started by creating a WorkManager for each thread in the thread pool. In Coherence these are known as
services, and they specify the number of threads that are offered in that service. This is exactly the same as the
existing concept of thread pools. The objects needed for classic DISTRIBUTED mode to manage thread pools, poll
for work, create and renew leases, and so on, are not initialized as that is redundant in a Coherence cluster.
An important difference to note here is that a CLUSTERED worker plays a far more passive role in the batch grid. A
DISTRIBUTED worker proactively polls for new work, whereas a CLUSTERED node waits for work to be handed to it
from a submitter.
A CLUSTERED submitter also goes through the same initialization as DISTRIBUTED mode and, like the worker,
starts a WorkManager instance to join the cluster. This WorkManager is created with a thread number
specification of zero, which indicates to Coherence that it is a client.
The job is then submitted by scheduling a serializable Work object. This object should be easily created from the
existing JobExecuterWork. This is a WorkManager schedule call, which serializes the object to its destination
worker node and waits for it to finish. This call will perform the "job" portion of the run by calling the standard
JavaJobExecuter or CobolJobExecuter appropriately.
Where the CLUSTERED implementation changes significantly from the DISTRIBUTED implementation is in the
submission of the threads. In DISTRIBUTED mode, a worker submits the threads, in effect becoming a submitter
(i.e. client). In contrast, the CLUSTERED implementation will make this the responsibility of the submitter node, so
that it is physically networked with its worker nodes. This will result in appropriate notifications in the event of nodes
dropping off. The figure below illustrates the concept:
6 CLUSTERED mode is only available with Oracle Utilities Application Framework 2.2 and above. For Oracle Utilities Application Framework 2.2
customers, refer to the Batch Operations and Configuration Guide for the appropriate patches to apply to enable CLUSTERED mode.
StandardExecuter
submitter=true
batchCode=JOB1
theadPool=DEFAULT
threadCount=4
Cluster
Machine Machine
StandardExecuter StandardExecuter
submitter=false submitter=false
theadPool=DEFAULT theadPool=DEFAULT
threadOffered=4 threadOffered=2
F1_TSPACE_ENTRY
From the batch application’s perspective, the underlying technology is completely transparent. Whether the
execution mode is THIN, DISTRIBUTED or CLUSTERED has no effect on the application program’s logic.
The CLUSTERED mode utilizes the Oracle Coherence NamedCache feature as a way of sharing information with
members of the cluster. There are two NamedCaches, one for job submission and the other for service information.
The BatchClusterCache moderates all insertions and deletions to these caches.
The process flow for the CLUSTERED mode is illustrated in the figure below:
AbstractGridWork
StandaloneExecuter StandaloneExecuter
AbstractBatchWork
Execute
Schedule
NamedCache NamedCache
Service Information Cache Job Submission Cache
The initialization is then similar to DISTRIBUTED mode, except that a ClusteredNode is created instead of a
DistributedGridNode. The Clustered node implements the MemberListener interface to listen for member
events, and the MapListener interface to listen for insertion and deletion of a NamedCache.
During the initialization process, the ClusteredNode creates a BatchWorkManager for each thread pool
specified in the properties file or command-line argument. With each BatchManager created, the ClusteredNode
is registered as the MemberListener so that it will be notified of any member events, and an entry is inserted into
the service information cache. The service information cache contains all information needed for a particular service
(thread pool), for example what each thread in that particular thread pool is currently running.
The ClusteredNode then initiates the job submission by inserting an entry into the job submission cache. The
submitter waits for job completion by monitoring the job submission cache. The insertion of the job into the job
submission cache fires an entry-inserted event that gets processed by all ClusteredNodes that are servicing the
particular pool name specified for that job submission. A node ‘acquires’ the job by being the first to update the job
submission entry with its member id. The node executes the SubmitBatchRun session executable and schedules
each work entry using a BatchWorkManager client. Each work entry is only scheduled as a thread becomes
available.
P ro c e s s
When the BatchWorkManager processes work, it updates the service info cache and the job submission cache to
specify the member id doing the processing and batchThreadId being processed. When the work is complete,
that information is removed. Once all the work is completed, the job status of the job entry in the submission cache
is changed to ended, signaling to the submitter that the job completed.
If a worker goes down, a member left event is generated and processed by all registered ClusteredNodes. The
submitter uses this event to get the pending work list of the member and, once the job is complete, the submitter
uses this pending work list to update the thread status to error and to report that a worker unexpectedly went down.
If a submitter goes down, a member left event is also generated for all registered ClusteredNodes. All the nodes
processing any work for that particular submitter immediately cancel the corresponding threads for the submitter and
update each thread’s status to error. The submitter itself then terminates with a non-zero exit code.
Me m b e r Va lid a tio n
An important operation is to validate that any node that joins a grid is supposed to join that particular grid, and not
some other. This is critical to ensure that a submitter node is joined to its intended cluster before submitting a job to
run against that environment. It would be disastrous if, for example, an archiving job meant to run against a test
system were inadvertently submitted to the production system.
To prevent this:
» When joining a cluster, a basic handshake protocol is used to validate that the new member is connected to the
same database as the other members in the cluster. The new member inserts a unique MEMBER_VALID entry in
the F1_TSPACE_ENTRY table and then waits for confirmation from an existing member that it saw that same
entry.
Note: This property is optional for Oracle Coherence, and is therefore not truly mandatory in that an error will be
reported if omitted, but the validation as described above ensures against arbitrary unions in a cluster. A unique
value for it must be specified for multiple environments to separate the clusters and to avoid cluster validation errors.
S c h e d u le r Da e m o n
The scheduler daemon is what enables online job submissions to be processed. It polls the CI_BATCH_JOB table
for Pending entries and submits them to the batch cluster by inserting appropriate entries into the job submission
cache and then polling for completion. This process is exactly as described above for job submitters, however a
scheduler daemon also updates the status to Ended on the CI_BATCH_JOB tables when the job (i.e. last thread)
has finished.
Note: This does not indicate success or failure, but merely whether the threads for the job have ended. The job and
thread statuses are held on the CI_BATCH_RUN and CI_BATCH_THD (Batch Run Tree) tables.
The scheduler daemon is a singleton process (i.e. exactly one of them running on the cluster at any one point) with
appropriate failover in the event of the hosting worker dropping off. If this happens, another worker that is daemon-
enabled will become the host. It is therefore advisable to set the property
com.splwg.batch.scheduler.daemon to true (or use the -d Y command-line option) to enable this failover
capability.
An online application server may also host the scheduler daemon, even if it does not host a batch worker in the
cluster. In other words, these properties, in the spl.properties file for the online application server are perfectly
acceptable:
com.splwg.grid.online.enabled=false
com.splwg.batch.scheduler.daemon=true
Starting with Oracle Coherence 3.1 you will receive the following warning if the operating system failed to allocate
the full size buffer.
AIX no -o rfc1323=1
no -o sb_max=4194304
Note: AIX only supports specifying buffer sizes of 1MB, 4MB, and 8MB. Additionally there is an issue with IBM's
1.4.2, and 1.5 JVMs which may prevent them from allocating socket buffers larger then 64K. This issue has been
addressed in IBM's 1.4.2 SR7 SDK and 1.5 SR3 SDK.
Note: Network protocols are more effective when the cluster is across more than one machine. On a single machine
there is little difference between the network protocols.
One of the configuration questions that need to be considered with the CLUSTERED execution mode is whether you
will use multicast or unicast for the threadpools. By default, a multi-cast based configuration is provided by the
configuration files provided with the installed product.
When deciding whether to use multicast or unicast the following advantages and disadvantages should be
considered:
Multicast (default) Only have to submit to one active node in Network traffic between clusters is via multi-cast
cluster. address.
Threadpools can be clustered and Work Some sites do not like the multicast protocol
Manager can load balance across them.
Threadpools communicate across cluster
Clusters can be shared or dedicated
Cluster nodes can be added dynamically for
load fluctuations
Unicast Can submit to specific nodes (micro Each node has to be defined to other nodes (no
management) dynamic node support)
Clusters can be shared or dedicated Increased configuration requirements (Well Known
Minimal interaction between nodes Address support)
Nodes should be on different machines (current
limitation only)
Multicast tangosol.coherence.clusteraddress
tangosol.coherence.clusterport
Unicast tangosol.coherence.localhost
tangosol.coherence.localport
tangosol.coherence.wka
tangosol.coherence.wka.port
Please refer to the Batch Operations And Configuration Guide, Batch Server Administration Guide or Server
Administration Guide for your product for details of these settings.
For more information about multi-cast and uni-cast see the following sites:
With the advent of the CLUSTERED execution mode, existing customers of the Oracle Utilities Application
Framework using the DISTRIBUTED execution mode can migrate to the CLUSTERED execution mode. To migrate to
CLUSTERED execution mode the following must be performed:
Batch RMI Port Default JMX port for threadpool. Must be unique per environment
Refer to the Batch Operations And Configuration Guide, Batch Server Administration Guide or Server Administration
Guide with your product for suggested values for these parameters.
» Execute the initialSetup[.sh] utility to reflect the changes in the product. This may require additional steps
to implement the change for selected platforms. Refer to the Batch Operations And Configuration Guide, Batch
Server Administration Guide or Server Administration Guide with your product for additional advice.
» Very few changes have been made to threadpoolworker.properties and submitbatch.properties.
The tangasol.coherence.* should be set in these files.
» Remove any custom com.splwg.grid.executionMode settings from any job specific configuration files (if
used).
» If your site wishes to use unicast rather than multicast then alter the threadpoolworker.properties and
submitbatch.properties files manually as outlined in the Batch Operations And Configuration Guide, Batch
Server Administration Guide or Server Administration Guide provided for your product version.
» You are now migrated from DISTRIBUTED to CLUSTERED execution mode.
Note: In Oracle Utilities Application Framework V4.1 and above , the tangasol parameters have been
moved to tangasol-coherence-override.xml. Refer to the Batch Server Administration Guide or Server
Administration Guide provided with your product for more information.
» If the threadpoolworker JVM is killed or crashes each related submitter node that was running in that
threadpoolworker is immediately terminated with a non-zero return code and the relevant batch run tree entries
are set to Error status. In DISTRIBUTED mode, a submitter node waiting for work to finish is not made aware of
an event such as a kill or JVM crash until the relevant threadpoolworker is restarted.
» The submitted node can be gracefully ended by killing the submitter node process on the operating system. The
affected threads are systematically cancelled and the relevant batch run tree entry statuses updated with
appropriate messages. This is not possible in DISTRUBUTED mode. JMX can be used in DISTRIBUTED, as it
also can in CLUSTERED mode, but to cancel an entire job requires using the JMX calls to find the appropriate
threadpoolworker(s) where the threads are running and cancelling each submitter thread individually.
» Database access is minimized. In DISTRIBUTED mode the entire grid is controlled through a single database
table (F1_TSPACE_ENTRY). This table continually gets polled by listeners for newly submitted work and work that
ended, as well as lease renewal agents that manage the leases for the active nodes. This polling does result in a
significant number of additional database calls and grows as more nodes join the grid. In contrast, CLUSTERED
mode uses shared cache for clustering, which in turn controls membership, so the F1_TSPACE_ENTRY table is
only used in a very minimal capacity.
» Lease renewals issues removed. In DISTRIBUTED mode the lease renewals relied upon good database
response. If the database experienced high demand these renewals would error which could lead to incomplete
threads. In CLUSTERED mode the last renewals are cache based and do not depend on the database.
Note: These settings only apply to a single copy of the product on a single machine. They are suggested for
demonstration or training purposes only and are not recommended for production.
Timed Batch
Note: This facility applies to Oracle Utilities Application Framework V4.0 and above only
One of the features of the Oracle Utilities Application Framework is the ability to support continuous or timed batch
processes. For example, there are monitor processes built into the product. These monitor processes track status
and business rules for specific objects and then process the data according to the status and object configuration.
This is typically is state transition where an object is moved from one state to another via this monitor process
according to the object specification. This monitor process will probably be more effective in this situation if it ran
continuously.
The facility consists of a Batch Control Type of Timed and a series of attributes to indicate the attributes of the batch
process at execution time. The figure below illustrates the additional attributes available when Batch Control Type is
set to Timed:
The term continuous and timed use the same facilities within the batch framework but have different processing
flows:
Note: If you intend to run Timed or Continuous Batch for your any of your monitor jobs then it is recommended that
an instance of the DEFAULT threadpool be made available to execute the timed or continuous jobs.
» To stop a timed or continuous running batch job then it is recommended to change the Timer Active value on the
Batch Control to No. The job will not restart the next timer interval.
» As continuous or timed batches are run automatically then additional information must be provided on the batch
control:
» User – The user to use for execution of the batch process. This user must have security access to the
necessary objects accessed by the batch process.
» Batch Language – The language for any messages.
» Email Address (optional) – The email address to send the output on completion of execution.
» Thread Count – Number of threads to allocated to the job
» Override Number Records to Commit (optional) – Commit interval for job. This applies to the complete
execution of the job regardless of what part of the day it is submitted.
One of the new features of the CLUSTERED approach is the ability to optimize Coherence for the batch activities you
are performing on the product environment. A new configuration setting has been added to Oracle Utilities
Application Framework V4.0.2 and above, where the mode in which Coherence operates can be set. The setting is
tangosol.coherence.mode set in the submitbatch.properties and threadpoolworker.properties.
The setting has valid values of prod for Production and dev for Development. These modes do not limit access to
features, but instead alter some default configuration settings. For instance, development mode allows for faster
cluster startup to ease the development process.
The Oracle Coherence mode setting should be set using the following guidelines:
» During non-production activities not involving a cluster, the mode should be set to dev. It is recommended to use
the development mode for all pre-production activities, such as development and testing. This is an important
safety feature, because Coherence automatically prevents these nodes from joining a production cluster.
» Ensure that the CLUSTERED settings for each environment are unique across the all servers in a network. If you
are sharing multiple batch servers as a single virtual environment then they can share CLUSTERED settings but
each server will need to be identified in the configuration as unique.
» Record the use of the ports used for CLUSTERED mode according to your site standards. For example, if your site
requires that all ports are listed in the /etc/services file then this file should be updated with the port numbers
used.
Note: In Oracle Utilities Application Framework V4.1 and above , the tangasol parameters have been
moved to tangasol-coherence-override.xml. Refer to the Batch Server Administration Guide or Server
Administration Guide provided with your product for more information.
If at any stage the database is shutdown and restarted (i.e. recycled) any active threadpools will become zombies.
This means the threadpools will not process any work and seem to be looping. This can be addressed by changing
the tolerances for the database connection to reconnect successfully after recycling.
In the spl.properties for the batch component the following parameters need to be changed to address
connectivity for database recycles:
EXTENDED Mode
Note:This mode is only supported for specific situations and is only available for Oracle Utilities Application
Framework V2.2 and above via Patch 1173571 (FW4.1) and Patch 11683404 (FW2.2) .
In most cases the CLUSTERED mode is sufficient for most background processing needs but if you require a large
number of submitters then you might want to consider a new mode called EXTENDED. This mode utilizes the
Coherence*Extend features to provide an alternative way for threadpoolworker and submitters to be setup.
Note: EXTENDED mode should only be used if there are a large number of submitters used and the CLUSTERED
mode is reporting communication delays. Communication delays can occur when large numbers of submitters
complete.
Note: The same threadpoolworker can be used for both CLUSTERED and EXTENDED mode if desired.
The idea is to define threadpoolworkers in your configuration as proxies and then define the submitters to point to
those proxies to implement the mode. The proxies are defined in the coherence-cache-config.xml
configuration files and then augment the local submitbatch.properties files (using an external configuration
file) on each node to point to the proxy servers defined. The examples below illustrate a single proxy but it is
possible to define multiple proxies to the submitbatch.properties to failover balance across proxies.
<fullpath> Full path to configuration file on local machine including configuration file name (usually
extend-client-config.xml)
Note: This change alters the way that batch programs initialize context as well as ensure SQL statements are closed
at appropriate times. If this causes some custom programs issues then it is recommended to add the
com.splwg.submitbatch.useOldExitCodeHandling=true to the submitbatch.properties file. This
will enforce the older Exit code handling within the execution of the program.
Note: When executing in EXTENDED mode a message outlining that the cache configuration file has been loaded will
appear in the threadpoolworker log files.
Threading Overview
One of the major features of the batch framework is the ability to support multi-threading. The multi-threading
support allows a site to increase throughput on an individual batch job by splitting the total workload across multiple
individual threads. This means each thread has fine level control over a segment of the total data volume at any
time.
The idea behind the threading is based upon the notion that "many hands make light work". Each thread takes a
segment of data in parallel and operates on that smaller set. The object identifier allocation algorithm built into the
product randomly assigns keys to help ensure an even distribution of the numbers of records across the threads and
to minimize resource and lock contention.
The best way to visualize the concept of threading is to use a pie analogy. Imagine the total workset for a batch job
is a pie. If you split that pie into equal sized segments, each segment would represent an individual thread.
Note: The elapsed runtime of the threads is rarely proportional to the number of threads executed. Even though
contention is minimized, some contention does exist for resources which can adversely affect runtime.
» Threads can be managed individually – Each thread can be started individually and can also be restarted
individually in case of failure. If you need to rerun thread X then that is the only thread that needs to be
resubmitted.
» Threading can be somewhat dynamic – The number of threads that are run on any instance can be varied as
the thread number and thread limit are parameters passed to the job at runtime. They can also be configured
using the configuration files outlined in this document and the relevant manuals.
Note: Threading is not dynamic after the job has been submitted.
» Failure due to data issues with threading is reduced – As mentioned earlier individual threads can be
restarted in case of failure. This limits the risk to the total job if there is a data issue with a particular thread or a
group of threads.
» Number of threads is not infinite – As with any resource there is a theoretical limit. While the thread limit can be
up to 1000 threads, the number of threads you can physically execute will be limited by the CPU and IO
resources available to the job at execution time.
Parameter Guidelines
Given the flexibility of the overriding configuration settings, there are a number of permutations and combinations of
configuration files and command line options to suit your sites needs. The following guidelines may assist in
deciding the optimal mix for your site:
» Internal defaults should not be relied upon for non-development use. They are provided for developers to unit test
their code for various testing techniques.
» The threadpoolworker.properties and submitbatch.properties files should represent your sites
global parameter settings. See Commonly Used Configuration Settings for guidelines of what commonly is set in
those files.
» For the submitjob[.sh] utility the provision of a job specific parameter file is not necessary in most cases. The
following are the only exceptions to this rule:
» If the background process requires additional parameters then a job specific parameter file is required.
» If any parameter on the job must be overridden on a regular basis then a job specific parameter file should be
created and only contain the parameters that are to be overridden.
» The command line options should only be used for reruns or off schedule (a.k.a. special runs). This avoids
updating the configuration files for one-off or special processing. The only exception to this rule is that the
Business Date parameter should be specified on the command line to avoid the past midnight issue. For more
details of this see Scheduler Implementation Guidelines.
» While it is possible to override most parameters on the command line, it is not desirable to do so as this is not
efficient. Configuration files are designed to minimize the need for command line overrides unless such overrides
are applicable to the particular execution of the background process.
» Command line options are the only option to be used if threads in a background process need different
parameters. For example, if the record range is available as a parameter then the command line options must be
used to specify that record range on the command line.
While the values are stored in the Batch Control they are not used by the Scheduler Submission Method as the
configuration files are the source of the configuration parameters. This information needs to be extracted to the
configuration files from the batch control.
Note: The parameters used by the Batch Control are stored in the CI_BATCH_CTRL_P table which has Batch Code
(BATCH_CD) and Batch Parameter Name (BATCH_PARM_NAME) as keys. The parameter value is stored in column
BATCH_PARM_VAL. It is not recommended to use BATCH_PARM_NAME='MAX-ERRORS'. Refer to Setting The Error
Tolerance for details. Remember ANY parameter must be in job specific configuration files in the format:
com.splwg.batch.submitter.softParameter.<BATCH_PARM_NAME>=<BATCH_PARM_VAL> and do not
include Blank values.
The reason the Batch Control is not referenced is that the Batch Control is managed by the business, where the
configuration files are under IT control. Typically, Change Management principles separate the responsibility for
these elements. The primary reason is to isolate the physical system from un-intentional changes to parameters by
the business.
» You need to run an execution of a program where you do not want to alter the existing batch control. The batch
control holds the run number for the next execution and there may be a business situation where you need to run
a one-off execution and do not want it to affect the batch number.
» You want to execute a business process a number of times with different parameters. Some customers use a
common extract format (via a common extract program) but run it multiple times with different parameters to send
to individual interface targets. For example, you may decide a common format for sending collection information
to a number of collection agencies. In this case, a common program would be written and a batch control created
per collection agency that needed the information. This would allow tracking at an individual collection agency
level and a separation of the execution of the process.
» You have a new background based business process you want to trial before you replace an existing background
process. This allows parallel execution.
Note: Environmental settings such userid, file location and file names are exceptions to this guideline.
» It is not recommended to set MAX-ERRORS at a global level. This should only be specified at an individual thread
of a background process level. Refer to Setting The Error Tolerance for more details.
» The Userid used for background processes should be specified at a global level (in submitbatch.properties
configuration file). This userid is used to mark records processed with this userid and security.
» It is not recommended that the default SYSUSER be used as the userid specified in background processes.
SYSUSER is the initial default user for the Oracle Utilities Application Framework and is only used to input
additional user in the initial stages of configuration. Most customers create a dedicated user record (such as
BATCH) to delineate background processes from online processes.
» The promptForValues parameter should be set to false in the submitbatch.properties configuration
file. Alternatives are only used for development purposes.
» The executionMode parameter should be set to CLUSTERED in the submitbatch.properties and
threadpoolworker.properties configuration files. For customers using versions of Oracle Utilities
Application Framework not supporting the CLUSTERED execution mode, then an executionMode of
DISTRIBUTED should be used. Refer to the Batch Operations And Configuration Guide, Batch Server
Administration Guide or Server Administration Guide supplied with your product for details of this setting.
» The distThreadPool parameter in the submitbatch.properties configuration file should be set to a
common site specific default threadpool used in your implementation. Refer to Designing Your Threadpools for
more details.
To support a more realistic tolerance, it is possible to set a limit on the number of errors tolerated before the process
should be cancelled. At the thread level, the MAX-ERRORS parameter 7 can be used to specify a thread level error
tolerance where the thread will be cancelled when the tolerance is reached or exceeded.
The default value of MAX-ERRORS is 0 (zero) which turns off the facility. Any appropriate non-zero value will then
become the error tolerance limit for the thread.
Setting the appropriate value for your site will require business approval in line with your sites organization business
practices.
Note: The MAX-ERRORS parameter is applicable to business errors only. System errors (SEVERE errors) will
terminate the process immediately.
Multi-threading Guidelines
7 The parameters MAX-ERRORS, MAX_ERRORS and maxErrors are all acceptable names for this parameter. For publishing purposes MAX-ERRORS
will be used.
A rule of thumb that may be used is to have three (3) threads per core available. For example if you have a quad
core processor, you can run twelve (12) threads to begin your testing.
This is a rule of thumb because the footprint of each process is different (heavy versus light) and is dependent on
the data in your database. Your hardware configuration (i.e., number of processors, speed of your disk drives,
speed of the network between the database server and the application server) also has an impact on the optimal
number of threads. Please follow these guidelines to determine the optimal number of threads for each background
process:
» Execute the background process using the number of threads dictated by the rule of thumb (described above).
During this execution, monitor the utilization percentage of your application server, database server and network
traffic.
» If you find that your database server has hit 80-100% utilization, but your application server hasn’t one of the
following is probably occurring:
» There may be a problematic SQL statement executing during the process. You must capture a database trace to
identify the problem SQL.
» It is also possible that your commit frequency may be too large. Commit frequency is a parameter supplied to
every background process. If it is too large, the database’s hold queues can start swapping. Refer to Parameters
Supplied to Background Processes for more information about this parameter.
» It is normal if you find that your application server has hit 80-100% utilization but your database server has not.
This is normal because, in general, all processes may become CPU bound and not IO bound. At this point, you
should decrease the number of threads until just under 90-100% of the application server utilization is achieved.
And this will be the optimal number of threads required for this background process.
» If you find that your application server has NOT hit 80-100% utilization, you should increase the number of
threads until you achieve just under 90-100% utilization on the application server. And remember, the application
server should achieve 80-100% utilization before the database server reaches 100% utilization. If this proves not
to be true, something is probably wrong with an SQL statement and you must capture an SQL trace to determine
the culprit.
Note: For the Windows platform, the CPU should not exceed 70-80% to provide enough additional CPU for the
operating system to process.
» Another way to achieve similar results is to start out with a small number of threads and increase the number of
threads until you have maximized throughput. The definition of throughput may differ for each process but can be
generalized as a simple count of the records processed in the Batch Run Tree. For example, in the Billing
background process in product, throughput is the number of bills processed per minute. If you opt to use this
method, it is recommended that you graph a curve of throughput vs. number of threads. The graph should
display a curve that is steep at first but then flattens as more threads are added. Eventually adding more threads
will cause the throughput to decline. Through this type of analysis you can determine the optimum number of
threads to execute for any given process.
There are two parameters that can be altered to control the amount of resources a background process uses when it
executes. While most implementations reuse the default values supplied with the product, it is possible to alter the
values to tune the performance of background processes and allow background processes to be executed during
peak application usage periods.
The higher commit frequency the value specified the less commit points are taken. Typically commit points can be
expensive transactions in a database so most implementations try and minimize them. The lower value the more
commit points. The latter is desirable when you are running background during online hours as it reduces the
resource hit the background process has on online processes.
Specifying a large commit frequency can cause larger than normal rollbacks to be performed by the database. This
can cause a strain on the database and hardware. The longer the unit of work the more work the database has to do
to roll it back in case of a failure.
The commit interval can have advantages and disadvantages depending on the situation and the value:
High value for Commit Frequency Low values for Commit Frequency
Less Commits in process (Less checkpoints). More Commits in process (More frequent checkpoints).
Larger Unit of Work Smaller Unit of work
Lower Concurrency (higher impact on other users) Higher Concurrency (lower impact on other users)
Longer rollback in case of failure Shorter rollbacks in case of failure
Can increase throughput on lightly loaded system Can allow background to work harmoniously with online.
The second parameter Cursor Reinitialization Time (in minutes) controls how long a solution set is held by the
process. When Oracle database processes a set of records it typically holds a snap shot of these records to save
processing time. If the set is held too long, the records may not reflect the state of the database so it is a good idea
for the Oracle database to maintain the currency of this data regularly. Within the background process this is
controlled by setting this value to prevent the snapshot being discarded by Oracle database and causing an abort. If
the records are held too long an ORA-1555 Snaphot too old error is generated and the process aborts.
The Cursor Reinitialization and Commit interval parameters are tunable parameters to affect the impact of the
background processes on the other processes running and prevent internal database errors. It is also important to
understand their impact to ascertain whether any change is required. The following rules of thumb apply to setting
the values:
» It is recommended that the Commit Interval should not be set to one (1) as this value may causes excessive
database I/O and therefore performance degradation.
» For light jobs (short duration, single threaded, small numbers of records etc), the default value for Commit Interval
may satisfy your site performance requirements.
» For heavy jobs (long duration, multi-threaded, large number of records etc), then a value for Commit Interval of
between 5 (five) to 20 (twenty) is recommended.
» The value of the Commit Interval directly affects the size of the redo logs allocated to the database. The higher
the commit interval the larger the redo logs need to be to hold the inprocess objects. Work with your sites DBA
groups to come up with a compromise between redo logs and commit interval.
During processing of any background process a main object is used to drive the process. For example in BILLING
the main object is Account. The BILLING process loops through the accounts objects as it processes. For other
processes it is other objects that are considered the main object. This main object type is used to determine when a
transaction is complete.
» When a certain number of main objects have been processed then a database commit is issued to the database.
This number is the Commit Interval. The larger the commit interval the larger the amount of work that the
database has to keep track of between commit points.
» The Cursor Reinitialization parameter is used to minimize issues in the Oracle database where the unit of work is
so large it causes a "Snapshot too old". The Oracle database stores undo information on the Rollback Segment
and the read consistent information for the current open cursor is no longer available. This is primarily caused
when the Oracle database recycles the Rollback Segment storage regularly. In Oracle Utilities Application
Framework based products this is prevented by reinitializing the cursor on a regular basis to prevent an error.
When this timeout, known as the Cursor Reinitialization, is exceeded then at the end of the current transaction a
commit will be issued.
» At any time in a process a commit for objects processed may be caused by the reaching the Commit Interval or
the time limit set on Cursor Reinitialization, whichever comes first.
» The setting of Commit Frequency and Cursor Initialization has impact on the amount of memory the JVM memory
space allocated to the individual threads. Higher values of both require more memory to hold the data.
Note: The Cursor Reinitialization parameter only applies to COBOL based batch processes.
Note: If the daemon is enabled in more than JVM, then the grid will ensure that only one daemon is active at any
time. If the JVM running the daemon fails for any reason, other JVM's in the grid will assume the role of the
daemon. As long as at least one JVM is configured to accept the scheduling daemon, the daemon will run on exactly
one thread somewhere among the batch grid JVMs. For example, if there are 3 JVMs configured to accept the
scheduling daemon, the daemon may end up running in one thread in JVM #2 and not at all on JVMs #1 and #3. If
JVM #2 goes down, the scheduler daemon will start running on one thread either in JVM #1 or #3, but not both.
This ensures that the scheduler is always running, if at all possible, and duplicate submissions do not happen.
» The daemon can be executed with an existing J2EE Business Application Server shared with the online. This
reserves some capacity from the server to execute the background processes.
» The daemon can be run within a dedicated standalone batch server that is not shared with online. This allows the
execution of background processing to be processed on dedicated server hardware.
While the decision to use an online or standalone daemon is a site specific one there are a number of factors that
should be considered when making this decision:
Online Easiest to configure (Installation option). If background process misbehaves it can effect
Does not require separate resources. online.
Standalone Direct management possible (JMX or command line based). Additional configuration and management.
Use of Spacenames
By default, all batch jobs submitted are run within the same housekeeping space within the batch grid 9. This
behavior is designed for production use to maximize the efficiency and resource usage of the batch grid. During
development of batch code, it may be desirable to execute each developer's workload in their own spaces to isolate
developers from affecting each other. There is a facility within the product to override the space used by the batch
grid to allow segregation.
To do this the developer must specify the following parameter for the job (in the properties file for the individual
job):
com.splwg.grid.spaceName=<spacename>
where <spacename> is the name of the desired space.
For example:
com.splwg.grid.spaceName=TEST1
Note: This parameter is intended as a development and testing aid only. It provides a hard partition between
workers. Each space name has its own HouseKeepingDaemon and is therefore totally separate from workers with
different space names. For production purposes, distributed thread pools are more flexible and should be used
instead.
8 The LOCAL threadpool applies to Oracle Utilities Application Framework V2.1 only.
9 It is known as the MAIN space.
com.splwg.batch.submitter.maxExecutionAttempts=n
where n is a number greater than 0.
The default is set to 1 and it is highly recommended that it should be at that value unless there instructed by Oracle
Support.
Each version of java from each java vendor supports a limit of the active threads that can be supported. This value
varies from java version to java version and java vendor to java vendor. Typically this varies from 150-600 active
threads per JVM. This implies that you can run 150+ threads of background processing but this is not the case.
Java threads tend to be short lived (typically associated with online or web services style work) and the limit of java
seems to support that level. Conversely, background process threads tend to be long lived and therefore take more
resource then short lived threads. This can be explained by the fact that online transactions and web services tend
to operate on one object and background process on multiple.
It is recommended that the maximum number of threads you consider per thread pool is 8 for heavy jobs and 10-15
for light jobs.
Customers using COBOL based objects should considering minimizing the number of threads in each pool to
reduce Access Violation Errors generated by the Microfocus runtime. If your site finds that this occurs, reducing the
number of concurrent threads per threadpool can reduce the occurance of the problem.
The thread limit is set at a threadpool level as part of the threadpoolworker[.sh] utility. This thread limit is set
to prevent the JVM from using more resources than required. Exceeding this limit will result in background process
execution delays as the process waits for an available thread.
Even though this limit is set, it represents the maximum number of potential threads in the threadpool. Not all
background processes have an equal footprint on the system. Some are heavier (use more resources) than others.
The footprint of a particular background process is not measured by the volume of data but by the thruput of the
background process. The heavier background processes tend to have lower throughput rates than lighter
background processes.
The importance of the footprint is related to the number of threads that can be actually executed at any time within
the threadpool. A threadpool will process less heavier background processes and more lighter background
processes. For example, if the threadpool limit is set to ten (10) and you try and run ten (10) heavy threads then the
JVM may run the background processes slower due to threadpool having capacity issues. If you sent ten (10) lighter
threads it may process them adequately.
» During the testing phase, set all the threadpools used to the 10-15 thread limit. Refer to Designing Your
Threadpools for additional advice.
» Allocate each threads to the threadpool up to the limit. Note the run times.
» Decrease the number of threads sent to the pool and note the run times.
Note: The online daemon DEFAULT threadpool should remain at five (5).
» Separate java based processes from COBOL based processes. This will assist in micromanaging the threads in
case you need to stop threads of a background process. The jmxbatchclient[.sh] utility can only kill
individual java based background processes so to kill COBOL based background processes it is recommended to
use jmxbatchclient[.sh] to shutdown the threadpool. This will stop all threads in the threadpool but is the
only way to stop a COBOL based background process within the Oracle Utilities Application Framework.
Note: jmsbatchclient[.sh] can be used for any batch process regardless of the language used to write the
functionality.
Note: COBOL support is not available for Oracle Utilities Application Framework V4.3.x and above.
» The number of threads you are simultaneously executing at any point in the schedule will dictate the number of
threadpools to be used for your site. For example, if 80 threads are executing at any time then 8-10 threadpools
may be necessary (this is only a rough calculation).
» Group light footprint background processes into a common set small number of threadpools. Typically
background processes that operate on the same type of data are ideal. Consult your product documentation as
the background processes are usually grouped by functional area already for consideration.
» Consider splitting heavy footprint background processes into a number of threadpools. The best way to determine
this is through trial and error by determining the optimal number of threads per threadpool for that particular
background process.
» Name your threadpools appropriate for their function. The DEFAULT threadpool is reserved for the daemon use
only and should not be used if the daemon is active in that environment.
This may sound quite complicated but the process can be simplified in a number of ways:
» Java based background processes are the easiest to manage so can be grouped together in a small number of
threadpools.
» Increasing the number of threads used to limit the threads to smaller units of work is acceptable for heavy
processes as subdividing the workload into small units can lead to an increase to thruput levels. Smaller units of
work reduce the memory and CPU of the JVM by reducing work queue length.
» The large heavy background processes that require multi-threading should be separated in their own dedicated
threadpools. This threadpool can be started prior to the first thread of the background process starting and
shutdown after the last thread has completed.
Note: In most product implementations, the number of multi-threaded background processes usually represents
fewer than 10% of the total number of background processes that need to be managed.
» Shut down any threadpools no longer used by your schedule. This may reduce overall resource usage.
Th re a d p o o l Clu s te r Co ns id e ra tio n s
The CLUSTERED execution mode allows for clustering of threadpools. There are a few guidelines when using this
facility that may assist in configuration and operations:
In Oracle Utilities Application Framework V2.1 and V2.2 and above, a number of configuration settings were
introduced to optimize the memory used by the threadpool based JVMs. These settings control the behavior of the
threadpool based JVMs in terms of their memory usage.
To optimize the settings for background processing it is recommended to set these memory settings to the following:
» Releasing COBOL Memory – COBOL programs only release their thread-bound memory when the thread dies.
This thread-bound memory is primarily memory allocated by the COBOL runtime on the C heap. As threads
return to the thread pool and are used again to process calls to different COBOL programs, the memory footprint
may continue to grow as more and more different COBOL programs are called. In an online processing scenario,
this can cause memory faults in the long run as many COBOL modules are called during the availability of the
product. During background processing, this problem is somewhat reduced as the number of COBOL modules
called is much lower. Therefore the configuration settings that controls this behavior in spl.properties for
background processing should be set as follows (opposite to what is recommended for online):
spl.runtime.cobol.remote.releaseThreadMemoryAfterEachCall=false
» Minimizing Housekeeping – When using the DISTRIBUTED mode of execution, the threadpool worker polls the
F1_TSPACE* tables to check for new available jobs. By default this poll is performed every 1000ms (1 second).
This can be inefficient for the threadpool worker process, therefore it is recommended to change this tolerance to
5000ms (5 seconds) in production to reduce overheads. This can be implemented by changing the
etc\threadpoolworker.properties file and adding the configuration setting:
com.splwg.grid.polling.minMillisBetweenCycles=5000
Note: Whilst there are numerous variations available for each scenario, the samples used in this section are generic
and simplified to cover the more pertinent aspects of the specific scenario.
» Attach to the environment by issuing the splenviron[.sh] command from the relevant host. If the scenario
then this will be repeated for each host and environment to implement the change.
» The threadpool can be optionally defined in the threadpoolworker.properties file. Definition of the
threadpool in this file can be skipped if the threadpool is dynamically created using the options on the
threadpoolworker[.sh] utility. Refer to the Batch Operations And Configuration Guide, Batch Server
Administration Guide or Server Administration Guide for details of the options for this utility. The location of the
threadpoolworker.properties file varies with Oracle Utilities Application Framework versions (refer to the
Batch Operations And Configuration Guide, Batch Server Administration Guide or Server Administration Guide)
for the location of this file.
» In Oracle Utilities Application Framework V4.1 and above, the cluster configuration is defined in the tangasol-
coherence-override.xml file as opposed to the threadpoolworker.properties file used by Oracle
Utilities Application Framework 4.0.x and below versions of the framework.
The figure below summarizes the steps used in the configuration:
Start
tangasol-coherence-
Setup Clustered mode override.xml
Setup Clustered mode
End
When making changes, please ensure the configuration files conform to the expected format as outlined in the
Batch Operations And Configuration Guide, Batch Server Administration Guide or Server Administration Guide for
the version of the Oracle Utilities Application Framework used.
Note: For publishing purposes both multicast and unicast examples will be shown.
The sample consists of two hosts (host1 and host2) which house an identical copy of the product. There are a
number of threadpools (SCEN1, SCEN2 and SCEN3) spread across the hosts.
Note: For publishing we will assume each threadpool has an arbitrary limit of 8 threads.
Scenario Attributes
A Single threadpool on a single host/environment. This is a common scenario for non-production environment such
as development and initial test environments.
B Multiple threadpools on a single host/environment. This is a scenario used for testing and for executing larger
numbers of jobs and threads simultaneous.
C Multiple threadpools across multiple machines but in a single "environment" (such as production). This is a scenario
where the site may have multiple servers for an environment (e.g. production) and want to run jobs across the
machines.
S e tu p th re a d p o o ls
The first part of the process is to setup the threadpoolworker.properties file with the definitions of the
threadpools required for tour scenario. This step is optional as the threadpools can be dynamically created using the
threadpoolworker[.sh] utility.
For each threadpool on a host a definition of threadpool can exist in threadpoolworker.properties are in
form:
com.splwg.grid.distThreadPool.threads.<poolname>=<threads>
The following table shows the threadpool file parameter entries for each scenario.
A On host1:
com.splwg.grid.distThreadPool.threads.SCEN1=8
B On host1:
com.splwg.grid.distThreadPool.threads.SCEN2=8
C On host1:
com.splwg.grid.distThreadPool.threads.SCEN3=8
D On host1:
com.splwg.grid.distThreadPool.threads.SCEN1=8
com.splwg.grid.distThreadPool.threads.SCEN2=8
com.splwg.grid.distThreadPool.threads.SCEN3=8
host2:
com.splwg.grid.distThreadPool.threads.SCEN3=8
S e tu p Clu s te re d Mo d e
The next step is to define the clustered mode information for the host/environment. There are two approaches to
consider. The clustered mode can use multicast or unicast (see Clustering using Unicast or Multicast for more
information about the different modes).
This information is specified in the tangasol-coherence-override.xml file for Oracle Utilities Application
Framework V4.1x and above products or he threadpoolworker.properties file for Oracle Utilities Application
Framework V4.0.x and below (including V2.x) customers.
MulticasT Unicast
Set tangosol.coherence.cluster to the cluster name. Refer Set tangosol.coherence.cluster to the cluster name.
to the Server Administration Guide, Batch Server Administration Refer to the Server Administration Guide, Batch Server
Guide or Batch Operations and Configuration Guide for Administration Guide or Batch Operations and Configuration
recommendations. Guide for recommendations.
Set the tangosol.coherence.clusterport to a unique port Set tangosol.coherence.wka to the host name or host IP
number for the cluster for the environment. address (if DNS resolution is slow) of the hosts in the cluster
A, B host1: host1:
tangosol.coherence.cluster=FWDEMO.SPLADM tangosol.coherence.cluster=FWDEMO.SPLAD
tangosol.coherence.clusteraddress=239.128.0.10 M
tangosol.coherence.clusterport=7810 tangosol.coherence.localport=7810
tangosol.coherence.distributed.localstorage=fals tangosol.coherence.wkaport=7810
e tangosol.coherence.wka=10.1.10.1
C, D host1: host1:
tangosol.coherence.cluster=FWDEMO.SPLADM tangosol.coherence.cluster=FWDEMO.SPLAD
tangosol.coherence.clusteraddress=239.128.0.10 M
tangosol.coherence.clusterport=7810 tangosol.coherence.port1=7820
tangosol.coherence.distributed.localstorage=fals tangosol.coherence.port2=7830
e tangosol.coherence.wkaport=7810
host2: tangosol.coherence.wka1=10.1.10.1
tangosol.coherence.cluster=FWDEMO.SPLADM tangosol.coherence.wka2=10.1.10.2
tangosol.coherence.clusteraddress=239.128.0.10 host2:
tangosol.coherence.clusterport=7810 tangosol.coherence.cluster=FWDEMO.SPLAD
tangosol.coherence.distributed.localstorage=fals M
e tangosol.coherence.port2=7820
tangosol.coherence.port1=7830
tangosol.coherence.wkaport=7810
tangosol.coherence.wka2=10.1.10.1
tangosol.coherence.wka1=10.1.10.2
Therefore, if using Oracle Utilities Application Framework V4.1 and above, the tangasol-coherence-
override.xml entries for the scenarios are as follows:
<coherence>
<cluster-config>
<services>
<service id="1">
<init-params>
<init-param id="4">
<param-value system-
property="tangosol.coherence.distributed.localstorage">false</param-value>
</init-param>
</init-params>
</service>
</services>
<member-identity>
<cluster-name system-
property="tangosol.coherence.cluster">FWDEMO.SPLADM</cluster-name>
</member-identity>
<multicast-listener>
<address system-
property="tangosol.coherence.clusteraddress">239.128.0.10</address>
<port system-property="tangosol.coherence.clusterport">7810</port>
</multicast-listener>
</cluster-config>
</coherence>
<coherence>
<cluster-config>
<member-identity>
<cluster-name system-
property="tangosol.coherence.cluster">FWDEMO.SPLADM</cluster-name>
</member-identity>
<unicast-listener>
<well-known-addresses>
<socket-address id="1">
<address system-property="tangosol.coherence.wka">host1</address>
<port system-property="tangosol.coherence.wka.port">7810</port>
</socket-address>
<address system-property="tangosol.coherence.localhost">localhost</address>
<port system-property="tangosol.coherence.localport">7810</port>
</cluster-config>
</coherence>
<coherence>
<cluster-config>
<services>
<service id="1">
<init-params>
<init-param id="4">
<param-name>local-storage</param-name>
<param-value system-
property="tangosol.coherence.distributed.localstorage">false</param-value>
</init-param>
</init-params>
</service>
</services>
<member-identity>
<cluster-name system-
property="tangosol.coherence.cluster">FWDEMO.SPLADM</cluster-name>
</member-identity>
<multicast-listener>
<address system-
property="tangosol.coherence.clusteraddress">239.128.0.10</address>
<port system-property="tangosol.coherence.clusterport">7810</port>
</multicast-listener>
</cluster-config>
</coherence>
<cluster-config>
<member-identity>
<cluster-name system-
property="tangosol.coherence.cluster">FWDEMO.SPLADM</cluster-name>
</member-identity>
<unicast-listener>
<well-known-addresses>
<socket-address id="1">
<address system-property="tangosol.coherence.wka1">host1</address>
<port system-property="tangosol.coherence.wka1.port">7810</port>
</socket-address>
<socket-address id="2">
<address system-property="tangosol.coherence.wka2">host2</address>
<port system-property="tangosol.coherence.wka2.port">7820</port>
</socket-address>
</well-known-addresses>
</unicast-listener>
<address system-property="tangosol.coherence.localhost">localhost</address>
<port system-property="tangosol.coherence.localport">7810</port>
</cluster-config>
</coherence>
S ta rtin g th e th re a d p o o ls
The key now is to start the threadpools using the threadpoolworker[.sh]. Generally the command:
threadpoolworker[.sh]
is sufficient to start all the threadpools in the threadpoolworker.properties file but if you are staring multiple instances
of the same pool (SCEN2) then you need to run additional explicit commands for each instance. For example for
SCEN2:
JMX Monitoring
Note: The JMX capability is only available for Oracle Utilities Application Framework V2.2 and above only.
With the implementation of DISTRIBUTED and CLUSTERED mode, the ability to actively monitor and manage
individual batch processes is available via Java Management Extensions (JMX) is possible. This means a JMX
console, such as jconsole, can be used on an active threadpool and individual batch threads to monitor their
progress and manage them remotely.
Refer to the BaServer Administration Guide and Batch Operations and Configuration Guide for your product for
more information on how to enable and monitor using JMX.
One of the features of the product is the ability to monitor active batch processes using JMX. This was a feature
introduced in Oracle Utilities Application Framework V2.2 and above to provide monitoring capability via
jmxbatchclient or a JMX console.
The issue is that each instance of a threadpool opens its own JMX port so while it is possible to monitor jobs at a
threadpool level it is not possible to see the jobs across all threadpool instances. In Oracle Utilities Application
Framework V4.2.0.0.0, a global batch view is available. This facility allows for a site to connect to any threadpool
instance and see all other instances with active jobs.
By default, the database connection information via the column MODULE on the V$SESSION view for batch
processes is set to "JDBC Thin Client". This makes the batch sessions harder to differentiate from other sessions.
The product has been altered to now populate the MODULE column to display the Batch control id for the connection
and "TUGBU Idle" when the threadpool has an idle connection.
To implement this change the following setting must be added to the hibernate.properties file contained in
$SPLEBASE/splapp/standalone/config (or %SPLEBASE%\splapp\standalone\config on Windows):
hibernate.connection.release_mode=on_close
The MODULE will now display the batch control on active connections.
Note: For Oracle Utilities Application Framework V4.1 or above, the CLIENT_IDENTIFIER is also populated with
the product user configured to execute the process.
In Oracle Utilities Application Framework V4.2.0.0.0 ( ), the database connection information for batch and
online connections has been expanded and the following information is displayed for active batch threads accessing
the database:
Commit Strategies
Note: Commit Strategies were introduced in Oracle Utilities Application Framework V2.1. Versions of Oracle Utilities
Application Framework prior to V2.2 used the Standard Commit Strategy exclusively. To implement any of the
alternative commit strategies ensure that the Oracle Utilities Application Framework is patched to the latest service
pack to include all strategies.
By default, background processes typically commit records every configurable interval (see Commit Interval for more
information). The commit interval defines the size of the work unit in the database as well as defines the granularity
of restart, as the product rolls back to the last commit point on process error or failure. This is known as the
Standard Commit Strategy and is employed by the majority of background processes in the product.
Whilst the product generally uses the Standard Commit Strategy it is possible for custom background processes to
use alternative strategies. These are either coded within the custom background process themselves or are
The table below outlines the valid strategies and their attributes:
Standard Commit (default) StandardCommit This strategy Process each Unit of work as part of a group of work units in
one database transaction. The standard maximumCommitRecords
parameter defines the commit interval, defaulted to 200 if not supplied or
not a number > 0. In the event of an exception, the transaction group is
rolled back to the last committed work unit, the exception is logged and
committed, and the successful entries in this transaction group are
reprocessed (up to the current, failed work unit). Processing can resumed,
upon restart, at the first work unit after the one that failed.
This strategy is appropriate for batch processes that can tolerate errors in
the execution.
Single Transaction SingleTransaction Process the entire workload in a single committed transaction. Any
exception will cause a rollback of processed work and will be considered
an unsuccessful thread execution.
This strategy is most appropriate to update processes or interfaces that
cannot tolerate any errors within a run. For example, an interface that
requires a complete set of results should consider this strategy.
Commit Every Unit CommitEveryUnit Each successful record processed has its own commit. There is no need
to "back up" and reprocess units rolled back because of an exception.
This strategy can continue to move forward after exceptions. This is
equivalent to a commit interval of one (1).
This strategy is most appropriate to update processes or interfaces that
can tolerate some errors within a run.
Thread Iteration ThreadIteration If there is a requirement for thread pool workers to select their data at
initialization time and to loop and process until the end of the selection
then using the ThreadIteration commit strategy is recommended.
The application data can come from a database table, one or more flat
files, or any other source that the thread worker requires. The opening,
fetching and closing of the data is left entirely up to the application
program The batch framework’s responsibility is to provide appropriate
context, commit frequency, error handling and restartability, as it does in
the case of the other strategies.
This strategy was introduced to reduce java heap space usage of the
background process and to provide alternatives for restart. When a thread
is restarted after a premature end (for example error or cancellation), its
initialization method will have the opportunity to refresh the selection of its
data for the run. This is in contrast to the existing model in which the data
will always be based on the original selection when the job was first
submitted
The standard maximumCommitRecords parameter defines the commit
interval, defaulted to 200 if not supplied. Soft parameter maxErrors
controls the number of errors that the program can tolerate. Each error
that is thrown while within this limit causes all updates for that one work
unit to be rolled back. This strategy class uses a JDBC savepoint for each
work unit to avoid also rolling back the successfully processed units of
work when an error is found. If maxErrors is overrun, the thread is
aborted.
Continuous Execution ContinuousExecution Support continuous batch processes that may run indefinitely. Similar to
Commit Every Unit Strategy. Introduced to support Timed Batch.
The online component of the Oracle Utilities Application Framework uses a cache of static data for performance
reasons. The batch component uses a similar cache mechanism (per threadpoolworker) using a Hibernate data
cache. Whilst this cache is automatically refreshed by the product on a regular basis it can be now manually
refreshed by running the F1-FLUSH background process.
It is recommended to run the F1-FLUSH background process for long continuously running threadpoolworkers to
reflect data changes in configuration data.
To do this, you must add the DIST-THD-POOL parameter to all batch control records you want to run in the external
pool. To save time, you can also provide a default value for this parameter to save the submitted having to specify it.
An example of this setting on the Batch Control is shown below:
It is recommended that long running threadpools be stopped and restarted on a regular basis to release resources
that may be held by those JVM's. This is particularly important for customers who are using Oracle Utilities
Application Framework based products that contain jobs written in technologies other than java, such as COBOL or
C, as those resources are not released as easily as java resources and a restart of the threadpool will assure these
resources are released as well.
The frequency of the restart will vary with your site's volume and frequency of jobs but a few guidelines may be
helpful in deciding this frequency:
Restart a threadpool when no jobs are running – It is not a good idea to restart a threadpool whilst the threadpool is
active. Pick a time where jobs are not likely running to restart the threadpool.
Monitoring the JVM memory footprint of the JVM can also be a good idea to see if too many resources are being
held by the JVM (the memory footprint will trend upwards over time). If the threadpool is very active with different
processes, especially processes not written in java, then those resources will be held and not released until the
threadpool is restarted.
The more active the threadpool with more than one batch code the more often to restart the threadpool. Batch jobs
load different classes and resources and if those resources cannot be released then they have to released by
restarting the threadpool.
By default the submitjob and threadpoolworker utilities will create logs in a specific location dictated by the
utility.
For example:
Where:
{batchCode} Batch Control used for job
If your implementation wishes to implement custom log file names then this may be achieved using user exits which
allow custom setting of the file name pattern. In the utilities an environment variable is set to the name and location
of the log file. The user exit may be used to set this environment variable to an alternative. The user exit contains the
script code fragment 11 used to set the log file environment file name.
The table below lists the user exit, environment variable name and the platform:
10 This would vary if the threadpool is very active. If it is more active, then
11 The script code fragment must be valid for your operating system.
Additionally internal session variables are available for use in the user exit ( indicates validity for the individual
utility).
Note: Other environment variables in the session can be used and determined in the user exit script code.
Note: When setting the log file name the location and file name MUST be valid for the security and operating system
used for the product. The directory should be writable by the OS user used to execute the job.
Single submitter for the entire process. Regardless of the number of threads (e.g. 100 threads of BILLING), there
will only be one submitter node for the process in the cluster.
Multiple submitters per batch process, i.e. one per thread. For example, in the case of 100 threads of BILLING, there
would be 100 submitter nodes.
There are advantages/disadvantages for both of the approaches. The single submitter approach is less resource
intensive (each submitter JVM requires 180-256Mb) and results in a smaller cluster in terms of transient members.
With respect to this latter point, it should be noted that submitter nodes are continually entering/exiting the cluster
(hence the term transient), thus requiring acknowledgment from other members and thereby significantly increasing
the required cluster communication as the number of submitter nodes increases. This can be problematic for large
clusters for which there is the recommendation to employ multiple clusters if communication delays become an
issue.
Despite the aforementioned resource and potential communication disadvantages, employing multiple submitters
per batch process (one per thread) does have distinct advantages. Namely, it allows for immediate notification of a
failed thread (the associated submitter nodes terminates immediately) and the canceling of a specific thread by
terminating the associated submitter node. Real-time feedback of a terminated thread can be critical at some sites
such that the issue can be attended to immediately - as opposed to waiting until all other threads have
ended/abended to receive such feedback as in the case of a single submitter (which, depending on the process, can
be a significant amount of time). Note that the threadpoolworker JMX facilities can also be used to monitor and
cancel individual threads, however the site will need to create the mechanism to issue and interpret the JMX
requests.
Clu s te re d Mo d e – De d ic a te d S to ra g e No d e Re c o mm e n d a tio n
By default, all OUAF batch node instances (threadpoolworker and submitter) will maintain the application caches
which can be overridden via the property tangosol.coherence.distributed.localstorage=false. While
this property can be readily specified in the submitbatch.properties file thereby disabling the local storage for
the submitter nodes, it needs to be approached differently with respect to the threadpoolworker nodes (as at least
one threadpoolworker node must have local storage enabled). It is recommended that if Unicast is being utilized that
the cache node(s) be the WKA members. It is further recommended that the cache node(s) not perform any
application processing; therefore they should be assigned a threadpool which is not specified by any job.
Clu s te re d Mo d e – Ro le s Re c o m m e n d a tio n
Coherence provides the property tangosol.coherence.role which can be used to identify the type of node for
added clarity when monitoring the application. By default, the application sets this value to
SplwgBaseApiThreadPoolWorker for the threadpoolworker instances and SplwgBaseApiSubmitBatch for
the submitter nodes. These values can be overridden to provide further specifics regarding the node, for example
the WKA/cache members can be identified, the job/thread associated with the submitter node, etc.
Below is sample output from a submitter node illustrating what can be achieved via specifying
tangosol.coherence.role for the different nodes. As can be seen, the WKA and storage enabled nodes are
denoted by WKACache_SplwgBaseApiTPW, the actual threadpoolworker instance as denoted by
SplwgBaseApiThreadPoolWorker (unchanged from the default), and the submitter nodes are denoted by the
Job/Thread, e.g., SubmitBatch_BILLING_1_OF_8.
WellKnownAddressList(Size=2,
WKA{Address=1.1.1.10, Port=7020}
WKA{Address=1.1.1.10, Port=7010}
)
MasterMemberSet
(
ThisMember=Member(Id=11, Timestamp=2012-07-06 18:07:21.509,
Address=1.1.1.10:7524, MachineId=424,
Location=machine:HOST1,process:5636500,
Role=SubmitBatch_BILLING_8_OF_8)
)
RecycleMillis=1200000
RecycleSet=MemberSet(Size=0, BitSetCount=0
)
)
TcpRing{Connections=[10]}
IpMonitor{AddressListSize=0}
Clu s te re d Mo d e – P riva te Ne twork Re c o m m e n d a tio n
If the batch cluster will be spread across multiple physical machines, ensure that the nodes are communicating via a
private network. This will preclude the possibility of the network becoming saturated by network activity outside the
cluster. Despite this measure, inter-machine communication can still be problematic from a communication
standpoint (indicated by the presence of communication delays), hence the subsequent recommendation of
establishing multiple clusters when more than one physical machine comprises the batch topology.
The above topology can be achieved by setting the property tangosol.coherence.cluster to a unique value for those
threadpoolworker.properties and submitbatch.properties on each physical server. For example, if two batch
servers were being utilized, this property could be set to GBUPRODA and GBUPRODB respectively. In the case
where multiple clusters will reside on the same machine, these values can be overridden at
submitter/threadpoolworker startup (note that the associated Unicast/Multicast ports will also need to be overridden).
By having separate clusters, the job submission / scheduling mechanism must submit jobs to each cluster explicitly.
For example, in the case of two clusters and a 60 thread job, the job submission mechanism could submit 30
threads to one cluster and 30 to the other.
Note: Refer to the JVM options from the documentation provided with the JVM vendor used for valid formats
By default the threadpools allocate enough memory to run most batch processes. In some cases though, such as
when out of memory conditions occur, you may need to tweak the setting to provide enough memory to the running
processes. There are a number of techniques available to address this:
» In Oracle Utilities Application Framework V4.x a number of parameters control the java memory settings of the
threadpoolworker using the following settings available from option 51 from the configureEnv utility (using the
–a option):
Setting Usage
BATCH_MEMORY_ADDITIONAL_OPT Additional Java options to be passed to the threadpoolworker JVM. The format of
the options is as expected by the version and JVM vendor.
Note: Avoid duplicating the memory options outlined below.
BATCH_MEMORY_OPT_MAX Maximum heap memory to allocate to the threadpoolworker. Equivalent to the –Xmx
java option.
BATCH_MEMORY_OPT_MIN Minimum heap memory to allocate to the threadpoolworker. Equivalent to the –Xms
java option.
» In Oracle Utilities Application Framework V2.2, to make changes to the memory arguments requires manual
changes to the following base scripts in the bin directory:
Setting Usage
Note: Any changes should be backed up and noted as they may be overwritten due to upgrades or fixes, and
therefore may need reapplication.
By default the batch logs are named in a standard fashion (refer to the Server Administration or
Configuration/Operations Guide for these standards). The name of the log file can include a custom prefix (for
example, a literal name). These can be set using the following method:
» As the product administrator user, create a file named batch_log_prefix.txt in the etc directory of your
installation. In that file create a single row entry that contains the prefix to use 12. For example:
12
Do not include the "."
Note: For the programmatic version of this facility refer to the Oracle Utilities SDK documentation.
In past releases of Oracle Utilities Application Framework it was possible to programmatically add additional tags to
the JMX interface to provide addition monitoring capabilities. In Oracle Utilities Application Framework V4.3.x and
above this facility is now available via configuration as well as programmatic.
» Add a new configuration entry to the submitbatch.properties files for each additional tag in the following
format:
com.splwg.batch.submitter.softParameter.f1.jmxInfo.<parameter>=<value>
where
For example:
com.splwg.batch.submitter.softParameter.f1.jmxInfo.foo=bar
» This setting can be set globally or on specific properties files for particular jobs.
Note: It is possible to specify this parameter on the command line using the -x
f1.jmxInfo.<parameter>=<value> option.
One of the most critical parts of the batch architecture is deciding and maintaining the configuration settings
appropriate for your requirements. In past releases this has involved maintaining the following configuration files 13:
Setting Usage
13 All batch configuration files are located in $SPLEBASE/splapp/standalone/config (or %SPLEBASE%\splapp\standalone\config on Windows)
In Oracle Utilities Application Framework V4.2.0.2.0 and above, a new utility bedit[.sh] has been added to
simplify the creation and maintenance of these files to promote stability and flexibility whilst minimizing maintenance
mistakes. The features of the facility are as follows:
» Command driven wizard with simplified interfaces. Customers familiar with Oracle's WebLogic WLST utility will
recognize the design pattern with the utility. For example:
$ bedit.sh -c
Editing file /oracle/demo/splapp/standalone/config/tangosol-
coherence-override.xml using template /oracle/demo/etc/tangosol-
coherence-override.ss.be
Batch Configuration Editor 1.0 [tangosol-coherence-override.xml]
-------------------------------------------------------------
Current Settings
cluster (cluster1)
address (127.0.0.1)
port (42020)
loglevel (5)
mode (dev)
> help
tangosol-coherence-override.ss
------------------------------
Topics
Commands
Online help is context sensitive to the options used. If help is not available on a topic then it may not be appropriate
for the option used.
Enabling BatchEdit
For backward compatibility the BatchEdit facility is disabled. To enable the use of BatchEdit the following process
must be performed:
» Attach to the environment as a valid administrator user using the splenviron[.sh] command.
» Execute the configureEnv[.sh] -a option to invoke the configuration menu.
» Select option 50 and navigate to the Enable Batch Edit Functionality menu option.
» Specify true to enable the functionality.
» Navigate to the main menu of the configuration menu and use the P option to process the change.
BatchEdit is now enabled.
» The first time a command option is used will create the default configuration files using product supplied
templates.
» Product supplied templates exist in the etc subdirectory with the suffix be.
Template Usage
submitbatch.be Default submitbatch.properties template for all jobs
» It is possible to create custom templates by copying the base template and adding a cm. prefix. This technique
will be illustrated during the configuration process.
» The templates have been pre-optimized based upon customer experiences, performance engineering and partner
feedback.
BatchEdit will remember your preferences 14, so minimal options are needed when maintaining the existing cluster
and threadpool definitions. For example, when you specify the -t option on the cluster, it is set and not needed for
subsequent invocations of BatchEdit.
Multicast Cluster
Standard Threadpool
Create/Maintain Threadpools
Cache Threadpool
General Submitters
Create/Maintain Submitters
Specific Submitters
Note: Once the configuration is complete, it must be reflected/synchronized across your architecture. If more than
one server is included in your hardware the configuration files created need to be sychcronized.
Type of Cluster - This decision is basically whether you want to implement a single server cluster, a unicast based
server or a multicast based server. This will define the scope of the cluster and how the objects in the cluster will
communicate within the cluster. This decision can be made considering a number of factors including the scope of
the cluster and your preferred networking method.
Single Server Cluster ss Cluster is restricted to a single host only. The networking is restricted to the
host using a local internal protocol. This type of cluster is useful for simple
environments such as development, demonstration and other non-production
environments you want to restrict to a single server.
Unicast Cluster wka Cluster is across one or more hosts using the unicast networking technique.
This requires each host to be explicitly defined in cluster using a Well Known
Multicast Cluster (default) mc Cluster is across one or more hosts using the multicast networking technique.
This is a dynamic configuration with each host in a cluster joining the cluster
using a common multicast setup in a network at startup time. This cluster is
suitable for production or non-production environments where more than one
host is in the cluster. This is the default option if no other is specified.
Once the type has been set the parameters for the cluster must be specified. The table below illustrates the
common parameters available for the cluster configuration:
Note: use the help <parameter> function in BatchEdit for a description of the field and more advice as well as a
full list of parameters.
address ■ ■ ■ IP address or host name for this node in the cluster. Use localhost if possible to
minimize maintenance across hosts.
port ■ ■ ■ Unique port used for cluster. This must be unique per cluster per host. It use will
vary from cluster t ype to cluster type. Refer to the online help for more
information.
loglevel ■ ■ ■ The logging level associated with cluster operations. Refer to the online help to
decide the amount of information you wish to log. The higher the value the more
that is logged. High values are used by developers typically.
mode ■ ■ ■ The Coherence mode that the cluster is to be scoped to. Refer to the online help
for more information.
socket ■ This is a section for each of the hosts in the Well Known Address format. Each
host is a separate socket entry. Refer to the online help for more information.
wkaaddress ■ The IP address or host name of the member of the cluster assigned to this
socket.
wkaport ■ The port number assigned of the member of the cluster assigned to this socket.
This value ideally is the same across all hosts in the cluster but can be
overridden to overcome port conflicts. The port number on each node must
match the number assigned to the port value.
Note: The Cluster type may be initially set by specifying the -t option on the command line. After the type has been
set the -t option is no longer needed unless you wish to change the cluster type.
» It is now possible to create different configurations per threadpool. The main differentiators for this are the role of
the threadpool and the JVM parameters. The different configurations are set using the -l option. For example:
Note: Use of multiple configurations is optional. Omit the -l option to use the default configuration.
Parameter Recommendations
minheap Minimum JVM Heap size
daemon Whether the threadpool should run the online submission daemon. This value should be set to false
for production environments.
» Use the default template for the vast majority of threadpools unless there is a need to implement different
parameters for individual threadpools.
» Create at least one cache threadpool per node in your architecture. Use the -l cache label option to achieve
this. For more information about cache threadpool refer to Using Cache Threadpools.
» Create custom templates or use labels to create custom configurations for specialist jobs where parameters differ
for the jobs run in that threadpool. Remember to use the -l <label> option on the threadpoolworker[.sh]
utility to use the label specific parameters.
Note: Once the configuration is completed, the execution of the configuration by manually starting/stopping the
threadpools, using the threadpoolworker[.sh] utility, defines the thread capacity and threadpool availability.
» It needs a configuration file that defines the parameters to be used for the individual batch process being
executed. These can be global configuration files or individual configuration files optimized for a particular batch
process.
» Command line options to set or override particular configuration parameters that define the execution parameters
for the individual process or thread.
Parameter Recommendations
poolname Name of threadpool to execute this submitter within
threads Thread limit of submitter. The number of threadsmust be equal to or less than the number of threads allocated
to executing instances of the threadpool
commit Default commit interval. This overrides the commit interval defined internally for the batch job.
user The userid, defined to the User record, to be used to determine execution permissions and is used for records
updated and created by this process. This MUST be a valid defined used.
lang The language pack used for the batch process (default is ENG)
storage This sets whether this node is a storage node. Used for submitters that use THIN mode (for developers). This
value is recommended to be set to false for all other submitter types.
role The role used for this submitter. This is used for the JMX monitoring interface as a filter. By default the batch
code used for this value.
soft Group section for soft parameters. One section per parameter
Note: Other parameters supported by the submitjob[.sh] utility are available as options rather than configuration
parameters.
» Use the generic template for the majority of the batch jobs unless the job requires special parameters.
» Use specific configurations using the -b <batchcode> option on the command line to generate and maintain
job specific configurations.
15 This is achieved by specifying the thread number as 0 (zero) to spawn threads up to a thread limit.
Cluster Cluster
Cache
Threadpool
The performance advantages of the cache increases with the number of elements the cluster has to manage and
cache threadpools have the following implementation recommendations:
» Cache threadpools do not execute any threads of any jobs within them. They are exclusively used for
administration, a storage node for the cluster state and a conduit for cluster management.
» Cache threadpools act as Coherence local storage nodes to maintain the integrity of the cluster and allow cluster
management.
» Cache threadpools are ideally suited to allow JMX connections to monitor the performance of the cluster using the
Global JMX interface outlined in the Batch Server Administration Guide.
» At least one cache threadpool per cluster per host is recommended. Multiple cache threadpools can be
implemented where high availability is required or there are a lrge number of submitters, threads and/or
threadpools to manage.
» If a cache threadpool, is shut down and no cache threadpools are active at any time, the cluster will not revert to
individual elements communicating across other elements.
To create cache clusters, use the bedit[.sh] -l cache command. A prebuilt template is created for the cache
where storage is enabled , distthds, invocthds and the number of threads is set to 0 (to prevent jobs from
actually running in the cache).
Parameter Process
Single Server Create and configure the single server cluster using the bedit[.sh] -c -t ss
Create and configure the threadpool definitions using the bedit[.sh] -w command. In most cases, use the
default template and avoid cache threadpools unless the number of submitters/threads/threadpools is large.
Multicast Create and configure the multicast cluster using the bedit[.sh] -c -t mc. Allocate an appropriate multi-
cast IP address and port number.
Create and configure the threadpool definitions for job executing threadpools using the bedit[.sh] -l job
command. Create at least one cache threadpool per cluster host 17 using the bedit[.sh] -l cache
command.
Create and configure the submitter global definitions for the jobs to execute using the bedit[.sh] -s
command. Specify job specific setting using the bedit[.sh] -b <batchcode> command.
Copy all the threadpool and cluster configuration files generated to the hosts in the cluster. The submitter
configuration files can be copied if submitted from that host.
» Create a backup the following files that are located in the splapp/standalone/config subdirectory:
file Recommendations
tangosol-coherence-override.xml Cluster Configuration
» Reflect and recreate your current configuration in the cluster, threadpool and submitter as outlined in BatchEdit
Common Configurations.
16 This is done at runtime not configuration time. Create one cache definition and copy it across nodes.
17 This is done at runtime not configuration time. Create one cache definition and copy it across nodes.
Business processes and business activity drive the schedule that the implementation will go live with. Some of the
background processes will be removed and some will be added. Individual background processes can be removed if
business process does not require the process or, for any reason, the process is not applicable to the business.
Custom Processes (typically for interfaces) will be added by your implementation team.
Therefore the following goals are applicable to schedule optimization within product:
» Maximize throughput by limiting process concurrency. Do not run too many processes simultaneously. The CPU
can flood if you run too much at the same time.
» Aligning schedule with Bill Cycle and Meter Read Schedules. The Bill cycle and Meter Read cycles, if used, can
influence the schedule. Refer to the Business Process documentation for billing and meter reading for more
details of this.
The following process can be used to optimize the base schedule supplied with the product:
» Remove background processes that are not to be implemented. Check with the business if the process is
applicable or needed for a business process. If in doubt, leave it in.
» Add custom background processes to schedule. For custom interfaces or custom business processes, outside the
scope of the base product, you will be adding a few custom background processes. Your implementation team
will have details of these processes.
» Adjust dependencies for the added and removed background processes. When you take away and add
background processes, the dependencies will change and the overall flow of data will change.
» Run schedule in test as initially documented – You need to run the new schedule at a basic level to get an idea of
how it hangs altogether.
» Gather elapsed times and throughput rates - You need to get some stats to determine which background
processes will need optimization and how much optimization is really needed.
» Determine "heavy" background processes – Background processes that take a long time (we will leave that
tolerance up to you) need to be determined, as they will greatly affect the overall schedule. These become
candidates for multiple threading.
» Now that we have the basic information we can start optimization.
» Heavy background processes can be run multi-threaded – Consider multiple threading the heavy background
processes but not too much threading as it can drive up contention. See Multi-threading guidelines for more
information.
» Move scheduling of background processes to minimize number of background processes running in parallel –
Reduce contention around heavy background processes by scheduling other background processes earlier or
later. Also try not to run a lot of light background processes at the same time. It has the same impact that heavy
background processes have.
When altering the base schedule remember the following:
» If in doubt, do NOT leave it out. Keep processes in the schedule that you are not sure about.
» Only run multi-threaded if necessary – remember too much threading can increase contention and therefore
reduce throughput.
» To run background processes during busy business days then reduce the Commit Interval to increase transaction
concurrency.
Once a third party scheduler has been chosen the scheduler, it must be configured for submission of background
processes. The following guidelines have been used by a number of sites to successfully implement the scheduler:
» Create a separate operating system account for the scheduler to use to submit background processes. Avoid
using administration accounts (e.g. root) or the product administration account. This account should have
access to the relevant product security groups.
» To run any utility provided with the product, the batch user must execute the splenviron[.sh] utility to set the
context of the executions. This can be achieved two ways:
» The .profile (or autoexec.bat on Windows) for the administration account can automatically call the
splenviron[.sh] utility automatically. This is the preferred method.
» The command line used in the scheduler for each background process (including threadpoolworker[.sh] as
well as submitjob[.sh]) must be prefixed with the full splenviron[.sh] utility with the –c option. Refer to
the Operations And Configuration Guide or Batch Server Administration Guide for your product for details of the
splenviron[.sh] utility.
» The background processes must be loaded into the scheduler. This can be done manually or using an import
facility provided with the scheduler. In some products the dependency information is stored in a set of tables.
Refer to the Framework Administration Guide on the Internal Scheduler online help for more information.
18 Not all products have included the scheduler as part of their product. Refer to the relevant product documentation for details.
» It is possible to submit individual threads of a background process as individual jobs within your scheduler. This
will allow micromanagement of the schedule for dependencies.
» It is possible to submit all threads in a single job using the –t 0 (zero) option. This will submit all threads
simultaneously in the same threadpool which may not be desirable.
» Remember to add non-product jobs that are necessary as part of your schedule such as backups and interface
transfers.
Common Errors
There are a number of common errors that can occur from time to time in threadpoolworker and/or submitter. This
section outlines some of the common errors and suggested remedies.
Communication Delays
» Examine the threadpoolworker and/or submitter logs (including the stdout logs) for an indication of an error
message from the COBOL object causing the error.
» If the logs do not contain any information enable tracing on the batch control and rerun the job to assist in finding
the object.
» Alternatively it is possible to use Linux/UNIX pmap command against the process to track errors.
Once the error is isolated, then the COBOL object or data causing the error needs to be corrected to enable the job
to be successfully executed.
MAX-ERRORS Exceeded
One of the major features of the batch architecture is the ability to define the error tolerance (MAX-ERRORS). Whilst
the default setting, 0 (zero), disables this facility, if it is useful to set detect mass data errors. Typically, batch
processes data in batch (hence the name), and if there is a data problem across the data it is processing then MAX-
ERRORS can be used to detect data set wide issues and prevent large numbers of errors (and any associated To Do
entries) being created.
The value of the MAX-ERRORS will vary from job to job and will depend on the error tolerance and likelihood of errors
in that job as well as the sensitivity of the business to any errors. For example, jobs that load data from external
sources would be ideal candidates for setting MAX-ERRORS as they would catch when the external system sent
invalid data in the data set. They are also useful for jobs where the processing is heavily reliant on configuration
settings such as calculations where error trapping can detect wrong administration data. For example, if you are
using a rate or calculation that will generate errors if misconfigured, it would be useful to set MAX-ERRORS to catch
such misconfigurations.
CONNECT WITH US
blogs.oracle.com/theshortenspot Copyright © 2007-2015, Oracle and/or its affiliates. All rights reserved. This document is provided for information purposes only, and
the contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other
facebook.com/oracle warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or
fitness for a particular purpose. We specifically disclaim any liability with respect to this document, and no contractual obligations are
formed either directly or indirectly by this document. This document may not be reproduced or transmitted in any form or by any
twitter.com/theshortenspot means, electronic or mechanical, for any purpose, without our prior written permission.
oracle.com Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and
are trademarks or registered trademarks of SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo are
trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group. 0415