PARALLEL CONCURRENT PROCESSING FAILOVER AND LOAD BALANCING OF E-BUSINESS SUITE RELEASE 11I AND RELEASE 12

Mike Swing, TruTek Abstract
Parallel Concurrent Processing Failover uses two mechanisms to detect a failure, dead connection detection, and detecting a failure of the process monitor for the Concurrent Managers, otherwise known as PMON (note that this is not the PMON from the database); introduced with Patch 6495206. Load balancing of the Concurrent Managers is critical if you expect parallel concurrent processing to function after the failover to the remaining node(s). This paper reviews Concurrent Manager basics before we discuss the topics of failover and load balancing. One of the key components used by Concurrent Processing is Generic Service Management. The use of GSM with multiple nodes and seeded GSM services is discussed. Administering Concurrent Managers, managing control across nodes, starting and stopping the Concurrent Managers, and managing concurrent log files are skills needed to understand the configuration of Parallel Concurrent Processing failover and load balancing. There are a number of ways that an E-Business Suite environment might be configured for failover: • • • • • Database Fast Connection Failover (FCF) Transparent Application Failover (TAF) Parallel Concurrent Processing Failover Concurrent Manager Failover

This paper will discuss Parallel Concurrent Processing Failover, ICM Failover, CRM Failover, and Concurrent Manager Failover. We’ll leave the discussion of Database Failover, Fast Connection Failover and Transparent Application Failover for another time. The paper concludes with a discussion of load balancing and the issues that must be considered to properly configure an EBusiness Suite environment to take advantage of Oracle’s load balancing features.

Concurrent Processing
Most user interactions with Oracle Applications data are conducted via the HTML interface or the Forms interface. However, reporting and interface programs may need to run periodically or on an ad hoc basis. As these programs may require a large number of computations, they are run in the background at a time, and with a priority, such that the work of interactive users is not impeded. Such programs are run on the Concurrent Processing server and run under Concurrent Managers. When a request is submitted to run a Concurrent Program through an Oracle Applications form or through Oracle Application Manager (OAM), the request inserts a row into the FND_CONCURRENT_REQUESTS table that specifies the program to be run. Concurrent Managers read the requests from the table and start the appropriate Concurrent Programs. The Concurrent Processing Server: • Allows scheduling of batch jobs called Concurrent Requests. • Processes Concurrent Programs as a Concurrent Request. • Requests can be grouped together into Request Sets. • Different types of Concurrent Managers handle different types of requests. • A Concurrent Program can be assigned to a responsibility, and that responsibility can be assigned to users, allowing them permission to run the Concurrent Program.

www.rmoug.org

1

RMOUG Training Days 2009

Parallel Concurrent Processing Failover and Load Balancing •

Swing

Concurrent Managers may have limits on the Concurrent Programs that can be run, and the times that they can be started. Concurrent Requests have priorities, statuses, and log and out files in $APPLCSF.

Definitions
The following are some acronyms that we will use throughout this paper: • CP => Concurrent Processing • DCD => Dead Connection Detection • ICM => Internal Concurrent Manager • IM => Internal Monitor • CRM => Conflict Resolution Manager • PCP => Parallel Concurrent Processing • PMON => Process Monitor for ICM

Concurrent Requests
Figure 1 shows an example of the Concurrent Manager Requests screen.

The Phase and Status tell us what is happening with each Concurrent Program

Figure 1
Phase and Status of Concurrent Requests Figure 2 shows the various Phases and Statuses that a Concurrent Program can have, with a description of what they mean: Phase Pending Pending Running Status Normal Standby Normal Description – Action The request is waiting to be picked up by the next available manager. Waiting for CRM to resolve conflict. CRM could be slow or an incompatible program is running. The request is running normally.

www.rmoug.org

2

RMOUG Training Days 2009

Parallel Concurrent Processing Failover and Load Balancing

Swing

Completed Completed Completed Inactive

Normal Error Warning No Manager

The request has finished successfully The request has finished with an error. Check logs. The request has finished with a Warning. Check the logs. Request won’t run without a manager. Specialization rules aren’t configured properly.

Figure 2

Concurrent Managers
Figure 3 shows the Concurrent Manager Administer screen. Oracle seeds a number of Concurrent Managers and assigns Concurrent Programs to those managers. Your Applications System Administrator can also define custom managers and assign Concurrent Programs to those managers.

Figure 3
Figure 4 shows the different types of Concurrent Managers, their Service Instance, and their Program Name. Your Applications System Administrator can adjust the Concurrent Managers and Transaction Managers, but the other types of managers must be left alone. Manager Type Internal Concurrent Manager Conflict Resolution Manager Internal Monitor Concurrent Manager Concurrent Manager Concurrent Manager Concurrent Manager Transaction Manager Service Instance Internal Manager Conflict Resolution Manager Internal Monitor:Node Service Manager Standard Manager Inventory Manager Session History Cleanup PA Streamline Manager CRP Inquiry Manager Program FNDLIBR FNDCRM FNDIMON FNDSM FNDLIBR INVLIBR FNDLIBR PALIBR CYQLIB

www.rmoug.org

3

RMOUG Training Days 2009

Parallel Concurrent Processing Failover and Load Balancing

Swing

Transaction Manager Transaction Manager Transaction Manager

FastFormula Transaction Manager PO Document Approval Manager Transaction Manager Scheduler/Prerelease Manager OAM Generic Collection Service:Node

FFTM POXCON FNDTMTST FNDSVC FNDSVC

Figure 4

Concurrent Processing Overview
This diagram provides an overview of how Concurrent Processing works.

Web Browser

HTML
Interface

Web Server Forms Server

JInitiator

JAVA Interface

Reports Server

Internal Monitor FNDIMON

ICM FNDLIBR Standard Manager FNDLIBR

Service Manager FNDSM

Report Review Agent

SQL*Net

.rdx
Out

FNDCRM

Requests

Log

In the diagram, you can see that: 1. The Concurrent Processing server communicates with the database using Oracle SQL*Net. 2. Log and Out files from Concurrent Programs are generated on the Concurrent Processing server. Log files show what occurred when the program ran, while out files are the output of the program. 3. The Concurrent Program log and output file from a request is passed back as a report to the Report Review Agent. 4. The Report Review Agent passes a file containing the entire report to the forms server. 5. The Forms Services component passes the report back to the user’s browser one page at time. Profile Options can be used to control the size of the files and pages passed, to suit report volume and available network capacity.

Concurrent Manager Processes
Internal Concurrent Manager Internal Concurrent Manager (FNDLIBR process) - Communicates with the Service Manager. • • • • • The Internal Concurrent Manager (ICM) starts, sets the number of active processes, monitors, and terminates all other concurrent processes through requests made to the Service Manager, including restarting any failed processes. The ICM also starts, stops, and restarts the Service Manager for each node. The ICM will perform process migration during an instance or node failure. The ICM will be active on a single node. This is also true in a Parallel Concurrent Processing environment, where the ICM will be active on at least one node at all times. The ICM really does not have any scheduling responsibilities. It has NOTHING to do with scheduling requests, or deciding which manager will run a particular request. The function of the ICM is to run 'queue control' requests; requests to startup or shutdown other managers.

www.rmoug.org

4

RMOUG Training Days 2009

Parallel Concurrent Processing Failover and Load Balancing • •

Swing

The ICM is responsible for startup and shutdown of the whole concurrent processing facility, and it monitors the other managers periodically, and restarts them if they should go down. It can also take over the Conflict Resolution Manager's job, and resolve incompatibilities. If the ICM itself should go down, requests will continue to run normally, except for 'queue control' requests. Your Applications System Administrator can restart the ICM by running the 'startmgr' command; there is no need to kill the other managers first.

Figure 5 shows the definition of the Internal Manager.

In this example of the ICM definition, there is a Secondary Node defined for PCP details.

Figure 5
In Release 11i, if there is more than one possible Secondary Node and the Primary Node fails, PCP will failover to any node that is available. By specifying a Secondary Node, it limits failover only to that node. An available node is any node, except AUTHENTICATION, in the FND_NODES table whose status is set to ‘Y’.

Figure 6
In Figure 6, the TCP connection to RH9 has been disconnected and it shows a status of ‘N’.

www.rmoug.org

5

RMOUG Training Days 2009

env file. and if the ICM crashes. cpid=(2259578).Parallel Concurrent Processing Failover and Load Balancing Swing Service Manager (FNDSM process) . The apps_<sid> listener must be active on each Concurrent Processing node to support the Service Manager connection to the local instance. The SM’s environment is set by the APPSORA. Found dead process: spid=(962754). It monitors whether the ICM is still running. www. The TWO_TASK setting used by the SM to connect to a RAC instance must match the instance_name from GV$INSTANCE. so there may be periods of time when the SM is not active. The SM is “chained” to the ICM. The SM will only reinitialize after termination when there is a function it needs to perform (start. Service Instance=(2010) Starting WFMGSMD Concurrent Manager Starting WFMGSMDB Concurrent Manager Starting WFALSNRSVCB Concurrent Manager : 15-AUG-2008 13:28:56 : 15-AUG-2008 13:28:56 : 15-AUG-2008 13:28:57 Starting STANDARD Concurrent Manager : 15-AUG-2008 13:30:31 Starting Internal Concurrent Manager Concurrent Manager : 15-AUG-2008 13:30:32 Internal Monitor (FNDIMON process) . and the gsmstart. cpid=(2259579).Communicates with the Service Manager and any client application process. cpid=(2259580). Service Instance=(1045) CONC-SM TNS FAIL Call to PingProcess failed for WFMAILER CONC-SM TNS FAIL Call to StopProcess failed for WFMAILER CONC-SM TNS FAIL Call to PingProcess failed for FNDCPGSC CONC-SM TNS FAIL Call to StopProcess failed for FNDOPP CONC-SM TNS FAIL Call to PingProcess failed for OAMGCS CONC-SM TNS FAIL Call to StopProcess failed for OAMGCS Found dead process: spid=(716870). it will restart it on another node. and non-Manager Service processes. Service Instance=(2009) Found dead process: spid=(1442020).sh script.rmoug. During a node failure in a PCP environment. the SM that resides on the same node with the ICM will also terminate. Metrics or Reports Server. You do not need to run this manager/service unless you are using Parallel Concurrent Processing. and any other process controlled through Generic Service Management). • • • • • • • • The Service Manager (SM) spawns and terminates manager and service processes (these could be Forms. but only the first ICM started will eventually remain active. The Internal Monitor (IM) monitors the Internal Concurrent Manager. When the ICM terminates. all others will gracefully terminate). or the listener failed to spawn the Service Manager process. There should be an Internal Monitor defined on each node where the ICM may migrate. Apache Listeners. There should be a Service Manager active on each node where a Concurrent or non-Manager service process will reside.org 6 RMOUG Training Days 2009 . The TNS alias could not be located. All processes initialized by the SM inherit the same environment as the SM. and restarts any failed ICM on the local node. FNDSM failover as noted in the Concurrent Manager log: Could not contact Service Manager FNDSM_RH8_VIS. the IM will restart the ICM on a surviving node (multiple ICMs may be started on multiple nodes. or stop a process).Communicates with the Internal Concurrent Manager. • • • • • This manager/service is used to implement Parallel Concurrent Processing. and this would be normal. Standard Manager (FNDLIBR process) .Communicates with the Internal Concurrent Manager. Concurrent Manager. the listener process on RH3 could not be contacted.

the Standard Manager is active on RH9.Parallel Concurrent Processing Failover and Load Balancing • Swing The Standard Manager is a worker process that initiates.org 7 RMOUG Training Days 2009 . www. Figure 7 shows the Administer Concurrent Managers screen: Notice that there are two nodes defined. the Standard Manager will not failover. even though no Primary Node is defined: 3 processes will run if the Standard Manager fails over Figure 9 Since no Secondary Node is defined. RH7 and RH8 Figure 7 You can also see the Concurrent Managers from the OAM web page: Figure 8 In Figure 9. and executes client requests on behalf of Applications batch and OLTP clients.rmoug.

not 1 per instance. At runtime. Figure 10 shows some of the Transaction Managers in Release 12: Figure 10 Note that between Release 11i and Release 12. Transaction Manager Transaction Managers communicate with the Service Manager. Unfortunately the DBMS_PIPE package does not extend to communications between sessions on different RAC instances. Returns a status/results to the client program. the communication between them for Release 11i has been handled using the DBMS_PIPE package. the way that Transaction Managers work has changed: Release 11i Transaction Managers use DBMS_PIPE • This does not work across RAC instances • RAC users must perform additional configuration. it starts a number of these managers as defined. The current workaround is to manually set up Transaction Managers to connect to www. Doesn’t poll concurrent request table for a new request You only need 1 Transaction Manager per database. there are now Failover Processes. As the client and server are two separate database sessions. The client then waits for the program to complete and can receive program results from the server. and any user process initiated on behalf of Forms. in order to specify the number of processes that will run when the Standard Manager fails over to the Secondary Node. causing transactions to time out for long periods or fail completely. On an Applications instance using RAC.org 8 RMOUG Training Days 2009 .Parallel Concurrent Processing Failover and Load Balancing Swing Notice in Figure 8 that in the Work Shifts definition. or a Standard Manager request. • Requires complicated configuration or additional hardware Release 12 Transaction Managers use AQ • Works across RAC Instances • Simplifies configuration • Reduces complexity • Profile Option can switch between mechanisms • DBMS_PIPE can be used for non-RAC users if performance becomes an issue Transaction Managers allow a client to make a request for a program to be run on the server immediately. A Transaction Manager: • • • • • • Supports synchronous processing of requests from a client program Gets a request for a client program to run a server-side program synchronously. the client and server are very likely to be on different instances.rmoug.

The server-side flow is: 1. TO SET UP TRANSACTION MANAGERS FOR PCP WHEN USING RAC These steps apply to both 11i and R12: 1.org 9 RMOUG Training Days 2009 . 3. SERVER (TM) Listen for Transaction Requests CLIENT Return with Error Start Transaction Timeout Receive Request Yes Process Request No Shut Down? Yes Yes Place Results on Return Queue Exit Place message on AQ Retrieve Transaction Results Get Concurrent Processor Yes No Receive Return Message Timeout Here we see the Client and Server Process flows for the AQ Transaction Managers. 3. Shut down the application tier services on all the nodes. This greatly simplifies the configuration and reduces the complexity for RAC administrators. The Client gets active Concurrent Processor Id which can process the transaction request.rmoug. 3.Parallel Concurrent Processing Failover and Load Balancing Swing all RAC instances. which not only takes up additional resources. This message is addressed by any available Transaction Manager that can process the client request.ora and add these parameters: www. the Transaction Managers. The Client returns if it can’t find any processor id. In Release 12. using the command: SQL>shutdown immediate. 2. 2. Edit $ORACLE_HOME/dbs/<context_name>_ifile. The Client places message containing the transaction details on the transaction AQ with the processor id as the correlation id. 4. The client-side flow is: 1. The Transaction Manager will process the transaction request if there is any. work on RAC connected to either instance. The Client listens on the return queue for a return message until one arrives or a timeout period expires. the Transaction Managers use the AQ mechanism. A Profile Option has been introduced to allow users to switch between the two transports DBMS_PIPE or AQ. The Transaction Manager will listen for any transaction requests that will get requests for its processor id. but may also require additional middle-tier hardware or a complicated configuration that is difficult to maintain. Shut down all the database instances cleanly in the RAC environment. The Transaction Manager will repeat steps 1 and 2 until it shuts down. 2. and puts the results back in the return AQ.

Restart the Concurrent Managers. If a program is identified as Run Alone. 8. and verify that the Transaction Manager works across the RAC instance. TO ENABLE/DISABLE THE CONFLICT RESOLUTION MANAGER • Use the system Profile Option 'Concurrent: Use ICM'. o Setting it to 'Yes' causes the CRM to be shutdown o The Internal Manager (ICM) will take over the conflict resolution duties. If a program is identified as Run Alone.1 7. while the ICM has other functions to perform as well. 9. Start the instance on each database node. 'No' o Allows the CRM to be started. Start up the Application tier services on all nodes. perhaps because it has been assigned to a Concurrent Manager that is disabled.1 Profile Option “Concurrent:TM Transport Type” can be set to PIPE or QUEUE Pipes are more efficient but require a Transaction Manager to be running on each database Instance. Note that using the ICM to resolve conflicts is not recommended. Concurrent Managers read the request information from the FND Concurrent Request tables. Navigate to the Concurrent > Manager > Define screen. • • Internal Scheduler/Prereleaser Manager The short name for this manager is FNDSCH. Only set this option to 'YES' if you have a good reason to do so.Parallel Concurrent Processing Failover and Load Balancing Swing _lm_global_posts=TRUE _immediate_commit_propagation=TRUE 4. The Conflict Resolution Manager checks Concurrent Program definitions for incompatibility rules. the Conflict Resolution Manager prevents the program from starting until any incompatible programs in the same domain have completed running. 10. The CRM's sole purpose is to resolve conflicts. the Conflict Resolution Manager prevents the program from starting until any incompatible programs in the same domain have completed running. The Conflict Resolution Manager checks Concurrent Program definitions for incompatibility rules. ATG RUP3 (4334965) or higher provides an option to use AQs in place of Pipes. then the Concurrent Request will stack up in the Conflict Resolution Manager. Note: 240818. It is also known as the www. When a program lists other programs as being incompatible with it. 6. and set up the Primary and Secondary Node names for the Transaction Managers.rmoug. When a program lists other programs as being incompatible with it. If a Concurrent Program cannot run on any Concurrent Manager. 5. From note: 362135.org 10 RMOUG Training Days 2009 . then the Conflict Resolution Manager prevents the Concurrent Managers from starting other programs in the same conflict domain. Navigate to Profile > System and change the Profile Option ‘Concurrent: TM Transport Type' to ‘QUEUE'. When a Concurrent Program is started. then the Conflict Resolution Manager prevents the Concurrent Managers from starting other programs in the same conflict domain. Conflict Resolution Manager Concurrent Managers read requests to start Concurrent Programs.

the Internal Concurrent Manager. there would be a chance the secondary node would not be available.0. a few managers. Its job is to determine when a scheduled request is ready to run.rmoug. Recurring Journals. but are not widely used by the various Applications modules. However. for example. Budget Formulas. Advanced Schedules were not fully implemented in Release 11. www. Most managers won’t start if a primary node is not assigned. If a secondary node is not defined. If the secondary node was specified. Internal Concurrent Manager Failover Definition Release 11i Define Primary and Secondary Nodes in Release 11i Figure 11 By not specifying a secondary node the ICM can failover to any node that is available. They are implemented in Release 11i. MassAllocations. General Ledger uses FNDSCH for financial schedules based on different calendars and period types. In Release 12 this works differently. including primary node RH3. It is then possible to schedule AutoAllocation sets. This manager is intended to implement Advanced Schedules.org 11 RMOUG Training Days 2009 .Parallel Concurrent Processing Failover and Load Balancing Swing Advanced Scheduler/Prereleaser Manager. is available for all managers in 11i. to failover to an un-named secondary node. for failover to function properly. Consider a system that has three or more concurrent processing nodes and two nodes go down. Release 12 In Release 12. both primary and secondary nodes must be specified. the manager will not failover. If financial schedules in GL are not being used then it is not a problem to deactivate this manager. and MassBudgets to run according to the General Ledger schedules that have been defined. and the Conflict Resolution Manager will start on any available node. This capability.

Parallel Concurrent Processing Failover and Load Balancing Swing Figure 12 Release 12 Generic Services and Request Processing Managers Figure 13 GENERIC SERVICES Generic Services include the Internal Concurrent Manager and Conflict Resolution Manager.rmoug.org 12 RMOUG Training Days 2009 . www.

HTTP Servers. The introduction of Generic Service Management in Release 11i helped simplify the management of these processes by providing a fault tolerant service framework and a central management console built into Oracle Applications Manager (OAM).Parallel Concurrent Processing Failover and Load Balancing Swing Figure 14 REQUEST PROCESSING MANAGERS Request Processing Managers include the Standard Manager and other Concurrent Managers. Concurrent Managers. many of these processes had to be individually started and monitored by the Applications System Administrator. Service Management is an extension of Concurrent Processing. Figure 15 GENERIC SERVICE MANAGEMENT An E-Business Suite system depends on a variety of services. virtually any application tier service can be integrated into this framework. In the past. and provides a framework for managing processes on multiple host machines. and Workflow Mailers. www. These services are composed of one or more processes. With Service Management.rmoug. since these services can be distributed across multiple host machines. such as Forms Listeners.org 13 RMOUG Training Days 2009 . Management of these processes is complicated.

Applications System Administrators can then configure.org 14 RMOUG Training Days 2009 . services such as the Oracle Forms Listener.rmoug. monitor. and Oracle Workflow Mailer can be run under Service Management. Apache Web listener. www. and control services though a management console that communicates with the ICM. With Service Management.Parallel Concurrent Processing Failover and Load Balancing Swing Figure 16 Figure 16 shows that beginning with Release 11i. Figure 17 shows the Oracle Application Manager (OAM) screen that an Applications System Administrator can use to manage the Concurrent Managers. a Service Manager acts on behalf of the ICM. allowing the ICM to monitor and control service processes on that host. the Internal Concurrent Manager (ICM) manages the various service processes across multiple hosts. On each host. Oracle Reports Server.

TEST – KILL SERVICES TO SEE IF GSM RESTARTS THEM In this example. If a host fails.org 15 RMOUG Training Days 2009 . The ICM itself is monitored and kept alive by Internal Monitor processes located on various hosts.Parallel Concurrent Processing Failover and Load Balancing Swing Figure 17 Service Management provides a fault tolerant system. If a service process exits unexpectedly. we will kill the FNDSM process and the FNDCRM process to see if the Generic Services Manager correctly restarts the process: Kill FNDSM applvis applvis applvis 9007 9159 9161 1 9155 5683 0 11:53 ? 0 11:55 ? 0 11:55 pts/3 00:00:00 FNDSM 00:00:00 FNDLIBR 00:00:00 grep FND [applvis@rh9 scripts]$ kill -9 9007 [applvis@rh9 scripts]$ ps -ef |grep FND applvis 9159 9155 0 11:55 ? 00:00:00 FNDLIBR www. the ICM may start the affected service processes on a secondary host. the ICM will automatically attempt to restart the process.rmoug.

rmoug.org 16 RMOUG Training Days 2009 . This includes setting the environment variable APPLDCP=ON and assigning a Primary Node for all defined managers and services (if not already defined. Users configuring GSM in a multiple-node system should be sure to have followed the instructions for setting up Parallel Concurrent Processing. both of these services were started before I could enter the grep command to find the corresponding process. This includes services on Web/Forms nodes that previously have had no concurrent processing footprint.Parallel Concurrent Processing Failover and Load Balancing Swing applvis applvis 9169 9249 1 5683 0 11:55 ? 0 11:57 pts/3 00:00:00 FNDSM 00:00:00 grep FND Kill FNDCRM [applvis@rh9 scripts]$ ps -ef |grep FNDCRM applvis 8886 1 0 11:52 ? 00:00:00 FNDCRM APPS/ZGA13053E1E1B7BA773417089054DA88F194EAC0D687728CC2551870E6B78C4B439EADB287342795115A88DBC85788CC B4 FND FNDCRM N 10 c LOCK Y RH9 1302318 [applvis@rh9 scripts]$ kill -9 8886 [applvis@rh9 scripts]$ ps -ef |grep FNDCRM applvis 9457 9392 0 12:09 ? 00:00:00 FNDCRM APPS/ZG26430816FA3570354BC57DE47FF105D145F8DE226EFE58CE04B416633DCB901267BFECFA7585114F7090060EFE1147 BE FND FNDCRM N 10 c LOCK Y RH9 1302343 In each case. Figure 18 shows that the entire set of system services may be started or stopped with a single action. Choose an action from the pulldown to start or stop services Figure 18 GSM AND MULTIPLE NODES GSM enables users to manage Applications services across multiple middle-tier nodes.) SEEDED GSM SERVICES When configuring GSM the following GSM Services are seeded automatically: www.

FNDSVCRG – SERVICE CONTROLLER UTILITY FNDSVCRG is an executable introduced as a part of the Seeded GSM Services. once seeded." NOTE: As per ARU.35 or higher with the GSM enabled. Release 12 GSM requires the value of APPLDCP to be set to “ON”. Each Service Manager maintains its own log file named FNDSMxxxx. The value is hard-coded in afpcsq. may be managed under GSM and controlled via the Oracle Applications Manager. at "afpcsq. APPLDCP is internally hard-coded to "ON" when the Generic Service Management (GSM) is enabled--"keeping in mind. FNDSVCRG connects to the database and validates the configuration of the Seeded GSM Service. thereby ignoring the value of APPLDCP. • Once GSM is enabled.5. FND. www.” Parallel Concurrent Processing • In a Release 11i or Release 12 environment with Parallel Concurrent Processing enabled. • If the ICM successfully starts the managers. the Primary Node assignment is optional for the Internal Concurrent Manager.rmoug.lpc version 115. The $FND_TOP/bin/FNDSVCRG executable is triggered from the concurrent processing control script before and after the script starts or stops the service. the current service log file location.5. the APPLDCP environment variable is ignored. the setting of the APPLDCP environment variable is ignored--this is the "default behavior on all Release 12 releases. It provides improved coordination between the GSM monitoring of these services and their command-line control scripts. located in the same directory as the Concurrent Manager log files.10" (3140000) contains "afpcsq. VERIFY GSM • To verify that GSM is working.lpc" version 115. It is useful to examine these log files when there are problems starting services. In short.org 17 RMOUG Training Days 2009 . • If managers and/or services fail to start. • The Internal Concurrent Manager can be started from any of the nodes (host machines) identified as concurrent processing server enabled.FND.35. use of the GSM is required".lpc" version 115. If you cannot locate the Service Manager log file.mgr. According to Oracle’s ATG Development in Note 753678.Parallel Concurrent Processing Failover and Load Balancing • • • • • • Swing Forms Listener Metrics Server Metrics Client Reports Server Apache Listener LINUX users should not Activate the Reports Server under GSM These services. errors should appear in the ICM log file. then GSM has been configured properly. start the Concurrent Managers. Parallel Concurrent Processing APPLDCP Profile Option Starting with Release 11.35 or higher. If a service is not enabled to be managed under GSM.37.H" (3262159) and "Oracle Applications Release 11.1: “As of file "afpcsq. and the current state of the service.H.10. the FNDSVCRG executable will update the service information in the database including the environment context. "Patch 11i. the FNDSVCRG executable does nothing and exits. it is likely that the Service Managers are not starting properly and there is a configuration issue that needs troubleshooting. The script then continues to perform its normal start/stop actions.lpc" version 115. If a service is enabled for GSM management. the ICM uses Service Managers to start all Concurrent Managers and activated services.

there must be an assigned Primary and Secondary Node for each Concurrent Manager. Parallel Concurrent Processing Parallel concurrent processing allows distribution of Concurrent Managers across multiple nodes. it is recommended to not specify the Secondary Node for failover. the processes will not failover as they do in Release 11i. • In Release 11i. However.org 18 RMOUG Training Days 2009 . • Primary and Secondary Nodes need not be explicitly assigned. you can assign Primary and Secondary Nodes for directed load and failover capabilities. Parallel Concurrent Processing The following diagram shows how Parallel Concurrent Processing works: www. with three or more nodes in the concurrent processing tier. If a Primary Node has been assigned to the Internal Concurrent Manager. the ICM and CRM could be configured to run on several of the nodes. • By not specifying the Secondary Node. This is a critical difference between Release 11i and Release 12.rmoug. Release 11i Parallel Concurrent Processing • In releases before Release 11i. GSM can find an available node with Concurrent Processing services that can be used during failover. at any given time. the Internal Concurrent Manager will stay on the node (host machine) where it was started. Parallel Concurrent Processing (PCP) is activated along with Generic Service Management (GSM). then it will be migrated back to that node whenever the node becomes available. However. With parallel concurrent processing implemented with GSM. If a Primary Node is not assigned. availability and scalability (load balancing). the Internal Concurrent Manager will be restarted on an alternate concurrent processing node. the Internal Concurrent Manager will continue to operate on the node where it was restarted. This is because the specified Secondary Node may not be available when the Primary Node goes down. There should be only one ICM and CRM.Parallel Concurrent Processing Failover and Load Balancing • • • • • Swing In the absence of a Primary Node assignment for the Internal Concurrent Manager. Concurrent Managers migrate to the surviving node when one of the concurrent nodes goes down. it can not be activated independently of GSM. Benefits are improved performance. the Internal Concurrent Manager will migrate to that node if it was started on a different node. If the node on which the Internal Concurrent Manager is currently running becomes unavailable. If a Primary Node is assigned. Release 12 Parallel Concurrent Processing • With Release 12. if a Secondary Node is not specified. the Internal Concurrent Manager (ICM) tries to assign valid nodes for Concurrent Managers and other service instances.

the Internal Concurrent Manager (ICM) assigns a valid Concurrent Processing Server Node as the Target Node. a Target Node will not be assigned. the Concurrent Manager will not start (the ICM will not look for another node on which to start the Concurrent Manager). The first Internal Monitor Process to detect that the Internal Concurrent Manager has failed restarts that manager on its own node. it will only try to start up on that node. and can activate and deactivate Concurrent Managers on all nodes. You decide which nodes have an Internal Monitor Process when you configure your system. When an ICM successfully starts www. You can also assign each Internal Monitor Process a Primary and a Secondary Node to ensure failover protection. If both the Primary and Secondary Nodes are unavailable. like Concurrent Managers. However. this node will be the same node where the Internal Concurrent Manager is running. and are activated and deactivated by the Internal Concurrent Manager. If a Concurrent Manager does have an assigned Primary Node.org 19 RMOUG Training Days 2009 . If a Concurrent Processing Server Node is not available. have assigned work shifts. Only one Internal Monitor Process can be active on a single node. the ICM chooses an active Concurrent Processing Server Node in the system.Parallel Concurrent Processing Failover and Load Balancing Swing Web Browser HTML Interface JAVA Interface Web Server Data Forms Server Reports Server Service Manager FNDSM Requests Report Review Agent Logs JInitiator Internal Monitor FNDIMON FNDCRM Internal Monitor FNDIMON FNDCRM ICM FNDLIBR Standard Manager FNDLIBR ICM FNDLIBR Standard Manager FNDLIBR SQL*Net . To provide this fault tolerance. Internal Monitor Processes. if the Primary Node is down. if one exists. In general. it needs high fault tolerance. Internal Monitor Processes: The sole job of an Internal Monitor Process is to monitor the Internal Concurrent Manager and to restart that manager should it fail.rmoug. automatic activation of PCP does not additionally require that Primary Nodes be assigned for all Concurrent Managers and other GSM-managed services. If no Primary Node is assigned for a service instance. This strategy prevents overloading any node in the case of failover.rdx Out Service Manager FNDSM Requests Report Review Agent Logs SQL*Net . it will look for its assigned Secondary Node. parallel concurrent processing uses Internal Monitor Processes. Since the Internal Concurrent Manager must be active at all times. In the case where the ICM is not on a Concurrent Processing Server Node.rdx Out Database Internal Concurrent Manager: The Internal Concurrent Manager can run on any node. The Concurrent Managers are aware of many aspects of the system state when they start up.

However. For the Internal Concurrent Manager. Internal Monitor: Host2 might have the Primary Node as vip2 and Secondary Node as vip1. Parallel Concurrent Processing will not provide database instance failover support. When an instance is down. however. the affected managers and services switch to their Secondary Nodes. 5. Log in to Oracle E-Business Suite Release 11i as SYSADMIN and choose the System Administrator Responsibility. When this Profile Option is set to OFF. Confirm that the Internal Monitor manager is activated from Concurrent > Manager > Administrator. the processes on that node will be shut down and switched to a Secondary Node if possible.ora and listener. 9. with the correct Primary and Secondary Node specifications and work shift details. On all Concurrent Processing nodes.ora file on the database nodes. you assign the Primary Node only. Optimizing the E-Business Suite with Real Application Clusters (RAC) . activating the manager as required. if you prefer to handle instance failover separately from such middle-tier failover (for example. Verify that the Internal Monitor for each node is defined properly. 8.ora located under the 8. it checks the TNS listeners and database instances on all remote nodes. If an instance is down. where node1 is known as vip1 and node2 is known as vip2: 1. 6.0. On all Concurrent Processing nodes. 4. If a node is changed from Online to Offline. www. 2. Concurrent processing provides database instance-sensitive failover capabilities.Ahmed Alomari Set Profile Option 'Concurrent: PCP Instance Check' o to 'ON' means that Concurrent Managers will fail over to a secondary application tier node if the database instance to which it is connected goes down.rmoug. 7. The request may hang if the sessions are load balanced. using the TNS connection-time failover mechanism instead).Parallel Concurrent Processing Failover and Load Balancing Swing up. set the $APPLCSF environment variable to point to a log directory on a shared file system. it will continue to provide middle-tier node failover support when a node goes down. set the $APPLPTMP environment variable to the value of the UTL_FILE_DIR entry in the init. Navigate to the Install > Nodes screen. Worker 1 connected to DB Instance 1 places a message in the pipe. Restart the Applications listener processes on each application node.6 ORACLE_HOME at $ORACLE_HOME /network/admin/<context>. However. and expects Worker 2 (which is connected to DB Instance 2) to consume the message. Do not use a load balanced TNS entry for the value of s_cp_twotask. 3. Ensure that you have information of all the other concurrent nodes for FNDSM and FNDFS entries. use the Profile Option Concurrent:PCP Instance Check. To Set Up PCP with RAC The following assumes a 2 node RAC cluster. Processes managed under GSM will only start on nodes that are in Online mode.org 20 RMOUG Training Days 2009 . Check the configuration files tnsnames. Worker 2 never gets the message since pipes are instance private. all managers connecting to it switch to a secondary middle-tier node. For example. and ensure that each node in the cluster is registered. This value should be a directory on a shared file system.

rmoug. and the IM deduces that the ICM has crashed. www. y y y y y Both the Internal Concurrent Manager and the Internal Monitor can use the DCD functionality of the Network (TCP sqlnet). we conclude that the ICM is fine. is a connection-oriented protocol. o If the ping fails. The ICM holds the named PL/SQL Lock. If the client end of the connection is still active. This is a server feature only. the Server-side process sends a small 10-byte packet to the client. Dead Connection Detection – sqlnet. the server will receive an error from the send call issued for the probe. ICM Process Monitor (PMON). the ICM can be down. o If it has been more than four PMON cycles we conclude that the ICM is dead. TCP/IP. On Unix servers. we further check if it has been over four PMON cycles since the ICM updated the work_start column in the FND_CONCURRENT_QUEUES table. if the connection is idle for the duration of the time interval specified in minutes by the SQLNET. The client may be running any supported SQL*Net V2 release. and SQL*Net on the server will signal the operating system to release the connection's resources. DCD is initiated on the server when a connection is established. the probe is discarded. and as such.EXPIRE_TIME parameter. TCP Keepalive.Parallel Concurrent Processing Failover and Load Balancing Swing o to 'OFF' if instance-sensitive failover is not required. If the client has terminated abnormally. With DCD enabled. so this is bad logic that can lead to false positives. Connection Failure Recovery (Release 12). and releases the resources associated with it. This packet is sent using TCP/IP.ora file. Obviously.org 21 RMOUG Training Days 2009 .EXPIRE_TIME parameter in the sqlnet. then SQL*Net receives notification that the probe failed. including Oracle Net8. even if TCP is working. As soon as the “ICM lock” is released by the DB / DCD. but it creates data traffic on the underlying protocol. o If the ping succeeds. The timer interval is set by providing a non-zero value in minutes for the SQLNET.ora file must be in either $TNS_ADMIN or $ORACLE_HOME/network/admin. After TCP/IP gives up. DCD detects when a partner in a SQL*Net V2 client/server or server/server connection has terminated unexpectedly. Oracle Network Basics There are four failover methods (and one method that we haven’t tested yet) that can be used once a TCP failure is detected: Dead Connection Detection. SQL*Net on the server sends a "probe" packet to the client. and 10g Timeout Parameters (our untested method). for example. If a timely acknowledgement is not received in response to the probe packet. The probe is an empty SQL*Net packet and does not represent any form of SQL*Net level data. and the timer mechanism is reset. the “ICM lock”. DCD is much more resource-intensive than similar mechanisms at the protocol level. the sqlnet. The ICM is a client process connected to a DCD-enabled DB dedicated server process. 1. At this time SQL*Net reads the SQL*Net parameter files and sets a timer to generate an alarm.1 and later. the TCP/IP stack will retransmit the packet some number of times before timing out. Neither /etc nor /var/opt/oracle alone is valid. When the timer expires.expire_time=1 (minute) Dead Connection Detection (DCD) is a feature of SQL*Net 2. FNDIMON pings the ICM node. the protocol will implement some level of packet timeout and retransmission in an effort to guarantee the safe and sequenced order of data packets. The IM is continuously trying to check whether it can get the same named PL/SQL Lock.

To Configure Dead Connection Detection (DCD) Implement by: adding SQLNET. After TCP/IP gives up... 2. It is a function of the TCP stack in use and is NOT an Oracle mechanism. TCP KEEPALIVE PARAMETERS FOR LINUX: tcp_keepalive_time the time since the last data packet sent and the first keepalive probe tcp_keepalive_intvl the time between keepalive probes tcp_keepalive_probes the number of probes to be sent before declaring the connection dead Initial Settings tcp_keepalive_time = 200 seconds tcp_keepalive_intvl = 20 tcp_keepalive_probes = 2 After 200 seconds of no response. The maximum number of times a TCP packet is retransmitted in established state before giving up tcp_retries2 (default: 15) www. TAF notices this error and performs fail-over as if the remote instance had been aborted. DCD will not initiate clean up sessions that are still connected . However.EXPIRE_TIME = 1 (Minutes) to the sqlnet. it is possible to enable this by adding a parameter to the sqlnet. If a timely acknowledgement is not received in response to the probe packet. although Oracle can request for KeepAlive to be enabled or disabled for a given connection.ora file With DCD enabled.rmoug.ora file. SQL*Net connections do not enable keepalive for TCP connections by default. Then. This protocol implements a level of packet timeout and retransmission to help guarantee the safe and sequenced order of data packets.Parallel Concurrent Processing Failover and Load Balancing y y Swing The DCD comes into the picture here after the ICM has crashed and the database needs to identify that the ICM is gone. then SQL*Net receives notification that the probe failed. but are idle / abandoned / inactive. resetting the timer. if the connection is idle for the duration of the time interval specified in minutes by the SQLNET.EXPIRE_TIME parameter. The database needs to clean up the dedicated server process resource corresponding to the ICM client process. the Server-side process sends a small 10-byte packet to the client. and SQL*Net removes the offending connection. TCP sends the first of 2 probes. TCP/IP is a connection-oriented protocol. the TCP/IP stack will retransmit the packet some number of times before timing out. TCP notifies SQL*Net of the failure. If the server dies then keepalive will notice this and signal an error to Oracle Net code. and another packet will be sent when next interval expires (assuming no other activity on the connection If the client fails to respond to the DCD probe packet: • The Server side process is marked as a dead connection and • PMON performs the clean up of the database processes / resources and • The client OS processes are terminated Dead Connection Detection: 1. 20 seconds apart. the client sends a response packet back to the database server. tcp_retries1 (default: 3) The number of times TCP will attempt to retransmit a packet on an established connection normally. DCD initiates clean up of OS and database processes that have disconnected / terminated abnormally 2. If the client side connection is still connected and responsive. Adding this parameter turns on a TCP level facility which can detect the loss of a server.org 22 RMOUG Training Days 2009 . In a RAC environment. This packet is sent using TCP/IP. TCP Keepalive Keep-Alive is a TCP/IP mechanism that allows a connection to detect if the partner has unexpectedly died. without the extra effort of getting the network layers involved.

The client waits a specified amount of time (OS configurable usually) like 200ms. the timeout period was reduced to about 20 seconds. one failover was initiated at a measured time of 6 seconds.Parallel Concurrent Processing Failover and Load Balancing Swing tcp_syn_retries (default: 5) The maximum number of times initial SYNs for an active TCP connection attempt will be retransmitted. client side SQL*Net connections do not enable keepalive for TCP connections by default. If there are a lot of IDLE connections on your network. corresponds to approximately 180 seconds. Keepalive enables dead connections to be discovered and closed more quickly.tcp_keepalive_time 3000 net.it can take more than 2 hours to notice a dead server even if keepalive is enabled).ora file.tcp_syn_retries 1 By changing some of these parameters. To make keepalive useful for PCP and TAF the keepalive interval needs to be reduced to a smaller value (such as 2 minutes). However. When configured correctly. Multiple measurements at 5 seconds recorded no change in connection status. The measured average was 8 seconds.1 net.2 seconds have passed by. Receiving no response. Again receiving no response.1 Six seconds is very close to the time measured during tests with tcp_syn_retries and tcp_retries2 set to 2.e. freeing resources used on the server more quickly.. with the following breakdown for the timeout: • • • • • • • • The client initiates a TCP/IP three-way handshake. **WARNING** Keepalive intervals can typically be set to 2 hours or more (i. The default value is 5.org 23 RMOUG Training Days 2009 . By now 6. At the time of this document. then reducing keepalive can increase network traffic significantly.ipv4. On Sun this interval is tcp_ip_abort_cinterval and defaults to 3 minutes (180000ms). the time to initialize the PCP failover was an average of 8 seconds after changing these TCP parameters.tcp_retries2 5 net. it is possible to enable this by adding the ENABLE=BROKEN parameter to the SQL*Net connect string.” Note: 249213. It sends the SYN packet again. Now let’s consider an example where the following TCP parameters are changed from their default values: tcp_retries1 = 2 tcp_retries2 = 2 tcp_syn_retries = 2 In this example.rmoug. it waits 1600ms and tries again. However. the client gives up. but still gets no response. Therefore it keeps trying every 3200ms until a magic interval occurs and it stops. It waits 400ms and tries again. but there is no response. Sample TNS alias to enable keepalive (notice the ENABLE=BROKEN clause) VIS_BALANCE = (DESCRIPTION = (ENABLE=BROKEN) www.ipv4. After another wait of 3200ms. We found the following Linux parameters listed in the Metalink note: 249213. by adding this parameter to the sqlnet. it waits 800ms and tries again.ipv4.

Parallel Concurrent Processing Failover and Load Balancing Swing (ADDRESS_LIST = (LOAD_BALANCE = ON) (FAILOVER = ON) (ADDRESS = (PROTOCOL = TCP)(HOST = rh8)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = rh6)(PORT = 1521))) 3. introduced with Patch 6495206. FNDIMON will now ping the node of the ICM. ICM Process Monitor (PMON) – once TCP fails.rmoug. If the ping succeeds.org 24 RMOUG Training Days 2009 . we conclude that the ICM is fine. Release 11i only uses PMON if patch 6495206 has been applied. If the ping fails. Connection Failure Recovery (Release 12) When Concurrent Managers fail due to a loss of the database connection: • A Reviver process will be started www. takes 2 minutes • • • • If the “ICM lock” is not available. DEFAULT PMON SETTINGS Figure 19 shows the Oracle Application Manager screen with the PMON settings for this instance: Click here to edit the PMON parameters Figure 19 4. The PMON method is included in Release 12. If it has been more than four PMON cycles we conclude that the ICM is dead. this method. we further check if it has been over four PMON cycles since the ICM updated the WORK_START column of the FND_CONCURRENT_QUEUES table.

In the case where only a single Concurrent Processing node is being used. resulting in a loss of database connectivity. 10g Timeout Parameters (Untested Solution) With the release of Oracle 10g. and these other nodes retain their database connection. as it will detect the Reviver and shut it down. This can lead to lost productivity between the time the network is restored and when the managers are restarted.ora file on the client or server: • • • sqlnet. the method that recognizes the failure first depends on the timeout settings of each method. or when all Concurrent Processing nodes lose their database connection (for example if the database node suffers a network failure). when the network comes back up. the Concurrent Managers will restart automatically as soon as connectivity is restored. is started. To achieve this. the adcmctl. instead of waiting for the TCP timeout to occur.inbound_connect_timeout (server) sqlnet. In addition this allows the Applications System Administrator to maintain control over Concurrent Processing even when network or database failure has brought down Concurrent Processing. If you wish to disable Connection Failure Recovery you can do so by setting the Concurrent Processing Reviver Process context file variable to “Disabled”. In Release 12.org 25 RMOUG Training Days 2009 . the Reviver will restart Concurrent Processing Concurrent Processing can be started / stopped when the network or database is down This should reduce processing down time because Concurrent Processing restarts as soon as possible This should reduce the Applications System Administrator’s workload.Parallel Concurrent Processing Failover and Load Balancing • • • • Swing When the database connection is possible.rmoug.send_timeout (client and/or server) sqlnet. The following settings can be used in the sqlnet. When a network failure occurs on a concurrent processing node. all Concurrent Managers running on that node will eventually be forced to shut down. When the connection is down. as there is no automatic restart facility. Oracle can time out within a desired period.recv_timeout (client and/or server) This method should provide automated recovery for Concurrent Managers after network or database failures. since he will no longer need to take the extra step of restarting the Concurrent Managers Of the first three methods. the managers must be restarted manually. This process will remain alive until it is able to obtain a database connection and restart Concurrent Processing. when a connection failure situation arises. Without this feature. www. an administrator can still start CP using the adcmctl. a new monitor process. When Concurrent Processing is down and a Reviver process is actively waiting to restart Concurrent Processing. all running Concurrent Managers on the entire instance will be forced to shut down.sh script and by doing so it will start a Reviver process. Method 4 is used to perform failover. in Release 11i. the managers will migrate to the working nodes. the Reviver. There is no additional setup required to use Connection Failure Recovery. 5.sh script can be used to stop Concurrent Processing. In cases where multiple Concurrent Processing nodes are being used. With this new feature.

sh – code summary Sleep 30 Test_connection Kill_old _icm Get session Alter system kill session Check_running_icm Fnd_conc.Read APPS username/password. Reviver Context variables • Concurrent Processing Reviver Process s_cp_reviver • Reviver Process PID Directory Location s_cp_fndreviverpiddir Writable directory location to create a pid file for ICM reviver process www.sh complete.. reviver.Killing previous ICM session.log: reviver.Attempting database connection...sh This example shows the reviver.org 26 RMOUG Training Days 2009 . reviver.ICM now running. [ Mon Jan 12 20:02:45 MST 2009 ] . [ Mon Jan 12 20:02:15 MST 2009 ] ..Successful database connection..ecm_alive start_icm startmgr. [ Mon Jan 12 20:02:45 MST 2009 ] . [ Mon Jan 12 20:02:45 MST 2009 ] . [ Mon Jan 12 20:02:45 MST 2009 ] . Commit complete... [ Mon Jan 12 20:02:45 MST 2009 ] .rmoug.. 1 row updated.sh starting up.Parallel Concurrent Processing Failover and Load Balancing Swing ICM Starts to Shutdown REVIVER Start No Receive Shutdown? Lost DB Connection? Yes Attempt to Get DB Connection No Sleep Yes No Spawn Reviver Yes Kill Previous DB Session ICM Started? Start ICM Exit Yes Exit No From Aaron Weisberg at Oracle.Looking for a running ICM process.

and triggers the database server process cleanup For example. it will start up any other managers that had been shut down and normal processing will resume. It then checks to make sure an ICM is successfully running. This can be seen from the Administer Concurrent Manager screen in the System Administrator Responsibility.rmoug. it will not exit until a new ICM is running. The $APPLCSF/log/. The Log and Out directories must be on a shared disk On HOST2. FNDIMON and FNDSM run independently on each concurrent processing node. If the Concurrent Managers are set up for PCP fail-over: • Failover is triggered when a node running the ICM goes down • • • When the ICM goes down. When the script starts. if: Primary Node = HOST1 – The Managers assigned to the Primary Node are ICM (FNDLIBR-cpmgr). and FNDCRM Secondary node = HOST2 – The Manager assigned to the Secondary Node is Standard Manager (FNDLIBR) When HOST1 becomes unavailable (this means TCP is no longer working). This is done by looking for specific error messages ORA-3113. along with the Oracle session id of the current ICM process. it will first kill the old ICM database session to make sure any locks are released. Once the ICM is restarted. it will attempt to make a database connection using sqlplus. ORA-3114 or ORA-1041. both the ICM and FNDCRM are migrated to HOST2. then start a new ICM using the normal startmgr script. and FNDLIBR are now migrated and running. PCP Failover Failover is the process of migrating the Concurrent Managers from the Primary Node to the Secondary Node because of a concurrent processing tier failure or listener failure. after the PMON cycle. If one of these errors is detected: • The ICM will assume that it has lost its database connection and will spawn the reviver process.org 27 RMOUG Training Days 2009 . When it successfully makes a connection.Parallel Concurrent Processing Failover and Load Balancing Swing As part of its shutdown process. FNDCRM. FNDICM.mgr logfile shows that HOST1 is being added to the unavailable list. FNDSM is not a persistent process. The ICM will pass the Apps username/password to the script using a secure protocol. Failback is when the Primary Node becomes available again and the Concurrent Managers need to migrate back to their original Primary Node. it will sleep for a 30 seconds before trying again. and FNDIMON is a persistent process local to each node www. the ICM will detect that it is being forced to shut down due to losing its database connection. It will continue this until it either successfully makes a connection or it receives a signal to shut itself down. If unsuccessful. the connected database server process clears its resources (including named PL/SQL “ICM lock”) The database server process cleanup is dependent on the DCD mechanism of the network (sql*net) sql*net determines that a connected client has closed down through the DCD mechanism.

Then. sqlnet determines that connected client has closed down through DCD mechanism and triggers database server process cleanup • • • • • • • • • • 11i PCP Failure The following steps occur in the order indicated: • TCP Failure • ICM Lock is released. even if TCP is working. ICM Failover in Release 11i • • • • • ICM and IM use the DCD functionality of the Network (TCP sqlnet). and the following TCP Keepalive parameters: www. Fail over is triggered when node running the ICM goes down This ICM going down would lead to connected database server process clearing its resources (including named PL/SQL lock) In turn. and the IM deduces that the ICM has crashed. o Obviously.expire_time. FNDIMON pings the ICM node. check PMON • PMON detects a “dead process”. and failover is begun. the ICM can be down. IM is continuously trying to check whether it can get the same named PL/SQL Lock. The DCD works after the ICM has crashed and DB needs to identify that the ICM is gone. As soon as the “ICM lock” is released by the DB / DCD from the ICM crash.Parallel Concurrent Processing Failover and Load Balancing Swing Be aware that if a TCP failure is not detected. ICM holds the named PL/SQL Lock. this is bad logic. failover will not occur. Variables: sqlnet.org 28 RMOUG Training Days 2009 .sh • DCD R12 PCP Failure • • • • • TCP Failure PMON detects a “dead process” ICM Shutdown o Look for error messages ORA-3113. ICM is a client process connected to a DCD enabled DB dedicated server process.rmoug. the “ICM lock”. FNDIMON pings ICM node. If the ping fails. the DB needs to clean up the dedicated server process resource corresponding to the ICM client process If the “ICM lock” is not available. FNDIMON will now ping the node of the ICM. we conclude that the ICM is fine. crashed ICM • reviver. If the ping succeeds. The following excerpt from a Concurrent Manager log shows the case where a failure is detected: fdpsrp() (running_processes correction): ICM cannot obtain exclusive lock on FND_CONCURRENT_QUEUES Oracle error code returned: 1 This message is information and does not indicate a problem with CP functionality. we further check if it has been over four PMON cycles since the ICM updated the WORK_START column in the FND_CONCURRENT_QUEUES table. If it has been more than four PMON cycles we conclude that the ICM is dead. ORA-3114 or ORA-1041 reviver.Function to call: PingProcess The PingProcess at the end of this log continues until the concurrent manager processes resume. PMON sleep and number of cycles. the database server process cleanup is dependent on DCD mechanism of network (sqlnet) That is. PMON and TCP failover methods.sh DCD Test PCP Failover Components Test to explore effect of DCD. remote call function (FNDIMON) 15-AUG-2008 10:06:02 . if ping fails. or a TCP failure is detected.

new value 2) Failover time / Failback time In Seconds 241/ 250/ 50 262 / 100 300 / 75 285/ 35 8/ 105 10/ 42 7/ 40 6/ 34 Expire_time In Minutes PMON Sleep PMON Cycles tcp_KA time tcp KA intvl tcp KA probes tcp retries tcp retries2 tcp syn retries 1 5 10 1 10 1 1 10 1 30 secs 30 secs 30 secs 15 secs 30 secs 30 secs 30 secs 30 secs 15 secs 4 4 4 2 4 4 4 4 2 200 200 200 200 1000 1000 200 200 200 20 20 20 20 60 60 20 20 20 2 2 2 2 10 10 2 2 2 3 3 3 3 3 2 2 2 2 15 15 15 15 15 2 2 2 2 5 5 5 5 5 2 2 2 2 Test the Failover and Failback of Parallel Concurrent Processing In Figure 20. CRM and Standard Managers all have their Primary Node as RH9.org 29 RMOUG Training Days 2009 . new value 2) tcp_retries2 (default: 15.rmoug. www. the ICM. tcp_keepalive_probes tcp_retries1 (default: 3. new value 2) tcp_syn_retries (default: 5.Parallel Concurrent Processing Failover and Load Balancing Swing • • • • • • tcp_keepalive_time. Oracle Application Manager (OAM) shows the details of the Internal Manager (ICM) Activated on RH9: Figure 20 In Figure 21. tcp_keepalive_intvl.

: 12-JAN-2009 15:22:55 - www. Review concurrent manager log file for more detailed information.rmoug.org 30 RMOUG Training Days 2009 .Parallel Concurrent Processing Failover and Load Balancing Swing Figure 21 In Figure 22. we can see that the Standard Manager is configured to failover to the Secondary Node RH7: Figure 22 Disconnect TCP Connection from RH9 The Internal Concurrent Manager has encountered an error.

OAM shows node RH9 is down. Spawned reviver process 1541. Actual=0 and Target=1. Node RH9 is down! Figure 23 Figure 24 shows the CRM is down.Parallel Concurrent Processing Failover and Load Balancing Swing Shutting down Internal Concurrent Manager : 12-JAN-2009 15:22:55 12-JAN-2009 15:22:55 The ICM has lost its database connection and is shutting down. Adding it to the list of unavailable nodes. cpid=(1301550). cpid=(1302176). manager=(0/1) Process monitor session started : 12-JAN-2009 15:18:27 Internal Concurrent Manager found node RH9 to be down.ora Database Listener SQL*Net Client SQL*Net Client TCP_KEEPALIVE takes 240 seconds before starting DCD Found dead process: spid=(1185). as well as all the application services on RH9. Found dead process: spid=(17963). www.org 31 RMOUG Training Days 2009 . Spawning reviver process to restart the ICM when the database becomes available again. The VIS_0112@VIS internal concurrent manager has terminated with status 1 . ORA pid=(78). manager=(0/1) DB Node – RH8 RH7 PCP RH9 PCP Database sqlnet. ORA pid=(26).giving up.rmoug. CONC-SM TNS FAIL Call to PingProcess failed for XDPCTRLS CONC-SM TNS FAIL Call to PingProcess failed for XDPQORDS In Figure 23.

rmoug. we can see defunct processes: The CRM and two other FNDLIBRs are shutting down. and the Internal Manager is failed over to RH7. Service Instance=(1051) If we run the command ps-ef | grep applvis. but can’t.Parallel Concurrent Processing Failover and Load Balancing Swing The Conflict Resolution Manager is down! Figure 24 The ICM tries to restart the CRM and other failed processes. RH9 is shown as down. cpid=(1301562). as shown in Figure 25: www. CONC-SM TNS FAIL Found dead process: spid=(999999). TCP is disconnected. but the FNDSM is still running. The ICM is still running in another FNDLIBR. show below: The FNDSM Service Manager is still running. Service Instance=(1050) Starting XDP_Q_EVENT_SVC Concurrent Manager : 12-JAN-2009 15:19:21 CONC-SM TNS FAIL Found dead process: spid=(999999).org 32 RMOUG Training Days 2009 . cpid=(1301563).

org 33 RMOUG Training Days 2009 .Parallel Concurrent Processing Failover and Load Balancing Swing RH7 is now running the Internal Manager Figure 25 RH7 starts up the Conflict Resolution Manager in Figure 26: RH7 starts up the Conflict Resolution Manager Figure 26 In Figure 27. RH7: www. the Concurrent Managers have started processing Concurrent Rerquests on the Secondary Node.rmoug.

only the Session History Cleanup.rmoug. Standard Manager and WMS Task Archiving Manager have Secondary Nodes defined. the Primary Node is RH9 and the Secondary Node is RH7. unlike Release 11i. In Figure 29.org 34 RMOUG Training Days 2009 .Parallel Concurrent Processing Failover and Load Balancing Swing Figure 27 Figure 28 shows the Oracle Applications Manager screens with RH7 activated: Figure 28 It is important to note that. Release 12 doesn’t failover a manager if there is no Secondary Node defined. In this case. www.

exiting. RH9.org 35 RMOUG Training Days 2009 .Parallel Concurrent Processing Failover and Load Balancing Swing The Inventory Manager. Process monitor session ended : 12-JAN-2009 15:21:15 : Migration of ICM has completed. Figure 30 In Figure 31. Figure 29 ICM Failover Figure 30 shows the Internal Manager processing migrating back to the Primary Node. Starting Internal Concurrent Manager Concurrent Manager : 12-JAN-2009 15:19:45 : Started ICM on Target RH7. Shutting down Internal Concurrent Manager : 12-JAN-2009 15:21:45 The VIS_0112@VIS internal concurrent manager has terminated successfully . MRP Manager and OAM Metrics Collection Manager will not failover unless they are defined to do so.rmoug. the Internal Manager is up for RH9 and the Conflict Resolution Manager is starting up on RH9: www.

rmoug. www. from RH9 to RH7.Parallel Concurrent Processing Failover and Load Balancing Swing Figure 31 Figure 32 Failover is complete. In the next section the TCP is reconnected and the failback from RH7 to RH9 is documented.org 36 RMOUG Training Days 2009 . for the ICM and CRM. Connect TCP connection Failback from RH7 to RH9 Failback from RH7 to RH9 is starting: Start of Failback Starting Internal Concurrent Manager Concurrent Manager : 12-JAN-2009 15:12:35 : Started ICM on Target RH9. Process monitor session ended : 12-JAN-2009 15:14:05 : Migration of ICM has completed.

org 37 RMOUG Training Days 2009 .Parallel Concurrent Processing Failover and Load Balancing Swing Shutting down Internal Concurrent Manager : 12-JAN-2009 15:14:35 The VIS_0112@VIS internal concurrent manager has terminated successfully . The Target Node is the node that the processes associated with a Concurrent Manager should run.shell process ID 14927 logfile=/d01/oracle/VIS/inst/apps/VIS_rh8/logs/appl/conc/log/VIS_0112.rmoug. ======================================================================= Starting VIS_0112@VIS Internal Concurrent Manager -.mgr PRINTER=noprint mailto=applvis restart=N diag=N sleep=30 pmon=4 quesiz=1 Reviver is ENABLED End of Failback Administer Concurrent Managers Figure 33 Target Nodes Using the Services Instances page in Oracle Applications Manager (OAM) or the Administer Concurrent Managers form. It can be the node that is explicitly defined as the Concurrent Manager's Primary Node in the Concurrent Managers window or the node assigned by the Internal Concurrent Manager. www. you can view the Target Node for each Concurrent Manager in a parallel concurrent processing environment. if no Primary Node is defined.exiting.

then when its Primary Node and ORACLE instance are available.Parallel Concurrent Processing Failover and Load Balancing Swing Figure 34 If you have defined Primary and Secondary Nodes for a manager. Otherwise. processes migrate from their current node to the Target Node. Control Across Nodes Using the Application Services category on the Site Map page in Oracle Applications Manager or the Administer Concurrent Managers form. it is possible to start. During process migration. and monitor Concurrent Managers and Internal Monitor Processes running on multiple nodes from any node in your parallel concurrent processing environment.org 38 RMOUG Training Days 2009 . stop. www. abort. the Target Node is set to the manager's Secondary Node (if that node and its ORACLE instance are available). restart. the Target Node is set to the Primary Node.rmoug.

It is possible to terminate the Internal Concurrent Manager or any other Concurrent Manager from any node in your parallel concurrent processing environment using Oracle Application Manager: www.org 39 RMOUG Training Days 2009 .Parallel Concurrent Processing Failover and Load Balancing Swing Figure 35 Figure 36 shows that It is not necessary log onto a node to control concurrent processing on it.rmoug.

followed by the Conflict Resolution Manager and then the other Generic Managers. www.sh script is run.sh script from the operating system prompt. as shown below: adcmctl.Parallel Concurrent Processing Failover and Load Balancing Swing Figure 36 Starting the Concurrent Managers The Internal Concurrent Manager starts first.org 40 RMOUG Training Days 2009 .sh start apps/apps The Internal Concurrent Manager starts up on the node where the adcmctl. Concurrent Managers and Transaction Managers. Figure 37 Start up parallel concurrent processing by running the adcmctl. If it is assigned to a different node. the ICM will migrate to the Primary Node.rmoug. when available.

it starts all the Internal Monitor Processes and all the Concurrent Managers. the defaults for the PMON settings are initially displayed: www. It attempts to start Internal Monitor Processes and Concurrent Managers on their Primary Nodes.rmoug. From the Concurrent Manager logs: Starting VIS_0815@VIS_BALANCE Internal Concurrent Manager -.shell process ID 978956 logfile=/VIS/logs/apps/log/VIS_0815.Parallel Concurrent Processing Failover and Load Balancing Swing After the Internal Concurrent Manager starts up.org 41 RMOUG Training Days 2009 . and resorts to a Secondary Node only if a Primary Node is unavailable.mgr PRINTER=noprint mailto=VIS restart=N diag=Y sleep=15 pmon=4 quesiz=1 (default) Edit the ICM Runtime Parameters Figure 38 shows that you can edit the ICM Runtime Parameters from Oracle Application Manager: Figure 38 In Figure 39.

rmoug. This should recognize a failure 1 minute after TCP finds a “dead peer”.org 42 RMOUG Training Days 2009 .Parallel Concurrent Processing Failover and Load Balancing Swing Figure 39 Figure 40 shows that you can change the Sleep Interval to 15 seconds and keep the PMON cycles at 4. www.

Parallel Concurrent Processing Failover and Load Balancing Swing Figure 40 Once you’ve saved your changes.org 43 RMOUG Training Days 2009 . Figure 41 shows a screen that confirms that you made changes: Figure 41 www.rmoug.

The # manager will attempt a restart after an abnormal termination # if the past invocation lasted for at least RESTART minutes. # # ARGUMENTS DEFAULT # [appmgr|sysmgr]=username/password # [sleep=sleep_seconds] 15 # [mgrname=manager_name] icm # [logfile=log_filename] $FND_TOP/$APPLLOG/$mgrname. startmgr. # # RESTART is set to N if the manager should not restart itself after # a crash.sh script accepts the schema logon when passed as the sysmgr parameter. Otherwise. # # QUESIZ is the duration of time between worker quantity # checks (checks for number of active workers).sh: startmgr.sh www.. it is an integer number of minutes. # # SLEEP holds the number of seconds that the manager should wait # between checks for new requests.. Note that the Applications User must have System Administrator responsibility in order to be able to successfully start Concurrent Processing. # # LOGFILE is a filename in which the manager's own log is stored. adcmctl. # # MGRNAME is the name of the manager for locking and log purposes. # # Parameters may be sent via the environment."] current user # [PRINTER=printer_name] # [pmon=iterations] 4 # [quesiz=pmon_iterations] 1 # [diag=Y|N] N # # SYSMGR holds the Oracle user as whom the manager should run # and its password.sh or adcmctl. The unit # of time is process monitor checks.sh • Schema logon is passed using sysmgr parameter • Apps logon may be passed using appmgr parameter • Apps user must have System Administrator responsibility The startmgr. Now it will also accept an Applications user sign on via the appmgr parameter.rmoug..Parallel Concurrent Processing Failover and Load Balancing Swing Make sure the PMON changes are made in the $FND_TOP/bin/batchmgr. The unit of time # is concurrent manager iterations (request table checks). # # MAILTO is a list of users who should receive mail whenever # the manager terminates. FILENAME # batchmgr # DESCRIPTION # fire up Internal Concurrent Manager process # USAGE # batchmgr arg1=val1 arg2=val2 .. ## # PMON is the duration of time between process monitor # checks (checks for failed workers).org 44 RMOUG Training Days 2009 . Concurrent Processing is typically started from the command line by using one of these start scripts.mgr # [restart=N|mim minutes between restarts] N # [mailto="user1 user2.sh file.

sometimes the services would not failback on RH9.sh script. in place of the schema logon you should specify Apps:User as shown here. To start using the Application Sign On instead. • Schema logon style: o CONCSUB apps/appspass SYSADMIN ‘System Administrator’ SYSADMIN CONCURRENT FND FNDSCARU <parameters> New Apps User Sign On Style o CONCSUB Apps:User SYSADMIN ‘System Administrator’ User/UserPass CONCURRENT FND FNDSCARU <parameters> • For this example we will use the Concurrent Program FNDSCARU. There is a context file variable that determines whether this script expects a schema logon or an Applications logon. Functional Security is enforced for Request Submission.sh stop apps/apps After the failover test. Figure 42 shows the OAM Dashboard and indicates that RH9 and the applications services are unavailable. If you pass the Apps:User parameter but do not supply a password for your specified Applications username. To do so. By default the schema logon is expected. edit the context file variable Concurrent Processing Password Type and set its value to AppsUser.Parallel Concurrent Processing Failover and Load Balancing • • • Swing Accepts a single username/password combination By default it is the schema logon Context File variable: Concurrent Processing Password Type o AppsSchema or AppsUser The adcmctl. an error message will be printed to the screen. The script will then begin to expect an Applications Username and Password. you would run the CONCSUB program from the command line as shown here. Previously to submit a request to run FNDSCARU using CONCSUB. the schema logon apps/appspass and the Applications User logon of User/UserPass.sh script from the Release 11i .rmoug. After the Applications username and password is authenticated. www. CONCSUB will verify that the user has the appropriate permission to submit the Concurrent Request. Remember. Run the adcmctl. you will be prompted to enter the password. It will accept a single username/password combination. Then for the Applications username parameter you should append the corresponding password. This indicates that an Applications User Sign On will be used. Now you can choose to authenticate instead using an Applications username and password. Shutting Down Managers You shut down parallel concurrent processing by issuing a "Stop" command in the OAM Service Instances page or a "Deactivate" command in the Administer Concurrent Managers form.sh script is more commonly used. All Concurrent Managers and Internal Monitor processes are shut down before the Internal Concurrent Manager shuts down. If the security check fails. the test pulls the TCP cable from the host.$COMMON_TOP/admin/scripts/<Context Name>.$INST_TOP/admin/scripts: adcmctl. Then run autoconfig to regenerate the adcmctl.org 45 RMOUG Training Days 2009 . or Release 12 .

GSM is able to restart the services. if you have this environment set: $APPLCSF = /d01/oracle/VIS/inst/apps/VIS_rh9 $APPLLOG = log $APPLOUT = out Then: • • Log files go to: /d01/oracle/VIS/inst/apps/VIS_rh9/logs Out files to: /d01/oracle/ VIS/inst/apps/VIS_rh9/out www. Figure 44 Concurrent Manager Log and Out Directories The Concurrent Manager first looks for the environment variable $APPLCSF. except the concurrent processing.Parallel Concurrent Processing Failover and Load Balancing Swing Figure 42 In order to restart the services on RH9. which was stopped Figure 43 In order to start the Concurrent Managers use: adcmctl. it creates a path using two other environment variables: $APPLLOG and $APPLOUT It places log files in $APPLCSF/$APPLLOG.sh start apps/apps This starts the concurrent processing on all nodes.rmoug. If this is set.org 46 RMOUG Training Days 2009 . first stop all the services on RH9 with: adstpall. output files go in $APPLCSF/$APPLOUT So. for example.sh apps/apps (sometimes a kill -9 -1 is necessary as the APPLMGR user) By stopping the services.

and will be found in the $APPLLOG directory Concurrent Processing Tables Major tables that contain information about concurrent processing: Table FND_CONCURRENT_REQUESTS FND_CONCURRENT_PROGRAMS Description Details of user requests. includes a history of Concurrent Manager requests. but not necessarily an output file. including status. For example. whether the program is constrained and whether there are incompatibilities Cross reference between concurrent requests and queues. Node info including availability status PMON and Reviver parameters FND_CONCURRENT_PROCESSES FND_CONCURRENT_QUEUES FND_NODES FND_CONCURRENT_QUEUE_PARAMS www. it places the files under the product top of the application associated with the request. All concurrent requests produce a log file. Concurrent Manager log files follow the same convention. start date and completion date Details of Concurrent Programs. including execution method.org 47 RMOUG Training Days 2009 .Parallel Concurrent Processing Failover and Load Balancing Swing If $APPLCSF is not set.rmoug. Details about the Concurrent Manager queues. a PO report would go under $PO_TOP/$APPLLOG and $PO_TOP/$APPLOUT All these directories must exist and have the correct permissions.

unless failover processes are restricted. the processes that were running on that node are restarted on Secondary Nodes (as defined by the System Administrator. www. each running Apache. Forms. we’ll only discuss Concurrent Processing Load Balancing. Release 12 introduces Failover Sensitive Workshifts. Applications System Administrators are able to enjoy the benefits of PCP failover while reducing the risk of performance issues through overloaded resources. half the Concurrent Requests and has 70% average CPU utilization. If too many processes are running on the Secondary Node when the Primary Node fails-over. and Concurrent Processing. the Secondary Node may not have the capacity to process the requests from additional Concurrent Managers. Key among these is its capability to provide failover in case of node failure. However. A Secondary Node may not be able to handle its normal workload and the additional burden of managers/processes from a failed node.rmoug. Concurrent Processing Load Balancing • • Load Balancing with both nodes running – no failover Load Balancing during failover Parallel Concurrent Processing has many benefits. Each node supports half the JSP and Forms users. Figure 45 Processing capabilities during failover may be severely degraded on the remaining hosts. A host may be considered underutilized if the CPU utilization is less than 70%. When a node fails. It is clearly not possible to process 140% of the workload on one of two remaining apps tiers.org 48 RMOUG Training Days 2009 .) This helps maintain throughput and keep the business running during node failures. With this added control.Parallel Concurrent Processing Failover and Load Balancing Swing Load Balancing Types of Load Balancing There are several types of load balancing: • Concurrent Processing Load Balancing • JSP-JDBC Load Balancing • JVM Load Balancing • Functionally Referenced Nodes For this paper. This enhancement allows the System Administrator to configure how many processes failover for each workshift. a resource intensive node (one with many processes) may inadvertently overtax the system when it fails-over. A typical production environment may have two application tiers. Release 11i has no mechanism for decreasing the number of processes a manager can run during a failover.

This is approaching the limit where queuing theory indicates minor increases in the number of running processes can cause major increases in wait times. It’s clear. in order to really run a Release 11i or Release 12 system.org 49 RMOUG Training Days 2009 . there are two choices: • Run the servers at 35% or less utilization • Reduce the number of processes that are allowed during failover For most businesses the second option is the most practical. Now each host has an average CPU utilization of 35%. the hosts have received hardware upgrades that allow them to process 100% more workload. www. during a failover. The combined average workload during failover is 70%.Parallel Concurrent Processing Failover and Load Balancing Swing Figure 46 EXAMPLE OF DECREASING THE NUMBER OF “FAILOVER PROCESSES” IN RELEASE 12 In order to compensate for further failovers.rmoug.

Parallel Concurrent Processing Failover and Load Balancing Swing Figure 47 Conversely. Figure 48 Figure 49 www. however. if a failover occurs from node 1 to node 2.rmoug.org 50 RMOUG Training Days 2009 . Only if the node fails does the “failover processes” take effect. this doesn’t work. we may want to reduce the failover processes.

SMP machine for the database server? For a more complete. By defining specialized managers it’s possible to direct concurrent requests to a specific concurrent processing node. please refer to Optimizing the E-Business Suite with Real Application Clusters (RAC) by Ahmed Alomari. serious discussion. GL reports are commonly run under a GL Manager. by defining the Primary/Secondary Node. This manager can have a Primary concurrent processing node that will use sqlnet to direct the database traffic to a related node in a RAC cluster. Specialization rules allow requests to be excluded from managers and included in the appropriate manager at the Application level. Related module requests should be directed to a specialized Concurrent Manager.rmoug. Why not just get a bigger. while Payroll requests typically run using a Payroll Manager. Quick note: It seems a little silly to go to all the trouble to create the RAC cluster and then figure out ways to direct traffic to a specific node. www. monolithic.org 51 RMOUG Training Days 2009 .Parallel Concurrent Processing Failover and Load Balancing Swing Application Affinity – How to Define Application Affinity Define a Concurrent Manager to handle requests for a specific module.

Concurrent Processing: Transaction Manager Setup and Configuration Requirement in an 11i RAC Environment R12 ATG .Parallel Concurrent Processing Failover/Failback Expectations 241370.org 52 RMOUG Training Days 2009 .1 .1.1 .Configuring Oracle Applications Release 11i with Oracle10g Release 2 Real Application Clusters and Automatic Storage Management Optimizing the E-Business Suite with Real Application Clusters (RAC) .Performance problems with Failover when TCP Network goes down 364171.Concurrent Manager Setup and Configuration Requirements in an 11i RAC Environment 602899.1 .1 .Ahmed Alomari 240818.1 .1 .1 .1 .TAF Session Hangs.1 .rmoug.Generic Service Management (GSM) in Oracle Applications 11i 271090.Concurrent Processing Functional Overview – Aaron Weisberg 210062.Parallel Concurrent Processing Failover and Load Balancing Swing References 249213.Some More Facts On How to Activate Parallel Concurrent Processing www. Select Fails To Complete W/ Loss Of NIC: Tune TCP Keepalive 211362.Process Monitor Session Cycle Repeats Too Frequently 291201.How To Remove a Dead Connection to the Target Database 362135.