Troubleshooting Guide: Replication Propagation [ID 1035874.

6] Modified 10-MAY-2010 PUBLISHED In this Document Type TROUBLESHOOTING Status

Purpose Scope & Application Last Review Date Instructions for the Reader Troubleshooting Details 1. Overview of how data moves between Replicated Site 2. Checking the replication propagation environment is configured correctly 2.1 Replication group status 2.2 Replicated object triggers, packages and status 2.3 Status of existing admin requests 3. Checking the automatic propagation mechanism is working 3.1 Checking for errors 3.2 Check job_queue_processes is set 3.3 Has a push / purge job been scheduled 3.4 Check the status of the propagation jobs 3.5 Check the propagator and their private database links 3.6 Check if the push job is currently running 3.7 Terminating a deferred queue push job that is currently running 4. Diagnosing the progress and status of replicated transactions 4.1 Verify transactions can be manually pushed 4.2 Verifying transactions are being propagated 4.3 Deferred transaction propagated but not applied to receiving site 4.3.1 Checking for deferred errors in the queue 4.3.2 Stop on Error 4.4 Verify the purge transaction operation 4.4.1 Verify transactions can be manually purged 4.4.2 Verifying transactions are being purged 4.5 Large deferred transactions and slow propagation 5. Diagnosing hanging propagation 5.1 Check for locks 5.2 Check the wait events 5.3 Check if a large error is being queued 5.4 Advanced Analysis 6. How to clear down large deferred queues 7. Example of how deferred transactions are propagated 7.1 Data is inserted into the table: 7.2 Interrogating the deferred queue: 7.3 Identify the next transaction to be pushed: 7.4 Manually push transactions to REP901 and interrogate the deferred queue 7.5 Identify unpurged transactions

Applies to:
Oracle Server - Enterprise Edition - Version: 7.2.2.0 to 10.2.0.1 - Release: 7.2.2 to 10.2 Oracle Server - Enterprise Edition - Version: 7.2.2.0 to 10.2.0.1 [Release: 7.2.2 to 10.2] Information in this document applies to any platform. Checked for relevance on 27-Nov-2007 Checked for relevance on 16-Mar-2010.

Purpose

The purpose of this article is to provide basic steps for troubleshooting advanced replication propagation and the underlying mechanism it uses; the deferred queue. Additional notes are referenced through out this article that address specific issues or provide additional information on a particular component used by Advanced Replication.

Scope & Application
To be used by Oracle support analysts and replication DBA's to understand and employ basic troubleshooting techniques for advanced replication propagation.

Last Review Date
March 16, 2010

Instructions for the Reader
A Troubleshooting Guide is provided to assist in debugging a specific issue. When possible, diagnostic tools are included in the document to assist in troubleshooting.

Troubleshooting Details
1. Overview of how data moves between Replicated Site
Oracle uses internalised triggers to capture and store any data changes made to tables that have been defined as replicated. It stores the data changes as remote procedure calls (RPC's) in a deferred transaction queue table (defcall) for propagation at a predefined time and interval. Propagation to remote master sites is automated using the database job queue mechanism. The job either applies the transaction to the remote master or if a transaction fails it is placed in the remote deferred transaction error queue for analysis at a later date.

2. Checking the replication propagation environment is configured correctly
A large majority of the problems encountered by replication users are due to incorrect setup and configuration of the replicated environment. Refer to section 1 of note:122039.1 and note:117434.1 before checking the following :

2.1 Replication group status
For propagation to be successful the status of the replication groups must be normal: select gname, status from dba_repgroup; GNAME STATUS ------------------------------ --------GROUP1 NORMAL

2.2 Replicated object triggers, packages and status
For the propagation and queuing of data changes to be successful to remote sites, replicated tables (objects) must display as valid at all replication sites. They must also have the associated internalised triggers and packages defined. Run the following query to check the replicated tables (objects), in releases prior to Oracle 8.1.x: select oname, status, generation_status from dba_repobject where status != 'VALID';

If the above query returns replication objects that do not have the associated internalised triggers or packages. If problems persist.Run the following query to check the replicated tables (objects).x/ v10.1 and note:122039. status.1. Use the following query to check for admin requests: column gname format a18 select gname.4 Check database links The correct configuration of database link is essential for replication propagation to operate correctly.x and above: select oname. object_type from dba_objects where status != 'VALID' and owner in ('SYS'. The following articles cover the links required by replication: note:50593. status. it may be necessary to re-generate replication support. request. For additional analysis of the replication administrative requests queue please refer to note:180014. If after regeneration the replication objects still show as invalid. run the following statement to ensure that all dependant SYS and SYSTEM objects are valid: column owner format a25 column object_name format a30 column object_type format a15 select owner. 'SYSTEM'). in Oracle 8. gname.0 note:117434.x Check the existence of the public and private links with the following query at each site involved to propagation: column owner format a15 . contact Oracle Support Services. 2.1 / v9. replication_trigger_exists. errnum from dba_repcatlog order by id.3 Status of existing admin requests Data cannot move between replicated sites if there are outstanding admin requests for the associated master group and there should not be any such requests if the group displays with status normal. 2. internal_package_exists from dba_repobject where status != 'VALID' or replication_trigger_exists != 'Y' or internal_package_exists != 'Y'. object_name.1 Initial steps required to create Multi Master and Snapshot Replication v8.1 Initial steps required to a create Multi Master Replication environment v8.1. generation_status.

errors will be written to the alertSID.----------------------------------------. 3. To check if the queue is growing run : select count(*) from deftrandest .column db_link format a45 column username format a15 select owner. they will not run as expected and their associated tasks will not be completed. Additional and more detailed information will go into the following files referred to by the alert.log. username from dba_db_links. If the queue appears to be growing to an unusual size.x sections to ensure that the automated jobs are not the cause.1 Checking for errors If a job fails while attempting to push or purge replication data changes. db_link.ora parameters that control the Oracle job queues must be set. Do not attempt to quiesce the replication system with SUSPEND_MASTER_ACTIVITY.TRC The format of these files may vary between operating systems. execute the following statement to check that they are correctly set at each replication site: SQL> show parameter job NAME TYPE VALUE . Administrators usually become aware that there may be a problem with the job queue mechanism when they discover that the replication deferred transaction queue is building up.V9 and above : SID_cjq0_nnnnn. as that will just try to push the queue first. Checking the automatic propagation mechanism is working Oracle Replication uses the job queue to automate the propagation and purging of deferred transactions. use the following 3. OWNER DB_LINK USERNAME --------------.log: . their location can be determined by running the following from SQLPLUS: SQL> show parameter dump_dest 3.2 Check job_queue_processes is set The init.TRC . 3. It is important that the correct links exist for the user who owns the job that performs the replication push job.Pre V9 : SID_snpx_nnnnn.TRC and SID_jnnn_nnnn. If the jobs are not configured properly.--------------PUBLIC DB2 REPADMIN DB2 REPADMIN Test each of the links and ensure that the global name matches the link name with the following query: connect as database link owner select * from global_name@.

push(d estination=>'DB3.1. SUBSTR(TO_CHAR(NEXT_DATE.dbms_defer_sys.------. 45 declare rc binary_integer. parallelism=>2). stop_on_error=>FALSE.dbms_defer_sys.1. 44 declare rc binary_integer.'MM/DD/YY HH24:MI:SS').WORLD'. begin rc := sys. end. delay_seconds=>0. JOB WHAT -------.---------------------------------------------------------------43 declare rc binary_integer.-------job_queue_interval integer 30 job_queue_processes integer 4 <= must be > 0 job_queue_processes : A job queue process executes a single job at a time and this parameter determines the maximum concurrent number of these.3 Has a push / purge job been scheduled For transactions to be propagated to remote master sites automatically. Its default value is 60 and if jobs need to execute more frequently reduce this value. it specifies in seconds how frequently each SNPn background process of the instance wakes up and checks if there is a job to run. there needs to be a job in the job queue to push them. begin rc := sys. dblink. end. what from dba_jobs where upper(what) like '%DBMS_DEFER_SYS%'.'MM/DD/YY HH24:MI:SS').dbms_defer_sys. check the jobs schedule with: column dblink format a24 column last_date format a9 column next_date format a9 column interval format a22 select job push_job.20) NEXT_DATE.purge( delay_seconds=>0). interval from defschedule where job in (select job from dba_jobs .-------------------------------. There should be one push job per remote master site.WORLD'. check the job exists with: column what format a64 select job. delay_seconds=>0. SUBSTR(TO_CHAR(LAST_DATE.push(d estination=>'DB2. stop_on_error=>FALSE. In most replication environments configure this to be: = number of replicated destinations (or connection qualifier destinations) + 1 (for administrative requests) + enough to service other concurrent non replication jobs if they exist job_queue_interval : This parameter only applies to versions prior to Oracle9i. 3. end.20) LAST_DATE. begin rc := sys. If the push job exists it is important to know when it last ran and when it is next scheduled to run. parallelism=>2).

--------. PURGE_JOB LAST_DATE NEXT_DATE INTERVAL ---------. broken. 2 minutes.where upper(what) like '%DBMS_DEFER_SYS. the third after four minutes.20) NEXT_DATE. to check this see section 3. . If the job fails 16 times it is marked as broken. PUSH_JOB DBLINK LAST_DATE NEXT_DATE INTERVAL ---------.1 if the next date of the job needs to be altered.--------------------44 DB2.---------------------------------------43 05/15/02 05/15/02 /*1:Hr*/ sysdate + 1/24 14:03:16 15:03:16 3. Please refer to DBMS_JOB. Oracle uses a lazy algorithm to purge deferred transactions from the local queue. to check this see section 3. failures from dba_jobs where upper(what) like '%DBMS_DEFER_SYS%'. JOB BROKEN FAILURES .. The second is that the job is still running.--------.'MM/DD/YY HH24:MI:SS').WORLD 05/15/02 05/15/02 sysdate + 10/(60*24) 14:27:22 14:37:22 45 DB3. so if the interval is 10 minutes and the job took 9 minutes to run the next_date will be 19 minutes from the time the job started.'MM/DD/YY HH24:MI:SS'). Oracle will try it again.20) LAST_DATE. 2 minutes. If there is a large number of transactions to be purged it can affect the performance of propagation.--------------------.4. It is important to note that the next_date is set when the job completes. etc.PUSH%'). SUBSTR(TO_CHAR(LAST_DATE.1.1. the second attempt after two minutes. The first attempt is made after one minute. The first is the job has failed..--------. use the following to check how many failures there have been and if the job has been marked as broken: column broken format a6 select job. It is important that the purge job runs regularly to clear down this queue because the same underlying table is used for transactions waiting to go to remote sites as for those which have been pushed but not purged.4 Check the status of the propagation jobs If a job returns an error during execution. check the scheduled purge job exists with: column last_date format a9 column next_date format a9 column interval format a40 select job purge_job. interval from defschedule where job in (select job from dba_jobs where upper(what) like '%DBMS_DEFER_SYS. its 1 minute. so if the interval is two minutes.PURGE%'). This happens up to the original interval.CHANGE in note:61730.WORLD 05/15/02 05/15/02 sysdate + 1/1440 14:30:33 14:31:33 There may be two reasons that the next_date has been passed. SUBSTR(TO_CHAR(NEXT_DATE.--------.6.

USERNAME USERID STATUS CREATED -----------------------------. but with different database links (see note:1024982.4 for details of the required database links and how to check they are working correctly. Please note that the job may show failures = 0 and broken = Y is the job was manually broken with DBMS_JOB. The following query identifies push jobs that are currently running: .PUSH%').6 Check if the push job is currently running Oracle replication only allows a single push operation to run to a master site at a time.---------. links and push jobs all match up: select * from defpropagator.BROKEN (see note:1018453.------------------------43 DB2.102).WORLD column pushed_site_by_propagator format a25 select job.---------.WORLD DB3.--------REPADMIN 31 VALID 18-APR-02 select db_link from dba_db_links where owner = (select username from defpropagator).6). JOB PUSHED_SITE_BY_PROPAGATOR ---------. DB_LINK ---------------------------------DB2.WORLD 44 DB3. 3. It may also be difficult to know if data is moving between replicated sites by using the deftrandest table because new transactions will be added all the time and for transactions with many calls they may take some time to process.---------26 N 0 25 N 3 If the job is showing failures follow the advice in section 3. once the underlying problem has been resolved unbreak the job with DBMS_JOB.5 Check the propagator and their private database links The owner of the job that performs the push must be the replication propagator and that user must have a private database link to the site where the job is pushing data to. If connection qualifiers are being used then there can be multiple push jobs to the same master site. See section 2.1. although multiple push operations can occur concurrently but they have to be to different remote master sites.WORLD 3. Use the following SQL to check the propagator.-----.------. dblink pushed_site_by_propagator from defschedule where job in (select job from dba_jobs where log_user in (select username from defpropagator) and upper(what) like '%DBMS_DEFER_SYS.BROKEN.

PUSH%') and j.1. The process will generally be named SNPx or Jxxx.---------. v$process and v$bgprocess.Break the job with : execute dbms_job. dba_jobs_running j where j. 3. SUBSTR(TO_CHAR(J.broken(. run manually from a users session). j.sid.-----------------------------9 164 REPADMIN It is important to note that the above queries will not identify manual push operations that have been initiated from a users session. please follow the steps defined in section 3. JOB SID DBLINK START_DATE ---------.sid = s. Use the following query to identify Job Queue (JQ) lock: select s.job = d.username from v$lock l.THIS_DATE. . to ensure the job is removed from dba_jobs_running.-----------------------------.sid. until the current problem that is being encountered is resolved. . restart the jobs with: . After evaluating the above and section 4 it may be necessary to terminate the push process.dblink. To do this use the sid from section 3.job. v$session s where l.7 to do this. see section 4.20) START_DATE from defschedule d.Kill the Job Queue Process from the Operating System. d.type = 'JQ' and l.column dblink format a30 select /*+ ORDERED */ j.6 to identify the process in v$session. Killing the background process from the operating system will release the Job Queue lock and the User Lock used to protect the push operation.After killing the process.7 Terminating a deferred queue push job that is currently running There will be situations where the running push job needs to be terminated and prevented from running again. there have been conditions observed where the lock is still held by the Job Queue Process due to network failures.------------------44 9 DB2. 5 and 6.sid.serial#. wait approximately 1 minute.job.1 for more details of identifying these.---------. true).'MM/DD/RRRR HH24:MI:SS'). Once the underlying problem has been resolved by working through sections 4. SID SERIAL# USERNAME ---------.e. Perform the following steps: . s.WORLD 05/16/2002 12:14:47 When a job runs it create a Job Queue Lock to protect it from being run more than once (i. s.job in (select job from dba_jobs where upper(what) like '%DBMS_DEFER_SYS.

dbms_defer_sys.Another users session is performing the same push operation. Prior to following these steps ensure the job queue is correctly configured. which may manifest itself as a hang.6 and 3.x onwards transactions (and calls) are no longer immediately deleted after application at the remote site. If neither of the above are the cause. false). begin x := sys. Diagnosing the progress and status of replicated transactions The following section provides database administrators with a step by step guide to diagnosing the progress of deferred transactions.x sections and if they do not identify the cause use section 5 to diagnose the problem.. the replication routines deleted all transactions from the deferred transaction queue once they had been pushed. . If the manual push completes without errors but the entries in deftrandest remain unchanged. when a manual push starts it allocates a User Lock to ensure there is only one push at a time. see section 3 above. 4.execute dbms_job.0. Where as from Oracle 8. generally rows will not be pushed.x sections and if the cause cannot be identified.6). raise a call with Oracle Support Services.x. If the manual push hangs check through the following 4.1 Verify transactions can be manually pushed If it has been established that the job queue is configured correctly. but still there are problems with propagation. Remember to make sure that the job that would normally perform the push is not currently running and has been prevented from running when the manual push is being tested (see sections 3. If this is not the case the manual push will normally return immediately without pushing any rows. use note:1059290. the first step should be to analyse the problem by trying a manual push of the deferred transaction queue. Log on as the propagation user and run the following: declare x integer..2 Verifying transactions are being propagated In Oracle release 7..BUG:734902 (fixed in 8. the following could be the cause: . use My Oracle Support to search for known problems that relate to these errors. There are two different views of the deferred transaction queue: . end. 4.6 to identify the blocking session. / If you get errors during the manual push.broken(.push(''.). check through the following 4. 4. they are instead purged from the local queue on a regular basis (interval).7).

13. use the following query to identify if a transaction is being pushed and how many rows have been pushed: column xid format a12 column dblink format a30 select /*+ ORDERED */ p.def$_destination.4 for additional information. Transactions appear once per master site they have to be pushed to.11. . DEFERRED_TRAN_ID DELIVERY_ORDER D START_TIM -----------------------------.-----------------------------1.callno) + 1) "Calls in Tran"..DEFTRAN : contains all unpurged transactions .5. To view all unpurged transactions: select * from deftran order by delivery_order. .deferred_tran_id = p.xid.-------------.sequence/(MAX(c.callno) + 1)) * 100 "% Processed Calls" from v$replprop p.--------9.Transactions that are currently being pushed appear in the target sites DEF$_ORIGIN table.559 280654 R 16-MAY-02 3.last_delivered is greater than system.-------------. To view all transactions yet to be propagated: column deferred_tran_id format a16 column dblink format a30 select * from deftrandest order by delivery_order. In Oracle9 and above.5.def$_aqcall. Prior to Oracle9 the DEFTRANDEST view was the only way to identify how propagation was progressing.. DEFERRED_TRAN_ID DELIVERY_ORDER DBLINK ---------------. (MAX(c.dblink.xid group by p.DEFTRANDEST : contains all transaction that have not yet been pushed to a remote master site.1 and 4.cscn.WORLD The transaction with the lowest delivery order will be the next transaction to be pushed to the remote replication site.WORLD 3.549 281112 R 16-MAY-02 ** Note the deftran view also includes transactions from the deferred error queue. p. defcall c where p.sequence. From Oracle8 onwards use the following to investigate the current push: . p. p.11. see sections 4.dblink.name like '%Slave%' and c.549 281112 DB2.904 280185 R 16-MAY-02 1. Oracle9 includes a mechanism for identifying how far through the current transaction the current push is. (p.xid.3.559 280654 DB2.A transaction has been pushed if system.

.----------------10. In Oracle9 and above running the following query should assist database administrators is monitoring the overall activity in the deferred transaction queue: select * from v$replqueue.--------. it may be better to run the following query: column xid format a12 column dblink format a30 select p.1 Replication Parallel Propagation 4.3 Deferred transaction propagated but not applied to receiving site If deferred transactions are leaving the local master site (being removed from the DEFTRANDEST view) and the data changes do not appear to be applied at the remote master site.581 DB2. if they do not help section 5 contains a more detailed analysis method.Data conflicts.WORLD 7000 15.556 DB2.dblink. refer to:note:76447.sequence "Processed Calls" from v$replprop p where p.----------.WORLD 9999 32.-----------------------------.Database space management or rollback problems. The most likely cause is they are failing at the remote site. where another user has updated a row that is being pushed to this replication site.xid.153031 The current implementation of v$replprop only applies to transactions that are pushed using parallel propagation. p. When the replication propagation mechanism encounters an error in applying a transaction to a remote master site.------------.6. On systems that are CPU bound pushing the deferred queue. however Oracle recommends all customers use parallel propagation. it rolls back the transaction and writes the whole transaction (all rows/calls) to the deferred error queue at the site to which the transactions was being pushed. p. the most common reasons for this are: .name like '%Slave%'.446489 6.x sections of this document. check the following 4. For further information on replication parallel propagation. The whole transaction is copied . TXNS_ENQUEUED CALLS_ENQUEUED TXNS_PURGED LAST_ENQU LAST_PURG ------------. If the current transaction being pushed appears to be hung or running very slowly.XID DBLINK Calls in Tran % Processed Calls -----------.-------------.--------6543 21299 300 17-MAY-02 17-MAY-02 See Section 7 for an example of a transaction being propagated to a remote site.20.

------5. there is a possibility that data at the remote master site could be logically inconsistent. because the earlier transaction has been overtaken.153 2 DB3. This makes it much easier to resolve divergent data. TO_CHAR(start_time. Deferred Oracle Transaction Call Origin Destination Date Of Error ID Number Database Database Error Number ----------. 'MM/DD/YY HH24:MI:SS') TIME_OF_ERROR. Use the following query to identify what stop_on_error has been set to: column what format a64 .WORLD 05/16/02 1403 20:03:07 8.---------------.3. Use note:2065172. then the first transaction to fail will be written to the error queue and subsequent transactions will not be pushed.11.even if only one row causes the problem.---------------. error_number from deferror order by start_time. 4.--------.If one transaction fails and is written to the error queue.102 to discover details of the call that is in error. callno. search MetaLink for ORA-.WORLD 05/16/02 1403 20:08:01 In general to resolve the errors that the above query returns.152 0 DB3.2 Stop on Error The majority of customers configure their push jobs with stop_on_error = false. then the following transactions succeed. 4. the consequence of this is: . destination. If a customer has configured stop_on_error = true.WORLD DB4. origin_tran_db.10. this data will have to be manually resynchronised and Oracle would recommend customers implement a conflict resolution mechanism.--------.WORLD DB4. If the error is an ORA-1403 as in the above example data between the replication sites tables has diverged.1 Checking for deferred errors in the queue To obtain information of a transaction in error query deferror with: column deferred_tran_id heading 'Deferred|Transaction|ID' format a11 column callno heading 'Call|Number' format 99999999 column origin_tran_db heading 'Origin |Database' format a16 column destination heading 'Destination|Database' format a16 column time_of_error heading 'Date Of|Error' format a9 column error_number heading 'Oracle|Error|Number' format 999999 select deferred_tran_id.3.

.dbms_defer_sys. it shows the number of transactions that have been purged since the database was last started: select txns_purged from v$replqueue. parallelism=>2). 4. end. Oldest Unpurged --------------781633 In Oracle9 and above it may be better to run the following query particularly if the deferred queue is very large. delay_seconds=>0. end.2 Verifying transactions are being purged The easiest way to check if the queue of deferred transactions to be purged is decreasing is to use the following query.2 to check that transactions are being correctly purged. such that the queue to purge does not grow too large. If the purge job is failing or the number of transactions to be purged does not appear to be decreasing follow the steps in this section. so log on as a replication administrator user (normally repadmin) and run the following: declare x integer.push(d estination=>'DB2.3 to check that the purge job is correctly scheduled and section 3. stop_on_error=>FALSE. JOB WHAT ---------.dbms_defer_sys. If the manual purge returns without error follow the steps described in section 4.---------------------------------------------------------------44 declare rc binary_integer.4 to identify if the job is failing.select job. The purge job should be owned (log_user) by a replication administrator.4 Verify the purge transaction operation As stated in section 4. 4. If the queue to purge gets too large it will start to impact the performance of propagation. which eliminates deferror entries in the local deferred queue: select min(delivery_order) "Oldest Unpurged" from deftran where destination_list != 'D'.4.2 deferred transactions are not immediately purged from the local replication site after they are propagated. begin x := sys.4.1 Verify transactions can be manually purged Use section 3. check MetaLink for likely causes.WORLD'. / If the manual purge raises errors. It is very important that the purge happens at a regular interval. what from dba_jobs where upper(what) like '%STOP_ON_ERROR%'.4. begin rc := sys. 4. address the errors and run the purge again.purge(delay_seconds=>0).

views will prove difficult because with enormous queues the views are slow. In this case.def$_aqcall. the transactions will remain in def$_aqcall until the low water mark rises above the cscn for the transaction.Lazy purge (the default).Queues with one or two transactions with tens or hundreds of thousands of calls.TXNS_PURGED ----------356 There are two types of purge that Oracle can perform: . A precise purge will purge transactions with a cscn lower than the low water mark for propagated transactions to it's specific destination. The following queries should help the database administrator to make a decision about what to do with the large deferred queue.Precise purge. begin x := sys.Large deferred transaction or error queues. usually caused by CPU or network bottlenecks. To perform the precise purge execute: declare x integer. end. There are two types of large queue: .. .purge(purge_method=>0).5 Large deferred transactions and slow propagation The two main reasons for slow propagation of the deferred queue are: .Queues with tens or hundreds of thousands of transactions. ... For the replication propagation mechanism to achieve maximum throughput the deferred queue needs to be kept as small as possible and transactions need to be propagated at regular intervals.cscn lower than the local low water mark for propagated transactions (this is calculated based on the minimum last_delivered in the local system. This low water mark can be lower than some cscn numbers of some previously pushed transactions. . This means that the purge will query the last_delivered for each dblink destination. / 4. usually cause by bulk update operations (DML) or SQL loads. usually caused by a failure in the propagation job due to network outage or space management issues at the remote site.dbms_defer_sys. This can happen if not all the push jobs have run and still have active transactions.Limited systems resources. All transactions that have been pushed from the local site to that destination will usually fall below the low water mark and be purged.def$_destination). so they will not be purged immediately. The lazy purge will purge transactions with a system. please note on some systems it may not be practical to run these queries. In the majority of cases the queue will have already become large before the database administrator becomes aware of the problem and running queries against the DEF. .

21.xid.---------1.deferred_tran_id and d.13. check how many rows have been propagated from the current transaction so far: column xid format a12 column dblink format a30 select p.x and above.dblink.725 112 10.704 4999 1. check the overall number of transactions and rows that have been queued since the instance was last started: select * from v$replqueue.Check how many rows are in the current or next transaction to be pushed: select d.sequence "Processed Calls" from v$replprop p where p.704 4999 Check how many rows exist in transactions to come (ordered by rows): select deferred_tran_id. p.-----------------------------. count(*) calls from defcall c. XID DBLINK Processed Calls -----------.WORLD 4999 4.-------------.WORLD 636 Oracle 9.----------.--------------9.0. DEFERRED_TRAN_ID CALLS -----------------------------.7.0.1016 DB2.628 DB2.10.--------.691 3430 1.name like '%Slave%'.0.x and above. deftrandest d where d. count(callno) calls from defcall group by deferred_tran_id order by calls.--------72438 2179964 2400 22-MAY-02 . TXNS_ENQUEUED CALLS_ENQUEUED TXNS_PURGED LAST_ENQU LAST_PURG ------------.---------1.deferred_tran_id. p. DEFERRED_TRAN_ID CALLS -----------------------------.delivery_order = (select min(delivery_order) from deftrandest) group by d.deferred_tran_id.669 102 Oracle 9.deferred_tran_id = c.

but for planned batch operations consider using procedural replication. propagation will appear to be hung.After making the above assessment it may be the case that the only real option is to clear down the deferred queue and manually resynchronise the data. as soon as a failure occurs the database administrator will then be alerted and can address the problem before the queues build up. If the ROWNO / OBJECT_NAME column does not change the blocking users session will .waiting_session = s. v$session s.row_wait_obj# = o. . To avoid large deferred queues building up in the future: . See Section 6 for details of how to perform this operation.Ensure conflict resolution handlers are defined for tables that receive large updates. .------------------------. 5.-------------13 SCOTT 32 T2 REPDBA For this query to be executed successfully. This operation can take considerably longer than the original transaction.object_id.1 Check for locks If a row being pushed to a remote master is locked by a users session for an extended period of time. o. 5.holding_session = h. by handling the conflicts we avoid the overhead of rolling back the transaction and re-pushing it into the remote error queue.----------.holding_session holder.row_wait_row# rowno.username holder_name.sid and s.There is no easy way to avoid large transactions that are generated by mistake or adhoc user access. v$session h where w. h. HOLDER HOLDER_NAME ROWNO OBJECT_NAME OWNER -----. Use the following select statement to identify sessions blocking propagation at the site where propagation is pushing data too: column holder format 99999 column holder_name format a16 column rowno format 9999999999 column object_name format a25 column owner format a16 select w. o.---------------.sid and s. perform the diagnostic steps described in the section.Monitor the push and purge jobs with Enterprise Manager events.object_name. replace 'REPADMIN' with the user that the pushing site's replication propagator pushes to (receiver user) and make sure CATBLOCK. dba_objects o. s.username = 'REPADMIN' and w. Diagnosing hanging propagation If after completing the analysis described in section 4. rows do not appear to be moving between replication sites.SQL has been run.owner from dba_waiters w.

sid and sw.job and jr. replace REPADMIN with the propagation user: column event format a44 select sw.sid in (select qs.SQL contain additional information about identifying locks.wait_time = 0 and s. sw.2 Check the wait events If propagation is not waiting for a lock then the next step in diagnosing the hang is to find out what the pushing and receiving sessions are waiting for. sw. sw.sid in ( select qs.what) like '%DBMS_DEFER_SYS. dba_jobs j.sid = s.event. The easiest way to do this is to use the V$SESSION_WAIT view. Under some circumstances more than one row may be returned because each query slave used by parallel propagation opens a separate session at the remote site.username = 'REPADMIN'.sid = qs.p1. 5.p1.p2 from v$session_wait sw where sw. note:76447.p2 from v$session_wait sw. Run the following query at the pushing site if propagation is being executed by a users session.qcsid and upper(j. sw.p2 from v$session_wait sw.sid. sw.job = j.PUSH%') and sw. note each parallel propagation slave will appear as a separate session: column event format a44 select sw. sw.1 contains additional information about Parallel Propagation note:62354. v$px_session qs where jr. Run the following query at the pushing site if propagation is being executed from DBMS_JOB: column event format a44 select sw. customers should note that Job Queue processes (DBMS_JOB) are not always recorded in this view.sid from dba_jobs_running jr. v$session s .sid. v$session s where sw.p1.1 and UTLLOCKT. sw.sid from v$px_session qs) and sw.event. sw.event.sid. Run the following query at the site where data is being pushed to and replace 'REPADMIN' with the user that the propagation user pushes to (receiver user).wait_time = 0. sw.have to be killed to allow propagation to continue.

Replace 'REPADMIN' with the user that the pushing sites replication propagator pushes to (receiver user). Use the query defined in section 5.sid = s..sql_address = a.4 Advanced Analysis If after following all other sections in this article the push still seems to be hung or stuck.username = 'REPADMIN' and sw....username = 'REPADMIN'. The second step is to collect an errorstack from all sessions that are applying changes at the remote site.sql_hash_value = a. however it will not assist in identifying if there is a single transaction with thousands of rows being written to the error queue.2 to identify these. this must include all query slaves used by the push.sid and s. as sysdba oradebug setospid . If the sessions appear to be stuck on the same event for a long period of time. failed transactions often take four or more times as long to write to the error queue as they did to create.hash_value and a.3 can be used to identify when small transactions are being written to the error queue. It may be necessary to raise a call with Oracle Support Services to assist in collecting the required information. The first step is to collect an errorstack from the pushing session. Section 4. 5. Collect the errorstack by running the following from SQLPLUS for each session: connect .2 to identify these..3 Check if a large error is being queued Writing failed replicated transactions to the remote deferred transaction error queue is one of the longest operations replication can perform. Collect the required trace by running the following from SQLPLUS for each session: connect . v$session s where s.where sw. as sysdba oradebug setospid oradebug unlimit oradebug dump errorstack 3 As hung or stuck push operations may in fact be spinning or looping operations.address and s. consult note:61998.1 or raise a call with Oracle Support Services.sql_text like '%DEF$_AQERROR%' and s.wait_time = 0. it may also be necessary to collect in depth sql_trace of all sessions listed above.. The following query can be used to identify if a large error is being queued at the remote master site: select count(*) from v$sqlarea a. Use the query defined in section 5. 5. collect the information described in this section and supply it to Oracle Support Services.

oradebug unlimit oradebug dump 10046 12 If after analysis the sessions need to be killed refer to section 3.Terminate the current push operation and prevent it from re-running.'MARKETING'.5 for additional information. DEFERRED_TRAN_ID DELIVERY_ORDER D START_TIM .1 Data is inserted into the table: insert into scott. if the push is running from dbms_job see section 3.dept values (90.WORLD to REP9I. 7. make sure they are up to date. Once the decision has been made to clear down the queue and resynchronise the data. .2 Interrogating the deferred queue: After applying DML to replicated tables. by analyzing the schema with compute statistics. commit. Example of how deferred transactions are propagated The following is an example of a transaction being replicated from ORA9I. The basic steps to clear down the queue are listed below.def$_aqcall table. to remove transactions from the deferred queue.1 How to Clear Down the Deferred Queue and DBMS_DEFER_SYS. .dept values (80.From SQLPLUS : execute dbms_defer_sys. 7. .'ORLANDO'). null).DO NOT attempt to suspend or quiesce the replicated environment. follow : note:190885. commit. See section 4. . modifications are logged in the system. slow or hanging refer to note:190885. insert into scott.WORLD. 6.7.'MARKETING'.7.'LONDON'). that will try to push the queue again and introduce admin requests that also need to be cleaned out. In general Oracle does not recommend adding statistics to the system schema tables.DELETE_TRAN Oracle normally recommends customers use dbms_defer_sys.Stop remote sites from replicating to the local site. These changes can be seen through the deftran and deftrandest views: select * from deftran.delete_tran with all arguments set to null. 7.delete_tran(null. How to clear down large deferred queues Database administrators frequently have to make the decision to terminate propagation and manually resynchronise their replicated environments when the deferred queue has become large and or slow. However when the queue is very large this may not be the most efficient mechanism for clearing the queue.1: . if the queue is large.If the system schema has optimiser statistics defined.

DEF$_AQCALL.-----------------------------4.WORLD 478878 At this point.-----------------------------.495 479001 REP9I..495 has not yet been pushed because its DELIVERY_ORDER OR CSCN of 479001 is greater than the current LAST_DELIVERED value for all transactions going to REP9I. DEFERRED_TRAN_ID OR ENQ_TID 4.LAST_DELIVERED means the transaction has NOT been pushed.494 478972 R 22-JAN-02 column deferred_tran_id format a10 column dblink format a30 select * from deftrandest order by delivery_order.--------4.WORLD which is 478992. DBLINK LAST_DELIVERED -----------------------------.-------------REP9I. cscn from system.push(destination=>'REP9I.DEF$_DESTINATION: select dblink.def$_aqcall where cscn is not null order by cscn.49.49.-------------.WORLD 478992 ORA9I.4 Manually push transactions to REP901 and interrogate the deferred queue Push the queue: . Check the views: select * from deftrandest.3 Identify the next transaction to be pushed: Find out what the current LAST_DELIVERED value is for SYSTEM. no rows selected .WORLD 7. DEFERRED_T DELIVERY_ORDER DBLINK ---------.WORLD').-------------.. last_delivered from system. --OR-select enq_tid..def$_destination.49. 7..CSCN > DEF$_DESTINATION. dbms_defer_sys.495 479001 R 22-JAN-02 3..0.

def$_destination where dblink ='[dblink]'). COUNT(*) ---------2 Related Products • • Oracle Database Products > Oracle Database > Oracle Database > Oracle Server Enterprise Edition Oracle Database Products > Oracle Database > Oracle Database > Oracle Server - . select dblink.--------4.495 479001 R 22-JAN-02 3. 7. ENQ_TID CSCN -----------------------------.0.494 478972 R 22-JAN-02 select enq_tid. which is all of the transactions currently in deftran.WORLD 479017 ORA9I.-------------REP9I.def$_destination.. cscn from system. the unpurged queue view deftran shows all transactions and the deftrandest view is emply because all transactions have been pushed.def$_aqcall.LAST_DELIVERED have been pushed. last_delivered from system.49.0. DBLINK LAST_DELIVERED ---------------------------.def$_aqcall where cscn < (select last_delivered from system.49.select * from deftran.494 478972 As expected.495 479001 3. DEFERRED_TRAN_ID DELIVERY_ORDER D START_TIM -----------------------------.-------------.---------4.WORLD 478878 The LAST_DELIVERED column has increased and all transactions with DEF$_AQCALL.5 Identify unpurged transactions The following query identifies the number of transactions that have been pushed but not yet purged: select count(*) from system.CSCN < DEF$_DESTINATION.

Enterprise Edition Keywords .

Sign up to vote on this title
UsefulNot useful