You are on page 1of 13

Golden Gate Performance Troubleshooting-

Part 1
What is a Lag?
Latency or lag is the period of time that has passed between when a change (DML or DDL) occurred in the
source database and the current point in time.

Extract latency is
the time that has elapsed since the change
occurred to the source table and the time it was extracted and written to the trail file
Replicat latency is
the time that has elapsed from the source table change to the time it was applied to the target database

Time Since Checkpoint and Lag at Checkpoint

→ Time Since Checkpoint


Each process has it's own checkpoint file.
Whenever a process see's a commit in the transaction, a checkpoint is made in the checkpoint file.

You can see the increase in the Time Since Checkpoint, when extract/pump/replict was stopped or when extract or replicat is processing a
long running transaction or if its blocked by other sessions

→ Lag at Checkpoint
For Extract, lag is the difference, in seconds, between the time that a record was processed by Extract (based on the system clock) and the
timestamp of that record in the data source
.
For Replicat, lag is the difference, in seconds, between the time that the last record was processed by Replicat (based on the system clock)
and the timestamp of the record in the trail.
But why is there a Lag? Lets investigate...

GGSCI (test) > info all


Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING EXT01 00:00:06 00:00:08
EXTRACT RUNNING EXT02 00:00:05 00:00:03
EXTRACT RUNNING PXT01 00:00:06 00:00:08
EXTRACT RUNNING PXT02 00:00:06 00:00:08
EXTRACT RUNNING REP03 02:00:07 00:10:07

For Replicat,

→ Check the number of trail left to process.


If there is spike in trail means there was a huge data load in source end.
Do info replicatname (check RBA)

→ Time Since Chkpt is increasing:


it can be due to replicat session being blocked or Long Running transaction.
Send replicatename status (check RBA)
Investigation Cont...

Lets Identify what the replicat is performing in the Database

Select inst_id, sid,serial#,username,sql_id,SQL_CHILD_NUMBER CH#,status,


logon_time,last_call_et,module,event from gv$session where module like '%PROCESS_NAME%' order by
module

Select inst_id, sid,serial#,username,sql_id,SQL_CHILD_NUMBER CH#,status,


logon_time,last_call_et,module,event from gv$session where username=<GGUSER> and
process=PROCESSID order by module

PROCESSID= the process you get in the info Replicate command


Info all output

To check the sql_id status in the buffer cache


select inst_id INST#, sql_id,object_status status, parsing_schema_name,plan_hash_value phv,child_number ch#,
sql_profile,sql_plan_baseline, executions,
ROUND( ( elapsed_time / decode(executions,0,1,executions)) / 1000000,4) per_elapsed_sec,
ROUND ( (buffer_gets / decode(executions,0,1,executions)) , 0) per_buffer_gets,
ROUND ( ( rows_processed / decode(executions,0,1,executions)) , 0 ) per_rows_processed
from gv$sql where sql_id= <SQLID>
If query is using bad plan, we can check from the previous sql_id executions
We can try to baseline a good plan.
Below steps are to be followed:
1. Stop the replicate
2. Flush the bad plan
3. Baseline the good plan
4. Start the replicate

We can also try to tune the SQL_ID.

SQL Tunning Advisory :


@?/rdbms/admin/sqltrpt
Few recommendation it can give :
→ Sql Profile
→ Index recommendation
→ Gathering Fresh stats
→ In the replicat report file, we can identify the how many rows replicate is processing.
REPORTCOUNT EVERY 1000 RECORDS, RATE
REPORTCOUNT EVERY 15 MINUTES, RATE

→ Replicat performed badly due to convertion of varchar to Nvarchar in target.


Used SOURCECHARSET PASSTHRU.

→ Split replicate into multiple to divide the workload

→ I/O performance of Filesystem for Trail Files

→ Swap Space/memory usage alert: CACHEMGRCACHESIZE 8GB default 64GB


→ Replciate abended with no error in report or ggserr log:

This can be due to software failure.


Core dump issue

Many a times from w stack trace of core dump file, we get segmentation fault
gghome> ls -lart | grep -i core
Eg:
gdb ./ggsci core.134021

→ BATCHSQL

Batches similar SQL statements into array.

When to Use BATCHSQL

When Replicat is in BATCHSQL mode, smaller row changes will show a higher gain in performance than larger row changes.

Usage Restrictions:

SQL statements that are treated as exceptions include:


Statements that contain LOB or LONG data.
Statements that contain rows longer than 25k in length.
Statements where the target table has one or more unique keys besides the primary key. Such statements cannot be processed in
batches because BATCHSQL does not guarantee the correct ordering for non-primary keys if their values could change.
(SQL Server) Statements where the target table has a trigger.
Statements that cause errors.
→ Replciate abended with no error in report or ggserr log:

This can be due to software failure.


Core dump issue

Many a times from w stack trace of core dump file, we get segmentation fault
gghome> ls -lart | grep -i core
Eg:
gdb ./ggsci core.134021

→ BATCHSQL

Batches similar SQL statements into array.

When to Use BATCHSQL

When Replicat is in BATCHSQL mode, smaller row changes will show a higher gain in performance than larger row changes.

Usage Restrictions:

SQL statements that are treated as exceptions include:


Statements that contain LOB or LONG data.
Statements that contain rows longer than 25k in length.
Statements where the target table has one or more unique keys besides the primary key. Such statements cannot be processed in
batches because BATCHSQL does not guarantee the correct ordering for non-primary keys if their values could change.
(SQL Server) Statements where the target table has a trigger.
Statements that cause errors.
2013-01-06 11:12:29 WARNING OGG-01137 Oracle GoldenGate Delivery for Oracle, rrp71.prm: BATCHSQL suspended, continuing in normal mode.
2013-01-06 11:12:29 WARNING OGG-01003 Oracle GoldenGate Delivery for Oracle, rrp71.prm: Repositioning to rba 121688 in seqno 0.
2013-01-06 11:12:29 WARNING OGG-00869 Oracle GoldenGate Delivery for Oracle, rrp73.prm: Aborting BATCHSQL transaction. Database error 1 (ORA-00001: unique
constraint (MYSCHEMA.TABLEB) violated).
2013-01-06 11:12:29 WARNING OGG-01137 Oracle GoldenGate Delivery for Oracle, rrp73.prm: BATCHSQL suspended, continuing in normal mode.
2013-01-06 11:12:29 WARNING OGG-01003 Oracle GoldenGate Delivery for Oracle, rrp73.prm: Repositioning to rba 120891 in seqno 0.
2013-01-06 11:12:29 WARNING OGG-00869 Oracle GoldenGate Delivery for Oracle, rrp62.prm: Aborting BATCHSQL transaction. Database error 1 (ORA-00001: unique
constraint (MYSCHEMA.TABLEC) violated).

When enabled, BATCHSQL sorts the DMLs to attempt the transaction. This can sometimes fail as the DMLs are sorted and replicat will
abort the transaction and reattempt this transaction with BATCHSQL disabled. Then it will enable this again for the next
transaction.

BATCHSQL statistics:

Batch operations: 55782064


Batches: 47026
Batches executed: 47026
Queues: 47027
Batches in error: 1682
Normal mode operations: 0
Immediate flush operations: 0
PK collisions: 0
UK collisions: 0
FK collisions: 0
Thread batch groups: 0
Commits: 10045
Rollbacks: 1682
Queue flush calls: 1695

Ops per batch: 1186.20


Ops per batch executed: 1186.20
Ops per queue: 1186.17
Parallel batch rate: N/A

The stats to look for


Commits: 10045
Rollbacks: 1682
Grouptransops:
It controls the number of records that are grouped into a single REPLICAT transaction.

25 operations for transaction 1


50 operations for transaction 2
60 operations for transaction 3
If grouptransops = 100, then replicat will commit under one transaction 25 + 50 + 60 operations.
25 < 100 so replicat will collect more transactions.
25 + 50 < 100 so replicat will collect more transactions.
25 + 50 + 60 > 100, replicat now commits.

A smaller value for GROUPTRANSOPS will result in more commits, greater checkpointing and more frequent updates
to reporting.--> more overhead and less throughput. Will reduce speed of replicate and lag will increase

Parameter GROUPTRANSOPS is ignored by Integrated Replicat when parallelism is greater than 1.

→ MAXTRANSOPS:
Its used to split large source transactions into smaller ones on the target system.
This parameter can be used when the target database is not configured to accommodate large transactions. For
example, if the Oracle rollback segments are not large enough on the target to reproduce a source transaction
that performs one million deletes, you could specify MAXTRANSOPS 10000, which forces Replicat to issue a commit
after each group of 10,000 deletes.
Intergrated Replicat:

→ stream pool need to be enough


1GB STREAMS_POOL_SIZE per process and additional 25 percent

→ The following features are applied in direct mode by Replicat:


DDL operations
Sequence operations
SQLEXEC parameter within a TABLE or MAP parameter
EVENTACTIONS processing

Because transactions are applied serially in direct apply mode, heavy use of such operations may reduce
the performance of the integrated Replicat mode.

So classic is better when you have more such operations


THANK YOU

You might also like