Professional Documents
Culture Documents
OracleGolden Gateis a tool provided by oracle for transactional data replication among oracle databases and other
RDBMS tools (SQL SERVER, DB2.Etc). Its modular architecture gives you the flexibility to easily decouple or
combined to provide best solution to meet the business requirements.
• High Availability
• Data Integration
● Extract
● Data pump
● Replicat
● Trails or extract
● Checkpoints
● Manager
● Collector
Below is the architecture diagram of GG:
Figure: gg.01
Oracle Golden Gate server runs on both source and target server. OracleGolden Gateis installed as an external
component to the database and it wont uses database resource, in turn it won’t effect database performance.
Where as Oracle streams which uses built in packages which are provided by oracle, which uses most of the
database resources and there are chances of performance slow down in both source and target databases.
EXTRACT:
Extract runs on the source system and it is the extraction mechanism for oracleGolden Gate( capture the changes
which happens at the source database).
The Extract process extracts the necessary data from the database transaction logs. For oracle database
transaction logs are nothing both REDO log file data. Unlike streams which runs in the oracle database itself and
needs the access to the database. OracleGolden Gatedoes not needs access to the oracle database and also it will
extract only the committed transaction from the online redo log file.
Whenever there is a long running transaction which generates more number of redo data will force to switch the
redo log file and in turn more number of archive logs will be generated. In these cases the extract process need to
read the archive log files to get the data.
Extract process captures all the changes that are made to objects that are configured for synchronization.
Multiple Extract processes can operate on different objects at the same time. For example once process could
continuously extract transactional data changes and stream them to a decision support database. while another
process performs batch extracts for periodic reporting or, two extract processes could extract and transmit in
parallel to two replicat processes ( with two trails) to minimize target latency when the databases are large.
DATAPUMP
Datapump is the secondary extract process within source oracleGolden Gateconfiguration. You can have the
source oracleGolden Gateconfigured without Datapump process also, but in this case Extract process has to send
the data to trail file at the target. If the Datapump is configured the primary extract process writes the data to the
source trail file and Datapump will read this trail file and propagate the data over the network to target trail file.
The Datapump adds the storage flexibility and it isolates the primary extract process from TCP/IP activity.
You can configure the primary extract process and Datapump extract to extract online or extract during batch
processing.
REPLICAT
Replicat process runs on the target system. Replicat reads extracted transactional data changes and DDL changes
(IF CONFIGURED) that are specified in the Replicat configuration, and than it replicates them to the target
database.
TRAILS OR EXTRACTS
To support the continuous extraction and replication of source database changes, Oracle Golden Gate stores the
captured changes temporarily on the disk in a series of files call a TRAIL. A trail can exist on the source or target
system and even it can be in a intermediate system, depending on how the configuration is done. On the local
system it is known as an EXTRACT TRAIL and on the remote system it is known as REMOTE TRAIL.
The use of a trail also allows extraction and replication activities to occur independently of each other. Since these
two ( source trail and target trail) are independent you have more choices for how data is delivered.
CHECKPOINT
Checkpoints stores the current read and write positions of a process to disk for recovery purposes. These
checkpoints ensure that data changes that are marked for synchronization are extracted by extract and replicated
by replicat.
Checkpoint work with inter process acknowledgments to prevent messages from being lost in the network.
OracleGolden Gatehas a proprietary guaranteed-message delivery technology.
Checkpoint information is maintained in checkpoint files within the dirchk sub-directory of the Oracle Golden
Gate directory. Optionally, Replicat checkpoints can be maintained in a checkpoint table within the target
database, apart from standard checkpoint file.
MANAGER
The Manager process runs on both source and target systems and it is the heart or control process of Oracle
Golden Gate. Manager must be up and running before you create EXTRAT or REPLICAT process. Manager performs
Monitoring, restarting oracle golden gate process, report errors, report events, maintains trail files and logs etc.
COLLECTOR
Collector is a process that runs in the background on the target system. Collector receives extracted database
changes that are sent across the TCP/IP network and it writes them to a trail or extract file.
Installation (Oracle 11g on Linux)
GoldenGate software is also available on OTN but for our platform we need to download the required software from the Oracle
E-Delivery web site.
Select the Product Pack “Oracle Fusion Middleware” and the platform Linux X86-64.
$ unzip V18159-01.zip
Archive: V18159-01.zip
inflating: ggs_redhatAS50_x64_ora11g_64bit_v10.4.0.19_002.tar
$ export PATH=$PATH:/u01/oracle/ggs
$ export LD_LIBRARY_PATH=$ORACLE_HOME/lib:/u01/oracle/ggs
$ ggsci
User created.
SQL> grant connect,resource, select any dictionary, select any table, create table, grant flashback any table, execute on
dbms_flashback , execute on utl_file to ggs_owner;
We can then confirm that the GoldenGate user we have just created is able to connect to the Oracle database
$ ggsci
Database altered
Configuring the Manager process
The Oracle GoldenGate Manager performs a number of functions like starting the other GoldenGate processes, trail log file
management and reporting.
The Manager process needs to be configured on both the source as well as target systems and configuration is carried out via a
parameter file just as in the case of the other GoldenGate processes like Extract and Replicat.
The only mandatory parameter that we need to specify is the PORT which defines the port on the local system where the
manager process is running. The default port is 7809 and we can either specify the default port or some other port provided the
port is available and not restricted in any way.
Some other recommended optional parameters are AUTOSTART which which automatically start the Extract and Replicat
processes when the Manager starts.
The Manager process can also clean up trail files from disk when GoldenGate has finished processing them via the
PURGEOLDEXTRACTS parameter. Used with the USECHECKPOINTS clause, it will ensure that until all processes have finished
using the data contained in the trail files, they will not be deleted.
PORT 7809
USERID ggs_owner, PASSWORD ggs_owner
PURGEOLDEXTRACTS /u01/oracle/ggs/dirdat/ex, USECHECKPOINTS
The manager can be stopped and started via the GSSCI commands START MANAGER and STOP MANAGER .
Information on the status of the Manager can be obtained via the INFO MANAGER command
This example illustrates using the GoldenGate direct load method to extract records from an Oracle 11g database on Red Hat
Linux platform and load the same into an Oracle 11g target database on an AIX platform.
On Source
Since this is a one time data extract task, the source of the data is not the transaction log files of the RDBMS (in this case the
online and archive redo log files) but the table data itself, that is why the keyword SOURCEISTABLE is used.
EXTRACT load1
USERID ggs_owner, PASSWORD ggs_owner
RMTHOST devu007, MGRPORT 7809
RMTTASK replicat, GROUP load2
TABLE sh.products;
On Target
Since this is a one time data load task, we are using the keyword SPECIALRUN
REPLICAT: name of the Replicat group created for the initial data load
USERID/PASSWORD: database credentials for the Replicat user (this user is created in the target database)
ASSUMETARGETDEFS: this means that the source table structure exactly matches the target database table structure
MAP: with GoldenGate we can have the target database structure entirely differ from that of the source in terms of table
names as well as the column definitions of the tables. This parameter provides us the mapping of the source and target tables
which is same in this case
REPLICAT load2
USERID ggs_owner, PASSWORD ggs_owner
ASSUMETARGETDEFS
MAP sh.customers, TARGET sh.customers;
On Source
5) Start the initial load data extract task on the source system
We now start the initial data load task load 1 on the source. Since this is a one time task, we will initially see that the extract
process is runningand after the data load is complete it will be stopped. We do not have to manually start the Replicat process
on the target as that is done when the Extract task is started on the source system.
On Source
On Target
GoldenGate change synchronization, means the changes that occur on the source are applied near real time on the target.
The table on the source is the EMP table in SCOTT schema which is being replicated to the EMP table in the target database SH
schema.
GoldenGate maintains its own Checkpoints which is a known position in the trail file from where the Replicat process will start
processing after any kind of error or shutdown. This ensures data integrity and a record of these checkpoints is either
maintained in files stored on disk or table in the database which is the preferred option.
We can also create a single Checkpoint table which can used by all Replicat groups from the single or many GoldenGate
instances.
In one of the earlier tutorials we had created the GLOBALS file. We now need to edit that GLOBALS file and add an entry
forCHECKPOINTTABLE which will include the checkpoint table name which will be available to all Replicat processes via the EDIT
PARAMS command.
GGSCHEMA GGS_OWNER
CHECKPOINTTABLE GGS_OWNER.CHKPTAB
We now create a trail – note that this path pertains to the GoldenGate software location on the target system and this is where
the trail files will be created having a prefix ‘rt’ which will be used by the Replicat process also running on the target system
EXTRACT ext1
USERID ggs_owner, PASSWORD ggs_owner
RMTHOST devu007, MGRPORT 7809
RMTTRAIL /u01/oracle/software/goldengate/dirdat/rt
TABLE scott.emp;
ON TARGET SYSTEM
Note that the EXTTRAIL location which is on the target local system conforms to the RMTTRAIL parameter which we used when
we created the parameter file for the extract process on the source system.
REPLICAT rep1
ASSUMETARGETDEFS
USERID ggs_owner, PASSWORD ggs_owner
MAP scott.emp, TARGET sh.emp;
ON SOURCE
ON TARGET
Start the Replicat process
.....
In addition to providing replication support for all DML statements, we can also configure the GoldenGate environment to
provide DDL support as well.
A number of prerequisite setup tasks need to be performed which we willl highlight here.
Run the following scripts from the directory where the GoldenGate software was installed.
SQL> @marker_setup
You will be prompted for the name of a schema for the GoldenGate database objects.
NOTE: The schema must be created prior to running this script.
NOTE: Stop all DDL replication before starting this installation.
.......
SQL> @ddl_setup
You will be prompted for the name of a schema for the GoldenGate database objects.
NOTE: The schema must be created prior to running this script.
NOTE: On Oracle 10g and up, system recycle bin must be disabled.
NOTE: Stop all DDL replication before starting this installation.
SQL> @role_setup
You will be prompted for the name of a schema for the GoldenGate database objects.
NOTE: The schema must be created prior to running this script.
NOTE: Stop all DDL replication before starting this installation.
SQL> @ddl_enable
We need to set the parameter recyclebin to OFF via the ALTER SYSTEM SET RECYCLEBIN=OFF command in order to prevent this
error which we will see if we try and configure DDL support and then start the Extract process.
Note- We had earlier enabled additional supplemental logging at the database level. Using the ADD TRANDATA command we
now enable it at even the table level as this is required by GoldenGate for DDL support.
Edit the parameter file for the Extract process to enable DDL synchronization
We had earlier created a parameter file for an Extract process ext1. We now edit that parameter file and add the entry
DDL INCLUDE MAPPED
This means that DDL support is now enabled for all tables which have been mapped and in this case it will only apply to the
SCOTT.EMP table as that is the only table which is being processed here. We can also use the INCLUDE ALL or EXCLUDE ALL or
wildcard characters to specify which tables to enable the DDL support for.
EXTRACT ext1
USERID ggs_owner, PASSWORD ggs_owner
RMTHOST 10.53.100.100, MGRPORT 7809
RMTTRAIL /u01/oracle/software/goldengate/dirdat/rt
DDL INCLUDE MAPPED
TABLE scott.emp;
Configuring Data Pump process
The Data Pump is an optional secondary Extract group that is created on the source system. When Data Pump is not
used, the Extract process writes to a remote trail that is located on the target system using TCP/IP. When Data Pump
is configured, the Extract process writes to a local trail and from here Data Pump will read the trail and write the data
over the network to the remote trail located on the target system.
The advantages of this can be seen as it protects against a network failure as in the absence of a storage device on
the local system, the Extract process writes data into memory before the same is sent over the network. Any failures
in the network could then cause the Extract process to abort (abend). Also if we are doing any complex data
transformation or filtering, the same can be performed by the Data Pump. It will also be useful when we are
consolidating data from several sources into one central target where data pump on each individual source system
can write to one common trail file on the target.
Using the ADD EXTRAIL command we will now create a local trail on the source system where the Extract process
will write to and which is then read by the Data Pump process. We will link this local trail to the Primary Extract group
we just created, ext1
On the source system create the Data Pump group and using the EXTTRAILSOURCE keywork specify the location
of the local trail which will be read by the Data Pump process
EXTRACT ext1
USERID ggs_owner, PASSWORD ggs_owner
EXTTRAIL /u01/oracle/software/goldengate/dirdat/lt
TABLE MONITOR.WORK_PLAN;
Use the RMTTRAIL to specify the location of the remote trail and associate the same with the Data Pump group as it
will be wriiten to over the network by the data pump process
Note- the parameter PASSTHRU signifies the mode being used for the Data Pump which means that the names of
the source and target objects are identical and no column mapping or filtering is being performed here.
ON TARGET SYSTEM
The EXTTRAIL clause indicates the location of the remote trail and should be the same as the RMTTRAIL value that
was used when creating the Data Pump process on the source system.
REPLICAT rep1
ASSUMETARGETDEFS
USERID ggs_owner, PASSWORD ggs_owner
MAP MONITOR.WORK_PLAN, TARGET MONITOR.WORK_PLAN;
ON SOURCE
On the source system, now start the Extract and Data Pump processes.
Note– the data pump process is reading from the Local Trail file – /u01/oracle/software/goldengate/dirdat/lt000000
ON TARGET SYSTEM
In example 1, we will filter the records that are extracted on the source and applied on the target – only rows where
the JOB column value equals ‘MANAGER” in the MYEMP table will be considered for extraction.
In example 2, we will deal with a case where the table structure is different between the source database and the
target database and see how column mapping is performed in such cases.
Example 1
Initial load of all rows which match the filter from source to target. The target database MYEMP table will only
be populated with rows from the EMP table where filter criteria of JOB=’MANAGER’ is met.
On Source
EXTRACT myload1
USERID ggs_owner, PASSWORD ggs_owner
RMTHOST devu007, MGRPORT 7809
RMTTASK replicat, GROUP myload1
TABLE scott.myemp, FILTER (@STRFIND (job, “MANAGER”) > 0);
On Target
EXTRACT myload2
USERID ggs_owner, PASSWORD ggs_owner
RMTHOST 10.53.200.225, MGRPORT 7809
RMTTRAIL /u01/oracle/software/goldengate/dirdat/bb
TABLE scott.myemp, FILTER (@STRFIND (job, “MANAGER”) > 0);
On Target
On Target
............................
In the source MYEMP table we have a column named SAL whereas on the target, the same MYEMP table has the
column defined as SALARY.
Create a definitions file on the source using DEFGEN utility and then copy that definitions file to the target
system
DEFSFILE /u01/oracle/ggs/dirsql/myemp.sql
USERID ggs_owner, PASSWORD ggs_owner
TABLE scott.myemp;
***********************************************************************
Oracle GoldenGate Table Definition Generator for Oracle
.........
We then ftp the definitions file from the source to the target system – in this case to the dirsql directory located in the
top level GoldenGate installed software directory
We now go and make a change to the original replicat parameter file and change the parameter ASSUMEDEFS
to SOURCEDEFS which provides GoldenGate with the location of the definitions file.
The other parameter which is included is the COLMAP parameter which tells us how the column mapping has been
performed. The ‘USEDEFAULTS’ keyword denotes that all the other columns in both tables are identical except for
the columns SAL and SALARY which differ in both tables and now we are mapping the SAL columsn in source to the
SALARY column on the target.
REPLICAT myload2
SOURCEDEFS /u01/oracle/software/goldengate/dirsql/myemp.sql
USERID ggs_owner, PASSWORD ggs_owner
MAP scott.myemp, TARGET sh.myemp,
COLMAP (usedefaults,
salary = sal);
................
Monitoring commands
MANAGER RUNNING
EXTRACT RUNNING DPUMP 00:00:00 00:00:04
EXTRACT RUNNING EXT1 00:00:00 00:00:09
3.
GGSCI (devu007) 24> status extract ext1
EXTRACT EXT1: RUNNING
8. View latency between the records processed by Goldengate and the timestamp in the
data source
A recent client had a problem which required the use of the OGG Logdump utility to verify whether a particular
table row had been extracted and placed in the Replicat trail file for replication to the target. In this post, I will
$ logdump
Logdump 3 >detail data <— adds hex and ASCII data values to the column information
Logdump 4 >next <— moves to the first record; advances by one record
Scan for timestamp >= 2010/08/04 04:00:00.000.000 GMT <— timestamp found
>pos FIRST
>pos <rba>
>log to <filename>.txt
Similarly, only one Replicat group is typically needed to apply data to a target database if using
Replicat in coordinated mode. (See Section 14.7.2, "About Coordinated Replicat Mode" for more
information.) However, even in some cases when using Replicat in coordinated mode, you may be
required to use multiple Replicat groups. If you are using Replicat in classic mode and your
applications generate a high transaction volume, you probably will need to use parallel Replicat
groups.
Because each Oracle GoldenGate component — Extract, data pump, trail, Replicat — is an
independent module, you can combine them in ways that suit your needs. You can use multiple trails
and parallel Extract and Replicat processes (with or without data pumps) to handle large transaction
volume, improve performance, eliminate bottlenecks, reduce latency, or isolate the processing of
specific data.
Memory
The system must have sufficient swap space for each Oracle GoldenGate Extract and Replicat process
that will be running
Isolating processing-intensive tables
You can use multiple process groups to support certain kinds of tables that tend to interfere with
normal processing and cause latency to build on the target. For example:
Extract may need to perform a fetch from the database because of the data type of the column, because
of parameter specifications, or to perform SQL procedures. When data must be fetched from the
database, it affects the performance of Extract. You can get fetch statistics from the STATS
EXTRACT command if you include the STATOPTIONS REPORTFETCH parameter in the Extract
parameter file. You can then isolate those tables into their own Extract groups, assuming that
transactional integrity can be maintained.
In its classic mode, Replicat process can be a source of performance bottlenecks because it is a single-
threaded process that applies operations one at a time by using regular SQL. Even
with BATCHSQL enabled, Replicat may take longer to process tables that have large or long-running
transactions, heavy volume, a very large number of columns that change, and LOB data. You can then
isolate those tables into their own Replicat groups, assuming that transactional integrity can be
maintained.
@RANGE is safe and scalable. It preserves data integrity by guaranteeing that the same row will
always be processed by the same process group.
It might be more efficient to use the primary Extract or a data pump to calculate the ranges than to
use Replicat. To calculate ranges, Replicat must filter through the entire trail to find data that meets
the range specification. However, your business case should determine where this filtering is
performed.
1. Establish benchmarks for what you consider to be acceptable lag and throughput volume for Extract
and for Replicat. Keep in mind that Extract will normally be faster than Replicat because of the kind
of tasks that each one performs. Over time you will know whether the difference is normal or one that
requires tuning or troubleshooting.
2. Set a regular schedule to monitor those processes for lag and volume, as compared to the benchmarks.
Look for lag that remains constant or is growing, as opposed to occasional spikes. Continuous, excess
lag indicates a bottleneck somewhere in the Oracle GoldenGate configuration. It is a critical first
indicator that Oracle GoldenGate needs tuning or that there is an error condition.
To view volume statistics, use the STATS EXTRACT or STATS REPLICAT command. To view lag
statistics, use the LAG EXTRACT or LAG REPLICATcommand.
There is a network bottleneck if the status of Replicat is either in delay mode or at the end of the trail file
and either of the following is true:
You are only using a primary Extract and its write checkpoint is not increasing or is increasing too slowly.
Because this Extract process is responsible for sending data across the network, it will eventually run out
of memory to contain the backlog of extracted data and abend.
You are using a data pump, and its write checkpoint is not increasing, but the write checkpoint of the
primary Extract is increasing. In this case, the primary Extract can write to its local trail, but the data pump
cannot write to the remote trail. The data pump will abend when it runs out of memory to contain the
backlog of extracted data. The primary Extract will run until it reaches the last file in the trail sequence and
will abend because it cannot make a checkpoint.
Check the RAID configuration. Because Oracle GoldenGate writes data sequentially, RAID 0+1 (striping
and mirroring) is a better choice than RAID 5, which uses checksums that slow down I/O and are not
necessary for these types of files.
Use the CHECKPOINTSECS parameter to control how often Extract and Replicat make their routine
checkpoints.
Use the GROUPTRANSOPS parameter to control the number of SQL operations that are
contained in a Replicat transaction when operating in its normal mode. Increasing the
number of operations in a Replicat transaction improves the performance of Oracle
GoldenGate by reducing the number of transactions executed by Replicat, and by reducing
I/O activity to the checkpoint file and the checkpoint table, if used. Replicat issues a
checkpoint whenever it applies a transaction to the target, in addition to its scheduled
checkpoints.