You are on page 1of 37

Oracle Golden Gate Interview

Questions
Q. What is the significance of Oracle GoldenGate Manager?

To give users control over Oracle GoldenGate processes, Manager provides a


command line interface to perform a variety of administrative, housekeeping, and
reporting activities, including:
 Setting parameters to configure and fine-tune Oracle GoldenGate processes
 Starting, stopping, and monitoring capture and delivery modules
 Critical, informational event, and threshold reporting
 Resource management
 Trail File management
Q. Why it is highly desirable that tables that you want to replicate should have
primary key?

In simple words, to uniquely identify a record GoldenGate requires a primary key. If


the primary key does not exist on the source table, GoldenGate will create its own
unique identifier by concatenating all the table columns together. This will certainly
prove inefficient as volume of data that needs to be extracted from the redo logs will
increase exponentially. In normal scenario, when a table has primary key, GoldenGate
process will fetch only the primary key and the changed data (before and after images
in the case of an update statement).

GoldenGate process will also warn you that primary key does not exist on the target
table and you may receive the following warning in the GoldenGate error log:

WARNING OGG-xxxx No unique key is defined for table


‘TARGET_TABLE_NAME’. All viable columns will be used
to represent the key, but may not guarantee uniqueness.
KEYCOLS may be used to define the key.
Having primary key also insure fast data lookup when the Replicat recreates and
applies the DML statements against the target database. But keep in mind that it is not
“mandatory” that primary key must be present for the table.

Q. Is it MUST that the source database should be in archivelog mode?


It is NOT must that the source database is in the archivelog mode but for any serious,
mission-critical GoldenGate system it is almost mandatory to have source system in
Archive Log mode.

Q. Without going into details, explain high level steps of setting up GoldenGate.

Below are the key steps to install/configure the GoldenGate system.


 Download the software from the Oracle website and upload to server
 Unpack/Unzip the installation zip file
 Prepare source and target system
 Install the software on the source and target system (for 12c use OUI)
 Prepare the source database (some DB parameters need to be adjusted)
Configure the Manager process on the source and target system
 Configure the Extract process on the source system
 Configure the data pump process on the source system
 Configure the Replicat process on the target system
 Start the Extract process
 Start the data pump process
 Start the Replicat process
Q. When creating GoldenGate database user for database 12c, what special
precaution you need to take?

You must grant the GoldenGate admin user access to all database containers on the
source side so that GoldenGate can access the redo logs for all the databases
(container and pluggable)

You must also grant the DBA role with the container=all option.

SQL> GRANT DBA TO C##GOLDENADMIN CONTAINER=ALL

Q. What is Downstream capture mode of GoldenGate?

Traditionally log mining work for the source data happens on Source database side
but in Downstream capture mode Oracle Data Guard redo transport mechanism is
used. This enables continuous log shipping to the target database’s standby redo logs
in real time. Log mining work to fetch DDL/DML transactions happens on the target
side.

Q. How do you take backup of GoldenGate?

Your source/database you can backup easily using backup tools like Oracle Recovery
Manager (RMAN) but to backup the GoldenGate you will need to back up the
GoldenGate home and subdirectories that contain the trail files, checkpoint files etc.
Without these key files, GoldenGate will not be able to recover from the last
checkpoint. It means that if somehow you lose all these key GoldenGate files then you
will have no option but to go for a new initial load. RMAN simply do not have
capability to backup the OS or no database files.

So either you keep all your GoldenGate related files on some kind of SAN setup
which gets backed up daily at storage level or use Unix shell commands etc in cron
job to take filesystem backups..

Q. What is checkpoint table? In which capture mode it is used: classic or


integrated?

Oracle GoldenGate extract and replicat processes perform checkpoint operations.


Now in the event of some unexpected failure, the checkpoint file or database table
ensures extract and replicat re-start from the point of failure and avoid re-capture and
re-apply of transactions.

So, Checkpoint table enables the checkpoint to be included within Replicat’s


transaction, ensuring complete recovery from all failure scenarios.

You use the GGSCI add checkpointtable command to create the checkpoint table.

Checkpoint table is used for Classic capture/replicate mode.

For Integrated mode, the Checkpoint table is not required and should not be created.

Q. What transaction types does Golden Gate support for Replication?

Goldengate supports both DML and DDL Replication from the source to target.

Q. What are the supplemental logging pre-requisites?


The following supplemental logging is required.
1. Database supplemental logging
2. Object level logging
Q. Why is Supplemental logging required for Replication?

Integrated Capture (IC):


1. In the Integrated Capture mode, GoldenGate works directly with the database
log mining server to receive the data changes in the form of logical change
records (LCRs).
2. IC mode does not require any special setup for the databases using ASM,
transparent data encryption, or Oracle RAC.
3. This feature is only available for oracle databases in Version 11.2.0.3 or higher.
4. It also supports various object types which were previously not supported by
Classic Capture.
5. This Capture mode supports extracting data from source databases using
compression.
Integrated Capture can be configured in an online or downstream mode.

Q. List the minimum parameters that can be used to create the extract process?

The following are the minimum required parameters which must be defined in the
extract parameter file.
1. EXTRACT NAME
2. USERID
3. EXTTRAIL
4. TABLE
Q. I want to configure multiple extracts to write to the same exttrail file? Is this
possible?

Only one Extract process can write to one exttrail at a time. So, you can’t configure
multiple extracts to write to the same exttrail.

Q. What type of Encryption is supported in Goldengate?

Oracle Goldengate provides 3 types of Encryption.


1. Data Encryption using Blow fish.
2. Password Encryption.
3. Network Encryption.
Q. What are some of the key features of GoldenGate 12c?

The following are some of the more interesting features of Oracle GoldenGate 12c:
1. Support for Multitenant Database
2. Coordinated Replicat
3. Integrated Replicat Mode
4. Use of Credential store
5. Use of Wallet and master key
6. Trigger-less DDL replication
7. Automatically adjusts threads when RAC node failure/start
8. Supports RAC PDML Distributed transaction
9. RMAN Support for mined archive logs
Q. If have created a Replicat process in OGG 12c and forgot to specify
DISCARDFILE parameter. What will happen?

Starting with OGG 12c, if you don’t specify a DISCARDFILE OGG process now
generates a discard file with default values whenever a process is started with START
command through GGSCI.

Q. Is it possible to start OGG EXTRACT at a specific CSN?

Yes, Starting with OGG 12c you can now start Extract at a specific CSN in the
transaction log or trail.

Example:

START EXTRACT fin ATCSN 12345

START EXTRACT finance AFTERCSN 67890

Q. List a few parameters which may help improve the replicat performance?

Below are the parameters below can be used to improve the replicat performance:
1. BATCHSQL
2. GROUPTRANSOPS
3. INSERTAPPEND
Q. What are the most common reasons of an Extract process slowing down?
Some of the possible reasons are:
 Long running batch transactions on a table.
 Insufficient memory on the Extract side. Uncommitted, long running
transactions can cause writing of a transaction to a temporary area (dirtmp)
on disk. Once the transaction is committed it is read from the temporary
location on the file system and converted to trail files.
 Slow or overburdened Network.
Q. What are the most common reasons of the Replicat process slowing down?

Some of the possible reasons are:


 Large amount of transactions on a particular table.
 Blocking sessions on the destination database where non-Goldengate
transactions are also taking place on the same table as the replicat
processing.
 If using DBFS, writing & reading of trail files may be slow if SGA
parameters are not tuned.
 For slow Replicat’s, latency may be due to missing indexes on target.
 Replicat having to process Update, delete of rows in very large tables.
Q. My extract was running fine for a long time. All of a sudden it went down. I
started the extract processes after 1 hour. What will happen to my committed
transactions that occurred in the database during last 1 hour?

OGG checkpoint provides the fault tolerance and make sure that the transaction
marked for committed is capture and captured only once. Even if the extract went
down abnormally, when you start the process again it reads the checkpoint file to
provide the read consistency and transaction recovery.

Q. I have configured Oracle GoldenGate integrated capture process using the


default values. As the data load increases I see that extract starts lagging behind
by an hour (or more) and database performance degrades. How you will resolve
this performance issue?

When operating in integrated capture mode, you must make sure that you have
assigned sufficient memory to STREAMS_POOL_SIZE. An undersized
STREAMS_POOL_SIZE or limiting the streams pool to use a specific amount of
memory can cause troubles.
The best practice is to allocate STREAMS_POOL_SIZE at the instance level and
allocate the MAX. SGA at GG process level as below:

SQL> alter system set STREAMS_POOL_SIZE=3G

TRANLOGOPTIONS INTEGRATEDPARAMS (MAX_SGA_SIZE 2048, PARALLELISM 4)

Q. Why would you segregate the tables in a replication configuration? How


would you do it?

In OGG you can configure replicat at the data at the schema level or at the table level
using TABLE parameter of extract and MAP parameter of replicat.

For replicating the entire database, you can list all the schemas in the database in the
extract/replicat parameter file.

Depending the amount of redo generation, you can split the tables in a schema in
multiple extracts and replicats to improve the performance of data replication.
Alternatively, you can also group a set of tables in the configuration by the application
functionality.

Alternatively, you may need to remove tables which have long running transactions in
a separate extract process to eliminate lag on the other tables.

Let’s say that you have a schema named SCOTT and it has 100 hundred tables.

Out of these hundred tables, 50 tables are heavily utilized by application.

To improve the overall replication performance, you create 3 extracts and 3 replicats
as follows:

Ext_1/Rep_1 –> 25 tables

Ext_2/Rep_2 –> 25 tables

Ext_3/Rep_3 –> 50 tables

Ext_1/Rep_1 and Ext_2/Rep_2 contains 25 table each which are heavily utilized or
generate more redo.

Ext_3/Rep_3 contains all the other 50 tables which are least used.
Q. How do you view the data which has been extracted from the redo logs?

The logdump utility is used to open the trail files and look at the actual records that
have been extracted from the redo or the archive log files.

Q. Why should I upgrade my GoldenGate Extract processes to Integrated


Extract?

Oracle is able to provide faster integration of the new database features by moving the
GoldenGate Extraction processes into the database. Due to this, the GoldenGate
Integrated Extract has a number of features like Compression which are not supported
in the traditional Extract. You can read more about how to upgrade to Integrated
Extract and more about Integrated Delivery. Going forward preference should be give
to create new extracts as Integrated Extracts and also to upgrade existing traditional
Extracts.

Q. What is the minimum Database version which supports Integrated Delivery?

Oracle 11.2.0.4 is the the minimum required database version which supports both
Integrated extract and Integrated Reaplicat.

Q. What databases supports GoldenGate Integrated Delivery?

Oracle Integrated Delivery is only available for Oracle Databases.

Q. With Integrated Delivery, where can we look for the performance stats?

Yes with 12c, performance statistics are collected in the AWR repository and the data
is available via the normal AWR reports.

Q. What are the steps required to add a new table to an existing replication
setup?

The steps to be executed would be the following:


 Include the new table to the Extract & pump process.
 Obtain starting database SCN and Copy the source table data to the target
database
 Start Replicat on target at the source SCN database point.
Q. What is the purpose of the DEFGEN utility?
When the source and the target schema objects are not the same (different DDL’s) the
Replicat process needs to know the source definition of the objects. The output from
the DEFGEN utility is used in conjunction with the trail data to determine which
column value in the trail belongs to which column.

Q. We want to setup one-way data replication for my online transaction


processing application. However, there are compressed tables in the
environment. Please suggest how I can achieve it.

You must use OGG 11.2 and configure GoldenGate Integrated Capture process to
extract data from compressed tables.

Note: Pre OGG 11.2 doesn’t support extracting data from compressed tables

Q. What are the different OGG Initial load methods available?

OGG has 2 functionalities, one it is used for Online Data Replication and second for
Initial Loading.

If you are replicating data between 2 homogeneous databases then the best method is
to use database specific method (Exp/Imp, RMAN, Transportable tablespaces,
Physical Standby and so on). Database specific methods are usually faster than the
other methods.

—If you are replicating data between 2 heterogeneous databases or your replicat
involves complex transformations, then the database specific method can’t be used. In
those cases you can always use Oracle GoldenGate to perform initial load.

Within Oracle GoldenGate you have 4 different ways to perform initial load.
1. Direct Load – Faster but doesn’t support LOB data types (12c include support
for LOB)
2. Direct Bulk Load – Uses SQL*LOAD API for Oracle and SSIS for MS SQL
SERVER
3. File to replicat – Fast but the rmtfile limit is 2GB. If the table can’t be fit in 1
rmtfile you can use maxfiles but the replicat need to be registered on the target
OGG home to read the rmtfiles from source.
4. File to Database utility – depending on the target database, use SQL*LOAD for
Oracle and SSIS for MS SQL SERVER and so on.
Q. I have a table called ‘TEST’ on source and target with same name, structure
and data type but in a different column order. How can you setup replication for
this table?

OGG by default assumes that the sources and target tables are identical. A table is said
to be identical if and only if the table structure, data type and column order are the
same on both the source and the target.

If the tables are not identical you must use the parameter ‘SOURCEDEFS’ pointing to
the source table definition and ‘COLMAP’ parameter to map the columns from source
to target.

Q. What is the best practice to delete the extract files in OGG?

Use the manager process to delete the extract files after they are consumed by the
extract/replicat process

PURGEOLDEXTRACTS /u01/app/oracle/dirdat/et*, USECHECKPOINTS,


MINKEEPHOURS 2

Q. I have a one-way replication setup. The system administration team wants to


apply an OS patch to both the OGG source host and the target servers. Provide
the sequence of steps that you will carry before and after applying this patch.

Procedure:

1. Check to make sure that the Extract has processed all the records in the data source
(Online Redo/archive logs)

GGSCI> send extract , logend

(The above command should print YES)

2. Verify the extract, pump and replicat has zero lag.

GGSCI> send extract , getlag

GGSCI> send extract , getlag

GGSCI> send replicat , getlag


(The above command should pring “At EOF, no more records to process.”)

3. Stop all application and database activity.

4. Make sure that the primary extract is reading the end of the redolog and that there is
no LAG at all for the processes.

5. Now proceed with stopping the processes:

Source:

1. Stop the primary extract

2. Stop the pump extract

3. Stop the manager process

4. Make sure all the processes are down.

Target:

1. Stop replicat process

2. Stop mgr

3. Make sure that all the processes are down.

4. Proceed with the maintenance

5. After the maintenance, proceed with starting up the processes:

Source:

1. Start the manager process

2. Start the primary extract

3. Start the pump extract

(Or simply all the extract processes as GGSCI> start extract *)


4. Make sure all that the processes are up.

Target:

1. Start the manager process

2. Start the replicat process.

3. Make sure that all the processes are up.

Q. What are the basic resources required to configure Oracle GoldenGate high
availability solution with Oracle Clusterware?

There are 3 basic resources required:


1. Virtual IP
2. Shared storage
3. Action script
Q. How can you determine if the parameters for a process was recently changed

Whenever a process is started, the parameters in the .prm file for the process is written
to the process REPORT. You can look at the older process reports to view the
parameters which were used to start up the process. By comparing the older and the
current reports you can identify the changes in the parameters.

Q. Is there a way to check the syntax of the commands in the parameter file
without actually running the GoldenGate process?

Yes, you can place the SHOWSYNTAX parameter in the parameter file and try
starting. If there is any error you will see it.

Q. What are macros?

Macro is an easier way to build your parameter file. Once a macro is written it can be
called from different parameter files. Common parameters like username/password
and other parameters can be included in these macros. A macro can either be another
parameter file or a library.
1. What is different b/w 11g and 12c GG?
-11g supports only classic extract.
-In 12c integrated extract is introduced.
-12c support internal parallelism for replicat to increase apply performance.
-automatic CDR.
2. What is diff b/w integrated method and classic method?
3. What is the advantage if we use pump process?
- by using pump we can avoid impact of network issues, if trail file currupted in target host we can
resend it by using pump since we maintain one copy of trail file in source host.
- Recovery will be easy if any trail file currupted in target.
4. What are the trail files?
-trail files are binary files that are maitained by goldengate to store replicate data.
5. What are the different checkpoints in goldengate.
-capture check poitn and replicate check point.
6. What is difference b/w lag at chkpt and time since chkpt.
7. What is logdump utility?
-logdump is goldengate utility used to handle trailfiles.
8. What is RBA in trailfile?
9. How to find number of transactions in a trailfile?
- using COUNT command in logdump utility.
10. From which database version supports integrated extract/replicate.
- RDBMS 11.2.0.3
11. Which process supports from database end for integrated extract.
-logmining server process
12. Which process supports from database end for integrated replicat.
-inbound server process
13. Checkpoint table is required for integrated replicat?
-No
14. What is the diff b/w coordinated and integrated replicat.
15. what is tranlog in goldengate level.
16. what is PASSTHROU parameter
17. what is ASSUMETARGETDEFS parameter
18. what is credential store
19. what is discard file. what data it will stores.
20. what si BATCHSQL mode.
21. difference between Lag at checkpoit and Time since checkpoint.
22. what is CDR.
23. what all manager process can do.

Q1: What is the OGG s/w version to use if source DB is Oracle 12c R2 and platform Linux and
destination is DB2 10.1 (32-bit) with 32-bit OS.
Does it mean we have to download and install separate OGG software versions at source and target
servers? Will they be able to communicate with each other?
A1: In any case, you will have to install and configure Goldengate on both target and source even if
versions are same. however, they will be two separate binaries.
Q2: Can the source platform be 32-bit and target platform on 64 bit OS for OGG
implementation?
A2: Yes

Q3: We ran an update statement in source database which updates ten million records,
commit it and immediately run “shut abort” in the source database. Will data will get
replicated correctly to the target database by OGG?
A3: If DB is down, Goldengate will abend. As long as data is written in logs when you restart the
process goldengate will pick from the point where it stopped. So everything depends on what is
getting written to logs and goldengate will pick committed transactions.

Q4: Instead of running ‘add trandata’, if I directly run ‘alter table add supplemental logging’ at
SQL prompt, will OGG still work?
A4: Yes

Q5: What happens if we add trandata for a table which do not have a primary key or unique
key, but has invisible columns:
a) Will the invisible column will be considered for uniqueness while enabling supplemental
logging?
b) What happens when we make the invisible column of the table visible?

A5: No invisible columns will not be considered for uniqueness if invisible columns are made visible
they will be treated as normal columns.

Q6: When we talk about OGG initial load, the target tables should be empty, but the metadata
should be present.
What about the indexes corresponding to the tables at the Target database?
Should they able be defined at the target database before starting OGG initial load?
A6: Goldengate Initial load takes care of only data. So yes metadata should be present. For faster
load, it is advisable to turn off indexes but it’s not mandatory.

READ  Oracle GoldenGate Training Day2 Review: Processes & DML Replication :


Lessons Learned & Key Takeaway

Q7: OGG “Skiptransaction” option can be provided only for replicat process? Can it be used
for extract and data pump processes also?
A7: Replicat only

Q8: If both the source database and target database are RAC databases, will the OGG
instances will also be RAC?
A8: There is no such thing called as OGG RAC. Only DB is RAC.

Q9: Under which circumstances do we need to run dblogin in ggsci?


Ans: Whenever any change to DB is required from GGSCI
Q10: If the file system on which trail files are stored gets filled up:
a) how will be the OGG processes behave? Will they get ABENDED?
Ans: Yes. OGG processes get ADENDED

b) How will the underlying source and databases behave?


Ans: Source and Target DB are independent of OGG hence no effect on DBS.

c) How will be the apply gap be detected and restored?


Ans: As soon as you restart the process, it will be taken care. This is done using Goldengate
Checkpoint process.

============================================================

========================= DataGuard Interview


Questions
What are the advantages in using Oracle Data Guard?
Following are the different benefits in using Oracle Data Guard feature in your environment.

 High Availability.

 Data Protection.

 Off-loading Backup operation to standby database.

 Automatic Gap detection and Resolution in standby database.

 Automatic Role Transition using Data Guard Broker.


What are the different services available in Oracle Data Guard?
Following are the different Services available in Oracle Data Guard of Oracle database.

 Redo Transport Services.

 Log Apply Services.

 Role -Transitions.
What are the different Protection modes available in Oracle Data Guard?
Below are the protection modes available in DG

1. Maximum Protection
2. Maximum Availability
3. Maximum Performance => This is the default protection mode. It provides the highest level of
data protection that is possible without affecting the performance of a primary database. This is
accomplished by allowing transactions to commit as soon as all redo data generated by those
transactions has been written to the online log.

How to check what protection mode of primary database in your Oracle Data Guard?
SELECT PROTECTION_MODE FROM V$DATABASE;
How to change protection mode in Oracle Data Guard setup?
ALTER DATABASE SET STANDBY DATABASE TO MAXIMUM [PROTECTION |
PERFORMANCE | AVAILABILITY];
What are the advantages of using a Physical standby database in Oracle Data Guard?

1. High Availability.
2. Load balancing (Backup and Reporting).
3. Data Protection.
4. Disaster Recovery.

What is the usage of DB_FILE_NAME_CONVERT parameter in Oracle Data Guard


setup?
DB_FILE_NAME_CONVERT This parameter is used when you are using different directory
structure in standby database compare to primary database data files location & also when
we duplicating database this parameter can be used to generate files in a different location.
What are the services required on the primary and standby data-base?
The services required on the primary database are:

1. Log Writer Process (LGWR): Collects redo information and updates the online redo logs. It
can also create local archived redo logs and transmit online redo to standby databases.
2. Archiver Process (ARCn): One or more archiver processes make copies of online redo
logs either locally or remotely for standby databases.
3. Fetch Archive Log (FAL) Server: Services requests for archive redo logs from FAL clients
running on multiple standby databases. Multiple FAL servers can be run on a primary database, one
for each FAL request.
4. Log network server (LNS): LNS is used on the primary to initiate a connection with the
standby database.

The services required on the standby database are:

1. Fetch Archive Log (FAL) Client: Pulls archived redo log files from the primary site. Initiates
transfer of archived redo logs when it detects a gap sequence.
2. Remote File Server (RFS): Receives archived and/or standby redo logs from the primary
database.
3. Archiver (ARCn) Processes: Archives the standby redo logs applied by the managed
recovery process (MRP).
4. Managed Recovery Process (MRP): pplies archive redo log information to the standby
database.

It controls the automated transfer of redo data from the production database to one or more
archival destinations. The redo transport services perform the following tasks:
What is RTS (Redo Transport Services) in Data-guard?
Transmit redo data from the primary system to the standby systems in the configuration.
Manage the process of resolving any gaps in the archived redo log files due to a network
failure.
Automatically detect missing or corrupted archived redo log files on a standby system and
automatically retrieve replacement archived redo log files from the primary database or
another standby database.
Control the automated transfer of redo data from a database destination to one or more
destinations. Redo transport services also manage the process of resolving any gaps in the
archived redo log files due to a network failure.
How to delay the application of logs to a physical standby?
A standby database automatically applies redo logs when they arrive from the primary
database. But in some cases, we want to create a time lag between the archiving of a redo
log at the primary site, and the application of the log at the standby site.
Modify the Log_Archive_Dest_n initialization parameter on the primary database to set a
delay for the standby database.
Example: For 60min Delay:
ALTER SYSTEM SET LOG_ARCHIVE_DEST_2=’SERVICE=stdby_srvc DELAY=60′;
The DELAY attribute is expressed in minutes.
The archived redo logs are still automatically copied from the primary site to the standby
site, but the logs are not immediately applied to the standby database. The logs are applied
when the specified time interval expires.

Oracle Data Guard Interview


Questions
How many standby databases we can create (in 10g/11g)?
Till Oracle 10g, 9 standby databases are supported.
From Oracle 11g R2, we can create 30 standby databases.
What are differences between physical, logical, snapshot standby and ADG (or) what
are different types of standby databases?
Physical standby – in MOUNT STATE, MRP proves will apply the archives
ADG – in READ ONLY state, MRP will apply the archives
Logical standby – in READ ONLY state, LSP will run
Snapshot standby databases – Physical standby database can be converted to snapshot
standby database, which will be in READ WRITE mode, can do any kind of testing, then we
can convert back snapshot standby database to physical standby database and start MRP
which will apply all pending archives.
How many standby databases we can create (in 10g/11g)?
Till Oracle 10g, 9 standby databases are supported.
From Oracle 11g R2, we can create 30 standby databases.
What are the parameters we’ve to set in primary/standby for Data Guard?
DB_UNIQUE_NAME
LOG_ARCHIVE_CONFIG
LOG_ARCHIVE_MAX_PROCESSES
DB_CREATE_FILE_DEST
DB_FILE_NAME_CONVERT
LOG_FILE_NAME_CONVERT
LOG_ARCHIVE_DEST_n
LOGARCHIVE_DEST_STATE_n
FAL_SERVER
FAL_CLIENT
STANDBY_FILE_MANAGEMENT
What is the use of fal_server & fal_client, is it mandatory to set these?
FAL_SERVER
specifies the FAL (fetch archive log) server for a standby database. The value is an Oracle
Net service name, which is assumed to be configured properly on the standby database
system to point to the desired FAL server.
FAL_CLIENT
specifies the FAL (fetch archive log) client name that is used by the FAL service, configured
through the
FAL_SERVER initialization parameter, to refer to the FAL client.
The value is an Oracle Net service name, which is assumed to be configured properly on
the FAL server system to point to the FAL client (standby database).
How to find out backlog of standby?
select round((sysdate - a.NEXT_TIME)*24*60) as "Backlog",m.SEQUENCE#-1 "Seq
Applied",m.process, m.status
from v$archived_log a, (select process,SEQUENCE#, status from v$managed_standby
where process like '%MRP%')m where a.SEQUENCE#=(m.SEQUENCE#-1);
If you didn't have access to the standby database and you wanted to find out what
error has occurred in a data guard configuration, what view would you check in the
primary database to check the error message?
You can check the v$dataguard_status view.
select message from v$dataguard_status;
How can u recover standby which far behind from primary (or) without archive logs
how can we make standby sync?
By using RMAN incremental backup.
What is snapshot standby (or) How can we give a physical standby to user in READ
WRITE mode and let him do updates and revert back to standby?
Till Oralce 10g, create guaranteed restore point, open in read write, let him do updates,
flashback to restore point, start MRP.
From Oracle 11g, convert physical standby to snapshot standby, let him do updates,
convert to physical standby, start MRP.
What is active data guard? Does it needs additional licensing?
Active dataguard means, the standby database is open with read only mode, when redo
logs are                 getting applied in real time.
Below are the benefit of using active dataguard.

 Reporting queries can be offloaded to standby database.

 Physical block corruptions are repaired automatically either at primary or physical standby
database.

 RMAN backups can be initiated from standby , instead of primary which will reduce cpu load
from primary.
What is active dataguard duplicate?
Starting from 11g we can duplicate database by two way 1) Active DB duplicate 2) Backup-
based                  duplicate.
Active DB duplicate copies the live TARGET DB over the network to the AUXILLARY
destination and      then create the duplicate database. In an active duplication process,
target database online image              copies and archived redo log files were copied
through the auxiliary instance service name. So there is      no need of target db backup.

Node eviction is quite sometimes happening in RAC environment on any platform and
troubleshooting and finding root cause for node eviction is very important for DBAs to avoid
same in the future. There are two RAC processes which are basically deciding about node
evictions and who will initiate node evictions in almost all platforms.
1. OCSSD : This process is primary responsible for inter node health monitoring and
instance endpoint recovery. It runs as oracle user. It also provides basic cluster locking and
group services. It can run with or without vendor clusterware. The abnormal termination or
killing this process will reboot the node by init.cssd script. If this script is killed then ocssd
process will survive and node will keep functioning. This script is called by /etc/inittab entry
and when it tries to respawn it and will try to start its own ocssd process. Since one ocssd
process is already running, this 2nd time script calling ocssd starting will fail and 2nd
init.cssd script will reboot the node.

2. OPROCD : This process is known as checking hangcheck and drive freezes on machine.
On Linux, it is not available on 10.2.0.3 platform as this same function is performed by linux
hangcheck timer module. Starting from 10.2.0.4, it will be started as part of clusterware
startup and it runs as root. Killing this process will reboot the node. If a machine is hang for
long time, this process needs to kill itself to avoid IO happening to disk so that rest of the
nodes can remaster the resources. This executable sets a signal handler and sets the
interval time bases on milliseconds parameter. It takes two parameters.

a. Timeout value –t : This is the length of time between executions. By default it’s
1000.
b. Margin –m : This is the acceptable difference between dispatches. By default, it’s
500.

When we set diagwait to 13, the margin becomes 13 -3 (reboottime seconds)= 10 seconds
so value of m will be 10000.

commented by UNIRAC ACE (20,920 points)
There are two kinds of heartbeat mechanisms which are responsible for node reboot and
reconfiguration of remaining clusteware nodes.

a. Network heartbit : This indicates that node can participate in cluster activities like group
membership changes. When it’s missing for too long, cluster membership will change as a result
of reboot. This too long value is determined by css miscount parameter value which is 30 seconds
on most of platforms but can be changed depending on network configuration of particular
environment. If at all it needs to be changed, it’s advisable to contact oracle support and take
their recommendations on this.
b. Disk heartbit : This disk heartbit means heartbits to voting disk file which has the latest information
about node members. Connectivity to a majority of voting files must be maintained for a node to stay
alive. Voting disk file uses kill blocks to notify nodes they have been evicted and then remaining
nodes can go for reconfiguration and a node with least no will become master as per Oracle
algorithm generally. By default this value is 200 seconds which is css disktimeout parameter. Again
changing this parameter requires oracle support’s recommendation. When node can no longer
communicate through private interconnect, other nodes can see its heartbits in voting file then
it’s being evicted by using voting disk kill block functionality.
Network split resolution : When network fails and nodes are not able to communicate to each other
then one node has to fail to maintain data integrity. The surviving nodes should be an optimal
subcluster of original cluster. Each node writes its own vote to voting file and Reconfiguration
manager component reads these votes to calculate an optimal sub cluster. Nodes that are not to
survive are evicted via communication through network and disk.

Causes of reboot by clusterware processes


======================================
Now we will briefly discuss about causes of reboot by these processes and at last, which files to
review and upload to oracle support for further diagnosis.
Reboot by OCSSD.
============================
1. Network failure : 30 consecutive missed checkins will reboot a node where heartbits are issues
once per second. Some kind of messages in occsd.log like heartbit fatal, eviction in xx seconds…
Here there are two things.
a. If node eviction time in messages log file is less than missed checkins then node eviction is likely
not due to missed checkins.
b. If node eviction time in messages log file is greater than missed checkins then node eviction is
likely due to missed checkins.
2. Problems writing to voting disk file : some kind of hang in accessing voting disk.
3. High CPU utilization : When CPU is highly utilized then css daemon doesn’t get CPU on time
to ping to voting disk and as a result, it cannot write to voting disk file its own vote and node is going
to be rebooted.
4. Disk subsystem is unresponsive due to storage issues.
5. Killing ocssd process.
6. An oracle bug.

Reboot by OPROCD.
============================
When a problem is detected by oprocd, it’ll reboot the node for following reasons.
1. OS scheduler algorithm problem.
2. High CPU utilization due to which oprocd is not getting cpu to check hang check issues at OS
level.
3. An oracle bug.

Also just to share with you, at one of the client sites, lms processes were running on low priority
scheduling and lms were not getting cpu on time when there’s cpu is high utilized so lms
couldn’t communicate through clusterware processes and node eviction got delayed and it was
observed that oprocd rebooted node which should not have happened as lms was responsible to run
at lower priority scheduling.
Determining cause of reboot by which process
==============================================

1. If there are below kind of messages in logfiles then it will be likely reboot by ocssd process.
a. Reboot due to cluster integrity in syslog file or messages file.
b. Any error prior to reboot in ocssd.log file.
c. Missed checkins in syslog file and eviction time is prior to node reboot time.
2. If there are below kind of messages in logfiles then it will be likely reboot by oprocd process.
a. Resetting message in messages logfile on linux.
b. Any error in oprocd log matching with timestamp of reboot or prior to reboot at /etc/oracle/oprocd
directory.
3. If there are other messages like Ethernet issues or some kind of errors in messages or syslog file
then please check with sysadmins. On AIX, errpt –a output gives lot of information about cause of
reboot.
Log files collection while reboot of node
==============================================
Whenever node reboot occurs in clusterware environment, please review below logfiles for getting
reason of reboot and these files are necessary to upload to oracle support for node eviction
diagnosis.
a. CRS log files (For 10.2.0 and above 10.2.0 release)
=============================================
1. $ORACLE_CRS_HOME/log//crsd/crsd.log
2. $ORACLE_CRS_HOME/log//cssd/ocssd.log
3. $ORACLE_CRS_HOME/log//evmd/evmd.log
4. $ORACLE_CRS_HOME/log//alert.log
5. $ORACLE_CRS_HOME/log//client/cls*.log (not all files but only latest files matching with
timestamp of node reboot)
6. $ORACLE_CRS_HOME/log//racg/ (Please check files and directories matching with timestamp of
reboot and if found then copy otherwise not required)
7. The latest .oprocd.log file from /etc/oracle or /var/opt/oracle/oprocd (Solaris)

Note: We can use $ORACLE_CRS_HOME/bin/diagcollection.pl to collect above files but it


doesn’t collect OPROCD logfiles, OS log files and OS watcher logfiles and also it may take lot of
time to run and consume resources so it’s better to copy manually.
b. OS log files (This will get overwritten so we need to copy soon)
====================================================
1. /var/log/syslog
2. /var/adm/messages
3. errpt –a >error_.log (AIX only)

c. OS Watcher log files (This will get overwritten so we need to copy soon)
=======================================================
Please check in crontab where OSwatcher is installed. Go to that directory and then archive folder
and then collect files from all directory matching with timestamp of node reboot.
1. OS_WATCHER_HOME/archive/oswtop
2. OS_WATCHER_HOME/archive/oswvmstat
3. OS_WATCHER_HOME/archive/oswmpstat
4. OS_WATCHER_HOME/archive/oswnetstat
5. OS_WATCHER_HOME/archive/oswiostat
6. OS_WATCHER_HOME/archive/oswps
7. OS_WATCHER_HOME/archive/oswprvtnet

Introduction - Physical and Logical Standby


A Physical standby database is an exact copy of the primary database. It is always
kept in a managed recovery mode and is unusable as long as primary is up and
functional.

The prominent difference with a logical standby database is that the latter is not an
exact replica of the primary database. A logical standby can be a subset or a superset
of the primary and is a fully operational database used for reporting etc. Unlike a
physical standby, tables in logical standby can be queries also.

In this particular scenario we are using a physical standby database where some
datafiles are missing causing the managed recovery process to stop and hence forcing
it to get out of sync with primary database.

Environment

Let us review the db environment which we are going to use for demonstrating this
scenario. 

1. Primary has 200 datafiles and standby has only 166 datafiles

2. Primary is a 3 node cluster and Standby is a 2 node cluster

3. The DB name is mydb

4. MRP on standby is not running

Problem and Symptoms 

Here is a detailed description of the actual problem and symptoms/indications which helped us
choose the appropriate corrective measures. 

1. On discovering that physical standby is out of sync, when we tried to start the
MRP on standby, it reported the following error in alert log:

**************************************************************

Errors in file /u01/app/oracle/admin/mydb/bdump/mydb1_mrp0_21189.trc


ORA-01111: name for data file 167 is unknown - rename to correct file
"ORA-01110: data file 167: '/u01/app/oracle/product/9.2.0/dbs/UNNAMED00167'
ORA-01157: cannot identify/lock data file 167 - see DBWR trace file
ORA-01111: name for data file 167 is unknown - rename to correct file
ORA-01110: data file 167: '/u01/app/oracle/product/9.2.0/dbs/UNNAMED00167'
*************************************************************

2. On further investigation, standby's alert log also shows following errors:

************************************************************************

Tue Sep 9 04:05:03 2015


Media Recovery Log /u03/oradata/mydb/arc_backup/mydb_2_2173.arc
Media Recovery Log /u03/oradata/mydb/arc_backup/mydb_1_1896.arc
WARNING: File being created with same name as in Primary
Existing file may be overwritten
File #167 added to control file as 'UNNAMED00167'. Originally created as:
'/u07/oradata/mydb/myfile_1.dbf'
Recovery was unable to create the file as:
'/u07/oradata/mydb/myfile_1.dbf'
MRP0: Background Media Recovery terminated with error 1274
Tue Sep 9 04:05:06 2015
Errors in file /u01/app/oracle/admin/mydb/bdump/mydb1_mrp0_7175.trc:
ORA-01274: cannot add datafile '/u07/oradata/mydb/myfile-1.dbf' - file could
not be created
ORA-01119: error in creating database file '/u07/oradata/mydb/myfile_1.dbf'
ORA-27054: Message 27054 not found; product=RDBMS; facility=ORA
Linux-x86_64 Error: 13: Permission denied

************************************************************************** 

3. On checking the view v$archived_log, there were lot of log sequence# which
were APPLIED=NO

4. Also note that there is no gap in the sequence#

What caused the missing datafile(s) condition on Standby?

Parameter db_file_name_convert was not set at standby database. So as long as the


files were created on /u02 and /u03 on primary, there was no problem on the standby
because standby had /u02 and /u03. But when file#167 was added at /u07 on primary
(on Sep 9 04:05:03 2015), it could not map to a /u07 mount point on standby
because /u07 does not exists on standby and db_file_name_convert was also not set.
As indicated by the alert log, the file#167 was registered in the standby's control file
as "UNANMED00167" at the default location of $ORACLE_HOME/dbs but the file
was not created physically on standby database.
Action Plan: How to resolve this 

1. At the standby:
Please set the db_file_name_convert parameter at the Standby for the /u07
folder at the Primary to the corresponding folder at the Standby.

          Since this parameter is a Static parameter, you need to bounce the Standby DB.

*********************************************************************
*********

As step#1, you can do following instead of the above step:

At the standby:

Create /u07 soft link for /u02, to eliminate the bounce of standby db due to the
addition of db_file_name_convert init.ora parameter

*********************************************************************
************************

2. At the standby :
SQL> alter system set standby_file_management=manual;

3. At the Primary for the datafile 167 :


SQL> alter tablespace < tablespace name> begin backup ;

          Copy the Datafile from the Primary to Standby to the correct location.
          SQL> Alter tablespace <tablespace name > end backup; 

4. At the Standby:
SQL> alter database rename file '.......UNNAMED00167' to
'< actual location of the datafile >';

*********************************************************************
*********

You can skip steps#3 and #4 and instead do following step after #2:

At the Standby:

SQL> ALTER DATABASE CREATE DATAFILE '< ....UNNAMED00167>' as '<


datafile name with the correct path>';
*********************************************************************
********* 

5. To create the remaining datafiles at the Standby automatically:


SQL> alter system set standby_file_management=auto;

6. Start the MRP at the Standby


SQL> alter database recover managed standby database;

At standby database ensure the MRP is running as expected

SQL>select process, status , sequence# from v$managed_standby;

Word of Caution: Prevent this from happening again

Before adding datafiles on the primary, make sure: 

1. The corresponding mount point exists on the standby


2. Or there should be an appropriate mapping between the primary's and standby's
mount points using the parameter db_file_name_convert
3. Or create a soft link on standby server with the same name as that of primary's
mount point if it does not exist on the standby.

Important notes: When Primary and Standby are RAC databases 

1. On Standby: You can see multiple copies of some or all logs transported and
applied on standby when you check the view v$archived_log. 

2. On Standby: All sequence# should have APPLIED=YES in v$archived_log for


all threads. This ensures that all logs from all threads were transported and
applied on standby and hence keeps standby in sync with primary.

3. On Standby: In the view v$archived_log you may not see same number of


multiple copies of all logs. For example, if the primary is a 3 node cluster, you
may or may not have 3 copies of each log i.e. you may not have the same
sequence# log on standby for all 3 threads. Of course the reason is that number
of logs generated on all 3 nodes of primary will differ. The current sequence#
transported from a node of primary RAC database can be seen by
querying v$archived_log on standby:

          SQL> select max(sequence#) from v$archived_log where thread#=1;


As explained above, the output will differ for all 3 threads.

Why extra standby redo log group?


"Step 2 Determine the appropriate number of standby redo log file groups.
Minimally, the configuration should have one more standby redo log file group
than the number of online redo log file groups on the primary database....
(maximum number of logfiles for each thread + 1) * maximum number of threads
Using this equation reduces the likelihood that the primary instance's log
writer (LGWR) process will be blocked because a standby redo log file cannot be
allocated on the standby database. For example, if the primary database has 2
log files for each thread and 2 threads, then 6 standby redo log file groups
are needed on the standby database."

I think it says that if you have groups #1 and #2 on primary and #1, #2 on
standby, and if LGWR on primary just finished #1, switched to #2, and now it
needs to switch to #1 again because #2 just became full, the standby must catch
up, otherwise the primary LGWR cannot reuse #1 because the standby is still
archiving the standby's #1. Now, if you have the extra #3 on standby, the
standby in this case can start to use #3 while its #1 is being archived. That
way, the primary can reuse the primary's #1 without delay

n previous article , we have seen basics about Oracle Dataguard ORACLE DATAGUARD.


In this article we will see oracle dataguard architecture in depth.
Let us understand above diagram with example. Let us assume our primary database is in
Delhi, India and standby database is in Bangalore , India.

When we have just primary database , it fills log buffer , whenever any changes are made to
the database and LGWR[log writer] process does it works and writes to online redo log files.
When log switch occurs ARCn process awakes and content of online redo log files will be
flushed to Archive redo log files.

Now , how it works with Standby database configured ?

1. LNS process of primary database captures redo from redo log buffer.
2. Send it to RFS process of standby database through oracle net.
3. RFS process then writes that redo information to standby redo log files.
4. If LNS process is not fast enough to capture  redo information before it goes to
online redo log files or if redo data are going online redo log files very quickly then
LNS process will read from Online redo log files and send redo to RFS process
through Oracle net.
5. If some network outage occur and online redo log gets log switch and data goes to
archived redo log files , before its been written to standby redo log files then RFS
process will directly communicate to ARCn process and works for Archive log gap
resolution.
6. Once with any possible way redo are written to standby redo log files then , MRP [in
case of physical dataguard] or LSP [in case of logical dataguard] process apply that
redo or sql to standby database.
7. And as redo data is being applied on standby database , ARCn process of standby
database also generates archive logs.
You can use your standby database for resource intensive task like backup and also it is
usable for reporting tasks.

Primary Database Processes for Dataguard Environment  :

1. LGWR : LGWR collects transaction redo information and updates online redo log files.

2.LNS process [Log writer network server]: LNS process works in two way.

1. SYNC mode : When you have configured your dataguard environment in sync Redo
transport Service , LGWR passes redo to LNS process , which transfers data directly
to RFS process on the standby database. LGWR waits for confirmation from the LNS
process and LNS process waits confirmation from RFS process that redo data are
applied to standby database before acknowledging commit.
2. ASYNC :  When you have configured async redo transport service, it is independent
of LNS process , whether LNS process have read from redo log buffer or from online
redo log files .  Dataguard just starts asynchronous LNS process , other than that
LGWR has no interaction with  any asynchronous standby destinations. In simple
terms , data guard will not wait for any acknowledgement from standby database that
redo are applied or not and keeps on doing its work. So it is way faster than sync
mode .
3. ARCn Process  : As we know ARCn process creates a copy of the online redo log files .
ARCn is also responsible for shipping redo data to an RFS process at a standby database
and for pro-actively detecting and resolving gaps on all standby database.

Standby Database Processes :

1. RFS [Remote File Server] : As we have seen above RFS process can get redo data
either from LNS process or from ARCn process of primary and RFS process can writes
redo information to standby redo logs files.

Each LNS and ARCn process that communicates with Standby database has its own RFS
process.

2. ARCn [Archiver] : The ARCn process archives standby redo logs.


3. MRP [Managed Recovery Process] : In case of physical dataguard MRP process
comes into play.MRP process applies archived redo log information to the physical standby
database.You can start managed recovery using “ALTER DATABASE RECOVER
MANAGED STANDBY DATABASE” this foreground session performs recovery. And if you
want to perform recovery on background then you can optionally use DISCONNECT FROM
SESSION. clause where MRP background process will start .

If you use DG BROKER to manage your dataguard , it always starts MRP background
process.

4. LSP [Logical Standby ]: It comes into play for logical dataguard only. It controls the
application of archived redo log information to the logical standby database. LSP process
will transform redo data into sql statements and then these sql statements will be applied to
logical standby database.

Usage of Handle Collision parameter


The Goldengate HANDLECOLLISIONS parameter is configured on the target database in
the Replicat process to Handle the collisions. It enables processing of the data when there
are duplicate data integrity or no data found issues identified in the destination database.

There could be a number of reasons which could cause this condition. Some of them include the
following.

1. Duplicate data exists in the source table.


2. Misconfiguration of the extract or Replicat configuration
3. Data Overlap –The table data was instantiated at a particular CSN (Commit Sequence
Number) in the destination database but the Replicat process was started at a CSN prior table
load SCN.
4. No Data exist

The HANDLECOLLISIONS parameter is used to overcome these collisions. There are 3 types of


Collisions:

1. Insert Collision – When a row is inserted on source database whose key column already
exist on target DB
2. Update Collision- When a row is updated on the source whose key column doesn’t exist on
target DB.
3. Delete Collision- When a row is deleted on the source whose key column doesn’t exist on
target DB.

Without the use of this parameter, the Replicat will ABEND when it tries to process the inserts from
the trail into the table which already has the rows (PK or unique constraint violation).

It will also ABEND when the Replicat tries an update or delete rows which are not present in the
destination tables. To overcome this normally the RBA of the trail has to be moved forward one
transaction before the Replicat can be restarted and will stay running.

The following is the behavior of the Replicat process when the Goldengate HANDLECOLLISIONS
parameter is enabled.

INSERT INSERT Collision INSERT IN SOURCE WHOSE KEY


  Converted to UPDATES
Collision CLUMN EXIST ON TARGET

UPDATE UPDATE Collision Updated in source but row not MISSING ROW FROM TARGET IS
 
Collision present in target  CONVERTED TO INSERT

DELETE DELETE Collision Deleted in source but row not


  Ignored
Collision present in target

Resolution :

Enabling HANDLECOLLISIONS

 Goldengate HANDLECOLLISIONS should be used only when and where necessary.


 It should be removed from the Oracle Goldengate Replication configuration as soon as
possible.
 If it has to be enabled, it should only be done so ONLY for tables requiring this.

This can be achieved by using HANDLECOLLISION, but by listing the specific tables and then
turning it off using the NOHANDLECOLLISIONS clause for the remaining tables, as shown below.

Set Globally
Enable global HANDLECOLLISIONS for ALL MAP statements

HANDLECOLLISIONS
MAP pdb1.ggtraining1.dept11, TARGET pdb2.ggtraining2.dept22;
MAP pdb1.ggtraining1.emp11, TARGET pdb2.ggtraining2.emp22;
MAP pdb1.ggtraining1.hr11, TARGET pdb2.ggtraining2.hr22;
MAP pdb1.ggtraining1.revenue11, TARGET pdb2.ggtraining2.revenue22;
Set for Group of MAP Statements

Enable HANDLECOLLISIONS for some MAP statements

HANDLECOLLISIONS
MAP pdb1.ggtraining1.dept11, TARGET pdb2.ggtraining2.dept22;
MAP pdb1.ggtraining1.emp11, TARGET pdb2.ggtraining2.emp22;
NOHANDLECOLLISIONS
MAP pdb1.ggtraining1.hr11, TARGET pdb2.ggtraining2.hr22;
MAP pdb1.ggtraining1.revenue11, TARGET pdb2.ggtraining2.revenue22;
Set for Specific Tables

Enable global HANDLECOLLISIONS but disable for specific tables

HANDLECOLLISIONS
MAP pdb1.ggtraining1.dept11, TARGET pdb2.ggtraining2.dept22;
MAP pdb1.ggtraining1.emp11, TARGET pdb2.ggtraining2.emp22;
MAP pdb1.ggtraining1.hr11, TARGET pdb2.ggtraining2.hr22; NOHANDLECOLLISIONS
MAP pdb1.ggtraining1.revenue11, TARGET pdb2.ggtraining2.revenue22,
NOHANDLECOLLISIONS;

Remove the HANDLECOLLISIONS parameter after the Replicat has moved past the CSN where it
was abending previously.

Also make sure to restart the Replicat after the removing this parameter.

Handle Collision parameter substitute


Since HANDLECOLLISION is not the recommended parameter to be used during ongoing
replication, you can use a different set of parameters.

1. You may use updateinserts to Handle Insert Collision with some limited functionality of
handlecollisions
2. To Handle Update Collision use parameter INSERTMISSINGUPDATES.
3. To capture rows which are either duplicate INSERTS or do not exist in the destination to be
updated or deleted, REPERROR can be used to record these rows into a discard file.
Why are Kernel parameters critical for Oracle
Database?
Installing Oracle Database software is one of our regular activities as DBA. There might
be supporting notes in each and every project we support, to complete it quicker and
more efficiently. One of the pre-requisites we implement is to set up appropriate
Kernel parameters at the operating system level on UNIX platforms. But, it would be
really awesome to understand the reason behind using those parameters. Incorrect
values of these parameters will lead to performance issues in the database as well. In
the Oracle installation documents, it is clearly advised the list of parameters to set and
their respective values.

This blog will explain you the purpose of Kernel parameters we set when installing
database software and its side effects when not set correctly. It will help you to debug
when you tune the performance at the OS level.
List of Parameters:
Below are the list of parameters Oracle advises in the documents to set up on Linux 64-
bit environment. We will take this set of parameters in this blog to understand them in
detail.

fs.aio-max-nr = 1048576

fs.file-max = 6815744

kernel.shmall = 2097152

kernel.shmmax = 4294967295

kernel.shmmni = 4096

kernel.sem = 250 32000 100 128

net.ipv4.ip_local_port_range = 9000 65500

net.core.rmem_default = 262144

net.core.rmem_max = 4194304

net.core.wmem_default = 262144

net.core.wmem_max = 1048586
What happens to your #Oracle #Database with wrong
#Kernel parameter values?CLICK TO TWEET
Categories:
These parameters can be categorized into 3 sections as the first part of the name says.

1. fs – File handles: All possible limitations in handling files.


2. kernel – Kernel specifics: Limitations on resource usage at kernel level like
Memory, CPU etc…
3. net – Network specifics: Limitations on network usage.
Let us explore:
1.fs:
fs.aio-max-nr – This parameter defines the maximum number of
ASYNCHRONOUS I/O calls that system can handle on the server. While aio-nr shows the
number of calls that system has at that moment.
If this parameter value is insufficient for Oracle Database, then the possible error that
you see in alert log will be:

ORA-27090: Unable to reserve kernel resources for


asynchronous disk I/O
fs.file-max – This parameter defines the maximum number of file handles,
meaning that how many number of opened files can system support at any instance.
It is recommended to have a minimum of 256 as value for every 4MB of RAM you have.
So for 8GB RAM = 2048 * 4MB = 2048 * 256 = 524288.

So if you are growing your RAM on the server, then consider to re-check this
parameter.

2.kernel:
SHMMNI, SHMALL, SHMMAX – Before we describe each one of these, all of them
defines the limitations on using shared memory on the server. With respect to UNIX
shared memory is just memory segments shared between multiple application
processes on the server. So Oracle Database is one of them.
SHMMNI – It sets the maximum number of shared memory segments that server can
handle. As Oracle recommends the value should be at least 4096, it says that we
cannot find more than 4096 number of shared memory segments at any instance on
the server. Note that SHMMNI value is in numbers.
SHMALL – It defines the total amount of shared memory PAGES that can be used
system-wide. It means that to the use all the physical memory this value should be less
than or equal to total physical memory size. For DBA’s, it means that sum of all SGA
sizes on the server should be less than or equal to SHMALL value. Note that SHMALL
value is a number of pages.
SHMMAX – It defines the maximum size that one shared memory segment that server
can allocate. Note that SHMMAX value in bytes. Oracle recommends that this value
should be greater than half of the physical memory on the server.
Appropriate #Kernel parameters for your #Oracle
#DatabaseCLICK TO TWEET
Case study:
Let us run through a case study to understand these parameter effects better.

Consider you have a server with 8GB physical memory(RAM). Let’s define the best
possible SHMMNI, SHMALL, SHMMAX values for this system.
SHMMNI – No change it should be 4096. It must be increased if you have more than at
least one fourth (1024) Oracle Databases running on the server. Which we never
recommend.
SHMALL – By default the page size on Linux is 4KB. The total size of RAM is 8GB. Let us
leave at least 1GB of RAM for Linux kernel to run, with which consider 7GB can be used
for Oracle Databases. Now value of SHMALL can be:
(7*1024*1024)KB/4KB = 1835008

SHMMAX – If you want the maximum size of SGA on this server to be 5GB, then this
parameter value should be 5*1024*1024*1024 = 5368709120 bytes. This, in turn, says
that you should not have any database with more than 5GB of SGA. But you can have
multiple databases with each 5GB of SGA or even less. This is the fact why Oracle
recommends to have this value more than half of the memory to utilize it for SGA(s).
By chance, if your SGA size is more than 5GB say it is 7GB then 2 shared memory areas
will be allocated to SGA with one of 5GB and two of 2GB sizes, which doesn’t perform
well.

3.net:
net.ipv4.ip_local_port_range – This parameter defines the range of port
numbers that system can use for programs which want to connect to the server
without a specific port number.
Now, it makes sense if you have come across somebody advising you not to use port
numbers for listener beyond 9000 😊. Also, just look back to documents on OEM
installation, Oracle uses and advises all the default port numbers less than 9000 😊. As I
observed.

net.core.rmem – This parameter defines the default  and maximum RECEIVE


socket memory through TCP.
net.core.wmem – This parameter defines the default  and maximum SEND socket
memory through TCP.
Oracle recommends to set these values as by default LINUX does not support to
transfer or receive large files over TCP. These parameters are pretty important to set
considering the amount of the data that flows between database and application – can
be BLOB, CLOB or DataGuard redo transfers and so on!!!

“Watch the video below and see how tricky the effect of SHMMAX parameter value would
be”, You will like it.
Conclusion:
 If you are creating a new oracle database instance, not just a free physical
memory on the server to check; But also make sure your SHMALL, SHMMNI,
SHMMAX parameters are re-configured
 When your data transfer between application and data is going high, run
through the network parameters and see if receive and send sockets are the
reason behind network delays.
 As your database grows, data files will do. Not just making sure DB_FILES
parameter is set to support a number of data files; Verify kernel parameters on
file handlers are also configured accordingly.

What about semaphores in kernal?.


Semaphore means ‘ process ‘ .Can we call like this semaphore set ( set of processes
called semaphore set ).
In oracle recommended parameter are,
semaphores = semmsl semmns semopm semmni
kernel.sem = 250 32000 100 128
250 – max number of semaphores (processes) per semaphore set ( or array ).
32000 – Total number of processes oracle can use from server level. ( means whole
oracle engine can use 32000 processes only ).
100 – Max number of operations per semaphores( means per process).
128 – Minimum number of semaphore sets ( means array ).
It will vary on RAC machines.
So that below is process calculation for preventing performance issue.
in this value calculation,
semmsl = 250 (per 1 set).
semmni = 128 sets ( Minimum ) ( Bcz oracle engine need min 128 set for managing DBs
and Cores).
semmns = semmsl * semmni = 32000
semopm = 100 ( you can set whatever you want based on number of processes you
having ).

You might also like