You are on page 1of 11

Optimizing Large Database Imports

Jianing Fan August 08, 2002

IBM Informix data movement utilities provide an easy way to move databases from one
machine to another. Large databases, especially those with a large number of associated
stored procedures and triggers, can pose special challenges. This article provides a sample
script that you can modify to make the job easier and faster.

© 2002 International Business Machines Corporation. All rights reserved.

Introduction
Dbexport and dbimport are IBM Informix® data movement utilities that move databases from one
machine to another. The whole process is pretty simple and straightforward. Dbexport creates a
special directory into which it creates a schema file that records the current database structure,
and then it unloads data into ASCII files for every database object. You can keep those schema
and data files on disk or on tape and then easily transfer them to another machine using UNIX® tar
or other utilities. At the same time, dbexport creates a message file in the user's current directory
to record any errors and warnings that occurred during the dbexport operation. The name of the
message file is dbexport.out.

The dbimport utility uses the schema and data files created by dbexport and creates exactly
the same database on other machines. Like dbexport, dbimport also creates a message file,
dbimport.out, in the user's current directory to log errors and warnings during the dbimport
operation.

You use these utilities for logical backups. By logical backup, I mean the backup of the database
structure or schema, and logical database objects such as tables, indexes, views, triggers and
stored procedures. A logical backup is quite different from physical backups produced by such
utilities as ontape, on-archive, or on-bar, which back up the entire Informix instance and raw disk
space.

One big advantage of logical backup is its flexibility. The schema file created by dbexport is an
ASCII file and can be edited at anytime before importing the database. There are quite a few areas
you can modify:

• Extent size for tables. You can enlarge your first and second table extent size such that the
table will have fewer but larger extents. This can improve your overall performance.

© Copyright IBM Corporation 2002 Trademarks


Optimizing Large Database Imports Page 1 of 11
developerWorks® ibm.com/developerWorks/

• Physical location for a table or an index. Based on your studies of your operating system's
disk I/O, you may want to move some tables and indexes to another location so as to reduce
I/O load.
• Names for indexes and constraints. You may want to replace automatically generated indexes
and constraints names with more meaningful names.
• Structures of database objects. You may apply a different fragmentation strategy for tables
and indexes by editing the schema file, so that when the database gets imported, tables and
indexes will be fragmented across different disks to achieve better performance.
In addition, you can change the name of the exported database so that you can keep more than
one copy of the same database on the same Informix instance. This is really good for developers
and testers who would always like to keep an original database for comparison.

Another advantage of logical backup is its finer granularity. When you restore, you don't need to
shut down whole Informix instance; instead, you can restore it on the database level. In this way,
you can avoid interfering with other people who are working with other databases.

Because of those advantages, we use dbexport and dbimport utilities frequently when developing,
unit testing, box testing, and integration testing.

Problems we ran into


Most of our databases are pretty small, only few hundred megabytes in size. But a few of them are
quite large - not only large because of the size of the data, but also large in the number of triggers
and stored procedures. One of the databases exceeds two gigabytes and contains more than four
hundred stored procedures and almost a thousand triggers.

Because IBM Informix stored procedure language (SPL) is a handy mid-tier API, a lot of
developers and programmers like to use it to implement business rules and as a link between
front-end applications and the database server. We are expecting that the number of stored
procedures and triggers will grow from release to release. We've had no problem in exporting
and importing small databases - databases with a few hundreds of MBs, but when we import
large databases - databases with large amount of data and a large number of triggers and stored
procedures - we had problems with long transactions and long compilation times.

Long transactions
IBM Informix considers the whole process of importing a database as a single transaction. When
data is large and triggers and store procedures are many, it is easy to get into a "long transaction"
state. A long transaction is defined by Informix as a transaction that starts in one logical log file and
is not committed when the database server needs to reuse the same logical log file. When a long
transaction hits the long trasaction high water mark, it starts to roll back; when the long transaction
exclusive access high water mark, the Informix server is blocked while the long transaction is
rolled back. In other words, other database operations on the server such as SELECT, INSERT,
DELETE or UPDATE are held until the long transaction is rolled back. The common way to avoid
this is either to turn off the database logging or increase logical logs to allow larger transactions to
go through.

Optimizing Large Database Imports Page 2 of 11


ibm.com/developerWorks/ developerWorks®

Long compilation time


When importing large databases, most of the time is spent on compiling stored procedures,
especially when the store procedure is long and complicated. The time is extremely long and
during the compiling, memory core dumps can occur. This often makes us to drop the half-
imported database and restart the whole process of importing from the beginning. This is quite a
time-consuming process.

Resolving the problems


To resolve these problems, we wrote a UNIX shell script to facilitate the database import process.
This script makes the process of importing large databases much easier and smoother, and, most
importantly, it improves the speed significantly; it makes whole process of importing at least four to
five times faster. Before we had this script, we needed four to five hours to import large databases;
with the script, we cut the time down to one hour and avoid the errors I described above.

Description of our system


To understand the logic of the script, you need to understand a bit about our system configuration.
We have two types of systems, the primary and the secondary. On the primary system we have
four databases: big_db, small_db1, small_db2 and small_db3. Of all databases, big_db is the
largest; it is about 2 GB of data and has more than 500 stored procedures and over one thousand
triggers. This is the database that caused us the most problem when importing and exporting.
The other databases are very small, only a few megabytes in size, and have almost no stored
procedures or triggers.

On the secondary system, there are only three small databases: small_db1, small_db2 and
small_db3.

The script's logic


The script consists of many functions that each contributes to smooth database export and import
operations. Figure 1 shows the relationships among those functions:

Optimizing Large Database Imports Page 3 of 11


developerWorks® ibm.com/developerWorks/

Figure 1. Export and import operations

Note: P stands for Primary and S stands for Secondary.

As you can see from the flowchart, the heart of this script is composed of two functions: export_db
and import_db. The basic logic here is to divide and conquer. In export_db, we take the advantage
of dbexport/dbimport utilities, which allow us to edit the exported schema. After we export big_db,
we split its exported schema into three parts as follows:

• Tables - only includes all tables structures and data


• Procedures - includes all stored procedures
• Rest - includes all views, indexes, constraints and triggers
The import_db function uses different methods to process each of them:

• For tables, we use the dbimport utility to create the database and then import tables with data
only. This part goes very fast, because there are no indexes and constraints for the tables.
• The second part, stored procedures, used to be the bottleneck, because the dbimport
utility compiles stored procedures line by line. Because the database has already created

Optimizing Large Database Imports Page 4 of 11


ibm.com/developerWorks/ developerWorks®

by dbimport, we can now use the UNIX pipe utility for this part. This improves the speed
significantly because the UNIX pipe utility can do the batch processing for compiling stored
procedures.
• We use the same method for the third part to create all views, indexes, constraints and
triggers.
I'll explore this in more detail later.

Understanding the script in detail


Let's have a close look at the script in Listing 1 and see how this divide-and-conquer logic gets
implemented. Remember, this code is presented to you "as is" without warranty of any kind.

Listing 1. stop_start and rename_db functions


#!/bin/ksh

. /usr/informix/.profile

DBACCESS=${INFORMIXDIR}/bin/dbaccess
DBEXPORT=${INFORMIXDIR}/bin/dbexport
DBIMPORT=${INFORMIXDIR}/bin/dbimport
EXPORT_PATH="/usr/gsm/db"
STAT_DB=stat_db
WEN_DB=wen_db
CONFIG_BIN=/SYS_SETUP/SYSCONFIG
LOGFILE="/usr/gsm/logs/cutover/db_exim.$$"

#######################
# stop_start_informix #
#######################
stop_start_informix() {

echo ""
echo "Stopping/Starting Informix."
echo "This may take up to 200 seconds. Please Wait..."
/usr/bin/su - informix -c "onmode -ky" >/dev/null 2>&1
#Wait for 50 seconds then start the informix engine backup
sleep 50
/usr/bin/su - informix -c "oninit; exit 0" >/dev/null 2>&1
rc=1
while [ $rc -ne 5 ]
do
echo "Informix oninit return code is $rc"
echo "Waiting for 50 seconds..."
sleep 50
echo "Checking if the Informix Engine is Online..."
/usr/bin/su - informix -c "onstat -" >/dev/null 2>&1
rc=$?
#
# onstat return codes:
# 0 = Initialization
# 1 = Quiescent
# 2 = Fast Recovery
# 3 = Archive Backup
# 4 = Shutting Down
# 5 = Online
# 6 = System Aborting
#
done
echo "The Informix Engine is Online."
echo ""

Optimizing Large Database Imports Page 5 of 11


developerWorks® ibm.com/developerWorks/

#############
# rename_db #
#############
rename_db() {

rc=1
while [ $rc -ne 0 ]
do
echo "Trying to rename the database $1 to $2"
echo "rename database $1 to $2" | $DBACCESS
rc=$?
if [ $rc -ne 0 ]
then
echo "WARNING: Failed to rename the database $1 to $2."
echo "Retrying. Please wait..."
#
# bounce the Informix engine
#
stop_start_informix

fi
done
#
# bounce the Informix engine
#
stop_start_informix

The purpose of the two functions in Listing 1 is to ensure that all database connections are killed
before exporting the database. When you export databases, you need exclusive access to the
database; otherwise you will get an error from Informix and won't be able to export. The logic here
is to bounce Informix to get rid of all database connections and then when the Informix engine
comes back, immediately rename the database so that other application programs cannot access
the database we are going to export. Then we can export the database without any interference.

Listing 2 shows the most important function of the script, the split_schema function.

Listing 2. split_schema function


###########################
# split schema #
###########################
split_schema() {

#define file seperator


file_seperator0="^CREATE PROCEDURE"
file_seperator1="^{ TABLE"
file_seperator2="^create view"

TABLE_SQL=table.sql
PROC_SQL=proc.sql
REST_SQL=rest.sql

#copy old schema file

echo "Copying $EXPORTED_SQL ..."

cp ${EXPORTED_SQL} ${EXPORTRED_SQL}.orig

Optimizing Large Database Imports Page 6 of 11


ibm.com/developerWorks/ developerWorks®

if [ "$?" -ne 0 ]
then
echo "error with creating files.please check directory permission"
exit 1
fi

echo "Parsing ${EXPORTED_SQL}..."

#parse into three parts

position0=$(( $(cat ${EXPORTED_SQL} | grep -n -i "$file_seperator0" | head -1 |cut -f1 -d:) - 1 ))

position1=$(( $(cat ${EXPORTED_SQL} | grep -n -i "$file_seperator1" | head -1 |cut -f1 -d:) - 1 ))

position2=$(( $(cat ${EXPORTED_SQL} | grep -n -i "$file_seperator2" | head -1 |cut -f1 -d:) - 1 ))

if [ $position0 -lt 0 ]
then
echo "First seperator $file_seperator0 is not found in file. "
exit 1
elif [ $position1 -lt 0 ]
then
echo "Second seperator $file_seperator1 is not found in file."
exit 1
elif [ $position2 -lt 0 ]
then
echo "Second seperator $file_seperator2 is not found in file."
exit 1
fi

echo "$position0 $position1 $position2"

diff_12=`expr $position2 - $position1`

head -${position0} ${EXPORTED_SQL}> ${EXPORTED_DIR}/${TABLE_SQL}

head -${position1} ${EXPORTED_SQL} | tail +${position0} > ${EXPORTED_DIR}/${PROC_SQL}

tail +${position1} ${EXPORTED_SQL} | head -${diff_12} >>${EXPORTED_DIR}/${TABLE_SQL}

tail +${position2} ${EXPORTED_SQL} > ${EXPORTED_DIR}/${REST_SQL}

mv ${EXPORTED_DIR}/${TABLE_SQL} ${EXPORTED_SQL}

This is the heart of the script, and the logic is explained in detail in The script's logic. This function
is called right after big_db gets exported in export_db function, as shown in Listing 3.

Listing 3. export_db function


#############
# export_db #
#############
export_db() {

date

DB=$1
lockedDB=$2
EXPORTED_SQL="$EXPORT_PATH/$lockedDB.exp/$lockedDB.sql"
EXPORTED_DIR="$EXPORT_PATH/$lockedDB.exp"

if [ -d "$EXPORT_PATH/$lockedDB.exp" ]; then

Optimizing Large Database Imports Page 7 of 11


developerWorks® ibm.com/developerWorks/

echo "Moving $EXPORT_PATH/$lockedDB.exp to another directory."


mv $EXPORT_PATH/$lockedDB.exp $EXPORT_PATH/$lockedDB.exp.old
fi

if [ "$DB" = "small_db2" ]
then
su - informix -c "echo 'grant dba to root' | $DBACCESS $DB; exit 0"
fi

rename_db $DB $lockedDB

echo "Exporting the $lockedDB database. Please Wait..."

$DBEXPORT -q $lockedDB -o $EXPORT_PATH -ss

ok=`grep "dbexport comleted" dbexport.out`

if [ "$ok" = "dbexport completed" ]


then
echo "dbexport completed with no error."
else
echo "Check dbexport.out file; your export has problems!"
tail dbexport.out
exit 1;
fi

rename_db $lockedDB $DB

if [ "$DB" = "big_db" ]
then
split_schema
fi

As we mentioned before, export_db is one of the major functions of the script, so let's explore it a
little more.

First, the function calls the rename_db function. The purpose is to obtain exclusive access
to the database to be exported, so that we can export it without any interference from other
programs. Then the function exports the database. Note that it uses quiet mode when exporting
the database. Because quiet mode does not show the errors and warnings during export on the
screen, it is very important to check dbexport.out after the database gets exported; otherwise, you
may miss errors and warnings. Right after export, the functions calls rename_db function again
to rename the database we just exported so that other programs can access it. The function then
checks the database name parameter or argument to see if it is big_db. If it is big_db, it splits the
exported schema into three parts. For the small databases, we don't split their exported schemas.

Optimizing Large Database Imports Page 8 of 11


ibm.com/developerWorks/ developerWorks®

Listing 4. Function to logging on or off


###################
# turn on logging #
###################
turnonlogging() {

infxcon="/usr/informix/etc/onconfig"

cp ${infxcon} ${infxcon}.ori

cat $infxcon.ori |sed 's/^TAPEDEV.*$/TAPEDEV \/dev\/null/' > $infxcon

/usr/bin/su informix -c "ontape -s -U $1;exit 0"

cp ${infxcon}.ori ${infxcon}

The purpose of the function in Listing 4 is to turn off logging while importing the large database,
so that the dbimport utility will use a minimum number of logical logs, thus preventing importing
operation getting into "long transaction" state and rolling back. We will then turn logging on after
the import.

Listing 5 shows how this function is getting called.

Listing 5. import_db
#############
# import_db #
#############
import_db() {

date

DB=$1
lockedDB=$2
DBSPACE=$3
EXPORTED_SQL="$EXPORT_PATH/$lockedDB.exp/$lockedDB.sql"
EXPORTED_DIR="$EXPORT_PATH/$lockedDB.exp"

echo "Importing the $lockedDB database. Please Wait..."

if [ "$DB" = "big_db" ]
then
$DBIMPORT -q $lockedDB -d $DBSPACE -i $EXPORT_PATH >$TEMP3 2>&1
/usr/bin/su - informix -c "cat ${EXPORTED_DIR}/proc.sql | $DBACCESS $DB; exit 0"
/usr/bin/su - informix -c "cat ${EXPORTED_DIR}/rest.sql | $DBACCESS $DB; exit 0"
turnonlogging $DB
else
$DBIMPORT -q $lockedDB -d $DBSPACE -l -i $EXPORT_PATH >$TEMP3 2>&1
fi

ok=`grep "dbimport completed" dbimport.out`

if [ "$ok" = "dbimport completed" ]


then
echo "dbimport completed with no errors."
else
echo "Check dbimport.out; problems in dbimport."
tail dbimport.out

Optimizing Large Database Imports Page 9 of 11


developerWorks® ibm.com/developerWorks/

exit 1;
fi

rename_db $lockedDB $DB

Import_db (Listing 5) is another major function of the script. It first checks the database name
parameter or argument passed in. If it is big_db, it uses the dbimport utility to process the tables
part of exported schema and then uses the UNIX pipe utility for compiling stored procedures,
indexes, and triggers. The purpose is to speed up the whole process and, as I mentioned before,
by doing it this way, we reduced the time used for import big_db by at least 50-60%. For big_db,
we also turn off logging before import and turn on logging after import to avoid "long transaction."
We only do this for big_db; for other databases, we believe we have plenty of logical logs and will
never run into "long transaction."

The rest of the script is to create a user-friendly and easy-to-use menu, which I don't describe in
this article, but you can see it for yourself in the downloadable script.

Conclusion
As you have seen, the heart of the script is the logic to split an exported schema into three parts
and to process each of them with different Informix dbimport and UNIX cat utilities. For us, this
method is more efficient than just using the Informix dbimport utility to import large databases; it
greatly speeds up the whole database import process and is better at using Informix and system
resources such as logical logs and memory. Since the script is composed of many functions, it is
very flexible and could easily be modified to meet your specific requirements. This script should
give you some practical and useful ideas and suggestions on how to import large databases more
efficiently.

Optimizing Large Database Imports Page 10 of 11


ibm.com/developerWorks/ developerWorks®

Downloadable resources
Description Name Size
new.sh ( HTTP | FTP ) 14KB

© Copyright IBM Corporation 2002


(www.ibm.com/legal/copytrade.shtml)
Trademarks
(www.ibm.com/developerworks/ibm/trademarks/)

Optimizing Large Database Imports Page 11 of 11

You might also like