You are on page 1of 5

Using Informatica to Load Teradata at Cisco

Using Informatica With Teradata Load Utilities

Cisco uses Informatica for data extracts and loads. Informatica has the ability to load data
into Teradata databases, both by using an ODBC connection, and by building and
launching Teradata load utility scripts. The Teradata load utilities are designed to load
massive amounts of data in a short amount of time. Loading using ODBC should be
considered for very small tables only. This document discusses using three Teradata
utilities: fastload, multiload, and tpump, and is valid through Informatica version 7.1.2.

Fastload: Fastload inserts large volumes of data very rapidly into Teradata tables. It
can load one table from multiple input files. The biggest restriction with Fastload is that
the table being loaded must be empty. This is useful for initial loads, or loading tables
that are emptied prior to scheduled loads. But it can’t be used for incremental updates.
Fastload will not load duplicate rows into a table, even if the table is created as a multiset
table. Completely duplicate input rows don’t cause errors; they are simply dropped during
the load process. A table being fastloaded is not available to users for queries.

Multiload: Multiload supports insert, update, delete, and upsert operations for up to
five target tables. It can apply conditional logic to determine what updates to apply. Its
speed approaches that of Fastload. Multiload is limited to one input file. Tables being
multiloaded are available for select access only.

Tpump: Tpump is generally used for low volume maintenance of large tables,
and/or near realtime maintenance. It does row-at-a-time processing using SQL, and is
slower than Fastload and Multiload. A table being maintained by tpump is available for
other updates while at the same time the tpump is running against the table. Tpump does
not support multiple input files.

When deciding which load utility to select, you must consider the volume of data, the
frequency of the load, and what type of availability is needed for the table while it is
being loaded. All three utilities provide some level of restartability following errors.

The table on the next page compares the features of the three load utilities.

11/07/2005 Page 1
Using Informatica to Load Teradata at Cisco

Feature Fastload Multiload Tpump


DDL Functions Limited All All
DML Functions Insert Ins/Upd/Del Ins/Upd/Del
Multiple DML No Yes Yes
Multiple Tables No Yes Yes
Multiple Yes Yes Yes
Sessions
Protocol Used FASTLOAD MULTILOAD SQL
Conditional No Yes Yes
Expressions
Arithmetic No Yes No
Calculations
Data 1 per column Yes Yes
Conversion
Error Files Yes Yes Yes
Error Limits Yes Yes Yes
User-written Yes Yes Yes
Routines

Informatica/Teradata Connections

The load method for an Informatica mapping is set on the mapping tab of the session,
under TARGET. For Teradata load utilities, Writer is set to File Writer, Connection Type
is set to Loader, and Value is set to the name of the connection. Connections are set up
using the Connections tab in Workflow Designer.

Attribute Description Fastload Multiload Tpump


TDPID Teradata server varies – td0 varies – td0 varies – td0
for POC for POC for POC
Database Database containing the target varies varies varies
Name table.
Date Format Leave blank, assuming the N/A blank N/A
value loaded into a date
column in a target is a
date/time type in Informatica.
Error Limit Max # of rows that can be 0 0 0
rejected before the job is
aborted. (0 = no limit)
Checkpoint # of rows (>= 60) or minutes 0 0 not staged, 0
(1-59) between checkpoints. If >=10,000
IS STAGED is selected, select staged
a reasonable # of records or

11/07/2005 Page 2
Using Informatica to Load Teradata at Cisco

Attribute Description Fastload Multiload Tpump


amount of time based on the
size of the output file. If the
connection is not staged, this
should be set to 0 (no
checkpoints).
Tenacity # of hours the job will keep 4 4 4
trying to logon the required
sessions.
Load Mode Insert, Update, Delete, Upsert, N/A Upsert Upsert
or Data Driven. Data driven
uses the property set in the
update strategy transformation
in the mapping.
Drop Error Specifies whether or not to No No No
Tables drop the error tables prior to
starting the loader.
External Name of the loader executable. fastload mload tpump
Loader
Executable
Max Sessions Default to one per AMP 80 for POC 80 for POC 10 for POC
Sleep # of minutes between logon 6 6 6
tries.
Packing # of statements to pack into a N/A N/A 1
Factor multi-statement request. Max
is 600, default is 20.
Statement Maximum rate at which N/A N/A blank
Rate statements are sent to Teradata
per minute. Unlimited if not
specified.
Serialize If set, actions to a given row N/A N/A On
are executed in order.
Robust If off, simple restart logic is N/A N/A Off
used (restart after last
checkpoint).
No Monitor If set, prevents Tpump from N/A N/A On
checking for statement rate
changes to send to the monitor.
Truncate If set, all rows in target table Off Off Off
Target Table are deleted prior to the load job
starting.
Is Staged Data is written to a flat file Off Off Off
before the load job starts.
Error Database where error tables Varies Varies Varies
Database will be created. (dw_errlog) (dw_errlog) (dw_errlog)
Work Table Database where work tables N/A Varies N/A

11/07/2005 Page 3
Using Informatica to Load Teradata at Cisco

Attribute Description Fastload Multiload Tpump


Database will be created. (dw_errlog)
Log Table Database where log table will N/A Varies N/A
Database be created. (dw_errlog)

Staged vs. Not Staged

When a loader connection has IS STAGED selected, Informatica will write output to a
flat file on the Informatica server. Data is sent to the target database only after
Informatica has completed creating the flat file. Informatica does not delete the flat file
after the loader has completed.

If a loader connection is not staged, Informatica will start sending data to the target
database using named pipes as soon as it has data to send. After job completion, there is
no flat file.

Source disk space requirements and restartability requirements need to be considered


when choosing which option to use.

Restarting Load Jobs

Multiload

Staged: If a job abends prior to the application phase, you can choose to restart the job,
or abandon the job. If it is restarted, it will pick up after the last checkpoint. To abandon
the job, execute a RELEASE MLOAD statement against the target table, and drop the
error and log tables. If the job has entered the application phase, you either have to restart
it, or drop the target table, recreate it, and restore the data from a backup.

Not Staged: If a job abends prior to the application phase, it can’t be restarted. Since
there isn’t an input file, there’s no way to guarantee that the input will match the original
input, and data corruption can occur. If the job abends in the application phase, it must be
restarted, or dropped and recreated.

Fastload

The same considerations apply regarding staged and not staged input. It’s usually easiest
to drop/recreate the table and start from the top.

Tpump

Staged: Restart the tpump job. It will use the error and log tables to determine where it
left off.

Not Staged: The job can’t be restarted.

11/07/2005 Page 4
Using Informatica to Load Teradata at Cisco

Troubleshooting

When Informatica launches a Teradata load job, the session waits for a return code from
the load job. If a zero return code is received, the session will be reported as successful;
non-zero will result in a failure. But a successful load job doesn’t necessarily mean that
all rows were loaded successfully. Some or all of the rows may have been rejected and
sent to the error table. Or rows that were assumed to be inserts were actually updates due
to duplicate keys in the input data.

Following any load job, its log should be checked to determine the actual results of the
job. The log files are written to the …/TgtFiles directory, with an extension of ‘ldrlog’.
There are two areas to look to find the relevant information. The number of inserts,
updates, and deletes will be reported in the application section of the log. Entries in the
clean-up section will report the number of rows sent to the error table(s).

The error tables are created in the database specified in the Informatica connection. They
are dropped at the end of the job if they are empty, so the existence of an error table after
a load job indicates that at least one row was rejected. Look at the rows in the error table
to find the error code.

When a load job is running much more slowly than expected, it’s a good idea to check the
number of rows in the associated error tables. Rows are written one at a time into the
error table, as opposed to the much faster writes to the target tables. If all or most of the
rows are being rejected, the writes to the error tables will slow down the load job. If this
number is very high, you may want to abort the load job, fix the problem, then rerun it.
The most common causes of rows being rejected are not null violations resulting from
failed lookup transformations, or data conversion errors.

11/07/2005 Page 5

You might also like