Professional Documents
Culture Documents
Using A With Teradata Load Utilities
Using A With Teradata Load Utilities
Cisco uses Informatica for data extracts and loads. Informatica has the ability to load data
into Teradata databases, both by using an ODBC connection, and by building and
launching Teradata load utility scripts. The Teradata load utilities are designed to load
massive amounts of data in a short amount of time. Loading using ODBC should be
considered for very small tables only. This document discusses using three Teradata
utilities: fastload, multiload, and tpump, and is valid through Informatica version 7.1.2.
Fastload: Fastload inserts large volumes of data very rapidly into Teradata tables. It
can load one table from multiple input files. The biggest restriction with Fastload is that
the table being loaded must be empty. This is useful for initial loads, or loading tables
that are emptied prior to scheduled loads. But it can’t be used for incremental updates.
Fastload will not load duplicate rows into a table, even if the table is created as a multiset
table. Completely duplicate input rows don’t cause errors; they are simply dropped during
the load process. A table being fastloaded is not available to users for queries.
Multiload: Multiload supports insert, update, delete, and upsert operations for up to
five target tables. It can apply conditional logic to determine what updates to apply. Its
speed approaches that of Fastload. Multiload is limited to one input file. Tables being
multiloaded are available for select access only.
Tpump: Tpump is generally used for low volume maintenance of large tables,
and/or near realtime maintenance. It does row-at-a-time processing using SQL, and is
slower than Fastload and Multiload. A table being maintained by tpump is available for
other updates while at the same time the tpump is running against the table. Tpump does
not support multiple input files.
When deciding which load utility to select, you must consider the volume of data, the
frequency of the load, and what type of availability is needed for the table while it is
being loaded. All three utilities provide some level of restartability following errors.
The table on the next page compares the features of the three load utilities.
11/07/2005 Page 1
Using Informatica to Load Teradata at Cisco
Informatica/Teradata Connections
The load method for an Informatica mapping is set on the mapping tab of the session,
under TARGET. For Teradata load utilities, Writer is set to File Writer, Connection Type
is set to Loader, and Value is set to the name of the connection. Connections are set up
using the Connections tab in Workflow Designer.
11/07/2005 Page 2
Using Informatica to Load Teradata at Cisco
11/07/2005 Page 3
Using Informatica to Load Teradata at Cisco
When a loader connection has IS STAGED selected, Informatica will write output to a
flat file on the Informatica server. Data is sent to the target database only after
Informatica has completed creating the flat file. Informatica does not delete the flat file
after the loader has completed.
If a loader connection is not staged, Informatica will start sending data to the target
database using named pipes as soon as it has data to send. After job completion, there is
no flat file.
Multiload
Staged: If a job abends prior to the application phase, you can choose to restart the job,
or abandon the job. If it is restarted, it will pick up after the last checkpoint. To abandon
the job, execute a RELEASE MLOAD statement against the target table, and drop the
error and log tables. If the job has entered the application phase, you either have to restart
it, or drop the target table, recreate it, and restore the data from a backup.
Not Staged: If a job abends prior to the application phase, it can’t be restarted. Since
there isn’t an input file, there’s no way to guarantee that the input will match the original
input, and data corruption can occur. If the job abends in the application phase, it must be
restarted, or dropped and recreated.
Fastload
The same considerations apply regarding staged and not staged input. It’s usually easiest
to drop/recreate the table and start from the top.
Tpump
Staged: Restart the tpump job. It will use the error and log tables to determine where it
left off.
11/07/2005 Page 4
Using Informatica to Load Teradata at Cisco
Troubleshooting
When Informatica launches a Teradata load job, the session waits for a return code from
the load job. If a zero return code is received, the session will be reported as successful;
non-zero will result in a failure. But a successful load job doesn’t necessarily mean that
all rows were loaded successfully. Some or all of the rows may have been rejected and
sent to the error table. Or rows that were assumed to be inserts were actually updates due
to duplicate keys in the input data.
Following any load job, its log should be checked to determine the actual results of the
job. The log files are written to the …/TgtFiles directory, with an extension of ‘ldrlog’.
There are two areas to look to find the relevant information. The number of inserts,
updates, and deletes will be reported in the application section of the log. Entries in the
clean-up section will report the number of rows sent to the error table(s).
The error tables are created in the database specified in the Informatica connection. They
are dropped at the end of the job if they are empty, so the existence of an error table after
a load job indicates that at least one row was rejected. Look at the rows in the error table
to find the error code.
When a load job is running much more slowly than expected, it’s a good idea to check the
number of rows in the associated error tables. Rows are written one at a time into the
error table, as opposed to the much faster writes to the target tables. If all or most of the
rows are being rejected, the writes to the error tables will slow down the load job. If this
number is very high, you may want to abort the load job, fix the problem, then rerun it.
The most common causes of rows being rejected are not null violations resulting from
failed lookup transformations, or data conversion errors.
11/07/2005 Page 5