Professional Documents
Culture Documents
Teradata Utilities MultiLoad
Teradata Utilities MultiLoad
Table of Contents
Chapter 4: Multiload........................................................................................................................1
Why it is Called "Multi" Load..................................................................................................1
Two MultiLoad Modes: IMPORT and DELETE................................................................1
Block and Tackle Approach.............................................................................................2
MultiLoad Imposes Limits.................................................................................................3
Error Tables, Work Tables and Log Tables...........................................................................3
Supported Input Formats.......................................................................................................4
MultiLoad Has Five IMPORT Phases....................................................................................5
Phase 1: Preliminary Phase.............................................................................................5
Phase 2: DML Transaction Phase...................................................................................6
Phase 3: Acquisition Phase.............................................................................................6
Phase 4: Application Phase.............................................................................................7
Phase 5: Clean Up Phase................................................................................................7
MultiLoad Commands............................................................................................................8
Two Types of Commands................................................................................................8
Parameters for .BEGIN IMPORT MLOAD.............................................................................9
Parameters for .BEGIN DELETE MLOAD...........................................................................12
A Simple Multiload IMPORT Script......................................................................................12
Building our Multiload Script................................................................................................13
Executing Multiload..............................................................................................................14
Another Simple MultiLoad IMPORT Script...........................................................................15
MultiLoad IMPORT Script....................................................................................................18
Error Treatment Options for the .DML LABEL Command....................................................19
An IMPORT Script with Error Treatment Options................................................................21
A IMPORT Script that Uses Two Input Data Files...............................................................22
Redefining the INPUT..........................................................................................................24
A Script that Uses Redefining the Input...............................................................................24
DELETE MLOAD Script Using a Hard Coded Value...........................................................26
A DELETE MLOAD Script Using a Variable........................................................................27
An UPSERT Sample Script..................................................................................................28
What Happens when MultiLoad Finishes.............................................................................29
MultiLoad Statistics........................................................................................................29
Troubleshooting Multiload Errors.........................................................................................30
RESTARTing Multiload........................................................................................................31
RELEASE MLOAD: When You DON'T Want to Restart MultiLoad.....................................31
MultiLoad and INMODs........................................................................................................32
How Multiload Compares with FastLoad.............................................................................32
Chapter 4: Multiload
"In the end we'll remember not the sound of our enemies, but the silence of our
friends."
- Martin Luther King Jr.
This feature rich utility can perform multiple types of DML tasks, including INSERT, UPDATE,
DELETE and UPSERT on up to five (5) empty or populated target tables at a time. These DML
functions may be run either solo or in combinations, against one or more tables. For these reasons,
MultiLoad is the utility of choice when it comes to loading populated tables in the batch environment.
As the volume of data being loaded or updated in a single block, the performance of MultiLoad
improves. MultiLoad shines when it can impact more than one row in every data block. In other
words, MultiLoad looks at massive amounts of data and says, "Bring it on!"
Leo Tolstoy once said, "All happy families resemble each other." Like happy families, the Teradata
load utilities resemble each other, although they may have some differences. You are going to be
pleased to find that you do not have to learn all new commands and concepts for each load utility.
MultiLoad has many similarities to FastLoad. It has even more commands in common with TPump.
The similarities will be evident as you work with them. Where there are some quirky differences, we
will point them out for you.
The MultiLoad DELETE mode is used to perform a global (all AMP) delete on just one table. The
reason to use .BEGIN DELETE MLOAD is that it bypasses the Transient Journal (TJ) and can be
RESTARTed if an error causes it to terminate prior to finishing. When performing in DELETE mode,
the DELETE SQL statement cannot reference the Primary Index in the WHERE clause. This due to
the fact that a primary index access is to a specific AMP; this is a global operation.
The other factor that makes a DELETE mode operation so good is that it examines an entire block
of rows at a time. Once all the eligible rows have been removed, the block is written one time and a
checkpoint is written. So, if a restart is necessary, it simply starts deleting rows from the next block
without a checkpoint. This is a smart way to continue. Remember, when using the TJ all deleted
rows are put back into the table from the TJ as a rollback. A rollback can take longer to finish then
the delete. MultiLoad does not do a rollback; it does a restart.
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 2
In the above diagram, monthly data is being stored in a quarterly table. To keep the contents limited
to four months, monthly data is rotated in and out. At the end of every month, the oldest month of
data is removed and the new month is added. The cycle is "add a month, delete a month, add a
month, delete a month." In our illustration, that means that January data must be deleted to make
room for May's data.
Here is a question for you: What if there was another way to accomplish this same goal without
consuming all of these extra resources? To illustrate, let's consider the following scenario: Suppose
you have TableA that contains 12 billion rows. You want to delete a range of rows based on a date
and then load in fresh data to replace these rows. Normally, the process is to perform a MultiLoad
DELETE to DELETE FROM TableA WHERE <date-column> < '2002-02-01'. The final step would be
to INSERT the new rows for May using MultiLoad IMPORT.
Amazingly, MultiLoad has full RESTART capability in all of its five phases of operation. Once again,
this demonstrates its tremendous flexibility as a load utility. Is it pure magic? No, but it almost
seems so. MultiLoad makes effective use of two error tables to save different types of errors and a
LOGTABLE that stores built-in checkpoint information for restarting. This is why MultiLoad does not
use the Transient Journal, thus averting time-consuming rollbacks when a job halts prematurely.
Here is a key difference to note between MultiLoad and FastLoad. Sometimes an AMP (Access
Module Processor) fails and the system administrators say that the AMP is "down" or "offline."
When using FastLoad, you must restart the AMP to restart the job. MultiLoad, however, can
continue running when an AMP fails, if the table is fallback protected. As the same time, you can
use the AMPCHECK option to make it work like FastLoad if you want.
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 3
Rule #2: Referential Integrity is not supported. MultiLoad will not load data into tables that are
defined with Referential Integrity (RI). Like a USI, this requires the AMPs to communicate with each
other. So, RI constraints must be dropped from the target table prior to using MultiLoad.
Rule #3: Triggers are not supported at load time. Triggers cause actions on related tables based
upon what happens in a target table. Again, this is a multi-AMP operation and to a different table. To
keep MultiLoad running smoothly, disable all Triggers prior to using it.
Rule #4: No concatenation of input files is allowed. MultiLoad does not want you to do this
because it could impact are restart if the files were concatenated in a different sequence or data
was deleted between runs.
Rule #5: The host will not process aggregates, arithmetic functions or exponentiation. If you
need data conversions or math, you might be better off using an INMOD to prepare the data prior to
loading it.
HINT: Sometimes a company wants all of these load support tables to be housed in a particular
database. When these tables are to be stored in any database other than the user's own default
database, then you must give them a qualified name (<databasename>.<tablename>) in the script
or use the DATABASE command to change the current database.
Where will you find these tables in the load script? The Logtable is generally identified immediately
prior to the .LOGON command. Worktables and error tables can be named in the BEGIN MLOAD
statement. Do not underestimate the value of these tables. They are vital to the operation of
MultiLoad. Without them a MultiLoad job can not run. Now that you have had the "executive
summary", let's look at each type of table individually.
Two Error Tables: Here is another place where FastLoad and MultiLoad are similar. Both require
the use of two error tables per target table. MultiLoad will automatically create these tables. Rows
are inserted into these tables only when errors occur during the load process. The first error table is
the acquisition Error Table (ET). It contains all translation and constraint errors that may occur
while the data is being acquired from the source(s).
The second is the Uniqueness Violation (UV) table that stores rows with duplicate values for
Unique Primary Indexes (UPI). Since a UPI must be unique, MultiLoad can only load one
occurrence into a table. Any duplicate value will be stored in the UV error table. For example, you
might see a UPI error that shows a second employee number "99." In this case, if the name for
employee "99" is Kara Morgan, you will be glad that the row did not load since Kara Morgan is
already in the Employee table. However, if the name showed up as David Jackson, then you know
that further investigation is needed, because employee numbers must be unique.
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 4
• Identifies errors
You have the option to name these tables in the MultiLoad script (shown later). Alternatively, if you
do not name them, they default to ET_<target_table_name> and UV_<target_table_name>. In
either case, MultiLoad will not accept error table names that are the same as target table names. It
does not matter what you name them. It is recommended that you standardize on the naming
convention to make it easier for everyone on your team. For more details on how these error tables
can help you, see the subsection in this chapter titled, "Troubleshooting MultiLoad Errors."
Log Table: MultiLoad requires a LOGTABLE. This table keeps a record of the results from each
phase of the load so that MultiLoad knows the proper point from which to RESTART. There is one
LOGTABLE for each run. Since MultiLoad will not resubmit a command that has been run
previously, it will use the LOGTABLE to determine the last successfully completed step.
Work Table(s): MultiLoad will automatically create one worktable for each target table. This means
that in IMPORT mode you could have one or more worktables. In the DELETE mode, you will only
have one worktable since that mode only works on one target table. The purpose of worktables is to
hold two things:
The worktables are created in a database using PERM space. They can become very large. If the
script uses multiple SQL statements for a single data record, the data is sent to the AMP once for
each SQL statement. This replication guarantees fast performance and that no SQL statement will
ever be done more than once. So, this is very important. However, there is no such thing as a free
lunch, the cost is space. Later, you will see that using a FILLER field can help reduce this disk
space by not sending unneeded data to an AMP. In other words, the efficiency of the MultiLoad run
is in your hands.
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 5
BINARY Each record is a 2-byte integer, n, that is followed by n bytes of data. A byte is the
smallest means of storage of for Teradata.
FASTLOAD This format is the same as Binary, plus a marker (X '0A' or X '0D') that specifies the
end of the record.
TEXT Each record has a random number of bytes and is followed by an end of the record
marker.
UNFORMAT The format for these input records is defined in the LAYOUT statement of the
MultiLoad script using the components FIELD, FILLER and TABLE.
VARTEXT This is variable length text RECORD format separated by delimiters such as a
comma. For this format you may only use VARCHAR, LONG VARCHAR (IBM) or
VARBYTE data formats in your MultiLoad LAYOUT. Note that two delimiter
characters in a row will result in a null value between them.
Figure 5-1
Let's take a look at each phase and see what it contributes to the overall load process of this
magnificent utility. Should you memorize every detail about each phase? Probably not. But it is
important to know the essence of each phase because sometimes a load fails. When it does, you
need to know in which phase it broke down since the method for fixing the error to RESTART may
vary depending on the phase. And if you can picture what MultiLoad actually does in each phase,
you will likely write better scripts that run more efficiently.
Second, all MultiLoad sessions with Teradata need to be established. The default is the number of
available AMPs. Teradata will quickly establish this number as a factor of 16 for the basis regarding
the number of sessions to create. The general rule of thumb for the number of sessions to use for
smaller systems is the following: use the number of AMPs plus two more. For larger systems with
hundreds of AMP processors, the SESSIONS option is available to lower the default. Remember,
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 6
these sessions are running on your poor little computer as well as on Teradata.
Each session loads the data to Teradata across the network or channel. Every AMP plays an
essential role in the MultiLoad process. They receive the data blocks, hash each row and send the
rows to the correct AMP. When the rows come to an AMP, it stores them in worktable blocks on
disk. But, lest we get ahead of ourselves, suffice it to say that there is ample reason for multiple
sessions to be established.
What about the extra two sessions? Well, the first one is a control session to handle the SQL and
logging. The second is a back up or alternate for logging. You may have to use some trial and error
to find what works best on your system configuration. If you specify too few sessions it may impair
performance and increase the time it takes to complete load jobs. On the other hand, too many
sessions will reduce the resources available for other important database activities.
Third, the required support tables are created. They are the following:
Type of Table Table Details
ERRORTABLES MultiLoad requires two error tables per target table. The first error
table contains constraint violations, while the second error table
stores Unique Primary Index violations.
WORKTABLES Work Tables hold two things: the DML tasks requested and the
input data that is ready to APPLY to the AMPs.
LOGTABLE The LOGTABLE keeps a record of the results from each phase of
the load so that MultiLoad knows the proper point from which to
RESTART.
Figure 5-2
The final task of the Preliminary Phase is to apply utility locks to the target tables. Initially, access
locks are placed on all target tables, allowing other users to read or write to the table for the time
being. However, this lock does prevent the opportunity for a user to request an exclusive lock.
Although, these locks will still allow the MultiLoad user to drop the table, no one else may DROP or
ALTER a target table while it is locked for loading. This leads us to Phase 2.
Later, during the Acquisition phase the actual input data will also be stored in the worktable so that it
may be applied in Phase 4, the Application Phase. Next, a match tag is assigned to each DML
request that will match it with the appropriate rows of input data. The match tags will not actually be
used until the data has already been acquired and is about to be applied to the worktable. This is
somewhat like a student who receives a letter from the university in the summer that lists his
courses, professor's names, and classroom locations for the upcoming semester. The letter is a
"match tag" for the student to his school schedule, although it will not be used for several months.
This matching tag for SQL and data is the reason that the data is replicated for each SQL statement
using the same data record.
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 7
At this point, Teradata does not care about which AMP receives the data block. The blocks are
simply sent, one after the other, to the next AMP in line. For their part, each AMP begins to deal
with the blocks that they have been dealt. It is like a game of cards - you take the cards that you
have received and then play the game. You want to keep some and give some away.
Similarly, the AMPs will keep some data rows from the blocks and give some away. The AMP
hashes each row on the primary index and sends it over the BYNET to the proper AMP where it will
ultimately be used. But the row does not get inserted into its target table, just yet. The receiving
AMP must first do some preparation before that happens. Don't you have to get ready before
company arrives at your house? The AMP puts all of the hashed rows it has received from other
AMPs into the worktables where it assembles them into the SQL. Why? Because once the rows are
reblocked, they can be sorted into the proper order for storage in the target table. Now the utility
places a load lock on each target table in preparation for the Application Phase. Of course, there is
no Acquisition Phase when you perform a MultiLoad DELETE task, since no data is being acquired.
Figure 5-3
Remember, MultiLoad allows for the existence of NUSI processing during a load. Every
hash-sequence sorted block from Phase 3 and each block of the base table is read only once to
reduce I/O operations to gain speed. Then, all matching rows in the base block are inserted,
updated or deleted before the entire block is written back to disk, one time. This is why the match
tags are so important. Changes are made based upon corresponding data and DML (SQL) based
on the match tags. They guarantee that the correct operation is performed for the rows and blocks
with no duplicate operations, a block at a time. And each time a table block is written to disk
successfully, a record is inserted into the LOGTABLE. This permits MultiLoad to avoid starting again
from the very beginning if a RESTART is needed.
What happens when several tables are being updated simultaneously? In this case, all of the
updates are scripted as a multi-statement request. That means that Teradata views them as a
single transaction. If there is a failure at any point of the load process, MultiLoad will merely need to
be RESTARTed from the point where it failed. No rollback is required. Any errors will be written to
the proper error table.
Code (&SYSRC). MultiLoad believes the adage, "All is well that ends well." If the last error code is
zero (0), all of the job steps have ended successfully (i.e., all has certainly ended well). This being
the case, all empty error tables, worktables and the log table are dropped. All locks, both Teradata
and MultiLoad, are released. The statistics for the job are generated for output (SYSPRINT) and the
system count variables are set. After this, each MultiLoad session is logged off. So what happens if
the final error code is not zero? Stay tuned. Restarting MultiLoad is a topic that will be covered later
in this chapter.
MultiLoad Commands
Two Types of Commands
You may see two types of commands in MultiLoad scripts: tasks and support functions. MultiLoad
tasks are commands that are used by the MultiLoad utility for specific individual steps as it
processes a load. Support functions are those commands that involve the Teradata utility Support
Environment (covered in Chapter 9), are used to set parameters, or are helpful for monitoring a
load.
The chart below lists the key commands, their type, and what they do.
MLOAD
Command Type What does the MLOAD Command do?
Support This command communicates directly with Teradata
.BEGIN to specify if the MultiLoad mode is going to be
[IMPORT] IMPORT or DELETE. Note that the word IMPORT
MLOAD
is optional in the syntax because it is the
DEFAULT, but DELETE is required. We
.BEGIN recommend using the word IMPORT to make the
DELETE coding consistent and easier for others to read. Any
MLOAD parameters for the load, such as error limits or
checkpoints will be included under the .BEGIN
command, too. It is important to know which
commands or parameters are optional ince, if you
do not include them, MultiLoad may supply defaults
that may impact your load.
Task The DML LABEL defines treatment options and
.DML LABEL labels for the application (APPLY) of data for the
INSERT, UPDATE, UPSERT and DELETE
operations. A LABEL is simply a name for a
requested SQL activity. The LABEL is defined first,
and then referenced later in the APPLY clause.
Task This instructs MultiLoad to finish the APPLY
.END MLOAD operations with the changes to the designated
databases and tables.
Task This defines a column of the data source record that
.FIELD will be sent to the Teradata database via SQL.
When writing the script, you must include a FIELD
for each data field you need in SQL. This command
is used with the LAYOUT command.
Task Do not assume that MultiLoad has somehow
.FILLER uncovered much of what you used in your term
papers at the university! FILLER defines a field that
is accounted for as part of the data source's row
format, but is not sent to the Teradata DBS. It is
used with the LAYOUT command.
Task LAYOUT defines the format of the INPUT DATA
.LAYOUT record so Teradata knows what to expect. If one
record is not large enough, you can concatenate
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 9
Figure 5-4
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 10
REQUIRED
PARAMETER OR NOT WHAT IT DOES
Optional NONE specifies
AMPCHECK that MLOAD starts
{NONE|APPLY|ALL} even with one
down AMP per
cluster if all tables
are Fallback.
APPLY
(DEFAULT)
specifies MLOAD
will not start or
finish Phase 4 with
a down AMP.
processing of a
load job.
Optional Names the two
ERRORTABLES error tables, two
ET_ERR UV_ERR per target table.
Note there is no
comma separator.
Optional If you opt to use
NOTIFY NOTIFY for a any
{LOW|MEDIUM|HIGH|OFF event during a
load, you may
designate the
priority of that
notification:
MEDIUM for
important events,
HIGH for events at
operational
decision points,
and OFF to
eliminate any
notification at all
for a given phase.
Optional This refers to the
SESSIONS <MAX> <MIN> number of
SESSIONS that
should be
established with
Teradata. For
MultiLoad, the
optimal number of
sessions is the
number of AMPs in
the system, plus
two more.
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 12
Figure 5-5
Ask Not – What your Multiload can do for you. Ask what you can do for your Multiload. Multiload is a
great tool when you're short on time. Multiload can update, insert, delete or upsert on Teradata
tables that are already populated. It can even do all four in one script. Our flatfile will contain
Employee_numbers and Salaries * 2. We are giving a big raise. We're going to create a flat file to
use with Multiload, as shown below:
Remember, we'll still use the BTEQ utility to create our flat file.
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 13
Getting these scripts down is a very hard process, so don't be discouraged if you have a couple of
mistakes. The next two slides will show you a blank copy of the basic Multiload script, as well as a
marked slide illustrating the important parts of the script:
"If you don't know where you're going, any road will take you there."
- Lewis Carrol
Executing Multiload
You will feel like the King after executing your first Multiload script. Multiload is the Elvis Presley of
data warehousing because nobody knows how make more records then Multiload. If you have the
ambition to learn, this book will give you what it takes to steer through these utilities. We initialize
the Multiload utility like we do with BTEQ, except that the keyword with Multiload Is mload.
Remember that this Multiload is going to double the salaries of our employees.
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 15
"Those who dance are considered insane by those who cannot hear the music."
- George Carlin
MultiLoad can be somewhat intimidating to the new user because there are many commands and
phases. In reality, the load scripts are understandable when you think through what the IMPORT
mode does:
• Setting up a Logtable
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 16
This first script example is designed to show MultiLoad IMPORT in its simplest form. It depicts the
loading of a three-column Employee table. The actual script is in the left column and our comments
are on the right. Below the script is a step-by-step description of how this script works.
Step One: Setting up a Logtable and Logging onto Teradata — MultiLoad requires you specify a
log table right at the outset with the .LOGTABLE command. We have called it CDW_Log. Once you
name the Logtable, it will be automatically created for you. The Logtable may be placed in the same
database as the target table, or it may be placed in another database. Immediately after this you log
onto Teradata using the .LOGON command. The order of these two commands is interchangeable,
but it is recommended to define the Logtable first and then to Log on, second. If you reverse the
order, Teradata will give a warning message. Notice that the commands in MultiLoad require a dot
in front of the command key word.
Step Two: Identifying the Target, Work and Error tables — In this step of the script you must tell
Teradata which tables to use. To do this, you use the .BEGIN IMPORT MLOAD command. Then
you will preface the names of these tables with the sub-commands TABLES, WORKTABLES AND
ERROR TABLES. All you must do is name the tables and specify what database they are in. Work
tables and error tables are created automatically for you. Keep in mind that you get to name and
locate these tables. If you do not do this, Teradata might supply some defaults of its own!
At the same time, these names are optional. If the WORKTABLES and ERRORTABLES had not
specifically been named, the script would still execute and build these tables. They would have
been built in the default database for the user. The name of the worktable would be
WT_EMPLOYEE_DEPT1 and the two error tables would be called ET_EMPLOYEE_DEPT1 and
UV_EMPLOYEE_DEPT1, respectively.
Sometimes, large Teradata systems have a work database with a lot of extra PERM space. One
customer calls this database CORP_WORK. This is where all of the logtables and worktables are
normally created. You can use a DATABASE command to point all table creations to it or qualify the
names of these tables individually.
Step Three: Defining the INPUT flat file record structure — MultiLoad is going to need to know
the structure the INPUT flat file. Use the .LAYOUT command to name the layout. Then list the fields
and their data types used in your SQL as a .FIELD. Did you notice that an asterisk is placed
between the column name and its data type? This means to automatically calculate the next byte in
the record. It is used to designate the starting location for this data based on the previous fields
length. If you are listing fields in order and need to skip a few bytes in the record, you can either use
the .FILLER (like above) to position to the cursor to the next field, or the "*" on the Dept_No field
could have been replaced with the number 132 (CHAR(11)+CHAR(20)+CHAR(100)+1). Then, the
.FILLER is not needed. Also, if the input record fields are exactly the same as the table, the .TABLE
can be used to automatically define all the .FIELDS for you. The LAYOUT name will be referenced
later in the .IMPORT command. If the input file is created with INDICATORS, it is specified in the
LAYOUT.
Step Four: Defining the DML activities to occur — The .DML LABEL names and defines the
SQL that is to execute. It is like setting up executable code in a programming language, but using
SQL. In our example, MultiLoad is being told to INSERT a row into the SQL01.Employee_Dept
table. The VALUES come from the data in each FIELD because it is preceded by a colon (:). Are
you allowed to use multiple labels in a script? Sure! But remember this: Every label must be
referenced in an APPLY clause of the .IMPORT clause.
Step Five: Naming the INPUT file and its format type — This step is vital! Using the .IMPORT
command, we have identified the INFILE data as being contained in a file called
"CDW_Join_Export.txt". Then we list the FORMAT type as TEXT. Next, we referenced the
LAYOUT named FILEIN to describe the fields in the record. Finally, we told MultiLoad to APPLY the
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 17
DML LABEL called INSERTS — that is, to INSERT the data rows into the target table. This is still a
sub-component of the .IMPORT MLOAD command. If the script is to run on a mainframe, the
INFILE name is actually the name of a JCL Data Definition (DD) statement that contains the real
name of the file.
Notice that the .IMPORT goes on for 4 lines of information. This is possible because it continues
until it finds the semi-colon to define the end of the command. This is how it determines one
operation from another. Therefore, it is very important or it would have attempted to process the
END LOADING as part of the IMPORT — it wouldn't work.
Step Six: Finishing loading and logging off of Teradata — This is the closing ceremonies for the
load. MultiLoad to wrap things up, closes the curtains, and logs off of the Teradata system.
Important note: Since the script above in Figure 5-6 does not DROP any tables, it is
completely capable of being restarted if an error occurs. Compare this to the next
script in Figure 5-7. Do you think it is restartable? If you said no, pat yourself on the
back.
REQUIRED
PARAMETER OR NOT WHAT IT DOES
Required Names the Target table.
TABLES Tablename1
Optional Names the worktable one per target table.
WORKTABLES Tablename1
Optional Names the two error tables, two per target
ERRORTABLES table and there is no comma separator
ET_ERR UV_ERR between them.
Optional Tells MultiLoad how many hours to try
TENACITY establishing sessions when its initial effort
to do so is rebuffed.
Figure 5-6
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 18
Figure 5-7
Figure 5-8
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 20
Figure 5-9
In IMPORT mode, you may specify as many as five distinct error-treatment options for one
.DML statement. For example, if there is more than one instance of a row, do you want MultiLoad to
IGNORE the duplicate row, or to MARK it (list it) in an error table?
If you do not specify IGNORE, then MultiLoad will MARK, or record all of the errors. Imagine you
have a standard INSERT load that you know will end up recording about 20,000 duplicate row
errors. Using the following syntax "IGNORE DUPLICATE INSERT ROWS;" will keep them out of the
error table. By ignoring those errors, you gain three benefits:
3. MultiLoad runs much faster since it is not conducting a duplicate row check.
• The default is IGNORE MISSING UPDATE ROWS. Mark is the default for all operations.
When doing an UPSERT, you anticipate that some rows are missing, otherwise, why do an
UPSERT. So, this keeps these rows out of your error table.
• The DO INSERT FOR MISSING UPDATE ROWS is mandatory. This tells MultiLoad to
insert a row from the data source if that row does not exist in the target table because the
update didn't find it.
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 21
The table that follows shows you, in more detail, how flexible your options are:
ERROR TREATMENT OPTIONS IN DETAIL
.DML LABEL WHAT IT DOES
OPTION
MARK DUPLICATE This option logs an entry for all duplicate INSERT rows in the
INSERT ROWS UV_ERR table. Use this when you want to know about the
duplicates.
IGNORE DUPLICATE This tells MultiLoad to IGNORE duplicate INSERT rows
INSERT ROWS because you do not want to see them.
MARK DUPLICATE This logs the existence of every duplicate UPDATE row.
UPDATE ROWS
IGNORE DUPLICATE This eliminates the listing of duplicate update row errors.
UPDATE ROWS
MARK MISSING This option ensures a listing of data rows that had to be
UPDATE ROWS INSERTed since there was no row to UPDATE.
IGNORE MISSING This tells MultiLoad NOT to list UPDATE rows as an error.
UPDATE ROWS This is a good option when doing an UPSERT since
UPSERT will INSERT a new row.
MARK MISSING This option makes a note in the ET_Error Table that a row to
DELETE ROWS be deleted is missing.
IGNORE MISSING This option says, "Do not tell me that a row to be deleted is
DELETE ROWS missing.
DO INSERT for This is required to accomplish an UPSERT. It tells MultiLoad
MISSING UPDATE that if the row to be updated does not exist in the target table,
ROWS then INSERT the entire row from the data source.
Figure 5-10
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 22
Figure 5-11
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 23
.logon
TDATA/SQL01,SQL01;
Drops the worktables
DROP TABLE SQL01.EMP_WT; and error tables, in
DROP TABLE SQL01.DEPT_WT; case they existed
DROP TABLE SQL01.EMP_ET;
DROP TABLE SQL01.EMP_UV; from a prior load;
DROP TABLE SQL01.DEPT_ET; NOTE: Do NOT
DROP TABLE SQL01.DEPT_UV; include IF you want to
RESTART using
CHECKPOINT.
Identifies the 2 target
/* the following defines 2 tables for loading */ tables with a comma
.BEGIN IMPORT MLOAD
between them.
TABLES
SQL01.Employee_Table, Names the worktable
SQL01.Department_Table and error tables for
WORKTABLES each target table;
SQL01.EMP_WT,
SQL01.DEPT_WT
ERRORTABLES Note there are NO
SQL01.EMP_ET commas between the
SQL01.EMP_UV, pair of names, but
SQL01.DEPT_ET there is a comma
SQL01.DEPT_UV; between this pair and
the next pair.
Names and Defines
/* these next 2 LAYOUTs define 2 different records */ the LAYOUT of the 1st
.LAYOUT FILEIN1; INPUT file
.FIELD Emp_No * INTEGER;
.FIELD LName * CHAR(20);
.FIELD FName * VARCHAR(20);
.FIELD Sal * DECIMAL (10,2);
.FIELD Dept_Num * INTEGER;
Names and Defines
.LAYOUT FILEIN2; the LAYOUT of the
.FIELD DeptNo * CHAR(6); 2nd INPUT file
.FIELD DeptName * CHAR(20);
Names the 1st DML
.DML LABEL EMP_INS Label; Tells MultiLoad
IGNORE DUPLICATE INSERT ROWS; to IGNORE duplicate
INSERT INTO SQL01.Employee_Table
VALUES (:Emp_No INSERT rows because
,:FName you do not want to
,:LName see them.
,:Sal
,:Dept_Num); INSERT a row into the
table, but does NOT
name the columns. So
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 24
Figure 5-12
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 25
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 26
Figure 5-13
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 27
.LOGTABLE RemoveLog; Identifies the Logtable and logs onto Teradata with a valid
.LOGON TDATA/SQL01,SQL01; logon string.
Begins MultiLoad in DELETE mode and Names the target
.BEGIN DELETE MLOAD table.
TABLES Order_Table;
SQL DELETE statement does a massive delete of order
DELETE FROM Order_Table data for orders placed prior to the hard coded date in the
WHERE Order_Date < '99/12/31'; WHERE clause. Notice that this is not the Primary Index.
You CANNOT DELETE in DELETE MLOAD mode based
upon the Primary Index.
Ends loading and logs off of Teradata.
.END MLOAD;
LOGOFF;
Figure 5-14
How many differences from a MultiLoad IMPORT script readily jump off of the page at you? Here
are a few that we saw:
• At the beginning, you must specify the word "DELETE" in the .BEGIN MLOAD command.
You need not specify it in the .END MLOAD command.
• You will readily notice that this mode has no .DML LABEL command. Since it is focused on
just one absolute function, no APPLY clause is required so you see no .DML LABEL.
• Notice that the DELETE with a WHERE clause is an SQL function, not a MultiLoad
command, so it has no dot prefix.
• Since default names are available for worktables (WT_<target_tablename>) and error tables
(ET_<target_tablename> and UV_<target_tablename>), they need not be specifically
named, but be sure to define the Logtable.
Do not confuse the DELETE MLOAD task with the SQL delete task that may be part of a MultiLoad
IMPORT. The IMPORT delete is used to remove small volumes of data rows based upon the
Primary Index. On the other hand, the MultiLoad DELETE does global deletes on tables, bypassing
the Transient Journal. Because there is no Transient Journal, there are no rollbacks when the job
fails for any reason. Instead, it may be RESTARTed from a CHECKPOINT. Also, the MultiLoad
DELETE task is never based upon the Primary Index.
Because we are not importing any data rows, there is neither a need for worktables nor an
Acquisition Phase. One DELETE statement is sent to all the AMPs with a match tag parcel. That
statement will be applied to every table row. If the condition is met, then the row is deleted. Using
the match tags, each target block is read once and the appropriate rows are deleted.
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 28
.LOGTABLE RemoveLog; Identifies the Logtable and logs onto Teradata with a valid
.LOGON TDATA/SQL01,SQL01; logon string.
Begins the DELETE task and names only one table, but still
.BEGIN DELETE MLOAD uses TABLES option.
TABLES Order_Table;
Names the LAYOUT and defines the column whose value will
.LAYOUT OldMonth be passed as a single row to MultiLoad. In this case, all of the
.FIELD OrdDate * DATE; order dates in the Order_Table will be tested against this
OrdDate value.
The condition in the WHERE clause is that the data rows with
DELETE FROM Order_Table orders placed prior to the date value (:OrdDate) passed from
WHERE Order_Date < :OrdDate; the LAYOUT OldMonth will be DELETEd from the
Order_Table.
Note that this time there is no dot in front of LAYOUT in this
.IMPORT INFILE clause since it is only being referenced.
LAYOUT OldMonth ;
Ends loading and logs off of Teradata.
.END MLOAD;
.LOGOFF;
Figure 5-15
Figure 5-16
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 30
Figure 5-17
For the most part, the Acquisition error table logs errors that occur during that processing phase.
The Application error table lists Unique Primary Index violations, field overflow errors on non-PI
columns, and constraint errors that occur in the APPLY phase. MultiLoad error tables not only list
the errors they encounter, they also have the capability to STORE those errors. Do you remember
the MARK and IGNORE parameters? This is where they come into play. MARK will ensure that the
error rows, along with some details about the errors are stored in the error table. IGNORE does
neither; it is as if the error never occurred.
Figure 5-19
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 31
Figure 5-20
RESTARTing Multiload
Who hasn't experienced a failure at some time when attempting a load? Don't take it personally!
Failures can and do occur on the host or Teradata (DBC) for many reasons. MultiLoad has the
impressive ability to RESTART from failures in either environment. In fact, it requires almost no
effort to continue or resubmit the load job. Here are the factors that determine how it works:
First, MultiLoad will check the Restart Logtable and automatically resume the load process from the
last successful CHECKPOINT before the failure occurred. Remember, the Logtable is essential for
restarts. MultiLoad uses neither the Transient Journal nor rollbacks during a failure. That is why you
must designate a Logtable at the beginning of your script. MultiLoad either restarts by itself or waits
for the user to resubmit the job. Then MultiLoad takes over right where it left off.
Second, suppose Teradata experiences a reset while MultiLoad is running. In this case, the host
program will restart MultiLoad after Teradata is back up and running. You do not have to do a thing!
Third, if a host mainframe or network client fails during a MultiLoad, or the job is aborted, you may
simply resubmit the script without changing a thing. MultiLoad will find out where it stopped and start
again from that very spot.
Fourth, if MultiLoad halts during the Application Phase it must be resubmitted and allowed to run
until complete.
Fifth, during the Acquisition Phase the CHECKPOINT (n) you stipulated in the .BEGIN MLOAD
clause will be enacted. The results are stored in the Logtable. During the Application Phase,
CHECKPOINTs are logged each time a data block is successfully written to its target table.
HINT: The default number for CHECKPOINT is 15 minutes, but if you specify the CHECKPOINT as
60 or less, minutes are assumed. If you specify the checkpoint at 61 or above, the number of
records is assumed.
Before V2R3: In the earlier days of Teradata it was NOT possible to use RELEASE MLOAD if one
of the following three conditions was true:
• In IMPORT mode, once MultiLoad had reached the end of the Acquisition Phase you could
not use RELEASE MLOAD. This is sometimes referred to as the "point of no return."
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 32
• In DELETE mode, the point of no return was when Teradata received the DELETE
statement.
• If the job halted in the Apply Phase, you will have to RESTART the job.
With and since V2R3: The advent of V2R3 brought new possibilities with regard to using the
RELEASE MLOAD command. It can NOW be used in the APPLY Phase, if:
You should be very cautious using the RELEASE command. It could potentially leave your table half
updated. Therefore, it is handy for a test environment, but please don't become too reliant on it for
production runs. They should be allowed to finish to guarantee data integrity.
You will find a more detailed discussion on how to write INMODs for MultiLoad in the chapter of this
book titled, "INMOD Processing".
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited
Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 33
Figure 5-21
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited