You are on page 1of 128

1

Teradata Utilities
Breaking the Barriers
First Edition, October 2002
Written by Tom Coffing, Michael J. Larkins, Randy Volters, Morgan
Jones, Steve Wilmes

Web Page: www.CoffingDW.com


E-Mail address:
Tom.Coffing@CoffingDW.Com

Published by

Teradata, NCR , and BYNET are registered trademarks of NCR Corporation, Dayton, Ohio, U.S.A.,
IBM and DB2 are registered trademarks of IBM Corporation, ANSI is a registered trademark of
the American National Standards Institute. In addition to these products names, all brands and
product names in this document are registered names or trademarks of their respective holders.

Coffing Data Warehousing shall have neither liability nor responsibility to any person or entity with
respect to any loss or damages arising from the information contained in this book or from the use of
programs or program segments that are included. The manual is not a publication of NCR
Corporation, nor was it produced in conjunction with NCR Corporation.

Copyright 2002 by Coffing Publishing

All rights reserved. No part of this book shall be reproduced, stored in a retrieval system, or
transmitted by any means, electronic, mechanical, photocopying, recording, or otherwise, without
written permission from the publisher. No patent liability is assumed with respect to the use of
information contained herein. Although every precaution has been taken in the preparation of this
book, the publisher and author assume no responsibility for errors or omissions, neither is any
liability assumed for damages resulting from the use of information contained herein. For
information, address:

Coffing Publishing
7810 Kiester Rd.
Middletown, OH 45042

International Standard Book Number: ISBN 0-9704980-7-1


2
Acknowledgements and Special Thanks
I dedicate this book to my parents, Tom and Sandy who have spent much of their life being great
people and who have worked hard to raise great people.

Tom Coffing

I dedicate this book to my parents, Steve and Joanne, and my grandmother who inspired me to
write and publish my first book.

Steve Wilmes

We are all grateful to God for the knowledge to complete this book, the perseverance to see it
through, the dedication of from all the team members and the drive to see it through to completion.
Most of all, we have Tom Coffing to thank for his tireless leadership and coordination of all the
resources involved in this effort.

Mike Larkins

I dedicate this book to my wife Anna, the love of my life.

Randy Volters

I dedicate this book to my wife, Janie and my children, Kara and David who are always an inspiration
to me.

Morgan Jones
3

About the Author Tom Coffing


Tom is President, CEO, and Founder of Coffing Data Warehousing. He is an internationally known
consultant, facilitator, speaker, trainer, and executive coach with an extensive background in data
warehousing. Tom has helped implement data warehousing in over 40 major data warehouse
accounts, spoken in over 20 countries, and has provided consulting and Teradata training to over
8,000 individuals involved in data warehousing globally.

Tom has co-authored the following eight books on Data Warehousing:


• Secrets of the Best Data Warehouses in the World
• Teradata SQL — Unleash the Power
• Tera-Tom on Teradata Basics
• Tera-Tom on Teradata E-business
• Teradata SQL Quick Reference Guide — Simplicity by Design
• Teradata Database Design — Giving Detailed Data Flight
• Teradata Users Guide — The Ultimate Companion
• Teradata Utilities — Breaking the Barriers

Mr. Coffing has also published over 20 data warehousing articles and has been a contributing
columnist to DM Review on the subject of data warehousing. He wrote a monthly column for DM
Review entitled, “Teradata Territory”. He is a nationally known speaker and gives frequent seminars
on Data Warehousing. He is also known as “The Speech Doctor” because of his presentation skills
and sales seminars.

Tom Coffing has taken his expert speaking and data warehouse knowledge and revolutionalized the
way technical training and consultant services are delivered. He founded CoffingDW with the same
philosophy more than a decade ago. Centered around 10 Teradata Certified Masters this dynamic
and growing company teaches every Teradata classes, provides world class Teradata consultants,
offers a suite of software products to enhance Teradata data warehouses, and has eight books
published on Teradata.

Tom has a bachelor’s degree in Speech Communications and over 25 years of business and technical
computer experience. Tom is considered by many to be the best technical and business speaker in
the United States. He has trained and consulted at so many Teradata sites that students
affectionately call him Tera-Tom.

Teradata Certified Master


• Teradata Certified Professional • Teradata Certified Designer

• Teradata Certified Administrator • Teradata Certified SQL Specialist

• Teradata Certified Developer • Teradata Certified Implementation Specialist


4

An Introduction to the Teradata Utilities


“It’s not the data load that breaks us down, it’s the way you carry it.”

– Tom Coffing

Teradata has been doing data transfers to and from the largest data warehouses in the world for
close to two decades. While other databases have allowed the loads to break them down, Teradata
has continued to set the standards and break new barriers. The brilliance behind the Teradata load
utilities is in their power and flexibility. With five great utilities Teradata allows you to pick the utility
for the task at hand. This book is dedicated to explaining these utilities in a complete and easy
manner. This book has been written by Five Teradata Certified Masters with experience at over 125
Teradata sites worldwide. Let our experience be your guide.

The intent of this book is to twofold. The first is to help you write and use the various utilities. A
large part of this is taken up with showing the commands and their functionality. In addition, it is
showing examples using the various utility commands and SQL in conjunction with each other that
you will come to appreciate.

The second intention is to help you know which utility to use under a variety of conditions. You will
learn that some of the utilities use very large blocks to transfer the data either to or from the
Teradata Relational Database Management System (RDBMS). From this perspective, they provide a
high degree of efficiency using a communications path of either the mainframe channel or network.

The other approach to transferring data rows either to or from the Teradata RDBMS is a single row at
a time. The following sections provide a high level introduction to the capabilities and considerations
for both approaches. You can use this information to help decide which utilities are appropriate for
your specific need.

Considerations for using Block at a Time Utilities

As mentioned above, there are efficiencies associated with using large blocks of data when
transferring data between computers. So, the logic might indicate that it is always the best
approach. However, there is never one best approach.

You will learn that efficiency comes at the price of other database capabilities. For instance, when
using large blocks to transfer and incorporate data into Teradata the following are not allowed:
• Secondary indices
• Triggers
• Referential integrity
• More than 15 concurrent utilities running at the same time

Therefore, it is important to understand when and where these considerations are present. So, as
important as it is to know the language of the utility and database, it is also important to understand
when to use the appropriate utility. The capabilities and considerations are covered in conjunction
with the commands.
5

Considerations for using Row at a Time Utilities

The opposite of sending a large block of rows at the same time is sending a single row at a time. The
primary difference in these approaches is speed. It is always faster to send multiple rows in one
operation instead of one row.

If it is slower, why would anyone ever use this approach?

The reason is that it provides more flexibility with fewer considerations. By this, we mean that the
row at a time utilities allow the following:
• Secondary indices
• Triggers
• Referential integrity
• More than 15 concurrent utilities running at the same time

As you can see, they allow all the things that the block utilities do not. With that in mind and for
more information, continue reading about the individual utilities and open up a new world of
capabilities in working with the Teradata RDBMS. Welcome to the world of the Teradata Utilities.
6

An Introduction to BTEQ
“It’s not the data load that breaks us down, it’s the way you carry it.”

– Tom Coffing

Why it is called BTEQ?

Why is BTEQ available on every Teradata system ever built? Because the Batch TEradata Query
(BTEQ) tool was the original way that SQL was submitted to Teradata as a means of getting an
answer set in a desired format. This is the utility that I used for training at Wal*Mart, AT&T, Anthem
Blue Cross and Blue Shield, and SouthWestern Bell back in the early 1990’s. BTEQ is often referred
to as the Basic TEradata Query and is still used today and continues to be an effective tool.

Here is what is excellent about BTEQ:


• BTEQ can be used to submit SQL in either a batch or interactive environment. Interactive
users can submit SQL and receive an answer set on the screen. Users can also submit BTEQ
jobs from batch scripts, have error checking and conditional logic, and allow for the work to
be done in the background.
• BTEQ outputs a report format, where Queryman outputs data in a format more like a
spreadsheet. This allows BTEQ a great deal of flexibility in formatting data, creating headings,
and utilizing Teradata extensions, such as WITH and WITH BY that Queryman has problems in
handling.
• BTEQ is often used to submit SQL, but is also an excellent tool for importing and exporting
data.
° Importing Data: Data can be read from a file on either a mainframe or LAN
attached computer and used for substitution directly into any Teradata SQL using
the INSERT, UPDATE or DELETE statements.
° Exporting Data: Data can be written to either a mainframe or LAN attached
computer using a SELECT from Teradata. You can also pick the format you desire
ranging from data files to printed reports to Excel formats.

There are other utilities that are faster than BTEQ for importing or exporting data. We will talk about
these in future chapters, but BTEQ is still used for smaller jobs.

Logging on to BTEQ

Before you can use BTEQ, you must have user access rights to the client system and privileges to
the Teradata DBS. Normal system access privileges include a userid and a password. Some systems
may also require additional user identification codes depending on company standards and
operational procedures. Depending on the configuration of your Teradata DBS, you may need to
include an account identifier (acctid) and/or a Teradata Director Program Identifier (TDPID).

Using BTEQ to Submit Queries


Submitting SQL in BTEQ’s Interactive Mode

Once you logon to Teradata through BTEQ, you are ready to run your queries. Teradata knows the
SQL is finished when it finds a semi-colon, so don’t forget to put one at the end of your query. Below
is an example of a Teradata table to demonstrate BTEQ operations.
7
Employee_Table

Employee_No Last_Name First_Name Salary Dept_No

2000000 Jones Squiggy 32800.50 ?


1256349 Harrison Herbert 54500.00 400
1333454 Smith John 48000.00 200
1121334 Strickling Cletus 54500.00 400

Figure 2-1

BTEQ execution

.LOGON cdw/sql01; Type at command


prompt: Logon with
TDPID and USERNAME.
Password: XXXXX Then enter PASSWORD at
the second prompt.
Enter your BTEQ/SQL Request or BTEQ Command. BTEQ will respond and is
waiting for a command.
SELECT * FROM Employee_Table An SQL Statement
WHERE Dept_No = 400;
*** Query Completed. 2 rows found. 5 Columns returned. BTEQ displays
*** Total elapsed time was 1 second. information about the
answer set.

The result set

Employee_No Last_name First_name Salary Dept_No

1256349 Harrison Herbert 54500.00 400


1121334 Strickling Cletus 54500.00 400

Figure 2-2

Submitting SQL in BTEQ’s Batch Mode

On network-attached systems, BTEQ can also run in batch mode under UNIX (IBM AIX, Hewlett-
Packard HP-UX, NCR MP-RAS, Sun Solaris), DOS, Macintosh, Microsoft Windows and OS/2 operating
systems. To submit a job in Batch mode do the following:

1. Invoke BTEQ
2. Type in the input file name
3. Type in the location and output file name.

The following example shows how to invoke BTEQ from a DOS command. In order for this to work,
the directory called Program Files\NCR\Teradata Client\bin must be established in the search path.

C:/> BTEQ < BatchScript.txt > Output.txt BTEQ is invoked and takes
instructions from a file
called BatchScript.txt. The
output file is called
8
Output.txt.

Figure 2-3

Notice that the BTEQ command is immediately followed by the ‘<BatchScript.txt’ to tell BTEQ which
file contains the commands to execute. Then, the ‘>Output.txt’ names the file where the output
messages are written. Here is an example of the contents of BatchScript.txt file.

BatchScript.txt File

.LOGON CDW/sql00,whynot Logon statement onto


Teradata. Notice the “,” before
the password. This is in batch
script format.
SELECT * FROM Employee_Table The actual SQL
WHERE Dept_No = 400;
.LOGOFF Logging off of Teradata

Figure 2-4

The above illustration shows how BTEQ can be manually invoked from a command prompt and
displays how to specify the name and location of the batch script file to be executed.

The previous examples show that when logging onto BTEQ in interactive mode, the user actually
types in a logon string and then Teradata will prompt for a password. However, in batch mode,
Teradata requires both a logon and password to be directly stored as part of the script.

Since putting this sensitive information into a script is scary for security reasons, inserting the
password directly into a script that is to be processed in batch mode may not be a good idea. It is
generally recommended and a common practice to store the logon and password in a separate file
that that can be secured. That way, it is not in the script for anyone to see.

For example, the contents of a file called “mylogon.txt” might be:

.LOGON cdw/sql00,whynot.

Then, the script should contain the following command instead of a .LOGON, as shown below and
again in the following script: .RUN FILE=mylogon.txt

This command opens and reads the file. It then executes every record in the file.

Using BTEQ Conditional Logic


Below is a BTEQ batch script example. The initial steps of the script will establish the logon, the database, and
the delete all the rows from the Employee_Table. If the table does not exist, the BTEQ conditional logic will
instruct Teradata to create it. However, if the table already exists, then Teradata will move forward and insert
data.

Note: In script examples, the left panel contains BTEQ base commands and the right panel provides a brief
description of each command.

.RUN FILE = mylogon.txt Logon to Teradata

DATABASE SQL_Class; Make the default database


SQL_Class
9
DELETE FROM Employee_Table; Deletes all the records from the
Employee_Table.
.IF ERRORCODE = 0 THEN .GOTO INSEMPS BTEQ conditional logic that will
/* ERRORCODE is a reserved word that contains the outcome status for check to ensure that the delete
every SQL statement executed in BTEQ. A zero (0) indicates that worked or if the table even existed.
statement worked. */
If the table did not exist, then BTEQ
will create it. If the table does exist,
the Create table step will be skipped
and directly GOTO INSEMPS.

.LABEL INSEMPS The Label INSEMPS provides code


so the BTEQ Logic can go directly to
inserting records into the
INSERT INTO Employee_Table (1232578, ‘Chambers’
Employee_Table.
,’Mandee’, 48850.00, 100);
INSERT INTO Employee_Table (1256349, ‘Harrison’
,’Herbert’, 54500.00, 400);

.QUIT

Figure 2-5

Using BTEQ to Export Data


BTEQ allows data to be exported directly from Teradata to a file on a mainframe or network-attached
computer. In addition, the BTEQ export function has several export formats that a user can choose
depending on the desired output. Generally, users will export data to a flat file format that is
composed of a variety of characteristics. These characteristics include: field mode, indicator mode, or
dif mode. Below is an expanded explanation of the different mode options.

Format of the EXPORT command:

.EXPORT <mode> {FILE | DDNAME } = <filename> [, LIMIT=n]

Record Mode: (also called DATA mode): This is set by .EXPORT DATA. This will bring data back as
a flat file. Each parcel will contain a complete record. Since it is not a report, there are no headers or
white space between the data contained in each column and the data is written to the file (e.g., disk
drive file) in native format. For example, this means that INTEGER data is written as a 4-byte binary
field. Therefore, it cannot be read and understood using a normal text editor.

Field Mode (also called REPORT mode): This is set by .EXPORT REPORT. This is the default mode
for BTEQ and brings the data back as if it was a standard SQL SELECT statement. The output of this
BTEQ export would return the column headers for the fields, white space, expanded packed or binary
data (for humans to read) and can be understood using a text editor.
10
Indicator Mode: This is set by .EXPORT INDICDATA. This mode writes the data in data mode, but
also provides host operating systems with the means of recognizing missing or unknown data (NULL)
fields. This is important if the data is to be loaded into another Relational Database System
(RDBMS).

The issue is that there is no standard character defined to represent either a numeric or character
NULL. So, every system uses a zero for a numeric NULL and a space or blank for a character NULL. If
this data is simply loaded into another RDBMS, it is no longer a NULL, but a zero or space.

To remedy this situation, INDICATA puts a bitmap at the front of every record written to the disk.
This bitmap contains one bit per field/column. When a Teradata column contains a NULL, the bit for
that field is turned on by setting it to a “1”. Likewise, if the data is not NULL, the bit remains a zero.
Therefore, the loading utility reads these bits as indicators of NULL data and identifies the column(s)
as NULL when data is loaded back into the table, where appropriate.

Since both DATA and INDICDATA store each column on disk in native format with known lengths and
characteristics, they are the fastest method of transferring data. However, it becomes imperative
that you be consistent. When it is exported as DATA, it must be imported as DATA and the same is
true for INDICDATA.

Again, this internal processing is automatic and potentially important. Yet, on a network-attached
system, being consistent is our only responsibility. However, on a mainframe system, you must
account for these bits when defining the LRECL in the Job Control Language (JCL). Otherwise, your
length is too short and the job will end with an error.

To determine the correct length, the following information is important. As mentioned earlier, one bit
is needed per field output onto disk. However, computers allocate data in bytes, not bits. Therefore,
if one bit is needed a minimum of eight (8 bits per byte) are allocated. Therefore, for every eight
fields, the LRECL becomes 1 byte longer and must be added. In other words, for nine columns
selected, 2 bytes are added even though only nine bits are needed.

With this being stated, there is one indicator bit per field selected. INDICDATA mode gives the Host
computer the ability to allocate bits in the form of a byte. Therefore, if one bit is required by the host
system, INDICDATA mode will automatically allocate eight of them. This means that from one to
eight columns being referenced in the SELECT will add one byte to the length of the record. When
selecting nine to sixteen columns, the output record will be two bytes longer.

When executing on non-mainframe systems, the record length is automatically maintained.


However, when exporting to a mainframe, the JCL (LRECL) must account for this addition length.

DIF Mode: Known as Data Interchange Format, which allows users to export data from Teradata to
be directly utilized for spreadsheet applications like Excel, FoxPro and Lotus.

The optional limit is to tell BTEQ to stop returning rows after a specific number (n) of rows. This
might be handy in a test environment to stop BTEQ before the end of transferring rows to the file.

BTEQ EXPORT Example Using Record (DATA) Mode

The following is an example that displays how to utilize the export Record (DATA) option. Notice the
periods (.) at the beginning some of script lines. A period starting a line indicates a BTEQ command.
If there is no period, then the command is an SQL command.

When doing an export on a Mainframe or a network-attached (e.g., LAN) computer, there is one
primary difference in the .EXPORT command. The difference is the following:

Mainframe syntax: .EXPORT DATA DDNAME = data definition statement name(JCL)


LAN syntax: .EXPORT DATA FILE = actual file name
11
The following example uses a Record (DATA) Mode format. The output of the exported data will
be a flat file.

Employee_Table
Employee_No Last_Name First_Name Salary Dept_No

2000000 Jones Squiggy 32800.50 ?


1256349 Harrison Herbert 54500.00 400
1333454 Smith John 48000.00 200
1121334 Strickling Cletus 54500.00 400

.LOGON CDW/sql01,whynot; Logon to TERADATA


.EXPORT DATA FILE = C:\EMPS.TXT This Export statement will be in
SELECT * FROM SQL_Class.Employee_Table; record (DATA) mode. The
EMPS.TXT file will be created as
a flat file
.QUIT Finish the execution.

Figure 2-6

BTEQ EXPORT Example Using Field (Report) Mode

The following is an example that displays how to utilize the export Field (Report) option. Notice the
periods (.) at the beginning some of script lines. A period starting a line indicates a BTEQ command
and needs no semi-colon. Likewise, if there is no period, then the command is an SQL command and
requires a semi-colon.

.LOGON CDW/sql01,whynot; Logon to TERADATA

DATABASE SQL_Class; This Export statement will be in


.EXPORT REPORT FILE = C:\EMPS.TXT field (REPORT) mode. The
EMPS.TXT file will be created as
SELECT * FROM Employee_Table;
a report.

.IF ERRORCODE > 0 THEN .GOTO Done BTEQ checks to ensure no


SELECT * FROM Department_Table; errors occurred and selects
more rows – else GOTO
Done.

.EXPORT RESET Reverse previous export


.LABEL Done command and fall through to
Done.
.QUIT

Figure 2-7

After this script has completed, the following report will be generated on disk.

Employee_No Last_name First_name Salary Dept_No

2000000 Jones Squiggy 32800.5 ?


1256349 Harrison Herbert 54500.00 400
1333454 Smith John 48000.00 200
12
1121334 Strickling Cletus 54500.00 400
1324657 Coffing Billy 41888.88 200
2341218 Reilly William 36000.00 400
1232578 Chambers Mandee 56177.50 100
1000234 Smythe Richard 64300.00 10
2312225 Larkins Loraine 40200.00 300

I remember when my mom and dad purchased my first Lego set. I was so excited about building my
first space station that I ripped the box open, and proceeded to follow the instructions to complete
the station. However, when I was done, I was not satisfied with the design and decided to make
changes. So I built another space ship and constructed another launching station. BTEQ export
works in the same manner, as the basic EXPORT knowledge is acquired, the more we can build on
that foundation.

With that being said, the following is an example that displays a more robust example of utilizing the
Field (Report) option. This example will export data in Field (Report) Mode format. The output of
the exported data will appear like a standard output of a SQL SELECT statement. In addition, aliases
and a title have been added to the script.

.LOGON CDW/sql01,whynot; Logon to TERADATA


.SET WIDTH 90 Set the format
.SET FORMAT ON parameters for the
final report
.SET HEADING ‘Employee Profiles’

.EXPORT REPORT FILE = C:\EMP_REPORT.TXT This Export


statement will be in
field (REPORT)
mode. The
EMP_REPORT.TXT
file will be created as
a report.
Specifies the
columns that are
being selected.
Notice that the
columns have an
alias.

.EXPORT RESET Reverse previous


.QUIT export command
effects.

Figure 2-8

After this script has been completed, the following report will be generated on disk.

Employee Profiles

Employee Last Name First Name Salary Department Number


Number

2000000 Jones Squiggy 32800.50 ?


13
1256349 Harrison Herbert 54500.00 400
1333454 Smith John 48000.00 200
1121334 Strickling Cletus 54500.00 400
1324657 Coffing Billy 41888.88 200
2341218 Reilly William 36000.00 400
1232578 Chambers Mandee 56177.50 100
1000234 Smythe Richard 64300.00 10
2312225 Larkins Loraine 40200.00 300

From this example, a number of BTEQ commands were added to the export script. Below is a review
of those commands.
• The WIDTH specifies the width of screen displays and printed reports, based on characters
per line.
• The FORMAT command allows the ability to enable/inhibit the page-oriented format option.
• The HEADING command specifies a header that will appear at the top every page of a report.

BTEQ IMPORT Example

BTEQ can also read a file from the hard disk and incorporate the data into SQL to modify the
contents of one or more tables. In order to do this processing, the name and record description of
the file must be known ahead of time. These will be defined within the script file.

Format of the IMPORT command:

.IMPORT { FILE | DNAME } = <filename> [,SKIP=n]

The script below introduces the IMPORT command with the Record (DATA) option. Notice the
periods (.) at the beginning some of script lines. A period starting a line indicates a BTEQ command.
If there is no period, then the command is an SQL command.

The SKIP option is used when you wish to bypass the first records in a file. For example, a
mainframe tape may have header records that should not be processed. Other times, maybe the job
started and loaded a few rows into the table with a UPI defined. Loading them again will cause an
error. So, you can skip over them using this option.

The following example will use a Record (DATA) Mode format. The input of the imported data will
populate the Employee_Table.

.SESSIONS 4 Specify the number of


SESSIONS to establish
with Teradata

.LOGON CDW/sql01,whynot; Logon to TERADATA

.IMPORT DATA FILE = C:\EMPS.TXT, SKIP = 2 Specify DATA mode,


name the file to read
“EMPS.TXT”, but skip
the first 2 records.

.QUIET ON Limit messages out.


.REPEAT * Loop in this script until
end of records in file.
14
The USING Specifies
the field in the input
file and names them.

Specify the insert


parameters for the
employee_table

Substitutes data from


the fields into the SQL
command.

.QUIT Exit the script after all


data read and rows
inserted.

Figure 2-9

From the above example, a number of BTEQ commands were added to the import script. Below is a
review of those commands.
• .QUIET ON limits BTEQ output to reporting only errors and request processing statistics.
Note: Be careful how you spell .QUIET, else forgetting the E becomes .QUIT and it will.
• .REPEAT * causes BTEQ to read a specified number of records or until EOF. The default is one
record. Using REPEAT 10 would perform the loop 10 times.
• The USING defines the input data fields and their associated data types coming from the
host.

The following builds upon the IMPORT Record (DATA) example above. The example below will still
utilize the Record (DATA) Mode format. However, this script will add a CREATE TABLE statement. In
addition, the imported data will populate the newly created Employee_Profile table.

.SESSIONS 2 Specify the number of


SESSIONS to establish
with Teradata

.LOGON CSW/sql101.whynot Logon to TERADATA

DATABASE SQL_Class; Make the default database


SQL_Class
15
This statement will create
the Employee_Profile
table.

.IMPORT INDICDATA FILE = C:\IND-EMPS.TXT This import statement


specifies INDICDATA
mode. The input file is
from a LAN file called
IND-EMPS.TXT.

.QUIET ON Quiet on limits the output


to reporting only errors
and processing statistics.

.REPEAT 120 This causes BTEQ to read


the first 120 records from
the file.

The USING Specifies the


parameters of the input
file.

Specify the insert


parameters for the
employee_profile.
16
Substitute the values to
be inputted into the SQL
command.

.LOGOFF
.QUIT

Figure 2-10

Notice that some of the scripts have a .LOGOFF and .QUIT. The .LOGOFF is optional because when
BTEQ quits, the session is terminated. A logoff makes it a friendly departure and also allows you to
logon with a different user name and password.

Determining Out Record Lengths

Some hosts, such as IBM mainframes, require the correct LRECL (Logical Record Length) parameter
in the JCL, and will abort if the value is incorrect. The following page will discuss how to figure out
the record lengths.

There are three issues involving record lengths and they are:
• Fixed columns
• Variable columns
• NULL indicators

Fixed Length Columns: For fixed length columns you merely count the length of the column. The
lengths are:

INTEGER 4 bytes
SMALLINT 2 bytes
BYTEINT 1 byte
CHAR(10) 10 bytes
CHAR(4) 4 bytes
DATE 4 bytes
DECIMAL(7,2) 4 bytes (packed data, total digits / 2 +1 )
DECIMAL(12,2) 8 bytes

Variable columns: Variable length columns should be calculated as the maximum value plus two.
This two bytes is for the number of bytes for the binary length of the field. In reality you can save
much space because trailing blanks are not kept. The logical record will assume the maximum and
add two bytes as a length field per column.

VARCHAR(8) 10 Bytes
17
VARCHAR(10) 12 Bytes

Indicator columns: As explained earlier, the indicators utilize a single bit for each field. If your record
has 8 fields (which require 8 bits), then you add one extra byte to the total length of all the fields. If
your record has 9-16 fields, then add two bytes.

BTEQ Return Codes

Return codes are two-digit values that BTEQ returns to the user after completing each job or task.
The value of the return code indicates the completion status of the job or task as follows:

Return Code Description

00 Job completed with no errors.


02 User alert to log on to the Teradata DBS.
04 Warning error.
08 User error.
12 Severe internal error.

You can over-ride the standard error codes at the time you terminate BTEQ. This might be handy for
debug purposes. The error code or “return code” can be any number you specify using one of the
following:

Override Code Description

.QUIT 15
.EXIT 15

BTEQ Commands

The BTEQ commands in Teradata are designed for flexibility. These commands are not used directly
on the data inside the tables. However, these 60 different BTEQ commands are utilized in four areas.
• Session Control Commands
• File Control Commands
• Sequence Control Commands
• Format Control Commands

Session Control Commands


ABORT Abort any and all active running requests and transactions,
but do not exit BTEQ.

DEFAULTS Reset all BTEQ Format command options to their defaults.


This will utilize the default configurations.

EXIT Immediately end the current session or sessions and exit


BTEQ.

HALT EXECUTION Abort any and all active running requests and transactions
and EXIT BTEQ.

LOGOFF End the current session or sessions, but do not exit BTEQ.

LOGON Starts a BTEQ Session. Every user, application, or utility


18
must LOGON to Teradata to establish a session.

QUIT End the current session or sessions and exit BTEQ.

SECURITY Specifies the security level of messages between a


network-attached system and the Teradata Database.

SESSIONS Specifies the number of sessions to use with the next


LOGON command.

SESSION CHARSET Specifies the name of a character set for the current
session or sessions.

SESSION SQLFLAG Specifies a disposition of warnings issued in response to


violations of ANSI syntax. The SQL will still run, but a
warning message will be provided. The four settings are
FULL, INTERMEDIATE, ENTRY, and NONE.

SESSION Specifies whether transaction boundaries are determined


TRANSACTION by Teradata SQL or ANSI SQL semantics.

SHOW CONTROLS Displays all of the BTEQ control command options


currently configured.

SHOW VERSIONS Displays the BTEQ software release versions.

TDP Used to specify the correct Teradata server for logons for a
particular session.

Figure 2-11

File Control Commands

These BTEQ commands are used to specify the formatting parameters of incoming and outgoing
information. This includes identifying sources and determining I/O streams.

CMS Execute a VM CMS command inside the BTEQ environment.

ERROROUT Write error messages to a specific output file.

EXPORT Open a file with a specific format to transfer information directly


from the Teradata database.

HALT EXECUTION Abort any and all active running requests and transactions and
EXIT BTEQ.

FORMAT Enable/inhibit the page-oriented format command options.

IMPORT Open a file with a specific format to import information into


Teradata.

INDICDATA One of multiple data mode options for data selected from
Teradata. The modes are INDICDATA, FIELD, or RECORD
MODE.

OS Execute an MS-DOS, PC-DOS, or UNIX command from inside


BTEQ.

QUIET Limit BTEQ output displays to all error messages and request
processing statistics.

REPEAT Submit the next request a certain amount of times


19
RUN Execute Teradata SQL requests and BTEQ commands directly
from a specified run file.

TSO Execute an MVS TSO command from inside the BTEQ


environment.

Figure 2-12

Sequence Control Commands

These commands control the sequence in which Teradata commands operate.

ABORT Abort any active transactions and requests.

ERRORLEVEL Assign severity levels to particular error numbers.

EXIT End the current session or sessions and exit BTEQ.

GOTO Skip all intervening commands and resume after branching


forward to the specified label.

HANG Pause BTEQ processing for a specific amount of time.

IF…THEN Test a stated condition, and then resume processing based on the
test results.

LABEL The GOTO command will always GO directly TO a particular line of


code based on a label.

MAXERROR Specifies a maximum allowable error severity level.

QUIT End the current session or sessions and exit BTEQ.

REMARK Place a comment on the standard output stream.

REPEAT Submit the next request a certain amount of times.

Figure 2-13

Format Control Commands

These commands control the formatting for Teradata and present the data in a report mode to the
screen or printer.

DEFAULTS Reset all BTEQ Format command options to their defaults. This will
utilize the default configurations.

ECHOREQ Enable the Echo required function in BTEQ returning a copy of


each Teradata SQL request and BTEQ command to the standard
output stream.

EXPORT Open a file with a specific format to transfer information directly


from the Teradata database.

FOLDLINE Split or fold each line of a report into multiple lines.

FOOTING Specify a footer to appear at the bottom of every report page.

FORMAT Enable/inhibit the page-oriented format command options.

IMPORT Open a file with a specific format to transfer or IMPORT


20
information directly to Teradata.

INDICDATA One of multiple data mode options for data selected from
Teradata. The modes are INDICDATA, FIELD, or RECORD MODE.

NULL Specifies a character or string of characters to represent null


values returned from Teradata.

OMIT Omit specific columns from a report.

PAGEBREAK Ejects a page whenever a specified column changes values.

PAGELENGTH Specifies the page length of printed reports based on lines per
page.

QUIET Limit BTEQ output displays to all error messages and request
processing statistics.

RECORDMODE One of multiple data mode options for data selected from
Teradata. (INDICDATA, FIELD, or RECORD).

RETCANCEL Cancel a request when the specified value of the RETLIMIT


command option is exceeded.

RETLIMIT Specifies the maximum number of rows to be displayed or


written from a Teradata SQL request.

RETRY Retry requests that fail under specific error conditions.

RTITLE Specify a header appearing at the top of all pages of a report.

SEPARATOR Specifies a character string or specific width of blank characters


separating columns of a report.

SHOWCONTROLS Displays all of the BTEQ control command options currently


configured.

SIDETITLES Place titles to the left or side of the report instead of on top.

SKIPLINE Inserts blank lines in a report when the value of a column changes
specified values.

SUPPRESS Replace each and every consecutively repeated value with


completely-blank character strings.

TITLEDASHES Display dash characters before each report line summarized by a


WITH clause.

UNDERLINE Display a row of dash characters when the specified column


changes values.

WIDTH Specifies the width of screen displays and printed reports, based on
characters per line.

Figure 2-14

An Introduction to FastExport

Why it is called “FAST”Export

FastExport is known for its lightning speed when it comes to exporting vast amounts of data from
Teradata and transferring the data into flat files on either a mainframe or network-attached
computer. In addition, FastExport has the ability to except OUTMOD routines, which provides the
21
user the capability to write, select, validate, and preprocess the exported data. Part of this speed
is achieved because FastExport takes full advantage of Teradata’s parallelism.

In this book, we have already discovered how BTEQ can be utilized to export data from Teradata in a
variety of formats. As the demand increases to store data, the ever-growing requirement for tools to
export massive amounts of data.

This is the reason why FastExport (FEXP) is brilliant by design. A good rule of thumb is that if you
have more than half a million rows of data to export to either a flat file format or with NULL
indicators, then FastExport is the best choice to accomplish this task.

Keep in mind that FastExport is designed as a one-way utility — that is, the sole purpose of
FastExport is to move data out of Teradata. It does this by harnessing the parallelism that Teradata
provides.

FastExport is extremely attractive for exporting data because it takes full advantage of multiple
sessions, which leverages Teradata parallelism. FastExport can also export from multiple tables
during a single operation. In addition, FastExport utilizes the Support Environment, which provides a
job restart capability from a checkpoint if an error occurs during the process of executing an export
job.

How FastExport Works

When FastExport is invoked, the utility logs onto the Teradata database and retrieves the rows that
are specified in the SELECT statement and puts them into SPOOL. From there, it must build blocks to
send back to the client. In comparison, BTEQ starts sending rows immediately for storage into a file.

If the output data is sorted, FastExport may be required to redistribute the selected data two times
across the AMP processors in order to build the blocks in the correct sequence. Remember, a lot of
rows fit into a 64K block and both the rows and the blocks must be sequenced. While all of this
redistribution is occurring, BTEQ continues to send rows. FastExport is getting behind in the
processing. However, when FastExport starts sending the rows back a block at a time, it quickly
overtakes and passes BTEQ’s row at time processing.

The other advantage is that if BTEQ terminates abnormally, all of your rows (which are in SPOOL)
are discarded. You must rerun the BTEQ script from the beginning. However, if FastExport
terminates abnormally, all the selected rows are in worktables and it can continue sending them
where it left off. Pretty smart and very fast!

Also, if there is a requirement to manipulate the data before storing it on the computer’s hard drive,
an OUTMOD routine can be written to modify the result set after it is sent back to the client on either
the mainframe or LAN. Just like the BASF commercial states, “We don’t make the products you buy,
we make the products you buy better”. FastExport is designed off the same premise, it does not
make the SQL SELECT statement faster, but it does take the SQL SELECT statement and processes
the request with lighting fast parallel processing!

FastExport Fundamentals

#1: FastExport EXPORTS data from Teradata. The reason they call it FastExport is because it
takes data off of Teradata (Exports Data). FastExport does not import data into Teradata.
Additionally, like BTEQ it can output multiple files in a single run.

#2: FastExport only supports the SELECT statement. The only DML statement that FastExport
understands is SELECT. You SELECT the data you want exported and FastExport will take care of the
rest.

#3: Choose FastExport over BTEQ when Exporting Data of more than half a million+ rows.
When a large amount of data is being exported, FastExport is recommended over BTEQ Export. The
22
only drawback is the total number of FastLoads, FastExports, and MultiLoads that can run at the
same time, which is limited to 15. BTEQ Export does not have this restriction. Of course, FastExport
will work with less data, but the speed may not be much faster than BTEQ.

#4: FastExport supports multiple SELECT statements and multiple tables in a single run.
You can have multiple SELECT statements with FastExport and each SELECT can join information up
to 64 tables.

#5: FastExport supports conditional logic, conditional expressions, arithmetic calculations,


and data conversions. FastExport is flexible and supports the above conditions, calculations, and
conversions.

#6: FastExport does NOT support error files or error limits. FastExport does not record
particular error types in a table. The FastExport utility will terminate after a certain number of errors
have been encountered.

#7: FastExport supports user-written routines INMODs and OUTMODs. FastExport allows you
write INMOD and OUTMOD routines so you can select, validate and preprocess the exported data

FastExport Supported Operating Systems

The FastExport utility is supported on either the mainframe or on LAN. The information below
illustrates which operating systems are supported for each environment:

The LAN environment supports the following Operating Systems:


• UNIX MP-RAS
• Windows 2000
• Windows 95
• Windows NT
• UNIX HP-UX
• AIX
• Solaris SPARC
• Solaris Intel

The Mainframe (Channel Attached) environment supports the following Operating Systems:

• MVS
• VM

Maximum of 15 Loads

The Teradata RDBMS will only support a maximum of 15 simultaneous FastLoad, MultiLoad, or
FastExport utility jobs. This maximum value is determined and configured by the DBS Control record.
This value can be set from 0 to 15. When Teradata is initially installed, this value is set at 5.

The reason for this limitation is that FastLoad, MultiLoad, and FastExport all use large blocks to
transfer data. If more then 15 simultaneous jobs were supported, a saturation point could be
reached on the availability of resources. In this case, Teradata does an excellent job of protecting
system resources by queuing up additional FastLoad, MultiLoad, and FastExport jobs that are
attempting to connect.

For example, if the maximum numbers of utilities on the Teradata system is reached and another job
attempts to run that job does not start. This limitation should be viewed as a safety control feature.
23
A tip for remembering how the load limit applies is this, “If the name of the load utility contains
either the word “Fast” or the word “Load”, then there can be only a total of fifteen of them running
at any one time”.

BTEQ does not have this load limitation. FastExport is clearly the better choice when exporting data.
However, if two many load jobs are running. BTEQ is an alternate choice for exporting data.

FastExport Support and Task Commands

FastExport accepts both FastExport commands and a subset of SQL statements. The FastExport
commands can be broken down into support and task activities. The table below highlights the key
FastExport commands and their definitions. These commands provide flexibility and control during
the export process.

Support Environment Commands


ACCEPT Allows the value of utility variables to be accepted directly from a
file or from environmental variables.

DATEFORM Specifies the style of the DATE data types for FastExport.

DISPLAY Writes messages to the specific location.

ELSE Used in conjunction with the IF statement. ELSE commands and


statements will execute when a proceeding IF condition is false.

ENDIF Used in conjunction with the IF or ELSE statements. Delimits the


commands that were subject to previous IF or ELSE conditions.

IF Introduces a conditional expression. If true then execution of


subsequent commands will happen.

LOGOFF Disconnects all FastExport active sessions and terminates


FastExport.

LOGON LOGON command or string used to connect sessions established


through the FastExport utility.

LOGTABLE FastExport utilizes this to specify a restart log table. The purpose
is for FastExport checkpoint information.

ROUTE MESSAGES Will route FastExport messages to an alternate destination.

RUN FILE Used to point to a file that FastExport is to use as standard input.
This will Invoke the specified external file as the current source of
utility and Teradata SQL commands.

SET Assigns a data type and value to a variable.

SYSTEM Suspends the FastExport utility temporarily and executes any valid
local operating system command before returning.

Figure 3-1

Task Commands
BEGIN EXPORT Begins the export task and sets the specifications for the
number of sessions with Teradata.

END EXPORT Ends the export task and initiates processing by Teradata.

EXPORT Provides two things which are:. The client destination and file
format specifications for the export data retrieved from
24
Teradata. A generated MultiLoad script file that can be used
later to reload the export data back into Teradata

FIELD Constitutes a field in the input record section that provides


data values for the SELECT statement.

FILLER Specifies a field in the input record that will not be sent to
Teradata for processing. It is part of the input record to
provide data values for the SELECT statement.

IMPORT Defines the file that provides the USING data values for the
SELECT.

LAYOUT Specifies the data layout for a file. It contains a sequence of


FIELD and FILLER commands. This is used to describe the
import file that can optionally provide data values for the
SELECT.

Figure 3-2

FastExport Supported SQL Commands

FastExport accepts the following Teradata SQL statements. Each has been placed in alphabetic order
for your convenience.

SQL Commands

ALTER TABLE Change a column or table options of a table.

CHECKPOINT Add a checkpoint entry in the journal table.

COLLECT STATISTICS Collect statistics for one or more columns or indexes in a


table.

COMMENT Store or retrieve a comment string for a particular object.

CREATE DATABASE Creates a new database.

CREATE TABLE Creates a new table.

CREATE VIEW Creates a new view.

CREATE MACRO Creates a new macro.

DATABASE Specify a default database for the session.

DELETE Delete rows from a table.

DELETE DATABASE Removes all tables, views, macros, and stored procedures
from a database.

DROP DATABASE Drops a database.

GIVE Transfer ownership of a database or user to another user.

GRANT Grant access privileges to an object.

MODIFY DATABASE Change the options for a database.

RENAME Change the name of a table, view, or macro.

REPLACE MACRO Change a macro.

REPLACE VIEW Change a view.


25
REVOKE Revoke privileges to an object.

SET SESSION Override the collation specification during the current


COLLATION session.

UPDATE Change a column value of an existing row or rows in a


table.

Figure 3-3

A FastExport in its Simplest Form

The hobby of racecar driving can be extremely frustrating, challenging, and rewarding all at the
same time. I always remember my driving instructor coaching me during a practice session in a new
car around a road course racetrack. He said to me, “Before you can learn to run, you need to learn
how to walk.” This same philosophy can be applied when working with FastExport. If FastExport is
broken into steps, then several things that appear to be complicated are really very simple. With this
being stated, FastExport can be broken into the following steps:
• Logging onto Teradata
• Retrieves the rows you specify in your SELECT statement
• Exports the data to the specified file or OUTMOD routine
• Logs off of Teradata

/* Created by CoffingDW */
/* Setup the Fast Export Parameters */
LOGTABLE sql01.SWA_Log; Creates the logtable -Required
.LOGON CDW/sql01,whynot; Logon to Teradata
BEGIN EXPORT SESSIONS 12; Begin the Export and set the
number of sessions on
Teradata
.EXPORT OUTFILE Student.txt Defines the output file name.
MODE RECORD FORMAT TEXT; In addition, specifies the
output mode and format (LAN
– ONLY)
The SELECT defines the column
used to create the export file.
NOTE: The selected columns
for the export are being
converted to character types.
This will simplify the importing
process into a different
database.

/* Finish the Export Job and Write to File */ End the Export and logoff
.END EXPORT; Teradata.
.LOGOFF;

Figure 3-4
26
Sample FastExport Script

Now that the first steps have been taken to understand FastExport, the next step is to journey
forward and review another example that shows builds upon what we have learned. In the script
below, Teradata comment lines have been placed inside the script [/*. . . . */]. In addition,
FastExport and SQL commands are written in upper case in order to highlight them. Another note is
that the column names are listed vertically. The recommendation is to place the comma separator in
front of the following column. Coding this way makes reading or debugging the script easier to
accomplish.

/* ---------------------------------------------------------------*/ ALWAYS GOOD TO IDENTIFY


/* @(#) FASTEXPORT SCRIPT */ THE SCRIPT AND AUTHOR IN
/* @(#) Version 1.1 */ COMMENTS
/* @(#) Created by CoffingDW */
/* ---------------------------------------------------------------*/
/* Setup the Fast Export Parameters */ CREATE LOGTABLE AND
LOGON;
.LOGTABLE SQL01.CDW_Log;
.LOGON CDW/SQL01,whynot;
.BEGIN EXPORT BEGIN EXPORT STATEMENT.
SESSIONS 12; SESSIONS 12;
.EXPORT OUTFILE Join_Export.txt DEFINES THE OUTPUT FILE
MODE RECORD FORMAT TEXT; NAME. IN ADDITION,
SPECIFIES THE OUTPUT MODE
AND FORMAT(LAN – ONLY)
MODE RECORD FORMAT
TEXT;
THE SELECT PULLS DATA
FROM TWO TABLES. IT IS
GOOD TO QUALILY WHEN
DOING A TWO-TABLE JOIN.

/* Finish the Export Job and Write to File */ END THE JOB AND LOGOFF
.END EXPORT; TERADATA;
.LOGOFF;

Figure 3-5

FastExport Modes and Formats

FastExport Modes

FastExport has two modes: RECORD or INDICATOR. In the mainframe world, only use RECORD
mode. In the UNIX or LAN environment, RECORD mode is the default, but you can use INDICATOR
mode if desired. The difference between the two modes is INDICATOR mode will set the indicator
bits to 1 for column values containing NULLS.

Both modes return data in a client internal format with variable-length records. Each individual
record has a value for all of the columns specified by the SELECT statement. All variable-length
columns are preceded by a two-byte control value indicating the length of the column data. NULL
27
columns have a value that is appropriate for the column data type. Remember, INDICATOR mode
will set bit flags that identify the columns that have a null value.

FastExport Formats

FastExport has many possible formats in the UNIX or LAN environment. The FORMAT statement
specifies the format for each record being exported which are:
• FASTLOAD
• BINARY
• TEXT
• UNFORMAT

The default FORMAT is FASTLOAD in a UNIX or LAN environment.

FASTLOAD Format is a two-byte integer, followed by the data, followed by an end-of-record marker.
It is called FASTLOAD because the data is exported in a format ready for FASTLOAD.

BINARY Format is a two-byte integer, followed by data.

TEXT is an arbitrary number of bytes followed by an end-of-record marker.

UNFORMAT is exported as it is received from CLIv2 without any client modifications.

A FastExport Script Using Binary Mode

/* --------------------------------------------------------------*/ COMMENTS
/* @(#) FASTEXPORT SCRIPT */
/* @(#) Version 1.1 */
/* @(#) Created by CoffingDW */
/* --------------------------------------------------------------*/
/* Setup the Fast Export Parameters */ CREATE LOGTABLE AND
LOGON TO TERADATA
.LOGTABLE SQL01.SWA_LOG;
.LOGON CDW/Sql101,whynot;

.BEGIN EXPORT BEGIN EXPORT STATEMENT;


SESSIONS 12;
.EXPORT OUTFILE CDW_Export.txt NAME THE OUTPUT FILE AND
MODE RECORD FORMAT TEXT; SET THE FORMAT TO BINARY;
THE SELECT PULLS DATA
FROM TWO TABLES. IT IS
GOOD TO QUALILY WHEN
DOING A TWO-TABLE JOIN.

/* Finish the Export Job and Write to File */ END THE JOB;

.END EXPORT;
28
.LOGOFF;

Figure 3-6

An Introduction to FastLoad

Why it is called “FAST”Load

FastLoad is known for its lightning-like speed in loading vast amounts of data from flat files from a
host into empty tables in Teradata. Part of this speed is achieved because it does not use the
Transient Journal. You will see some more of the reasons enumerated below. But, regardless of the
reasons that it is fast, know that FastLoad was developed to load millions of rows into a table.

The way FastLoad works can be illustrated by home construction, of all things! Let’s look at three
scenarios from the construction industry to provide an amazing picture of how the data gets loaded.

Scenario One: Builders prefer to start with an empty lot and construct a house on it, from the
foundation right on up to the roof. There is no pre-existing construction, just a smooth, graded lot.
The fewer barriers there are to deal with, the quicker the new construction can progress. Building
custom or spec houses this way is the fastest way to build them. Similarly, FastLoad likes to start
with an empty table, like an empty lot, and then populate it with rows of data from another source.
Because the target table is empty, this method is typically the fastest way to load data. FastLoad will
never attempt to insert rows into a table that already holds data.

Scenario Two: The second scenario in this analogy is when someone buys the perfect piece of land
on which to build a home, but the lot already has a house on it. In this case, the person may
determine that it is quicker and more advantageous just to demolish the old house and start fresh
from the ground up — allowing for brand new construction. FastLoad also likes this approach to
loading data. It can just 1) drop the existing table, which deletes the rows, 2) replace its structure,
and then 3) populate it with the latest and greatest data. When dealing with huge volumes of new
rows, this process will run much quicker than using MultiLoad to populate the existing table. Another
option is to DELETE all the data rows from a populated target table and reload it. This requires less
29
updating of the Data Dictionary than dropping and recreating a table. In either case, the result is a
perfectly empty target table that FastLoad requires!

Scenario Three: Sometimes, a customer has a good house already but wants to remodel a portion
of it or to add an additional room. This kind of work takes more time than the work described in
Scenario One. Such work requires some tearing out of existing construction in order to build the new
section. Besides, the builder never knows what he will encounter beneath the surface of the existing
home. So you can easily see that remodeling or additions can take more time than new construction.
In the same way, existing tables with data may need to be updated by adding new rows of data. To
load populated tables quickly with large amounts of data while maintaining the data currently held in
those tables, you would choose MultiLoad instead of FastLoad. MultiLoad is designed for this task
but, like renovating or adding onto an existing house, it may take more time.

How FastLoad Works

What makes FastLoad perform so well when it is loading millions or even billions of rows? It is
because FastLoad assembles data into 64K blocks (64,000 bytes) to load it and can use multiple
sessions simultaneously, taking further advantage of Teradata’s parallel processing.

This is different from BTEQ and TPump, which load data at the row level. It has been said, “If you
have it, flaunt it!” FastLoad does not like to brag, but it takes full advantage of Teradata’s parallel
architecture. In fact, FastLoad will create a Teradata session for each AMP (Access Module Processor
— the software processor in Teradata responsible for reading and writing data to the disks) in order
to maximize parallel processing. This advantage is passed along to the FastLoad user in terms of
awesome performance. Teradata is the only data warehouse product in the world that loads data,
processes data and backs up data in parallel.

FastLoad Has Some Limits

There are more reasons why FastLoad is so fast. Many of these become restrictions and therefore,
cannot slow it down. For instance, can you imagine a sprinter wearing cowboy boots in a race? Of
course, not! Because of its speed, FastLoad, too, must travel light! This means that it will have
limitations that may or may not apply to other load utilities. Remembering this short list will save
you much frustration from failed loads and angry colleagues. It may even foster your reputation as a
smooth operator!

Rule #1: No Secondary Indexes are allowed on the Target Table. High performance will only
allow FastLoad to utilize Primary Indexes when loading. The reason for this is that Primary (UPI and
NUPI) indexes are used in Teradata to distribute the rows evenly across the AMPs and build only data
rows. A secondary index is stored in a subtable block and many times on a different AMP from the
data row. This would slow FastLoad down and they would have to call it: get ready now,
HalfFastLoad. Therefore, FastLoad does not support them. If Secondary Indexes exist already, just
drop them. You may easily recreate them after completing the load.

Rule #2: No Referential Integrity is allowed. FastLoad cannot load data into tables that are
defined with Referential Integrity (RI). This would require too much system checking to prevent
referential constraints to a different table. FastLoad only does one table. In short, RI constraints will
need to be dropped from the target table prior to the use of FastLoad.

Rule #3: No Triggers are allowed at load time. FastLoad is much too focused on speed to pay
attention to the needs of other tables, which is what Triggers are all about. Additionally, these
require more than one AMP and more than one table. FastLoad does one table only. Simply ALTER
the Triggers to the DISABLED status prior to using FastLoad.

Rule #4: Duplicate Rows (in Multi-Set Tables) are not supported. Multi-set tables are tables
that allow duplicate rows — that is when the values in every column are identical. When FastLoad
finds duplicate rows, they are discarded. While FastLoad can load data into a multi-set table,
FastLoad will not load duplicate rows into a multi-set table because FastLoad discards duplicate rows!
30
Rule #5: No AMPs may go down (i.e., go offline) while FastLoad is processing. The down
AMP must be repaired before the load process can be restarted. Other than this, FastLoad can
recover from system glitches and perform restarts. We will discuss Restarts later in this chapter.

Rule #6: No more than one data type conversion is allowed per column during a FastLoad.
Why just one? Data type conversion is highly resource intensive job on the system, which requires a
“search and replace” effort. And that takes more time. Enough said!

Three Key Requirements for FastLoad to Run

FastLoad can be run from either MVS/ Channel (mainframe) or Network (LAN) host. In either case,
FastLoad requires three key components. They are a log table, an empty target table and two error
tables. The user must name these at the beginning of each script.

Log Table: FastLoad needs a place to record information on its progress during a load. It uses the
table called Fastlog in the SYSADMIN database. This table contains one row for every FastLoad
running on the system. In order for your FastLoad to use this table, you need INSERT, UPDATE and
DELETE privileges on that table.

Empty Target Table: We have already mentioned the absolute need for the target table to be
empty. FastLoad does not care how this is accomplished. After an initial load of an empty target
table, you are now looking at a populated table that will likely need to be maintained.

If you require the phenomenal speed of FastLoad, it is usually preferable, both for the sake of speed
and for less interaction with the Data Dictionary, just to delete all the rows from that table and then
reload it with fresh data. The syntax DELETE <databasename>.<tablename> should be used for
this. But sometimes, as in some of our FastLoad sample scripts (see Figure 4-2), you want to drop
that table and recreate it versus using the DELETE option. To do this, FastLoad has the ability to run
the DDL statements DROP TABLE and CREATE TABLE. The problem with putting DDL in the script is
that is no longer restartable and you are required to rerun the FastLoad from the beginning.
Otherwise, we recommend that you have a script for an initial run and a different script for a restart.

Two Error Tables: Each FastLoad requires two error tables. These are error tables that will only be
populated should errors occur during the load process. These are required by the FastLoad utility,
which will automatically create them for you; all you must do is to name them. The first error table is
for any translation errors or constraint violations. For example, a row with a column containing a
wrong data type would be reported to the first error table. The second error table is for errors caused
by duplicate values for Unique Primary Indexes (UPI). FastLoad will load just one occurrence for
every UPI. The other occurrences will be stored in this table. However, if the entire row is a
duplicate, FastLoad counts it but does not store the row. These tables may be analyzed later for
troubleshooting should errors occur during the load. For specifics on how you can troubleshoot, see
the section below titled, “What Happens When FastLoad Finishes.”

Maximum of 15 Loads

The Teradata RDBMS will only run a maximum number of fifteen FastLoads, MultiLoads, or
FastExports at the same time. This maximum is determined by a value stored in the DBS Control
record. It can be any value from 0 to 15. When Teradata is first installed, this value is set to 5
concurrent jobs.

Since these utilities all use the large blocking of rows, it hits a saturation point where Teradata will
protect the amount system resources available by queuing up the extra load. For example, if the
maximum number of jobs are currently running on the system and you attempt to run one more,
that job will not be started. You should view this limit as a safety control. Here is a tip for
remembering how the load limit applies: If the name of the load utility contains either the word
“Fast” or the word “Load”, then there can be only a total of fifteen of them running at any one time.
31
FastLoad Has Two Phases

Teradata is famous for its end-to-end use of parallel processing. Both the data and the tasks are
divided up among the AMPs. Then each AMP tackles its own portion of the task with regard to its
portion of the data. This same “divide and conquer” mentality also expedites the load process.
FastLoad divides its job into two phases, both designed for speed. They have no fancy names but are
typically known simply as Phase 1 and Phase 2. Sometimes they are referred to as Acquisition Phase
and Application Phase.

PHASE 1: Acquisition

The primary function of Phase 1 is to transfer data from the host computer to the Access Module
Processors (AMPs) as quickly as possible. For the sake of speed, the Parsing Engine of Teradata does
not does not take the time to hash each row of data based on the Primary Index. That will be done
later. Instead, it does the following:

When the Parsing Engine (PE) receives the INSERT command, it uses one session to parse the SQL
just once. The PE is the Teradata software processor responsible for parsing syntax and generating a
plan to execute the request. It then opens a Teradata session from the FastLoad client directly to the
AMPs. By default, one session is created for each AMP. Therefore, on large systems, it is normally a
good idea to limit the number of sessions using the SESSIONS command. This capability is shown
below.

Simultaneously, all but one of the client sessions begins loading raw data in 64K blocks for transfer
to an AMP. The first priority of Phase 1 is to get the data onto the AMPs as fast as possible. To
accomplish this, the rows are packed, unhashed, into large blocks and sent to the AMPs without any
concern for which AMP gets the block. The result is that data rows arrive on different AMPs than
those they would live, had they been hashed.

So how do the rows get to the correct AMPs where they will permanently reside? Following the
receipt of every data block, each AMP hashes its rows based on the Primary Index, and redistributes
them to the proper AMP. At this point, the rows are written to a worktable on the AMP but remain
unsorted until Phase 1 is complete.

Phase 1 can be compared loosely to the preferred method of transfer used in the parcel shipping
industry today. How do the key players in this industry handle a parcel? When the shipping company
receives a parcel, that parcel is not immediately sent to its final destination. Instead, for the sake of
speed, it is often sent to a shipping hub in a seemingly unrelated city. Then, from that hub it is sent
to the destination city. FastLoad’s Phase 1 uses the AMPs in much the same way that the shipper
uses its hubs. First, all the data blocks in the load get rushed randomly to any AMP. This just gets
them to a “hub” somewhere in Teradata country. Second, each AMP forwards them to their true
destination. This is like the shipping parcel being sent from a hub city to its destination city!

PHASE 2: Application

Following the scenario described above, the shipping vendor must do more than get a parcel to the
destination city. Once the packages arrive at the destination city, they must then be sorted by street
and zip code, placed onto local trucks and be driven to their final, local destinations.

Similarly, FastLoad’s Phase 2 is mission critical for getting every row of data to its final address (i.e.,
where it will be stored on disk). In this phase, each AMP sorts the rows in its worktable. Then it
writes the rows into the table space on disks where they will permanently reside. Rows of a table are
stored on the disks in data blocks. The AMP uses the block size as defined when the target table was
created. If the table is Fallback protected, then the Fallback will be loaded after the Primary table has
finished loading. This enables the Primary table to become accessible as soon as possible. FastLoad
is so ingenious, no wonder it is the darling of the Teradata load utilities!
32
FastLoad Commands

Here is a table of some key FastLoad commands and their definitions. They are used to provide
flexibility in control of the load process. Consider this your personal redi-reference guide! You will
notice that there are only a few SQL commands that may be used with this utility (Create Table,
Drop Table, Delete and Insert). This keeps FastLoad from becoming encumbered with additional
functions that would slow it down.

AXSMOD Short for Access Module, this command specifies input protocol
like OLE-DB or reading a tape from REEL Librarian. This
parameter is for network-attached systems only. When used, it
must precede the DEFINE command in the script.

BEGIN LOADING This identifies and locks the FastLoad target table for the
duration of the load. It also identifies the two error tables to be
used for the load. CHECKPONT and INDICATORS are
subordinate commands in the BEGIN LOADING clause of the
script. CHECKPOINT, which will be discussed below in detail, is
not the default for FastLoad. It must be specified in the script.
INDICATORS is a keyword related to how FastLoad handles
nulls in the input file. It identifies columns with nulls and uses
a bitmap at the beginning of each row to show which fields
contain a null instead of data. When the INDICATORS option is
on, FastLoad looks at each bit to identify the null column. The
INDICATORS option does not work with VARTEXT.

CREATE TABLE This defines the target table and follows normal syntax. If
used, this should only be in the initial script. If the table is
being loaded, it cannot be created a second time.

DEFINE This names the Input file and describes the columns in that file
and the data types for those columns.

DELETE Deletes all the rows of a table. This will only work in the initial
run of the script. Upon restart, it will fail because the table is
locked.

DROP TABLE Drops a table and its data. It is used in FastLoad to drop
previous Target and error tables. At the same time, this is not
a good thing to do within a FastLoad script since it cancels the
ability to restart.

END LOADING Success! This command indicates the point at which that all the
data has been transmitted. It tells FastLoad to proceed to
Phase II. As mentioned earlier, it can be used as a way to
partition data loads to the same table. This is true because the
table remains empty until after Phase II.

ERRLIMIT Specifies the maximum number of rejected ROWS allowed in


error table 1 (Phase I). This handy command can be a lifesaver
when you are not sure how corrupt the data in the Input file is.
The more corrupt it is, the greater the clean up effort required
after the load finishes. ERRLIMIT provides you with a safety
valve. You may specify a particular number of error rows
beyond which FastLoad will immediately precede to the abort.
This provides the option to restart the FastLoad or to scrub the
input data more before loading it. Remember, all the rows in
the error table are not in the data table. That becomes your
responsibility.
33
HELP Designed for online use, the Help command provides a list of
all possible FastLoad commands along with brief, but pertinent
tips for using them.

HELP TABLE Builds the table columns list for use in the FastLoad DEFINE
statement when the data matches the Create Table statement
exactly. In real life this does not happen very often.

INSERT This is FastLoad’s favorite command! It inserts rows into the


target table.

LOGON/LOGOFF or, No, this is not the WAX ON / WAX OFF from the movie, The
QUIT Karate Kid! LOGON simply begins a session. LOGOFF ends a
session. QUIT is the same as LOGOFF.

NOTIFY Just like it sounds, the NOTIFY command used to inform the
job that follows that some event has occurred. It calls a user
exit or predetermined activity when such events occur. NOTIFY
is often used for detailed reporting on the FastLoad job’s
success.

RECORD Specifies the beginning record number (or with THRU, the
ending record number) of the Input data source, to be read by
FastLoad. Syntactically, This command is placed before the
INSERT keyword. Why would it be used? Well, it enables
FastLoad to bypass input records that are not needed such as
tape headers, manual restart, etc. When doing a partition data
load, RECORD is used to over-ride the checkpoint. What does
this mean???

SET RECORD Used only in the LAN environment, this command states in
what format the data from the Input file is coming: FastLoad,
Unformatted, Binary, Text, or Variable Text. The default is the
Teradata RDBMS standard, FastLoad.

SESSIONS This command specifies the number of FastLoad sessions to


establish with Teradata. It is written in the script just before
the logon. The default is 1 session per available AMP. The
purpose of multiple sessions is to enhance throughput when
loading large volumes of data. Too few sessions will stifle
throughput. Too many will preclude availability of system
resources to other users. You will need to find the proper
balance for your configuration.

SLEEP Working in conjunction with TENACITY, the SLEEP command


specifies the amount minutes to wait before retrying to logon
and establish all sessions. This situation can occur if all of the
loader slots are used or if the number of requested sessions
are not available. The default is 6 minutes. For example,
suppose that Teradata sessions are already maxed-out when
your job is set to run. If TENACITY were set at 4 and SLEEP at
10, then FastLoad would attempt to logon every 10 minutes for
up to 4 hours. If there were no success by that time, all efforts
to logon would cease.

TENACITY Sometimes there are too many sessions already established


with Teradata for a FastLoad to obtain the number of sessions
it requested to perform its task or all of the loader slots are
currently used. TENACITY specifies the amount of time, in
hours, to retry to obtain a loader slot or to establish all
34
requested sessions to logon. The default for FastLoad is “no
tenacity”, meaning that it will not retry at all. If several
FastLoad jobs are executed at the same time, we recommend
setting the TENACITY to 4, meaning that the system will
continue trying to logon for the number of sessions requested
for up to four hours.

Figure 4-1

A FastLoad Example in its Simplest Form

The load utilities often scare people because there are many things that appear complicated. In
actuality, the load scripts are very simple. Think of FastLoad as:
• Logging onto Teradata
• Defining the Teradata table that you want to load (target table)
• Defining the INPUT data file
• Telling the system to start loading

This first script example is designed to show FastLoad in its simplest form. The actual script is in the
left column and our comments are on the right.

Logon CDW/jones, cowboys; LOGON TO TERADATA


Creates the department
target table in the sql101
database in Teradata

/* in this sample script, the create shows what the Defines the fields in the
table looks like, however, this is not a good practice record for the flat file
in a production script */ being read and FILE=
provides the name the
input file

Specifies table to load for


locking purposes, names
the error tables and sets
the checkpoint for restart
processing to 15000.
The INSERT used to
populate the table

END LOADING; Tells FastLoad to begin


Phase 2
LOGOFF; Disconnects the Teradata
sessions

Figure 4-2
35
Sample FastLoad Script

Let’s look at an actual FastLoad script that you might see in the real world. In the script below, every
comment line is placed inside the normal Teradata comment syntax, [/*. . . . */]. FastLoad and SQL
commands are written in upper case in order to make them stand out. In reality, Teradata utilities,
like Teradata itself, are by default not case sensitive. You will also note that when column names are
listed vertically we recommend placing the comma separator in front of the following column. Coding
this way makes reading or debugging the script easier for everyone. The purpose of this script is to
update the Employee_Profile table in the SQL01 database. The input file used for the load is named
EMPS.TXT. Below the sample script each step will be described in detail.

Normally it is not a good idea to put the DROP and CREATE statements in a FastLoad script. The
reason is that when any of the tables that FastLoad is using are dropped, the script cannot be
restarted. It can only be rerun from the beginning. Since FastLoad has restart logic built into it, a
restart is normally the better solution if the initial load attempt should fail. However, for purposes of
this example, it shows the table structure and the description of the data being read.

/* !/bin/ksh* */ Runs from a shell script.


/* ++++++++++++++++++++++++++++*/ Always good to identify
/* FASTLOAD SCRIPT TO LOAD THE */ the script and author in
/* Employee_Profile TABLE */ comments.
/* Version 1.1 */ Since this script does not
/* Created by Coffing Data Warehousing */ drop the target or error
/* ++++++++++++++++++++++++++++*/ tables, it is restartable.
This is a good thing for
/* Setup the FastLoad Parameters */ production jobs.

SESSIONS 100; /*or, the number of sessions supportable*/ Specify the number of
sessions to logon.
TENACITY 4; /* the default is no tenacity, means no retry */ Tenacity is set to 4 hr;
SLEEP 10; /* the default is 6, means retry in 6 minutes */ Wait 10 Min between
retries.
LOGON CW/SQL01,SQL01;
SHOW VERSIONS; /* Shows the Utility’s release number */ Display the version of
FastLoad.
/* Set the Record type to a comma delimited for FastLoad */ Starts with the second
RECORD 2; record.
SET RECORD VARTEXT ‘,’; Specifies if record layout
is vartext with a comma
delimiter.
Notice that all fields are
defined as VARCHAR.
When using VARTEXT,
the fields do not contain
the length field like in
these formats: text,
FastLoad, or
unformatted.
FILE= EMPS.TXT; Defines the flat file name.
/* Optional to show the layout of the input */ SHOW Specifies table to load
and lock.
36
/* Begin the Load and Insert Process into the */
/* Employee_Profile Table */
BEGIN LOADING SQL01.Employee_Profile Names the error tables.
ERRORFILES SQL01.Emp_Err1, SQL01.Emp_Err2 Sets the number of rows
CHECKPOINT 100000; at which to pause &
record progress in the
restart log before loading
further.
Defines the insert
statement to use for
loading the rows.

END LOADING; Continues loading


process with Phase 2.
LOGOFF; Logs off of Teradata.

Figure 4-3

Step One: Before logging onto Teradata, it is important to specify how many sessions you need. The
syntax is [SESSIONS {n}].

Step Two: Next, you LOGON to the Teradata system. You will quickly see that the utility commands
in FastLoad are similar to those in BTEQ. FastLoad commands were designed from the underlying
commands in BTEQ. However, unlike BTEQ, most of the FastLoad commands do not allow a dot [“.”]
in front of them and therefore need a semi-colon. At this point we chose to have Teradata tell us
which version of FastLoad is being used for the load. Why would we recommend this? We do because
as FastLoad’s capabilities get enhanced with newer versions, the syntax of the scripts may have to
be revisited.

Step Three: If the input file is not a FastLoad format, before you describe the INPUT FILE structure
in the DEFINE statement, you must first set the RECORD layout type for the file being passed by
FastLoad. We have used VARTEXT in our example with a comma delimiter. The other options are
FastLoad, TEXT, UNFORMATTED OR VARTEXT. You need to know this about your input file ahead of
time.

Step Four: Next, comes the DEFINE statement. FastLoad must know the structure and the name of
the flat file to be used as the input FILE, or source file for the load.

Step Five: FastLoad makes no assumptions from the DROP TABLE statements with regard to what
you want loaded. In the BEGIN LOADING statement, the script must name the target table and the
two error tables for the load. Did you notice that there is no CREATE TABLE statement for the error
tables in this script? FastLoad will automatically create them for you once you name them in the
script. In this instance, they are named “Emp_Err1” and “Emp_Err2”. Phase 1 uses “Emp_Err1”
because it comes first and Phase 2 uses “Emp_Err2”. The names are arbitrary, of course. You may
call them whatever you like. At the same time, they must be unique within a database, so using a
combination of your userid and target table name helps insure this uniqueness between multiple
FastLoad jobs occurring in the same database.

In the BEGIN LOADING statement we have also included the optional CHECKPOINT parameter. We
included [CHECKPOINT 100000]. Although not required, this optional parameter performs a vital task
with regard to the load. In the old days, children were always told to focus on the three “R’s’ in
grade school (“reading, ‘riting, and ‘rithmatic”). There are two very different, yet equally important,
R’s to consider whenever you run FastLoad. They are RERUN and RESTART. RERUN means that the
37
job is capable of running all the processing again from the beginning of the load. RESTART means
that the job is capable of running the processing again from the point where it left off when the job
was interrupted, causing it to fail. When CHECKPOINT is requested, it allows FastLoad to resume
loading from the first row following the last successful CHECKPOINT. We will learn more about
CHECKPOINT in the section on Restarting FastLoad.

Step Six: FastLoad focuses on its task of loading data blocks to AMPs like little Yorkshire terrier’s do
when playing with a ball! It will not stop unless you tell it to stop. Therefore, it will not proceed to
Phase 2 without the END LOADING command.

In reality, this provides a very valuable capability for FastLoad. Since the table must be empty at the
start of the job, it prevents loading rows as they arrive from different time zones. However, to
accomplish this processing, simply omit the END LOADING on the load job. Then, you can run the
same FastLoad multiple times and continue loading the worktables until the last file is received. Then
run the last FastLoad job with an END LOADING and you have partitioned your load jobs into smaller
segments instead of one huge job. This makes FastLoad even faster!

Of course to make this work, FastLoad must be restartable. Therefore, you cannot use the DROP or
CREATE commands within the script. Additionally, every script is exactly the same with the exception
of the last one, which contains the END LOADING causing FastLoad to proceed to Phase 2. That’s a
pretty clever way to do a partitioned type of data load.

Step Seven: All that goes up must come down. And all the sessions must LOGOFF. This will be the
last utility command in your script. At this point the table lock is released and if there are no rows in
the error tables, they are dropped automatically. However, if a single row is in one of them, you are
responsible to check it, take the appropriate action and drop the table manually.

Converting Data Types with FastLoad

Converting data is easy. Just define the input data types in the input file. Then, FastLoad will
compare that to the column definitions in the Data Dictionary and convert the data for you! But the
cardinal rule is that only one data type conversion is allowed per column. In the example below,
notice how the columns in the input file are converted from one data type to another simply by
redefining the data type in the CREATE TABLE statement.

FastLoad allows six kinds of data conversions. Here is a chart that displays them:

IN FASTLOAD YOU MAY CONVERT

CHARACTER DATA TO NUMERIC DATA


FIXED LENGTH DATA TO VARIABLE LENGTH DATA
CHARACTER DATA TO DATE
INTEGERS TO DECIMALS
DECIMALS TO INTEGERS
DATE TO CHARACTER DATA
NUMERIC DATA TO CHARACTER DATA

Figure 4-4

When we said that converting data is easy, we meant that it is easy for the user. It is actually quite
resource intensive, thus increasing the amount of time needed for the load. Therefore, if speed is
important, keep the number of columns being converted to a minimum!
38
A FastLoad Conversion Example

This next script example is designed to show how FastLoad converts data automatically when the
INPUT data type differs from the Target Teradata Table data type. The actual script is in the left
column and our comments are on the right.

LOGON CDW/jones, cowboys; LOGON TO TERADATA

NOTICE THAT DEPT_NO IS AN


INTEGER HERE IN THE TARGET TABLE,
BUT A CHAR(4) IN THE FLAT FILE
DEFINITION BELOW - CHAR(4) will
convert to integer
These date columns are DATE data
type will be converted from CHAR(10)

CHAR(4) converts to INTEGER


Character dates in different style in
the file: CHAR(10) comes in as YYYY-
MM-DD
CHAR(10) comes in as MM/DD/YYYY

FILE= Dept_Flat.txt; DEFINES THE FLAT FILE AND NAME


INPUT FILE

Names the target table and error


tables, don’t let the word “errorfiles”
fool you, they are tables.

Will check point every 15000 rows

The INSERT does automatic


conversion:
Converts character to integer
Converts character from ANSI date to
DATE
Converts character as other date to
DATE by describing the input format in
the file. Without the format, this row
goes into the error table.

Figure 4-5

When You Cannot RESTART FastLoad


There are two types of FastLoad scripts: those that you can restart and those that you cannot
without modifying the script. If any of the following conditions are true of the FastLoad script that
you are dealing with, it is NOT restartable:
• The Error Tables are DROPPED
• The Target Table is DROPPED
• The Target Table is CREATED
39
Why might you have to RESTART a FastLoad job, anyway? Perhaps you might experience a system
reset or some glitch that stops the job one half way through it. Maybe the mainframe went down.
Well, it is not really a big deal because FastLoad is so lightning-fast that you could probably just
RERUN the job for small data loads.

However, when you are loading a billion rows, this is not a good idea because it wastes time. So the
most common way to deal with these situations is simply to RESTART the job. But what if the normal
load takes 4 hours, and the glitch occurs when you already have two thirds of the data rows loaded?
In that case, you might want to make sure that the job is totally restartable. Let’s see how this is
done.

What Happens When FastLoad Finishes


You Receive an Outcome Status

The most important thing to do is verify that FastLoad completed successfully. This is accomplished
by looking at the last output in the report and making sure that it is a return code or status code of
zero (0). Any other value indicates that something wasn’t perfect and needs to be fixed.

The locks will not be removed and the error tables will not be dropped without a successful
completion. This is because FastLoad assumes that it will need them for its restart. At the same
time, the lock on the target table will not be released either. When running FastLoad, you realistically
have two choices once it is started. First choice is that you get it to run to a successful completion, or
lastly, rerun it from the beginning. As you can imagine, the best course of action is normally to get it
to finish successfully via a restart.

You Receive a Status Report

What happens when FastLoad finishes running? Well, you can expect to see a summary report on the
success of the load. Following is an example of such a report.

Line 1: TOTAL RECORDS READ = 1000000


Line 2: TOTAL ERRORFILE1 = 50
Line 3: TOTAL ERRORFILE2 = 0
Line 4: TOTAL INSERTS APPLIED = 999950
Line 5: TOTAL DUPLICATE ROWS = 0

Figure 4-7

The first line displays the total number of records read from the input file. Were all of them loaded?
Not really. The second line tells us that there were fifty rows with constraint violations, so they were
not loaded. Corresponding to this, fifty entries were made in the first error table. Line 3 shows that
there were zero entries into the second error table, indicating that there were no duplicate Unique
Primary Index violations. Line 4 shows that there were 999950 rows successfully loaded into the
empty target table. Finally, there were no duplicate rows. Had there been any duplicate rows, the
duplicates would only have been counted. They are not stored in the error tables anywhere. When
FastLoad reports on its efforts, the number of rows in lines 2 through 5 should always total the
number of records read in line 1.

Note on duplicate rows: Whenever FastLoad experiences a restart, there will normally be duplicate
rows that are counted. This is due to the fact that a error seldom occurs on a checkpoint (quiet or
quiescent point) when nothing is happening within FastLoad. Therefore, some number of rows will be
sent to the AMPs again because the restart starts on the next record after the value stored in the
checkpoint. Hence, when a restart occurs, the first row after the checkpoint and some of the
consecutive rows are sent a second time. These will be caught as duplicate rows after the sort. This
40
restart logic is the reason that FastLoad will not load duplicate rows into a MULTISET table. It
assumes they are duplicates because of this logic.

You Can Troubleshoot

In the example above, we know that the load was not entirely successful. But that is not enough.
Now we need to troubleshoot in order identify the errors and correct them. FastLoad generates two
error tables that will enable us to find the culprits. The first error table, which we named Errorfile1,
contains just three columns: The column ErrorCode contains the Teradata FastLoad code number to
a corresponding translation or constraint error. The second column, named ErrorField, specifies
which column in the table contained the error. The third column, DataParcel, contains the row with
the problem. Both error tables contain the same three columns; they just track different types of
errors.

As a user, you can select from either error table. To check errors in Errorfile1 you would use this
syntax:

Corrected rows may be inserted to the target table using another utility that does not require an
empty table.

To check errors in Errorfile2 you would the following syntax:

The definition of the second error table is exactly the same as the target table with all the same
columns and data types.

Restarting FastLoad — A More In-Depth Look


How the CHECKPOINT Option Works

CHECKPOINT option defines the points in a load job where the FastLoad utility pauses to record that
Teradata has processed a specified number of rows. When the parameter “CHECKPOINT [n]” is
included in the BEGIN LOADING clause the system will stop loading momentarily at increments of [n]
rows.

At each CHECKPOINT, the AMPs will all pause and make sure that everything is loading smoothly.
Then FastLoad sends a checkpoint report (entry) to the SYSADMIN.Fastlog table. This log contains a
list of all currently running FastLoad jobs and the last successfully reached checkpoint for each job.
Should an error occur that requires the load to restart, FastLoad will merely go back to the last
successfully reported checkpoint prior to the error. It will then restart from the record immediately
following that checkpoint and start building the next block of data to load. If such an error occurs in
Phase 1, with CHECKPOINT 0, FastLoad will always restart from the very first row.

Restarting with CHECKPOINT

Sometimes you may need to restart FastLoad. If the FastLoad script requests a CHECKPOINT (other
than 0), then it is restartable from the last successful checkpoint. Therefore, if the job fails, simply
resubmit the job. Here are the two options: Suppose Phase 1 halts prematurely; the Data Acquisition
phase is incomplete. Resubmit the FastLoad script. FastLoad will begin from RECORD 1 or the first
record past the last checkpoint. If you wish to manually specify where FastLoad should restart, locate
41
the last successful checkpoint record by referring to the SYSADMIN.FASTLOG table. To specify
where a restart will start from, use the RECORD command. Normally, it is not necessary to use the
RECORD command — let FastLoad automatically determine where to restart from.

If the interruption occurs in Phase 2, the Data Acquisition phase has already completed. We know
that the error is in the Application Phase. In this case, resubmit the FastLoad script with only the
BEGIN and END LOADING Statements. This will restart in Phase 2 with the sort and building of the
target table.

Restarting without CHECKPOINT (i.e., CHECKPOINT 0)

When a failure occurs and the FastLoad Script did not utilize the CHECKPOINT (i.e., CHECKPOINT 0),
one procedure is to DROP the target table and error tables and rerun the job. Here are some other
options available to you:

Resubmit job again and hope there is enough PERM space for all the rows already sent to the
unsorted target table plus all the rows that are going to be sent again to the same target table.
Other than using space, these rows will be rejected as duplicates. As you can imagine, this is not the
most efficient way since it processes many of the same rows twice.

If CHECKPOINT wasn’t specified, then CHECKPOINT defaults to 100,000. You can perform a manual
restart using the RECORD statement. If the output print file shows that checkpoint 100000 occurred,
use something like the following command: [RECORD 100001;]. This statement will skip records 1
through 10000 and resume on record 100001.

Using INMODs with FastLoad


When you find that FastLoad does not read the file type you have or you wish to control the access
for any reason, then it might be desirable to use an INMOD. An INMOD (Input Module), is fully
compatible with FastLoad in either mainframe or LAN environments, providing that the appropriate
programming languages are used. However, INMODs replace the normal mainframe DDNAME or LAN
defined FILE name with the following statement: DEFINE INMOD=<INMOD-name>. For a more in-
depth discussion of INMODs, see the chapter of this book titled “INMOD Processing”.

An Introduction to MultiLoad
Why it is called “Multi”Load

If we were going to be stranded on an island with a Teradata Data Warehouse and we could only
take along one Teradata load utility, clearly, MultiLoad would be our choice. MultiLoad has the
capability to load multiple tables at one time from either a LAN or Channel environment. This is in
stark contrast to its fleet-footed cousin, FastLoad, which can only load one table at a time. And it
gets better, yet!

This feature rich utility can perform multiple types of DML tasks, including INSERT, UPDATE, DELETE
and UPSERT on up to five (5) empty or populated target tables at a time. These DML functions may
be run either solo or in combinations, against one or more tables. For these reasons, MultiLoad is the
utility of choice when it comes to loading populated tables in the batch environment. As the volume
of data being loaded or updated in a single block, the performance of MultiLoad improves. MultiLoad
shines when it can impact more than one row in every data block. In other words, MultiLoad looks at
massive amounts of data and says, “Bring it on!”

Leo Tolstoy once said, “All happy families resemble each other.” Like happy families, the Teradata
load utilities resemble each other, although they may have some differences. You are going to be
pleased to find that you do not have to learn all new commands and concepts for each load utility.
MultiLoad has many similarities to FastLoad. It has even more commands in common with TPump.
42
The similarities will be evident as you work with them. Where there are some quirky differences,
we will point them out for you.

Two MultiLoad Modes: IMPORT and DELETE

MultiLoad provides two types of operations via modes: IMPORT and DELETE. In MultiLoad IMPORT
mode, you have the freedom to “mix and match” up to twenty (20) INSERTs, UPDATEs or DELETEs
on up to five target tables. The execution of the DML statements is not mandatory for all rows in a
table. Instead, their execution hinges upon the conditions contained in the APPLY clause of the script.
Once again, MultiLoad demonstrates its user-friendly flexibility. For UPDATEs or DELETEs to be
successful in IMPORT mode, they must reference the Primary Index in the WHERE clause.

The MultiLoad DELETE mode is used to perform a global (all AMP) delete on just one table. The
reason to use .BEGIN DELETE MLOAD is that it bypasses the Transient Journal (TJ) and can be
RESTARTed if an error causes it to terminate prior to finishing. When performing in DELETE mode,
the DELETE SQL statement cannot reference the Primary Index in the WHERE clause. This due to the
fact that a primary index access is to a specific AMP; this is a global operation.

The other factor that makes a DELETE mode operation so good is that it examines an entire block of
rows at a time. Once all the eligible rows have been removed, the block is written one time and a
checkpoint is written. So, if a restart is necessary, it simply starts deleting rows from the next block
without a checkpoint. This is a smart way to continue. Remember, when using the TJ all deleted rows
are put back into the table from the TJ as a rollback. A rollback can take longer to finish then the
delete. MultiLoad does not do a rollback; it does a restart.

The Purpose of DELETE MLOAD

In the above diagram, monthly data is being stored in a quarterly table. To keep the contents limited
to four months, monthly data is rotated in and out. At the end of every month, the oldest month of
data is removed and the new month is added. The cycle is “add a month, delete a month, add a
month, delete a month.” In our illustration, that means that January data must be deleted to make
room for May’s data.

Here is a question for you: What if there was another way to accomplish this same goal without
consuming all of these extra resources? To illustrate, let’s consider the following scenario: Suppose
you have Table A that contains 12 billion rows. You want to delete a range of rows based on a date
and then load in fresh data to replace these rows. Normally, the process is to perform a MultiLoad
43
DELETE to DELETE FROM Table A WHERE <date-column> < ‘2002-02-01’. The final step would be
to INSERT the new rows for May using MultiLoad IMPORT.

Block and Tackle Approach

MultiLoad never loses sight of the fact that it is designed for functionality, speed, and the ability to
restart. It tackles the proverbial I/O bottleneck problem like FastLoad by assembling data rows into
64K blocks and writing them to disk on the AMPs. This is much faster than writing data one row at a
time like BTEQ. Fallback table rows are written after the base table has been loaded. This allows
users to access the base table immediately upon completion of the MultiLoad while fallback rows are
being loaded in the background. The benefit is reduced time to access the data.

Amazingly, MultiLoad has full RESTART capability in all of its five phases of operation. Once again,
this demonstrates its tremendous flexibility as a load utility. Is it pure magic? No, but it almost
seems so. MultiLoad makes effective use of two error tables to save different types of errors and a
LOGTABLE that stores built-in checkpoint information for restarting. This is why MultiLoad does not
use the Transient Journal, thus averting time-consuming rollbacks when a job halts prematurely.

Here is a key difference to note between MultiLoad and FastLoad. Sometimes an AMP (Access Module
Processor) fails and the system administrators say that the AMP is “down” or “offline.” When using
FastLoad, you must restart the AMP to restart the job. MultiLoad, however, can RESTART when an
AMP fails, if the table is fallback protected. As the same time, you can use the AMPCHECK option to
make it work like FastLoad if you want.

MultiLoad Imposes Limits

Rule #1: Unique Secondary Indexes are not supported on a Target Table. Like FastLoad,
MultiLoad does not support Unique Secondary Indexes (USIs). But unlike FastLoad, it does support
the use of Non-Unique Secondary Indexes (NUSIs) because the index subtable row is on the same
AMP as the data row. MultiLoad uses every AMP independently and in parallel. If two AMPs must
communicate, they are not independent. Therefore, a NUSI (same AMP) is fine, but a USI (different
AMP) is not.

Rule #2: Referential Integrity is not supported. MultiLoad will not load data into tables that are
defined with Referential Integrity (RI). Like a USI, this requires the AMPs to communicate with each
other. So, RI constraints must be dropped from the target table prior to using MultiLoad.

Rule #3: Triggers are not supported at load time. Triggers cause actions on related tables
based upon what happens in a target table. Again, this is a multi-AMP operation and to a different
table. To keep MultiLoad running smoothly, disable all Triggers prior to using it.

Rule #4: No concatenation of input files is allowed. MultiLoad does not want you to do this
because it could impact are restart if the files were concatenated in a different sequence or data was
deleted between runs.

Rule #5: The host will not process aggregates, arithmetic functions or exponentiation. If
you need data conversions or math, you might be better off using an INMOD to prepare the data
prior to loading it.

Error Tables, Work Tables and Log Tables

Besides target table(s), MultiLoad requires the use of four special tables in order to function. They
consist of two error tables (per target table), one worktable (per target table), and one log table. In
essence, the Error Tables will be used to store any conversion, constraint or uniqueness violations
during a load. Work Tables are used to receive and sort data and SQL on each AMP prior to storing
them permanently to disk. A Log Table (also called, “Logtable”) is used to store successful
checkpoints during load processing in case a RESTART is needed.
44
HINT: Sometimes a company wants all of these load support tables to be housed in a particular
database. When these tables are to be stored in any database other than the user’s own default
database, then you must give them a qualified name (<databasename>.<tablename>) in the script
or use the DATABASE command to change the current database.

Where will you find these tables in the load script? The Logtable is generally identified immediately
prior to the .LOGON command. Worktables and error tables can be named in the BEGIN MLOAD
statement. Do not underestimate the value of these tables. They are vital to the operation of
MultiLoad. Without them a MultiLoad job can not run. Now that you have had the “executive
summary”, let’s look at each type of table individually.

Two Error Tables: Here is another place where FastLoad and MultiLoad are similar. Both require the
use of two error tables per target table. MultiLoad will automatically create these tables. Rows are
inserted into these tables only when errors occur during the load process. The first error table is the
acquisition Error Table (ET). It contains all translation and constraint errors that may occur while
the data is being acquired from the source(s).

The second is the Uniqueness Violation (UV) table that stores rows with duplicate values for
Unique Primary Indexes (UPI). Since a UPI must be unique, MultiLoad can only load one occurrence
into a table. Any duplicate value will be stored in the UV error table. For example, you might see a
UPI error that shows a second employee number “99.” In this case, if the name for employee “99” is
Kara Morgan, you will be glad that the row did not load since Kara Morgan is already in the Employee
table. However, if the name showed up as David Jackson, then you know that further investigation is
needed, because employee numbers must be unique.

Each error table does the following:


• Identifies errors
• Provides some detail about the errors
• Stores the actual offending row for debugging

You have the option to name these tables in the MultiLoad script (shown later). Alternatively, if you
do not name them, they default to ET_<target_table_name> and UV_<target_table_name>. In
either case, MultiLoad will not accept error table names that are the same as target table names. It
does not matter what you name them. It is recommended that you standardize on the naming
convention to make it easier for everyone on your team. For more details on how these error tables
can help you, see the subsection in this chapter titled, “Troubleshooting MultiLoad Errors.”

Log Table: MultiLoad requires a LOGTABLE. This table keeps a record of the results from each phase
of the load so that MultiLoad knows the proper point from which to RESTART. There is one LOGTABLE
for each run. Since MultiLoad will not resubmit a command that has been run previously, it will use
the LOGTABLE to determine the last successfully completed step.

Work Table(s): MultiLoad will automatically create one worktable for each target table. This means
that in IMPORT mode you could have one or more worktables. In the DELETE mode, you will only
have one worktable since that mode only works on one target table. The purpose of worktables is to
hold two things:
• The Data Manipulation Language (DML) tasks
• The input data that is ready to APPLY to the AMPs

The worktables are created in a database using PERM space. They can become very large. If the
script uses multiple SQL statements for a single data record, the data is sent to the AMP once for
each SQL statement. This replication guarantees fast performance and that no SQL statement will
ever be done more than once. So, this is very important. However, there is no such thing as a free
lunch, the cost is space. Later, you will see that using a FILLER field can help reduce this disk space
45
by not sending unneeded data to an AMP. In other words, the efficiency of the MultiLoad run is in
your hands.

Supported Input Formats

Data input files come in a variety of formats but MultiLoad is flexible enough to handle many of
them. MultiLoad supports the following five format options: BINARY, FASTLOAD, TEXT, UNFORMAT
and VARTEXT.

BINARY Each record is a 2-byte integer, n, that is followed by n bytes of


data. A byte is the smallest means of storage of for Teradata.

FASTLOAD This format is the same as Binary, plus a marker (X ‘0A’ or X ‘0D’)
that specifies the end of the record.

TEXT Each record has a random number of bytes and is followed by an


end of the record marker.

UNFORMAT The format for these input records is defined in the LAYOUT
statement of the MultiLoad script using the components FIELD,
FILLER and TABLE.

VARTEXT This is variable length text RECORD format separated by delimiters


such as a comma. For this format you may only use VARCHAR,
LONG VARCHAR (IBM) or VARBYTE data formats in your MultiLoad
LAYOUT. Note that two delimiter characters in a row will result in a
null value between them.

Figure 5-1

MultiLoad Has Five IMPORT Phases


MultiLoad IMPORT has five phases, but don’t be fazed by this! Here is the short list:
• Phase 1: Preliminary Phase
• Phase 2: DML Transaction Phase
• Phase 3: Acquisition Phase
• Phase 4: Application Phase
• Phase 5: Cleanup Phase

Let’s take a look at each phase and see what it contributes to the overall load process of this
magnificent utility. Should you memorize every detail about each phase? Probably not. But it is
important to know the essence of each phase because sometimes a load fails. When it does, you
need to know in which phase it broke down since the method for fixing the error to RESTART may
vary depending on the phase. And if you can picture what MultiLoad actually does in each phase, you
will likely write better scripts that run more efficiently.

Phase 1: Preliminary Phase

The ancient oriental proverb says, “Measure one thousand times; Cut once.” MultiLoad uses Phase 1
to conduct several preliminary set-up activities whose goal is to provide a smooth and successful
climate for running your load. The first task is to be sure that the SQL syntax and MultiLoad
commands are valid. After all, why try to run a script when the system will just find out during the
46
load process that the statements are not useable? MultiLoad knows that it is much better to
identify any syntax errors, right up front. All the preliminary steps are automated. No user
intervention is required in this phase.

Second, all MultiLoad sessions with Teradata need to be established. The default is the number of
available AMPs. Teradata will quickly establish this number as a factor of 16 for the basis regarding
the number of sessions to create. The general rule of thumb for the number of sessions to use for
smaller systems is the following: use the number of AMPs plus two more. For larger systems with
hundreds of AMP processors, the SESSIONS option is available to lower the default. Remember,
these sessions are running on your poor little computer as well as on Teradata.

Each session loads the data to Teradata across the network or channel. Every AMP plays an essential
role in the MultiLoad process. They receive the data blocks, hash each row and send the rows to the
correct AMP. When the rows come to an AMP, it stores them in worktable blocks on disk. But, lest we
get ahead of ourselves, suffice it to say that there is ample reason for multiple sessions to be
established.

What about the extra two sessions? Well, the first one is a control session to handle the SQL and
logging. The second is a back up or alternate for logging. You may have to use some trial and error
to find what works best on your system configuration. If you specify too few sessions it may impair
performance and increase the time it takes to complete load jobs. On the other hand, too many
sessions will reduce the resources available for other important database activities.

Third, the required support tables are created. They are the following:

Type of Table Table Details


ERRORTABLES MultiLoad requires two error tables per target table. The first
error table contains constraint violations, while the second error
table stores Unique Primary Index violations.
WORKTABLES Work Tables hold two things: the DML tasks requested and the
input data that is ready to APPLY to the AMPs.
LOGTABLE The LOGTABLE keeps a record of the results from each phase of
the load so that MultiLoad knows the proper point from which to
RESTART.

Figure 5-2

The final task of the Preliminary Phase is to apply utility locks to the target tables. Initially, access
locks are placed on all target tables, allowing other users to read or write to the table for the time
being. However, this lock does prevent the opportunity for a user to request an exclusive lock.
Although, these locks will still allow the MultiLoad user to drop the table, no one else may DROP or
ALTER a target table while it is locked for loading. This leads us to Phase 2.

Phase 2: DML Transaction Phase

In Phase 2, all of the SQL Data Manipulation Language (DML) statements are sent ahead to
Teradata. MultiLoad allows the use of multiple DML functions. Teradata’s Parsing Engine (PE) parses
the DML and generates a step-by-step plan to execute the request. This execution plan is then
communicated to each AMP and stored in the appropriate worktable for each target table. In other
words, each AMP is going to work off the same page.

Later, during the Acquisition phase the actual input data will also be stored in the worktable so that it
may be applied in Phase 4, the Application Phase. Next, a match tag is assigned to each DML request
that will match it with the appropriate rows of input data. The match tags will not actually be used
until the data has already been acquired and is about to be applied to the worktable. This is
somewhat like a student who receives a letter from the university in the summer that lists his
47
courses, professor’s names, and classroom locations for the upcoming semester. The letter is a
“match tag” for the student to his school schedule, although it will not be used for several months.
This matching tag for SQL and data is the reason that the data is replicated for each SQL statement
using the same data record.

Phase 3: Acquisition Phase

With the proper set-up complete and the PE‘s plan stored on each AMP, MultiLoad is now ready to
receive the INPUT data. This is where it gets interesting! MultiLoad now acquires the data in large,
unsorted 64K blocks from the host and sends it to the AMPs.

At this point, Teradata does not care about which AMP receives the data block. The blocks are simply
sent, one after the other, to the next AMP in line. For their part, each AMP begins to deal with the
blocks that they have been dealt. It is like a game of cards — you take the cards that you have
received and then play the game. You want to keep some and give some away.

Similarly, the AMPs will keep some data rows from the blocks and give some away. The AMP hashes
each row on the primary index and sends it over the BYNET to the proper AMP where it will
ultimately be used. But the row does not get inserted into its target table, just yet. The receiving
AMP must first do some preparation before that happens. Don’t you have to get ready before
company arrives at your house? The AMP puts all of the hashed rows it has received from other AMPs
into the worktables where it assembles them into the SQL. Why? Because once the rows are
reblocked, they can be sorted into the proper order for storage in the target table. Now the utility
places a load lock on each target table in preparation for the Application Phase. Of course, there is
no Acquisition Phase when you perform a MultiLoad DELETE task, since no data is being acquired.

Phase 4: Application Phase

The purpose of this phase is to write, or APPLY, the specified changes to both the target tables and
NUSI subtables. Once the data is on the AMPs, it is married up to the SQL for execution. To
accomplish this substitution of data into SQL, when sending the data, the host has already attached
some sequence information and five (5) match tags to each data row. Those match tags are used to
join the data with the proper SQL statement based on the SQL statement within a DMP label. In
addition to associating each row with the correct DML statement, match tags also guarantee that no
row will be updated more than once, even when a RESTART occurs.

The following five columns are the matching tags:

MATCHING TAGS

ImportSeq Sequence number that identifies the IMPORT command where


the error occurred

DMLSeq Sequence number for the DML statement involved with the
error

SMTSeq Sequence number of the DML statement being carried out


when the error was discovered

ApplySeq Sequence number that tells which APPLY clause was running
when the error occurred

SourceSeq The number of the data row in the client file that was being
built when the error took place

Figure 5-3

Remember, MultiLoad allows for the existence of NUSI processing during a load. Every hash-
sequence sorted block from Phase 3 and each block of the base table is read only once to reduce I/O
48
operations to gain speed. Then, all matching rows in the base block are inserted, updated or
deleted before the entire block is written back to disk, one time. This is why the match tags are so
important. Changes are made based upon corresponding data and DML (SQL) based on the match
tags. They guarantee that the correct operation is performed for the rows and blocks with no
duplicate operations, a block at a time. And each time a table block is written to disk successfully, a
record is inserted into the LOGTABLE. This permits MultiLoad to avoid starting again from the very
beginning if a RESTART is needed.

What happens when several tables are being updated simultaneously? In this case, all of the updates
are scripted as a multi-statement request. That means that Teradata views them as a single
transaction. If there is a failure at any point of the load process, MultiLoad will merely need to be
RESTARTed from the point where it failed. No rollback is required. Any errors will be written to the
proper error table.

Phase 5: Clean Up Phase

Those of you reading these paragraphs that have young children or teenagers will certainly
appreciate this final phase! MultiLoad actually cleans up after itself. The utility looks at the final Error
Code (&SYSRC). MultiLoad believes the adage, “All is well that ends well.” If the last error code is
zero (0), all of the job steps have ended successfully (i.e., all has certainly ended well). This being
the case, all empty error tables, worktables and the log table are dropped. All locks, both Teradata
and MultiLoad, are released. The statistics for the job are generated for output (SYSPRINT) and the
system count variables are set. After this, each MultiLoad session is logged off. So what happens if
the final error code is not zero? Stay tuned. Restarting MultiLoad is a topic that will be covered later
in this chapter.

MultiLoad Commands
Two Types of Commands

You may see two types of commands in MultiLoad scripts: tasks and support functions. MultiLoad
tasks are commands that are used by the MultiLoad utility for specific individual steps as it processes
a load. Support functions are those commands that involve the Teradata utility Support Environment
(covered in Chapter 9), are used to set parameters, or are helpful for monitoring a load.

The chart below lists the key commands, their type, and what they do.

MLOAD Type What does the MLOAD Command do?


Command

.BEGIN Task This command communicates directly with Teradata to


[IMPORT] specify if the MultiLoad mode is going to be IMPORT or
MLOAD .BEGIN DELETE. Note that the word IMPORT is optional in the
DELETE MLOAD syntax because it is the DEFAULT, but DELETE is
required. We recommend using the word IMPORT to
make the coding consistent and easier for others to read.
Any parameters for the load, such as error limits or
checkpoints will be included under the .BEGIN command,
too. It is important to know which commands or
parameters are optional since, if you do not include them,
MultiLoad may supply defaults that may impact your load.
.DML LABEL Task The DML LABEL defines treatment options and labels for
49
the application (APPLY) of data for the INSERT, UPDATE,
UPSERT and DELETE operations. A LABEL is simply a
name for a requested SQL activity. The LABEL is defined
first, and then referenced later in the APPLY clause.
.END MLOAD Task This instructs MultiLoad to finish the APPLY operations
with the changes to the designated databases and tables.
.FIELD Task This defines a column of the data source record that will
be sent to the Teradata database via SQL. When writing
the script, you must include a FIELD for each data field
you need in SQL. This command is used with the LAYOUT
command.
.FILLER Task Do not assume that MultiLoad has somehow uncovered
much of what you used in your term papers at the
university! FILLER defines a field that is accounted for as
part of the data source’s row format, but is not sent to
the Teradata DBS. It is used with the LAYOUT command.
.LAYOUT Task LAYOUT defines the format of the INPUT DATA record so
Teradata knows what to expect. If one record is not large
enough, you can concatenate multiple by using the
LAYOUT parameter CONTINUEIF to tell which value to
perform for the concatenation. Another option is
INDICATORS, which is used to represent nulls by using
the bitmap (1 bit per field) at the front of the data record.
.LOGON Support This specifies the username or LOGON string that will
establish sessions for MultiLoad with Teradata.
.LOGTABLE Support This support command names the name of the Restart
Log that will be used for storing CHECKPOINT data
pertaining to a load. The LOGTABLE is then used to tell
MultiLoad where to RESTART, should that be necessary. It
is recommended that this command be placed before
the .LOGON command.
.LOGOFF Support This command terminates any sessions established by the
LOGON command.
.IMPORT Task This command defines the INPUT DATA FILE, file type, file
usage, the LAYOUT to use and where to APPLY the data to
SQL.
.SET Support Optionally, you can SET utility variables. An example
would be {.SET DBName TO ‘CDW_Test‘}.
.SYSTEM Support This interrupts the operation of MultiLoad in order to issue
commands to the local operating system.
.TABLE Task This is a command that may be used with the .LAYOUT
command. It identifies a table whose columns (both their
order and data types) are to be used as the field names
and data descriptions of the data source records.

Figure 5-4

Parameters for .BEGIN IMPORT MLOAD


Here is a list of components or parameters that may be used in the .BEGIN IMPORT command. Note:
The parameters do not require the usual dot prior to the command since they are actually sub-
commands.
50

PARAMETER REQUIRED OR NOT WHAT IT DOES

AMPCHECK {NONE|APPLY|ALL} Optional NONE specifies that


MLOAD starts even with
one down AMP per cluster
if all tables are Fallback.
APPLY (DEFAULT)
specifies MLOAD will not
start or finish Phase 4 with
a down AMP.
ALL specifies not to
proceed if any AMPs are
down, just like FastLoad.
AXSMOD Optional Short for Access Module,
this command specifies
input protocol like OLE-DB
or reading a tape from
REEL Librarian. This
parameter is for network-
attached systems only.
When used, it must
precede the DEFINE
command in the script.
CHECKPOINT Optional You have two options:
CHECKPOINT refers to the
number of minutes, or
frequency, at which you
wish a CHECKPOINT to
occur if the number is 60
or less. If the number is
greater than 60, it
designates the number of
rows at which you want
the CHECKPOINT to occur.
This command is NOT valid
in DELETE mode.
ERRLIMIT errcount [errpercent] Optional You may specify the
maximum number of
errors, or the percentage,
that you will tolerate
during the processing of a
load job.
ERRORTABLES Optional Names the two error
ET_ERR UV_ERR tables, two per target
table. Note there is no
comma separator.
NOTIFY {LOW|MEDIUM|HIGH|OFF Optional If you opt to use NOTIFY
for a any event during a
load, you may designate
the priority of that
notification:
LOW for level events.
MEDIUM for important
51
events.
HIGH for events at
operational decision points,
and OFF to eliminate any
notification at all for a
given phase.
SESSIONS Optional This refers to the number
of SESSIONS that should
be established with
Teradata. For MultiLoad,
the optimal number of
sessions is the number of
AMPs in the system, plus
two more. You can also
use MAX or MIN, which
automatically use the
maximum or minimum
number of sessions to
complete the job. If you
specify nothing, it will
default to MAX.
SLEEP Optional Tells MultiLoad how
frequently, in minutes, to
try logging on to the
system.
TABLES Tablename1, Required Names up to 5 target
Tablename2…,Tablename5 tables.
TENACITY Optional Tells MultiLoad how many
hours to try logging on
when its initial effort to do
so is rebuffed.
WORKTABLES Optional Names the worktable(s),
Tablename1,Tablename2…,Tablename5 one per target table.

Figure 5-5

Parameters for .BEGIN DELETE MLOAD


Here is a list of components or parameters that may be used in the BEGIN DELETE command. Note:
The parameters do not require the usual dot prior to the command since parameters are actually
sub-commands.

PARAMETER REQUIRED OR NOT WHAT IT DOES


TABLES Tablename1 Required Names the Target table.
WORKTABLES Tablename1 Optional Names the worktable one per
target table.
ERRORTABLES ET_ERR Optional Names the two error tables, two
UV_ERR per target table and there is no
comma separator between
them.
52
TENACITY Optional Tells MultiLoad how many hours
to try establishing sessions when
its initial effort to do so is
rebuffed.
SLEEP Optional Tells MultiLoad how frequently,
in minutes, to try logging on to
the system.

Figure 5-6

A Simple MultiLoad IMPORT Script


MultiLoad can be somewhat intimidating to the new user because there are many commands and
phases. In reality, the load scripts are understandable when you think through what the IMPORT
mode does:
• Setting up a Logtable
• Logging onto Teradata
• Identifying the Target, Work and Error tables
• Defining the INPUT flat file
• Defining the DML activities to occur
• Naming the IMPORT file
• Telling MultiLoad to use a particular LAYOUT
• Telling the system to start loading
• Finishing loading and logging off of Teradata

This first script example is designed to show MultiLoad IMPORT in its simplest form. It depicts the
loading of a three-column Employee table. The actual script is in the left column and our comments
are on the right. Below the script is a step-by-step description of how this script works.

/* Simple Mload script */ Sets Up a Logtable and


.LOGTABLE SQL01.CDW_Log; Logs on to Teradata
.LOGON TDATA/SQL01,SQL0;

Begins the Load Process by


naming the Target Table,
Work table and error
tables; Notice NO comma
between the error tables

Names the LAYOUT of the


INPUT record and defines
its structure; Notice the
dots before the FIELD and
FILLER and the semi-
colons after each
definition.

.DML LABEL INSERTS; Names the DML Label


53
Tells MultiLoad to INSERT
a row into the target table
and defines the row
format.

Lists, in order, the VALUES


(each one preceded by a
colon) to be INSERTed.

Names the Import File and


its Format type; Cites the
LAYOUT file to use tells
Mload to APPLY the
INSERTs.

.END MLOAD; Ends MultiLoad and Logs


.LOGOFF; off all MultiLoad sessions

Figure 5-7

Step One: Setting up a Logtable and Logging onto Teradata — MultiLoad requires you specify a
log table right at the outset with the .LOGTABLE command. We have called it CDW_Log. Once you
name the Logtable, it will be automatically created for you. The Logtable may be placed in the same
database as the target table, or it may be placed in another database. Immediately after this you log
onto Teradata using the .LOGON command. The order of these two commands is interchangeable,
but it is recommended to define the Logtable first and then to Log on, second. If you reverse the
order, Teradata will give a warning message. Notice that the commands in MultiLoad require a dot in
front of the command key word.

Step Two: Identifying the Target, Work and Error tables — In this step of the script you must
tell Teradata which tables to use. To do this, you use the .BEGIN IMPORT MLOAD command. Then
you will preface the names of these tables with the sub-commands TABLES, WORKTABLES AND
ERROR TABLES. All you must do is name the tables and specify what database they are in. Work
tables and error tables are created automatically for you. Keep in mind that you get to name and
locate these tables. If you do not do this, Teradata might supply some defaults of its own!

At the same time, these names are optional. If the WORKTABLES and ERRORTABLES had not
specifically been named, the script would still execute and build these tables. They would have been
built in the default database for the user. The name of the worktable would be
WT_EMPLOYEE_DEPT1 and the two error tables would be called ET_EMPLOYEE_DEPT1 and
UV_EMPLOYEE_DEPT1, respectively.

Sometimes, large Teradata systems have a work database with a lot of extra PERM space. One
customer calls this database CORP_WORK. This is where all of the logtables and worktables are
normally created. You can use a DATABASE command to point all table creations to it or qualify the
names of these tables individually.

Step Three: Defining the INPUT flat file record structure — MultiLoad is going to need to know
the structure the INPUT flat file. Use the .LAYOUT command to name the layout. Then list the fields
and their data types used in your SQL as a .FIELD. Did you notice that an asterisk is placed between
the column name and its data type? This means to automatically calculate the next byte in the
record. It is used to designate the starting location for this data based on the previous fields length.
If you are listing fields in order and need to skip a few bytes in the record, you can either use the
.FILLER (like above) to position to the cursor to the next field, or the “*” on the Dept_No field could
have been replaced with the number 132 ( CHAR(11)+CHAR(20)+CHAR(100)+1 ). Then, the .FILLER
is not needed. Also, if the input record fields are exactly the same as the table, the .TABLE can be
54
used to automatically define all the .FIELDS for you. The LAYOUT name will be referenced later in
the .IMPORT command. If the input file is created with INDICATORS, it is specified in the LAYOUT.

Step Four: Defining the DML activities to occur — The .DML LABEL names and defines the SQL
that is to execute. It is like setting up executable code in a programming language, but using SQL. In
our example, MultiLoad is being told to INSERT a row into the SQL01.Employee_Dept table. The
VALUES come from the data in each FIELD because it is preceded by a colon (:). Are you allowed to
use multiple labels in a script? Sure! But remember this: Every label must be referenced in an APPLY
clause of the .IMPORT clause.

Step Five: Naming the INPUT file and its format type — This step is vital! Using the .IMPORT
command, we have identified the INFILE data as being contained in a file called
“CDW_Join_Export.txt”. Then we list the FORMAT type as TEXT. Next, we referenced the LAYOUT
named FILEIN to describe the fields in the record. Finally, we told MultiLoad to APPLY the DML LABEL
called INSERTS — that is, to INSERT the data rows into the target table. This is still a sub-
component of the .IMPORT MLOAD command. If the script is to run on a mainframe, the INFILE
name is actually the name of a JCL Data Definition (DD) statement that contains the real name of
the file.

Notice that the .IMPORT goes on for 4 lines of information. This is possible because it continues until
it finds the semi-colon to define the end of the command. This is how it determines one operation
from another. Therefore, it is very important or it would have attempted to process the END
LOADING as part of the IMPORT — it wouldn’t work.

Step Six: Finishing loading and logging off of Teradata — This is the closing ceremonies for the
load. MultiLoad to wrap things up, closes the curtains, and logs off of the Teradata system.

Important note: Since the script above in Figure 5-7 does not DROP any tables, it is completely
capable of being restarted if an error occurs. Compare this to the next script in Figure 5-8. Do you
think it is restartable? If you said no, part yourself on the back.

MultiLoad IMPORT Script


Let’s take a look at MultiLoad IMPORT script that comes from real life. This sample script will look
much more like what you might encounter at your workplace. It is more detailed. The notes to the
right are brief and too the point. They will help you can grasp the essence of what is happening in
the script.

Load Runs from a Shell


Script

Any words between /* … */


are comments only and are
not processed by Teradata.
Names and describes the
purpose of the script; names
the author

.LOGTABLE SQL01.CDW_Log; Secures the logon by storing


.RUN FILE LOGON.TXT; userid and password in a
separate file, then reads it.

/*Drop Error Tables — caution, this script cannot be Drops Existing error tables
restarted because these tables would be needed */ and cancels the ability for
55
the script to restart – DON’T
DROP TABLE SQL01.CDW_ET; ATTEMPT THIS AT HOME!
DROP TABLE SQL01.CDW_UV; Also, SQL does not use a dot
(.)

Begins the Load Process by


telling us first the names of
the target table, Work table
and error tables; note NO
comma between the names
of the error tables

Names the LAYOUT of the


INPUT file.
Defines the structure of the
INPUT file. Notice the dots
before the FIELD command
and the semi-colons after
each FIELD definition.

Names the DML Label

Tells MultiLoad to INSERT a


row into the target table
and defines the row format.
Note that we place comma
separators in front of the
following column or value
for easier debugging.
Lists, in order, the VALUES
to be INSERTed.

Names the Import File and


States its Format type;
Names the Layout file to use
And tells MultiLoad to APPLY
the INSERTs.

.END MLOAD; Ends MultiLoad and Logs off


.LOGOFF; of Teradata

Figure 5-8

Error Treatment Options for the .DML LABEL Command

MultiLoad allows you to tailor how it deals with different types of errors that it encounters during the
load process, to fit your needs. Here is a summary of the options available to you:

ERROR TREATMENT OPTIONS FOR .DML LABEL


56

Figure 5-9

In IMPORT mode, you may specify as many as five distinct error-treatment options for one
.DML statement. For example, if there is more than one instance of a row, do you want MultiLoad to
IGNORE the duplicate row, or to MARK it (list it) in an error table? If you do not specify IGNORE,
then MultiLoad will MARK, or record all of the errors. Imagine you have a standard INSERT load that
you know will end up recording about 20,000 duplicate row errors. Using the following syntax
“IGNORE DUPLICATE INSERT ROWS;” will keep them out of the error table. By ignoring those errors,
you gain three benefits:

1. You do not need to see all the errors.


2. The error table is not filled up needlessly.
3. MultiLoad runs much faster since it is not conducting a duplicate row check.

When doing an UPSERT, there are two rules to remember:


• The default is IGNORE MISSING UPDATE ROWS. Mark is the default for all operations. When
doing an UPSERT, you anticipate that some rows are missing, otherwise, why do an UPSERT.
So, this keeps these rows out of your error table.
• The DO INSERT FOR MISSING UPDATE ROWS is mandatory. This tells MultiLoad to insert a
row from the data source if that row does not exist in the target table because the update
didn’t find it.

The table that follows shows you, in more detail, how flexible your options are:

ERROR TREATMENT OPTIONS IN DETAIL

DML LABEL OPTION WHAT IT DOES

MARK DUPLICATE INSERT ROWS This option logs an entry for all
duplicate INSERT rows in the UV_ERR
table. Use this when you want to know
about the duplicates.

IGNORE DUPLICATE INSERT ROWS This tells MultiLoad to IGNORE duplicate


INSERT rows because you do not want
to see them.

MARK DUPLICATE UPDATE ROWS This logs the existence of every


duplicate UPDATE row.

IGNORE DUPLICATE UPDATE ROWS This eliminates the listing of duplicate


update row errors.

MARK MISSING UPDATE ROWS This option ensures a listing of data


rows that had to be INSERTed since
57
there was no row to UPDATE.

IGNORE MISSING UPDATE ROWS This tells MultiLoad NOT to list UPDATE
rows as an error. This is a good option
when doing an UPSERT since UPSERT
will INSERT a new row.

MARK MISSING DELETE ROWS This option makes a note in the


ET_Error Table that a row to be deleted
is missing.

IGNORE MISSING DELETE ROWS This option says, “Do not tell me that a
row to be deleted is missing”.

DO INSERT for MISSING UPDATE ROWS This is required to accomplish an


UPSERT. It tells MultiLoad that if the
row to be updated does not exist in the
target table, then INSERT the entire row
from the data source.

Figure 5-10

An IMPORT Script with Error Treatment Options

The command .DML LABEL names any DML options (INSERT, UPDATE OR DELETE) that immediately
follow it in the script. Each label must be given a name. In IMPORT mode, the label will be
referenced for use in the APPLY Phase when certain conditions are met. The following script provides
an example of just one such possibility:

/* !/bin/ksh* */ Load Runs from a Shell


Script
/* +++++++++++++++++++++++++++++*/ Any words between /* …
/* MultiLoad SCRIPT */ */ are COMMENTS ONLY
/*This script is designed to change the */ and are not processed by
/*EMPLOYEE_DEPT table using the data from */ Teradata.
/* the IMPORT INFILE CDW_Join_Export.txt */ Names and describes the
/* Version 1.1 */ purpose of the script;
/* Created by Coffing Data Warehousing */ names the author
/* ++++++++++++++++++++++++++++ */

/* Setup the MultiLoad Logtables, Logon Statements*/ Sets up a Logtable and then
.LOGTABLE SQL01.CDW_Log; logs on to Teradata.
.LOGON TDATA/SQL01,SQL01;
DATABASE SQL01; Specifies the database in
which to find the target
table.

/*Drop Error Tables */ Drops Existing error tables


DROP TABLE WORKDB.CDW_ET; in the work database.
DROP TABLE WORKDB.CDW_UV;

/* Begin Import and Define Work and Error Tables */ Begins the Load Process by
.BEGIN IMPORT MLOAD TABLES telling us first the names of
Employee_Dept the Target Table, Work
WORKTABLES table and error tables are in
WORKDB.CDW_WT a work database. Note
58
ERRORTABLES there is no comma between
WORKDB.CDW_ET the names of the error
WORKDB.CDW_UV; tables (pair).

/* Define Layout of Input File */ Names the LAYOUT of the


.LAYOUT FILEIN; INPUT file.
.FIELD Employee_No * CHAR(11); Defines the structure of the
.FIELD First_Name * CHAR(14); INPUT file. Notice the dots
.FIELD Last_Name * CHAR(20); before the FIELD command
.FIELD Dept_No * CHAR(6); and the semi-colons after
.FIELD Dept_Name * CHAR(20); each FIELD definition.

/* Begin INSERT Process on Table */ Names the DML Label

.DML LABEL INSERTS Tells MultiLoad NOT TO


IGNORE DUPLICATE INSERT ROWS; LIST duplicate INSERT rows
in the error table; notice
the option is placed AFTER
the LABEL identification
and immediately BEFORE
the DML function.
INSERT INTO SQL01.Employee_Dept Lists, in order, the VALUES
( Employee_No to be INSERTed.
,First_Name
,Last_Name
,Dept_No
,Dept_Name )
VALUES
( :Employee_No
,:First_Name,
,:Last_Name,
,:Dept_No,
,:Dept_Name );

/* Specify IMPORT File and Apply Parameters */ Names the Import File and
.IMPORT INFILE CDW_Join_Export.txt States its Format type;
FORMAT TEXT names the Layout file to
LAYOUT FILEIN use and tells MultiLoad to
APPLY INSERTS; APPLY the INSERTs.

.END MLOAD; Ends MultiLoad and logs off


.LOGOFF; of Teradata

Figure 5-11

An IMPORT Script that Uses Two Input Data Files

/* !/bin/ksh* */ Load Runs from a Shell Script


59
Any words between /* … */
are comments only and are
not processed by Teradata.

.LOGTABLE SQL01.EMPDEPT_LOG; Sets up a Logtable and logs


.RUN FILE c:\mydir\logon.txt; on with .RUN.
The logon.txt file contains:
.logon TDATA/SQL01,SQL01;

DROP TABLE SQL01.EMP_WT; Drops the worktables and


DROP TABLE SQL01.DEPT_WT; error tables, in case they
DROP TABLE SQL01.EMP_ET; existed from a prior load;
DROP TABLE SQL01.EMP_UV; NOTE: Do NOT include IF you
DROP TABLE SQL01.DEPT_ET; want to RESTART using
DROP TABLE SQL01.DEPT_UV; CHECKPOINT.

Identifies the 2 target tables


with a comma between them.
Names the worktable and
error tables for each target
table;
Note there are NO commas
between the pair of names,
but there is a comma
between this pair and the
next pair.

Names and Defines the


LAYOUT of the 1st INPUT file

Names and Defines the


LAYOUT of the 2nd INPUT file
60
st
Names the 1 DML Label;
Tells MultiLoad to IGNORE
duplicate INSERT rows
because you do not want to
see them.
INSERT a row into the table,
but does NOT name the
columns. So all VALUES are
passed IN THE ORDER they
are defined in the Employee
table.

Names the 2nd DML Label;


Tells MultiLoad to UPDATE
when it finds Deptno
(record) equal to the
Dept_No in the
Department_table and
change the Dept_name
column with the DeptName
from the INPUT file.

Names the TWO Import Files


Names the TWO Layouts that
define the structure of the
INPUT DATA files…
and tells MultiLoad to APPLY
the INSERTs to target table 1
and the UPDATEs to target
table 2.
.END MLOAD; Ends MultiLoad and logs off
.LOGOFF; of Teradata.

Figure 5-12

Redefining the INPUT

Sometimes, instead of using two different INPUT DATA files, which require two separate LAYOUTs,
you can combine them into one INPUT DATA file. And you can use that one file, with just one
LAYOUT to load more than one table! You see, a flat file may contain more than one type of data
record. As long as each record has a unique code to identify it, MultiLoad can check this code and
know which layout to use for using different names in the same layout. To do this you will need to
REDEFINE the INPUT. You do this by redefining a field’s position in the .FIELD or .FILLER section of
the LAYOUT. Unlike the asterisk (*), which means that a field simply follows the previous one,
redefining will cite a number that tells MultiLoad to take a certain portion of the INPUT file and jump
to the redefined position to back toward the beginning of the record.

A Script that Uses Redefining the Input

The following script uses the ability to define two record types in the same input data file. It uses
a .FILLER to define the code since it is never used in the SQL, only to determine which SQL to run.

/* !/bin/ksh* */ Load Runs from a Shell Script


61
Any words between /* … */
are comments only and are
not processed by Teradata.

.LOGTABLE SQL01.EmpDept_Log; Sets Up a Logtable and


.LOGON TDATA/SQL01,SQL01; Logs on to Teradata;
Optionally, specifies the
database to work in.

Identifies the 2 target tables;


Names the worktable and
error tables for each target
tables;
Note there is no comma
between the names of the
error tables but there is a
comma between the pair of
error tables.

Names and defines the


LAYOUT of the INPUT record.
The FILLER is for a field that
tells what type of record has
been read. Here that field
contains an “E” or a “D”. The
“E” tells MLOAD use the
Employee data and the “D” is
for department data.
The definition for Dept_Num
tells MLOAD to jump
backward to byte 2. Where
as the * for Emp_Num
defaulted to byte 2. So,
Emp_No and Dept_Num both
start at byte 2, but in
different types of records.
When Trans (byte position 1)
contains a “D”, the APPLY
uses the dept names and for
an “E” the APPLY uses the
employee data.
62
st
Names the 1 DML Label;
Tells MultiLoad to IGNORE
duplicate INSERT rows
because you do not want to
see them.
Tells MultiLoad to INSERT a
row into the 1st target table
but optionally does NOT
define the target table row
format. All the VALUES are
passed to the columns of the
Employee table IN THE
ORDER of that table’s row
format.

Names the 2nd DML Label;


Tells MultiLoad to UPDATE
the 2nd target table but
optionally does NOT define
that table’s row format.
When the VALUE of the
DeptNo equals that of the
Dept_No column of the
Department, then update the
Dept_Name column with the
DeptName from the INPUT
file.

.END MLOAD; Ends MultiLoad and logs off


.LOGOFF; of Teradata.

Figure 5-13

A DELETE MLOAD Script Using a Hard Coded Value

The next script demonstrates how to use the MultiLoad DELETE task. In this example, students no
longer enrolled in the university are being removed from the Student_Profile table, based upon the
registration date. The profile of any student who enrolled prior to this date will be removed.

.LOGTABLE RemoveLog; Identifies the Logtable and logs


.LOGON TDATA/SQL01,SQL01; onto Teradata with a valid logon
string.

.BEGIN DELETE MLOAD Begins MultiLoad in DELETE


TABLE Order_Table; mode and Names the target
table.

SQL DELETE statement does a


massive delete of order data for
orders placed prior to the hard
coded date in the WHERE clause.
Notice that this is not the
Primary Index. You CANNOT
DELETE in DELETE MLOAD mode
based upon the Primary Index.

.END MLOAD; Ends loading and logs off of


.LOGOFF;
63
Teradata.

Figure 5-14

How many differences from a MultiLoad IMPORT script readily jump off of the page at you? Here are
a few that we saw:
• At the beginning, you must specify the word “DELETE” in the .BEGIN MLOAD command. You
need not specify it in the .END MLOAD command.
• You will readily notice that this mode has no .DML LABEL command. Since it is focused on
just one absolute function, no APPLY clause is required so you see no .DML LABEL.
• Notice that the DELETE with a WHERE clause is an SQL function, not a MultiLoad command,
so it has no dot prefix.
• Since default names are available for worktables (WT_<target_tablename>) and error tables
(ET_<target_tablename> and UV_<target_tablename>), they need not be specifically
named, but be sure to define the Logtable.

Do not confuse the DELETE MLOAD task with the SQL delete task that may be part of a MultiLoad
IMPORT. The IMPORT delete is used to remove small volumes of data rows based upon the Primary
Index. On the other hand, the MultiLoad DELETE does global deletes on tables, bypassing the
Transient Journal. Because there is no Transient Journal, there are no rollbacks when the job fails for
any reason. Instead, it may be RESTARTed from a CHECKPOINT. Also, the MultiLoad DELETE task is
never based upon the Primary Index.

Because we are not importing any data rows, there is neither a need for worktables or an Acquisition
Phase. One DELETE statement is sent to all the AMPs with a match tag parcel. That statement will be
applied to every table row. If the condition is met, then the row is deleted. Using the match tags,
each target block is read once and the appropriate rows are deleted.

A DELETE MLOAD Script Using a Variable

This illustration demonstrates how passing the values of a data row rather than a hard coded value
may be used to help meet the conditions stated in the WHERE clause. When you are passing values,
you must add some additional commands that were not used in the DELETE example with hard
coded values. You will see .LAYOUT and .IMPORT INFILE in this script.

.LOGTABLE RemoveLog; Identifies the Logtable and logs onto


.LOGON TDATA/SQL01,SQL01; Teradata with a valid logon string.

.BEGIN DELETE MLOAD Begins the DELETE task and names


TABLE Order_Table; only one table, but still uses TABLES
option.

Names the LAYOUT and defines the


column whose value will be passed as
a single row to MultiLoad. In this case,
all of the order dates in the
Order_Table will be tested against this
OrdDate value.

The condition in the WHERE clause is


that the data rows with orders placed
prior to the date value (:OrdDate)
passed from the LAYOUT OldMonth
will be DELETEd from the Order_Table.

.IMPORT INFILE Note that this time there is no dot in


64
LAYOUT OldMonth ; front of LAYOUT in this clause since it
is only being referenced.

.END MLOAD; Ends loading and logs off of Teradata.


.LOGOFF;

Figure 5-15

An UPSERT Sample Script

The following sample script is provided to demonstrate how do an UPSERT — that is, to update a
table and if a row from the data source table does not exist in the target table, then insert a new
row. In this instance we are loading the Student_Profile table with new data for the next semester.
The clause “DO INSERT FOR MISSING UPDATE ROWS” indicates an UPSERT. The DML
statements that follow this option must be in the order of a single UPDATE statement followed by a
single INSERT statement.

Load Runs from a shell


script;
Any words between /* …
*/ are comments only
and are not processed by
Teradata;
Names and describes the
purpose of the script;
names the author.

/* Setup Logtable, Logon Statements*/ Sets Up a Logtable and


then logs on to Teradata.
.LOGTABLE SQL01.CDW_Log; Specifies the database to
.LOGON CDW/SQL01,SQL01; work in (optional).

DATABASE SQL01;

Begins the Load Process


by telling us first the
names of the target
table, work table and
error tables.

Names the LAYOUT of the


INPUT file;
An ALL CHARACTER
based flat file.
Defines the structure of
the INPUT file; Notice the
dots before the FIELD
command and the semi-
colons after each FIELD
definition;
/* Begin INSERT and UPDATE Process on Table */ Names the DML Label
Tells MultiLoad to
.DML LABEL UPSERTER INSERT a row if there is
DO INSERT FOR MISSING UPDATE ROWS; not one to be UPDATED,
i.e., UPSERT.
65
/* Without the above DO, one of these is guaranteed to
fail on this same table. If the UPDATE fails because rows
is missing, it corrects by doing the INSERT */

UPDATE SQL01.Student_Profile Defines the UPDATE.


SET Last_Name = :Last_Name
,First_Name = :First_Name
,Class_Code = :Class_Code
,Grade_Pt = :Grade_Pt Qualifies the UPDATE.
WHERE Student_ID = :Student_ID;
Defines the INSERT.
INSERT INTO SQL01.Student_Profile
VALUES ( :Student_ID We recommend placing
,:Last_Name comma separators in
,:First_Name front of the following
,:Class_Code column or value for
,:Grade_Pt ); easier debugging.

/* Specify IMPORT File and Apply Parameters */ Names the Import File
and it names the Layout
.IMPORT INFILE CDW_EXPORT.DAT file to use and tells
LAYOUT FILEIN MultiLoad to APPLY the
APPLY UPSERTER; UPSERTs.

.END MLOAD; Ends MultiLoad and logs


.LOGOFF; off of Teradata

Figure 5-16

What Happens when MultiLoad Finishes

MultiLoad Statistics

Figure 5-17

MultiLoad Output From and UPSERT


66
67
68
69
70
71

Figure 5-18

Troubleshooting MultiLoad Errors — More on the Error Tables

The output statistics in the above example indicate that the load was entirely successful. But that is
not always the case. Now we need to troubleshoot in order identify the errors and correct them, if
desired. Earlier on, we noted that MultiLoad generates two error tables, the Acquisition Error and the
Application error table. You may select from these tables to discover the problem and research the
issues.

For the most part, the Acquisition error table logs errors that occur during that processing phase.
The Application error table lists Unique Primary Index violations, field overflow errors on non-PI
columns, and constraint errors that occur in the APPLY phase. MultiLoad error tables not only list the
errors they encounter, they also have the capability to STORE those errors. Do you remember the
MARK and IGNORE parameters? This is where they come into play. MARK will ensure that the error
72
rows, along with some details about the errors are stored in the error table. IGNORE does neither;
it is as if the error never occurred.

THREE COLUMNS SPECIFIC TO THE ACQUISITION ERROR TABLE

ErrorCode System code that identifies the error.

ErrorField Name of the column in the target table where the error
happened; is Left blank if the offending column cannot be
identified.

HostData The data row that contains the error.

Figure 5-19

THREE COLUMNS SPECIFIC TO THE APPLICATION ERROR TABLE

Uniqueness Contains a certain value that disallows duplicate row errors in


this table; can be ignored, if desired.

DBCErrorCode System code that identifies the error.

DBCErrorField Name of the column in the target table where the error
happened; is left blank if the offending column cannot be
identified. NOTE: A copy of the target table column
immediately follows this column.

Figure 5-20

RESTARTing MultiLoad

Who hasn’t experienced a failure at some time when attempting a load? Don’t take it personally!
Failures can and do occur on the host or Teradata (DBC) for many reasons. MultiLoad has the
impressive ability to RESTART from failures in either environment. In fact, it requires almost no
effort to continue or resubmit the load job. Here are the factors that determine how it works:

First, MultiLoad will check the Restart Logtable and automatically resume the load process from the
last successful CHECKPOINT before the failure occurred. Remember, the Logtable is essential for
restarts. MultiLoad uses neither the Transient Journal nor rollbacks during a failure. That is why you
must designate a Logtable at the beginning of your script. MultiLoad either restarts by itself or waits
for the user to resubmit the job. Then MultiLoad takes over right where it left off.

Second, suppose Teradata experiences a reset while MultiLoad is running. In this case, the host
program will restart MultiLoad after Teradata is back up and running. You do not have to do a thing!

Third, if a host mainframe or network client fails during a MultiLoad, or the job is aborted, you may
simply resubmit the script without changing a thing. MultiLoad will find out where it stopped and
start again from that very spot.

Fourth, if MultiLoad halts during the Application Phase it must be resubmitted and allowed to run
until complete.

Fifth, during the Acquisition Phase the CHECKPOINT (n) you stipulated in the .BEGIN MLOAD clause
will be enacted. The results are stored in the Logtable. During the Application Phase, CHECKPOINTs
are logged each time a data block is successfully written to its target table.

HINT: The default number for CHECKPOINT is 15 minutes, but if you specify the CHECKPOINT as 60
or less, minutes are assumed. If you specify the checkpoint at 61 or above, the number of records is
assumed.
73
RELEASE MLOAD — When You DON'T Want to Restart MultiLoad

What if a failure occurs but you do not want to RESTART MultiLoad? Since MultiLoad has already
updated the table headers, it assumes that it still “owns” them. Therefore, it limits access to the
table(s). So what is a user to do? Well there is good news and bad news. The good news is that if
the job you may use the RELEASE MLOAD command to release the locks and rollback the job. The
bad news is that if you have been loading multiple millions of rows, the rollback may take a lot of
time. For this reason, most customers would rather just go ahead and RESTART.

Before V2R3: In the earlier days of Teradata it was NOT possible to use RELEASE MLOAD if one of
the following three conditions was true:
• In IMPORT mode, once MultiLoad had reached the end of the Acquisition Phase you could not
use RELEASE MLOAD. This is sometimes referred to as the “point of no return.”
• In DELETE mode, the point of no return was when Teradata received the DELETE statement.
• If the job halted in the Apply Phase, you will have to RESTART the job.

With and since V2R3: The advent of V2R3 brought new possibilities with regard to using the
RELEASE MLOAD command. It can NOW be used in the APPLY Phase, if:
• You are running a Teradata V2R3 or later version
• You use the correct syntax:
RELEASE MLOAD <target-table> IN APPLY
• The load script has NOT been modified in any way
• The target tables either:
° Must be empty, or
° Must have no Fallback, no NUSIs, no Permanent Journals

You should be very cautious using the RELEASE command. It could potentially leave your table half
updated. Therefore, it is handy for a test environment, but please don’t get too reliant on it for
production runs. They should be allowed to finish to guarantee data integrity.

MultiLoad and INMODs

INMODs, or Input Modules, may be called by MultiLoad in either mainframe or LAN environments,
providing the appropriate programming languages are used. INMODs are user written routines whose
purpose is to read data from one or more sources and then convey it to a load utility, here
MultiLoad, for loading into Teradata. They allow MultiLoad to focus solely on loading data by doing
data validation or data conversion before the data is ever touched by MultiLoad. INMODs replace the
normal MVS DDNAME or LAN file name with the following statement:

.IMPORT INMOD=<INMOD-name>

You will find a more detailed discussion on how to write INMODs for MultiLoad in “Teradata Utilities:
Breaking The Barriers”, Chapter 7.

How MultiLoad Compares with FastLoad

Function FastLoad MultiLoad

Error Tables must be defined Yes Optional.


2 Error Tables have to
exist for each target
table and will
74
automatically be
assigned.

Work Tables must be defined No Optional.


1 Error Table has to
exist for each target
table and will
automatically be
assigned.

Logtable must be defined No Yes

Allows Referential Integrity No No

Allows Unique Secondary Indexes No No

Allows Non-Unique Secondary Indexes No Yes

Allows Triggers No No

Loads a maximum of n number of One Five


tables

DML Statements Supported INSERT INSERT, UPDATE,


DELETE, and “UPSERT“

DDL Statements Supported CREATE and DROP DROP TABLE


TABLE

Transfers data in 64K blocks Yes Yes

Number of Phases Two Five

Is RESTARTable Yes Yes, in all 5 phases


(auto CHECKPOINT)

Stores UPI Violation Rows Yes Yes

Allows use of Aggregated, Arithmetic No Yes


calculations or Conditional
Exponentiation

Allows Data Conversion Yes, 1 per column Yes

NULLIF function Yes Yes

Figure 5-21

An Introduction to TPump

The chemistry of relationships is very interesting. Frederick Buechner once stated, “My assumption is
that the story of any one of us is in some measure the story of us all.” In this chapter, you will find
that TPump has similarities with the rest of the family of Teradata utilities. But this newer utility has
been designed with fewer limitations and many distinguishing abilities that the other load utilities do
not have.

Do you remember the first Swiss ArmyTM knife you ever owned? Aside from its original intent as a
compact survival tool, this knife has thrilled generations with its multiple capabilities. TPump is the
Swiss ArmyTM knife of the Teradata load utilities. Just as this knife was designed for small tasks,
TPump was developed to handle batch loads with low volumes. And, just as the Swiss ArmyTM knife
easily fits in your pocket when you are loaded down with gear, TPump is a perfect fit when you have
75
a large, busy system with few resources to spare. Let’s look in more detail at the many facets of
this amazing load tool.

Why It Is Called “TPump”

TPump is the shortened name for the load utility Teradata Parallel Data Pump. To understand this,
you must know how the load utilities move the data. Both FastLoad and MultiLoad assemble massive
volumes of data rows into 64K blocks and then moves those blocks. Picture in your mind the way
that huge ice blocks used to be floated down long rivers to large cities prior to the advent of
refrigeration. There they were cut up and distributed to the people. TPump does NOT move data in
the large blocks. Instead, it loads data one row at a time, using row hash locks. Because it locks
at this level, and not at the table level like MultiLoad, TPump can make many simultaneous, or
concurrent, updates on a table.

Envision TPump as the water pump on a well. Pumping in a very slow, gentle manner results in a
steady trickle of water that could be pumped into a cup. But strong and steady pumping results in a
powerful stream of water that would require a larger container. TPump is a data pump which, like
the water pump, may allow either a trickle-feed of data to flow into the warehouse or a strong and
steady stream. In essence, you may “throttle” the flow of data based upon your system and business
user requirements. Remember, TPump is THE PUMP!

TPump Has Many Unbelievable Abilities

Just in Time: Transactional systems, such those implemented for ATM machines or Point-of-Sale
terminals, are known for their tremendous speed in executing transactions. But how soon can you
get the information pertaining to that transaction into the data warehouse? Can you afford to wait
until a nightly batch load? If not, then TPump may be the utility that you are looking for! TPump
allows the user to accomplish near real-time updates from source systems into the Teradata data
warehouse.

Throttle-switch Capability: What about the throttle capability that was mentioned above? With
TPump you may stipulate how many updates may occur per minute. This is also called the statement
rate. In fact, you may change the statement rate during the job, “throttling up” the rate with a
higher number, or “throttling down” the number of updates with a lower one. An example:
Having this capability, you might want to throttle up the rate during the period from 12:00 noon to
1:30 PM when most of the users have gone to lunch. You could then lower the rate when they return
and begin running their business queries. This way, you need not have such clearly defined load
windows, as the other utilities require. You can have TPump running in the background all the time,
and just control its flow rate.

DML Functions: Like MultiLoad, TPump does DML functions, including INSERT, UPDATE and
DELETE. These can be run solo, or in combination with one another. Note that it also supports
UPSERTs like MultiLoad. But here is one place that TPump differs vastly from the other utilities:
FastLoad can only load one table and MultiLoad can load up to five tables. But, when it pulls data
from a single source, TPump can load more than 60 tables at a time! And the number of concurrent
instances in such situations is unlimited. That’s right, not 15, but unlimited for Teradata! Well OK,
maybe by your computer. I cannot imagine my laptop running 20 TPump jobs, but Teradata does not
care.

How could you use this ability? Well, imagine partitioning a huge table horizontally into multiple
smaller tables and then performing various DML functions on all of them in parallel. Keep in mind
that TPump places no limit on the number of sessions that may be established. Now, think of ways
you might use this ability in your data warehouse environment. The possibilities are endless.

More benefits: Just when you think you have pulled out all of the options on a Swiss ArmyTM knife,
there always seems to be just one more blade or tool you had not noticed. Similar to the knife,
TPump always seems to have another advantage in its list of capabilities. Here are several that relate
to TPump requirements for target tables. TPump allows both Unique and Non-Unique Secondary
76
Indexes (USIs and NUSIs), unlike FastLoad, which allows neither, and MultiLoad, which allows just
NUSIs. Like MultiLoad, TPump allows the target tables to either be empty or to be populated with
data rows. Tables allowing duplicate rows (MULTISET tables) are allowed. Besides this, Referential
Integrity is allowed and need not be dropped. As to the existence of Triggers, TPump says, “No
problem!”

Support Environment compatibility: The Support Environment (SE) works in tandem with TPump
to enable the operator to have even more control in the TPump load environment. The SE
coordinates TPump activities, assists in managing the acquisition of files, and aids in the processing
of conditions for loads. The Support Environment aids in the execution of DML and DDL that occur in
Teradata, outside of the load utility.

Stopping without Repercussions: Finally, this utility can be stopped at any time and all of locks
may be dropped with no ill consequences. Is this too good to be true? Are there no limits to this load
utility? TPump does not like to steal any thunder from the other load utilities, but it just might
become one of the most valuable survival tools for businesses in today’s data warehouse
environment.

TPump Has Some Limits

TPump has rightfully earned its place as a superstar in the family of Teradata load utilities. But this
does not mean that it has no limits. It has a few that we will list here for you:

Rule #1: No concatenation of input data files is allowed. TPump is not designed to support
this.

Rule #2: TPump will not process aggregates, arithmetic functions or exponentiation. If you
need data conversions or math, you might consider using an INMOD to prepare the data prior to
loading it.

Rule #3: The use of the SELECT function is not allowed. You may not use SELECT in your SQL
statements.

Rule #4: No more than four IMPORT commands may be used in a single load task. This
means that a most, four files can be directly read in a single run.

Rule #5: Dates before 1900 or after 1999 must be represented by the yyyy format for the
year portion of the date, not the default format of yy. This must be specified when you create
the table. Any dates using the default yy format for the year are taken to mean 20th century years.

Rule #6: On some network attached systems, the maximum file size when using TPump is
2GB. This is true for a computer running under a 32-bit operating system.

Rule #7: TPump performance will be diminished if Access Logging is used. The reason for
this is that TPump uses normal SQL to accomplish its tasks. Besides the extra overhead incurred, if
you use Access Logging for successful table updates, then Teradata will make an entry in the Access
Log table for each operation. This can cause the potential for row hash conflicts between the Access
Log and the target tables.

Supported Input Formats

TPump, like MultiLoad, supports the following five format options: BINARY, FASTLOAD, TEXT,
UNFORMAT and VARTEXT. But TPump is quite finicky when it comes to data format errors. Such
errors will generally cause TPump to terminate. You have got to be careful! In fact, you may specify
an Error Limit to keep TPump from terminating prematurely when faced with a data format error.
You can specify a number (n) of errors that are to be tolerated before TPump will halt. Here is a data
format chart for your reference:
77

BINARY Each record is a 2-byte integer, n, that is followed by n bytes of


data. A byte is the smallest XXX you can have in Teradata.

FASTLOAD This format is the same as Binary, plus a marker (X ‘0A’ or X ‘0D’)
that specifies the end of the record.

TEXT Each record has a random number of bytes and is followed by an


end of the record marker.

UNFORMAT The format for these input records is defined in the LAYOUT
statement of the MultiLoad script using the components FIELD,
FILLER and TABLE.

VARTEXT This is variable length text RECORD format separated by delimiters


such as a comma. For this format you may only use VARCHAR,
LONG VARCHAR (IBM) or VARBYTE data formats in your MultiLoad
LAYOUT. Note that two delimiter characters in a row will result in a
null value between them.

Figure 6-1

TPump Commands and Parameters

Each command in TPump must begin on a new line, preceded by a dot. It may utilize several lines,
but must always end in a semi-colon. Like MultiLoad, TPump makes use of several optional
parameters in the .BEGIN LOAD command. Some are the same ones used by MultiLoad. However,
TPump has other parameters. Let’s look at each group.

LOAD Parameters IN COMMON with MultiLoad

PARAMETER WHAT IT DOES

ERRLIMIT errcount [errpercent] You may specify the maximum number of


errors, or the percentage, that you will
tolerate during the processing of a load job.
The key point here is that you should set the
ERRLIMIT to a number greater than the PACK
number. The reason for this is that
sometimes, if the PACK factor is a smaller
number than the ERRLIMIT, the job will
terminate, telling you that you have gone
over the ERRLIMIT. When this happens,
there will be no entries in the error tables.
CHECKPOINT (n) In TPump, the CHECKPOINT refers to the
number of minutes, or frequency, at which
you wish a checkpoint to occur. This is unlike
MultiLoad which allows either minutes or the
number of rows.
SESSIONS (n) This refers to the number of SESSIONS that
should be established with Teradata. TPump
places no limit on the number of SESSIONS
you may have. For TPump, the optimal
number of sessions is dependent on your
needs and your host computer (like a
laptop).
TENACITY Tells TPump how many hours to try logging
78
on when less than the requested number of
sessions is available.
SLEEP Tells TPump how frequently, in minutes, to
try logging on to the system.

Figure 6-2

.BEGIN LOAD Parameters UNIQUE to TPump


MACRODB This parameter identifies a database that will contain any
<databasename> macros utilized by TPump. Remember, TPump does not run
the SQL statements by itself. It places them into Macros and
executes those Macros for efficiency.

NOMONITOR Use this parameter when you wish to keep TPump from
checking either statement rates or update status information
for the TPump Monitor application.

PACK (n) Use this to state the number of statements TPump will “pack”
into a multiple-statement request. Multi-statement requests
improve efficiency in either a network or channel
environment because it uses fewer sends and receives
between the application and Teradata.

RATE This refers to the Statement Rate. It shows the initial


maximum number of statements that will be sent per minute.
A zero or no number at all means that the rate is unlimited.
If the Statement Rate specified is less than the PACK
number, then TPump will send requests that are smaller than
the PACK number.

ROBUST ON/OFF ROBUST defines how TPump will conduct a RESTART.


ROBUST ON means that one row is written to the Logtable
for every SQL transaction. The downside of running TPump in
ROBUST mode is that it incurs additional, and possibly
unneeded, overhead. ON is the default. If you specify
ROBUST OFF, you are telling TPump to utilize “simple”
RESTART logic: Just start from the last successful
CHECKPOINT. Be aware that if some statements are
reprocessed, such as those processed after the last
CHECKPOINT, then you may end up with extra rows in your
error tables. Why? Because some of the statements in the
original run may have already have found errors, in which
case they would have recorded those errors in an error table.

SERIALIZE OFF/ON You only use the SERIALIZE parameter when you are going
to specify a PRIMARY KEY in the .FIELD command. For
example, “.FIELD Salaryrate * DECIMAL KEY.” If you specify
SERIALIZE TPump will ensure that all operations on a row will
occur serially. If you code “SERIALIZE”, but do not specify
ON or OFF, the default is ON. Otherwise, the default is OFF
unless doing an UPSERT.

Figure 6-3

A Simple TPump Script — A Look at the Basics


• Setting up a Logtable and Logging onto Teradata
• Begin load process, add Parameters, naming the error table
79
• Defining the INPUT flat file
• Defining the DML activities to occur
• Naming the IMPORT file and defining its FORMAT
• Telling TPump to use a particular LAYOUT
• Telling the system to start loading data rows
• Finishing loading and logging off of Teradata

The following script assumes the existence of a Student_Names table in the SQL01 database. You
may use pre-existing target tables when running TPump or TPump may create the tables for you. In
most instances you will use existing tables. The CREATE TABLE statement for this table is listed for
your convenience.

Much of the TPump command structure should look quite familiar to you. It is quite similar to
MultiLoad. In this example, the Student_Names table is being loaded with new data from the
university’s registrar. It will be used as an associative table for linking various tables in the data
warehouse.

/* This script inserts rows into a table called Sets Up a Logtable and then logs on
student_names from a single file */ with .RUN.

.LOGTABLE WORK_DB.LOG_PUMP; The logon.txt file contains: .logon


.RUN FILE C:\mydir\logon.txt; TDATA/SQL01,SQL01;.
DATABASE SQL01; Also specifies the database to find the
necessary tables.

.BEGIN LOAD Begins the Load Process;


ERRLIMIT 5 Specifies optional parameters.
CHECKPOINT 1
SESSIONS 64
TENACITY 2
PACK 40
RATE 1000
ERRORTABLE SQL01.ERR_PUMP; ERRORTABLE names the error table
for this run.

Names the LAYOUT of the INPUT


record;
Notice the dots before the .FIELD and
.FILLER commands and the semi-
colons after each FIELD definition.
Also, the more_junk field moves the
field pointer to the start of the
First_name data.
Notice the comment in the script.
80
Names the DML Label
Tells TPump to INSERT a row into the
target table and defines the row
format;
Comma separators are placed in front
of the following column or value for
easier debugging.
Lists, in order, the VALUES to be
INSERTed. Colons precede VALUEs.

Names the IMPORT file;


Names the LAYOUT to be called from
above; tells TPump which DML Label
to APPLY.

.END LOAD; Tells TPump to stop loading and logs


.LOGOFF; off all sessions.

Figure 6-4

Step One: Setting up a Logtable and Logging onto Teradata — First, you define the Logtable
using the .LOGTABLE command. We have named it LOG_PUMP in the WORK_DB database. The
Logtable is automatically created for you. It may be placed in any database by qualifying the table
name with the name of the database by using syntax like this: <databasename>.<tablename>

Next, the connection is made to Teradata. Notice that the commands in TPump, like those in
MultiLoad, require a dot in front of the command key word.

Step Two: Begin load process, add Parameters, naming the Error Table — Here, the script
reveals the parameters requested by the user to assist in managing the load for smooth operation. It
also names the one error table, calling it SQL01.ERR_PUMP. Now let’s look at each parameter:
• ERRLIMIT 5 says that the job should terminate after encountering five errors. You may set
the limit that is tolerable for the load.
• CHECKPOINT 1 tells TPump to pause and evaluate the progress of the load in increments of
one minute. If the factor is between 1 and 60, it refers to minutes. If it is over 60, then it
refers to the number of rows at which the checkpointing should occur.
• SESSIONS 64 tells TPump to establish 64 sessions with Teradata.
• TENACITY 2 says that if there is any problem establishing sessions, then to keep on trying
for a period of two hours.
• PACK 40 tells TPump to “pack” 40 data rows and load them at one time.
• RATE 1000 means that 1,000 data rows will be sent per minute.

Step Three: Defining the INPUT flat file structure — TPump, like MultiLoad, needs to know the
structure the INPUT flat file record. You use the .LAYOUT command to name the layout. Following
that, you list the columns and data types of the INPUT file using the .FIELD, .FILLER or .TABLE
commands. Did you notice that an asterisk is placed between the column name and its data type?
This means to automatically calculate the next byte in the record. It is used to designate the starting
location for this data based on the previous field’s length. If you are listing fields in order and need
to skip a few bytes in the record, you can either use the .FILLER with the correct number of bytes as
character to position to the cursor to the next field, or the “*” can be replaced by a number that
equals the lengths of all previous fields added together plus 1 extra byte. When you use this
technique, the .FILLER is not needed. In our example, this says to begin with Student_ID, continue
on to load Last_Name, and finish when First_Name is loaded.
81
Step Four: Defining the DML activities to occur — At this point, the .DML LABEL names and
defines the SQL that is to execute. It also names the columns receiving data and defines the
sequence in which the VALUES are to be arranged. In our example, TPump is to INSERT a row into
the SQL01.Student_NAMES. The data values coming in from the record are named in the VALUES
with a colon prior to the name. This provides the PE with information on what substitution is to take
place in the SQL. Each LABEL used must also be referenced in an APPLY clause of the .IMPORT
clause.

Step Five: Naming the INPUT file and defining its FORMAT — Using the .IMPORT INFILE
command, we have identified the INPUT data file as “CDW_Export.txt”. The file was created using
the TEXT format.

Step Six: Associate the data with the description — Next, we told the IMPORT command to use
the LAYOUT called, “FILELAYOUT.”

Step Seven: Telling TPump to start loading — Finally, we told TPump to APPLY the DML LABEL
called INSREC — that is, to INSERT the data rows into the target table.

Step Seven: Finishing loading and logging off of Teradata — The .END LOAD command tells
TPump to finish the load process. Finally, TPump logs off of the Teradata system.

TPump Script with Error Treatment Options

/* !/bin/ksh* */ Load with a Shell Script

/* ++++++++++++++++++++++++++++++*/ Names and describes the


/* TPUMP SCRIPT - CDW */ purpose of the script; names
/*This script loads SQL01.Student_Profile4 */ the author.
/* Version 1.1 */
/* Created by Coffing Data Warehousing */
/*+++++++++++++++++++++++++++++++*/

/* Setup the TPUMP Logtables, Logon Statements and Sets up a Logtable and then
Database Default */ logs on to Teradata.

.LOGTABLE SQL01.LOG_PUMP; Specifies the database


.LOGON CDW/SQL01,SQL01; containing the table.
DATABASE SQL01;

/* Begin Load and Define TPUMP Parameters and Error


Tables */

.BEGIN LOAD BEGINS THE LOAD PROCESS


ERRLIMIT 5
CHECKPOINT 1 SPECIFIES MULTIPLE
SESSIONS 1 PARAMETERS TO AID IN
TENACITY 2 PROCESS CONTROL
PACK 40
RATE 1000 NAMES THE ERRROR TABLE;
ERRORTABLE SQL01.ERR_PUMP; TPump HAS ONLY ONE
ERROR TABLE.

.LAYOUT FILELAYOUT; Names the LAYOUT of the


.FIELD Student_ID * VARCHAR (11); INPUT file.
.FIELD Last_Name * VARCHAR (20);
.FIELD First_Name * VARCHAR (14); Defines the structure of the
82
.FIELD Class_Code * VARCHAR (2);
.FIELD Grade_Pt * VARCHAR (8); INPUT file; here, all Variable
CHARACTER data and the file
has a comma delimiter.
See .IMPORT below for file
type and the declaration of
the delimiter.

.DML LABEL INSREC Names the DML Label;


IGNORE DUPLICATE ROWS SPECIFIES 3 ERROR
IGNORE MISSING ROWS TREATMENT OPTIONS with
IGNORE EXTRA ROWS; the ; after the last option.

INSERT INTO Student_Profile4 Tells TPump to INSERT a row


( Student_ID into the target table and
,Last_Name defines the row format.
,First_Name
,Class_Code Note that we place comma
,Grade_Pt ) separators in front of the
VALUES following column or value for
( :Student_ID easier debugging.
,:Last_Name
,:First_Name Lists, in order, the VALUES
,:Class_Code to be INSERTed. A colon
,:Grade_Pt ); always precedes values.

.IMPORT INFILE CDW_Export.txt Names the IMPORT file;


FORMAT VARTEXT ‘,’ Names the LAYOUT to be
LAYOUT FILELAYOUT called from above; Tells
APPLY INSREC; TPump which DML Label to
APPLY.
Notice the FORMAT with a
comma in the quotes to
define the delimiter between
fields in the input record.

.END LOAD; Tells TPump to stop loading


.LOGOFF; and Logs Off all sessions.

Figure 6-5

TPump Output Statistics

This illustration shows the actual TPump statistics for the sample script above. Notice how well
TPump breaks out what happened during each part of the load process.
83
84
85
86
87

Figure 6-6

A TPump Script that Uses Two Input Data Files

Load Runs from a Shell


Script

Names and describes the


purpose of the script;
names the author.

.LOGTABLE SQL01.LOG_TPMP; Sets Up a Logtable and


.LOGON CDW/SQL01,SQL01; then logs on to Teradata.
Specifies the database to
DATABASE SQL01; work in (optional).
88
Begins the load process
Specifies multiple
parameters to aid in load
management
Names the error table;
TPump HAS ONLY ONE
ERROR TABLE PER
TARGET TABLE

Defines the LAYOUT for


the 1st INPUT file also
has the indicators for
NULL data.

Defines the LAYOUT for


the 2nd INPUT file with a
different arrangement of
fields

Names the 1st DML Label


and specifies 2 Error
Treatment options.
Tells TPump to INSERT a
row into the target table
and defines the row
format.
Lists, in order, the
VALUES to be INSERTed.
A colon always precedes
values.

Names the 2nd DML Label


and specifies 1 Error
Treatment options.
Tells TPump to INSERT a
row into the target table
and defines the row
format.
Lists, in order, the
VALUES to be INSERTed.
A colon always precedes
values.
89
Names the TWO Import
Files as FILE-REC1.DAT
and FILE-REC2.DAT. The
file name is under
Windows so the “-“ is
fine.
Names the TWO Layouts
that define the structure
of the INPUT DATA files;
Names the TWO INPUT
data files

.END LOAD; Tells TPump to stop


.LOGOFF; loading and logs off all
sessions.

Figure 6-7

A TPump UPSERT Sample Script

Sets Up a Logtable and


then logs on to Teradata.

Begins the load process


Specifies multiple
parameters to aid in load
management
Names the error table;
TPump HAS ONLY ONE
ERROR TABLE PER
TARGET TABLE

Defines the LAYOUT for


the 1st INPUT file; also
has the indicators for
NULL data.
90
st
Names the 1 DML Label
and specifies 2 Error
Treatment options.
Tells TPump to INSERT a
row into the target table
and defines the row
format.
Lists, in order, the
VALUES to be INSERTed.
A colon always precedes
values.

Names the Import File as


UPSERT-FILE.DAT. The
file name is under
Windows so the “-“ is
fine.
The file type is FASTLOAD.

.END LOAD; Tells TPump to stop


.LOGOFF; loading and logs off all
sessions.

Figure 6-8

The following is the output from the above UPSERT:


91
92

Figure 6-9

NOTE: The above UPSERT uses the same syntax as MultiLoad. This continues to work. However,
there might soon be another way to accomplish this task. NCR has built an UPSERT and we have
tested the following statement, without success:
93

We are not sure if this will be a future technique for coding a TPump UPSERT, or if it is handled
internally. For now, use the original coding technique.

Monitoring TPump

TPump comes with a monitoring tool called the TPump Monitor. This tool allows you to check the
status of TPump jobs as they run and to change (remember “throttle up” and “throttle down?”) the
statement rate on the fly. Key to this monitor is the “SysAdmin.TpumpStatusTbl” table in the Data
Dictionary Directory. If your Database Administrator creates this table, TPump will update it on a
minute-by-minute basis when it is running. You may update the table to change the statement rate
for an IMPORT. If you want TPump to run unmonitored, then the table is not needed.

You can start a monitor program under UNIX with the following command:

Below is a chart that shows the Views and Macros used to access the “SysAdmin.TpumpStatusTbl”
table. Queries may be written against the Views. The macros may be executed.

Views and Macros to access the table SysAdmin.TpumpStatusTbl

View SysAdmin.TPumpStatus
View SysAdmin.TPumpStatusX
Macro Sysadmin.TPumpUpdateSelect
Macro TPumpMacro.UserUpdateSelect

Figure 6-10

Handling Errors in TPump Using the Error Table

One Error Table

Unlike FastLoad and MultiLoad, TPump uses only ONE Error Table per target table, not two. If
you name the table, TPump will create it automatically. Entries are made to these tables whenever
errors occur during the load process. Like MultiLoad, TPump offers the option to either MARK errors
(include them in the error table) or IGNORE errors (pay no attention to them whatsoever). These
options are listed in the .DML LABEL sections of the script and apply ONLY to the DML functions in
that LABEL. The general default is to MARK. If you specify nothing, TPump will assume the default.
When doing an UPSERT, this default does not apply.
94
The error table does the following:
• Identifies errors
• Provides some detail about the errors
• Stores a portion the actual offending row for debugging

When compared to the error tables in MultiLoad, the TPump error table is most similar to the
MultiLoad Acquisition error table. Like that table, it stores information about errors that take place
while it is trying to acquire data. It is the errors that occur when the data is being moved, such as
data translation problems that TPump will want to report on. It will also want to report any
difficulties compiling valid Primary Indexes. Remember, TPump has less tolerance for errors than
FastLoad or MultiLoad.

COLUMNS IN THE TPUMP ERROR TABLE

ImportSeq Sequence number that identifies the IMPORT command where


the error occurred

DMLSeq Sequence number for the DML statement involved with the
error

SMTSeq Sequence number of the DML statement being carried out


when the error was discovered

ApplySeq Sequence number that tells which APPLY clause was running
when the error occurred

SourceSeq The number of the data row in the client file that was being
built when the error took place

DataSeq Identifies the INPUT data source where the error row came
from

ErrorCode System code that identifies the error

ErrorMsg Generic description of the error

ErrorField Number of the column in the target table where the error
happened; is left blank if the offending column cannot be
identified; This is different from MultiLoad, which supplies the
column name.

HostData The data row that contains the error, limited to the first
63,728 bytes related to the error

Figure 6-11

Common Error Codes and What They Mean

TPump users often encounter three error codes that pertain to:
• Missing data rows
• Duplicate data rows
• Extra data rows

Become familiar with these error codes and what they mean. This could save you time getting to the
root of some common errors you could see in your future!

#1: Error 2816: Failed to insert duplicate row into TPump Target Table.
95
Nothing is wrong when you see this error. In fact, it can be a very good thing. It means that
TPump is notifying you that it discovered a DUPLICATE row. This error jumps to life when one of the
following options has been stipulated in the .DML LABEL:
• MARK DUPLICATE INSERT ROWS
• MARK DUPLICATE UPDATE ROWS

Note that the original row will be inserted into the target table, but the duplicate row will not.

#2: Error 2817: Activity count greater than ONE for TPump UPDATE/DELETE.

Sometimes you want to know if there were too may “successes.” This is the case when there are
EXTRA rows when TPump is attempting an UPDATE or DELETE.

TPump will log an error whenever it sees an activity count greater than zero for any such extra rows
if you have specified either of these options in a .DML LABEL:
• MARK EXTRA UPDATE ROWS
• MARK EXTRA DELETE ROW

At the same time, the associated UPDATE or DELETE will be performed.

#3: Error 2818: Activity count zero for TPump UPDATE or DELETE.

Sometimes, you want to know if a data row that was supposed to be updated or deleted wasn’t! That
is when you want to know that the activity count was zero, indicating that the UPDATE or DELETE
did not occur. To see this error, you must have used one of the following parameters:
• MARK MISSING UPDATE ROWS
• MARK MISSING DELETE ROWS

RESTARTing TPump

Like the other utilities, a TPump script is fully restartable as long as the log table and error tables are
not dropped. As mentioned earlier you have a choice of setting ROBUST either ON (default) or OFF.
There is more overhead using ROBUST ON, but it does provide a higher degree of data integrity, but
lower performance.

TPump and MultiLoad Comparison Chart

Function MultiLoad TPump

Error Tables must be defined Optional, 2 per target Optional, 1 per target
table table
Work Tables must be defined Optional, 1 per target No
table
Logtable must be defined Yes Yes
Allows Referential Integrity No Yes
Allows Unique Secondary Indexes No Yes
Allows Non-Unique Secondary Yes Yes
Indexes
Allows Triggers No Yes
Loads a maximum of n number of Five 60
96
tables
Maximum Concurrent Load 15 Unlimited
Instances
Locks at this level Table Row Hash
DML Statements Supported INSERT, UPDATE, INSERT, UPDATE,
DELETE, “UPSERT“ DELETE, “UPSERT“
How DML Statements are Runs actual DML Compiles DML into
Performed commands MACROS and executes
DDL Statements Supported All All
Transfers data in 64K blocks Yes No, moves data at row
level
RESTARTable Yes Yes
Stores UPI Violation Rows Yes, with MARK option Yes, with MARK option
Allows use of Aggregated, No No
Arithmetic calculations or
Conditional Exponentiation
Allows Data Conversion Yes Yes
Performance Improvement As data volumes increase By using multi-statement
requests
Table Access During Load Uses WRITE lock on tables Allows simultaneous
in Application Phase READ and WRITE access
due to Row Hash Locking
Effects of Stopping the Load Consequences No repercussions
Resource Consumption Hogs available resources Allows consumption
management via
Parameters

Figure 6-12

What is an INMOD?

When data is being loaded or incorporated into the Teradata Relational Database Management
System (RDBMS), the processing of the data is performed by the utility. All of the NCR Teradata
RDBMS utilities are able to read files that contain a variety of formatted and unformatted data. They
are able to read from disk and from tape. These files and devices must support a sequential access
method. Then, the utility is responsible for incorporating the data into SQL for use by Teradata.
However, there are times when it is advantageous or even necessary to use a different access
technique or a special device.

When special input processing is desired, than an INMOD (acronym for INput MODule) is a potential
approach to solving the problem. An INMOD is written to perform the input of the data from a data
source. It removes the responsibility of performing input data from the utility. Many times an INMOD
is written because the utility is not capable of performing the particular input processing. Other
times, it is written for convenience.

The INMOD is a user written routine to do the specialized access from the file system, device or
database. The INMOD does not replace the utility; it becomes a part of and an extension of the
utility. The major difference is that instead of the utility receiving the data directly, it receives the
data from the INMOD. An INMOD can be written to work with FastLoad, MultiLoad, TPump and
FastExport.
97
As an example, an INMOD might be written to access the data directly from another RDBMS
besides Teradata. It would be written to do the following steps:

1. Connect to the RDBMS


2. Retrieve a row using a SELECT or DECLARE CURSOR
3. Pass the row to the utility
4. Loop back and do steps 2 & 3 until there is no more data
5. When there is no more data, disconnect from the RDBMS

How an INMOD Works

An INMOD is sometimes called an exit routine. This is because the utility exits itself by calling the
INMOD and passing control to it. The INMOD performs its processing and exits back as its method for
passing the data back to the utility.

The following diagram illustrates the normal logic flow when using the utility:

The following diagram illustrates the logic flow when using an INMOD with the utility:

As seen in the above diagrams, there is an extra step involved with the processing of an INMOD. On
the other hand, it can eliminate the need to create an intermediate file by literally using another
RDBMS as its data source. However, the user still scripts and executes the utility, like when using a
file, that portion does not change.

The following chart shows the appropriate languages for mainframe and network-attached systems:
written in.

Operating System Programming Language

VM or MVS Assembler, COBOL, SAS/C or IBM PL/I


UNIX or Windows C (although not supported, MicroFocus COBOL can be used)
98
Figure 7-1

Calling an INMOD from FastLoad

As shown in the diagrams above, the user still executes the utility and the utility is responsible for
calling the INMOD. Therefore, the utility needs an indication from the user that it is supposed to call
the INMOD instead of reading a file.

Normally the utility script contains the name of the file or JCL statement (DDNAME). When using an
INMOD, the file designation is no longer specified. Instead, the name of the program to call is
defined in the script.

The following chart indicates the appropriate statement to define the INMOD:

Utility Name Statement (replaces FILE or DDNAME)

FastLoad DEFINE INMOD=<INMOD-name>


MultiLoad, TPump and FastExport .IMPORT INMOD=<INMOD-name>

Figure 7-2

Writing an INMOD

The writing of an INMOD is primarily concerned with processing an input data source. However, it
cannot do the processing haphazardly. It must wait for the utility to tell it what and when to perform
every operation.

It has been previously stated that the INMOD returns data to the utility. At the same time, the utility
needs to know that it is expecting to receive the data. Therefore, a high degree of handshake
processing is necessary for the two components (INMOD and utility) to know what is expected.

As well as passing the data, a status code is sent back and forth between the utility and the INMOD.
As with all processing, we hope for a successful completion. Earlier in this book, it was shown that a
zero status code indicates a successful completion. That same situation is true for communications
between the utility and the INMOD.

Therefore, a memory area must be allocated that is shared between the INMOD and the utility. The
area contains the following elements:

1. The return or status code


2. The length of the data that follows
3. The data area

Writing for FastLoad

The following charts show the various programming statements to define the data elements, status
codes and other considerations for the various programming languages.

Parameter definition for FastLoad

Assembler
99
C Struct {
Long retcode;
Long retlength;
char buffer(<data-length>);
COBOL 01 PARM-REC.
03 RETCODE PIC S9(9) COMP.
03 RETLENGTH PIC 9(9) COMP.
03 RETDATA PIC X(<data-length>).
PL/I DCL 1 PARM-REC,
10 RETCODE FIXEDBINARY(31,0)
10 RETCODE FIXEDBINARY(31,0)
10 RETDATA PIC X(<data-length>)

Figure 7-3

Return/status codes from FastLoad to the INMOD

Value Indicates that . . .

0 FastLoad is calling the INMOD for the first time. The INMOD should
open/connect to the data source, read the first record and return it to
FastLoad.
1 FastLoad is calling for the next record. The INMOD should read the next record
and return it to FastLoad.
2 FastLoad and the INMOD failed and have been restarted. The INMOD should
use the saved record count to reposition in the input data source to where it
left off. Since checkpoint is optional in FastLoad, it must be requested in the
script. This also means that for values 0 and 1, the INMOD must count each
record and save the record count for use if needed. Do not return a record to
FastLoad.
3 FastLoad has written a checkpoint. The INMOD should guarantee that the
record count has been written to disk. Do not return a record to FastLoad.
4 The Teradata RDBMS failed. The INMOD should use the saved record count to
reposition in the input data source to where it left off. Do not return a record
to FastLoad.
5 FastLoad has finished loading the data to Teradata. The INMOD should cleanup
and end.

Figure 7-4

Return/status codes for the INMOD to FastLoad

Value Indicates that . . .

0 The INMOD is returning data to the utility.


Not 0 The utility is at end of file.

Figure 7-5

Entry point for FastLoad used in the DEFINE:


100
SAS/C <dynamic-name-by-user>
All others BLKEXIT

Figure 7-6

NCR Corporation provides two examples for writing a FastLoad INMOD. The first is called BLKEXIT.C,
which does not contain the checkpoint and restart logic, and the other is BLKEXITR.C that does
contain both checkpoint and restart logic.

Writing for MultiLoad, TPump and FastExport

The following charts show the data statements used to define the two parameter areas for the
various languages.

First Parameter definition for MultiLoad, TPump and FastExport to the INMOD

Assembler

C Struct {
long retcode;
long retlength;
char buffer(<data-length>);
COBOL 01 PARM-REC.
03 RETCODE PIC S9(9) COMP.
03 RETLENGTH PIC 9(9) COMP.
03 RETDATA PIC X(<data-length>).
PL/I DCL 1 PARM-REC,
10 RETCODE FIXED BINARY(31,0)
10 RETCODE FIXED BINARY(31,0)
10 RETDATA PIC X(<data-length>)

Figure 7-7

Second Parameter definition for INMOD to MultiLoad, TPump and FastExport

Assembler

C Struct {
long iseqnum;
short ilength;
char ibuffer(<data-length>);
COBOL 01 PARM-REC.
03 ISEQNUM PIC 9(9) COMP.
03 ILENGTH PIC 9(9) COMP.
101
03 IDATA PIC X(<data-length>).
PL/I DCL 1 PARM-REC,
10 ISEQNUM FIXED BINARY(31,0)
10 ILENGTH FIXED BINARY(15,0)
10 IDATA PIC X(<data-length>)

Figure 7-8

Return/status codes for MultiLoad, TPump and FastExport to the INMOD

Value Indicates that . . .

0 The utility is calling the INMOD for the first time. The INMOD should
open/connect to the data source, read the first record and return it to the
utility.
1 The utility is calling for the next record. The INMOD should read the next
record and return it to the utility.
2 The utility and the INMOD failed and have been restarted. The INMOD should
use the saved record count to reposition in the input data source to where it
left off. Since checkpoint is optional in The utility, it must be requested in the
script. This also means that for values 0 and 1, the INMOD must count each
record and save the record count for use if needed. Do not return a record to
the utility.
3 The utility needs to write a checkpoint. The INMOD should guarantee that the
record count has been written to disk and return it to the utility in the second
parameter to be stored in the LOGTABLE. Do not return a record to the utility.
4 The Teradata RDBMS failed. The INMOD should receive the record count from
the utility in the second parameter for use in repositioning in the input data
source to where it left off. Do not return a record to the utility.
5 The utility has finished loading the data to Teradata. The INMOD should
cleanup and end.
6 The INMOD should initialize prepare to receive the first data record from the
utility.
7 The INMOD should receive the next data record from the utility.

Figure 7-9

The following diagram shows how to use the return codes of 6 and 7:

Return/status codes for the INMOD to MultiLoad, TPump and FastExport:

Value Indicates that . . .


102
0 The INMOD is returning data to the utility.
Not 0 The utility is at end of file.

Figure 7-10

Entry point for MultiLoad, TPump and FastExport:

All languages <dynamic-name-by-user>

Figure 7-11

Migrating an INMOD

As seen in figures 7-4 and 7-9, many of the return codes are the same. However, it should also be
noted that FastLoad must remember the record count in case a restart is needed. Where as, the
other utilities send the record count to the INMOD. If the INMOD fails to accept the record count
when sent to it, the job will abort or hang and never finish successfully.

This means that if a FastLoad INMOD is used in one of the other utilities, it will work as long as the
utility never requests that a checkpoint take place. Remember that unlike FastLoad, the newer
utilities default to a checkpoint every 15 minutes. The only way to turn it off is to set the
CHECKPOINT option of the .BEGIN to a number than is higher than the number of records being
processed.

Therefore, it is not the best practice to simply use a FastLoad INMOD as if it is interchangeable. It is
better to modify the INMOD logic for the restart and checkpoint processing necessary to receive the
record count and use it for the repositioning operation.

Writing a NOTIFY Routine

As seen earlier in this book, there is a NOTIFY statement. If the standard values are acceptable, you
should use them. However, if they are not, you may write your own NOTIFY routine.

If you chose to do this, refer to the NCR Utilities manual for guidance for writing this processing. We
just want you to know here that it is something you can do.

Sample INMOD

Below is and example of the PROCEDURE DIVISION commands that might be used for MultiLoad,
TPump or FastExport.
PROCEDURE DIVISION USING PARM-1, PARM-2.
BEGIN.
MAIN.
{ specific user processing goes here, followed by: }
IF RETCODE= 0 THEN
DISPLAY “INMOD RECEIVED — RETURN CODE 0 — INITIALIZE & READ “
PERFORM 100-OPEN-FILES
PERFORM 200-READ-INPUT
GOBACK
ELSE
IF RETCODE= 1 THEN
DISPLAY “INMOD RECEIVED — RETURN CODE 1- READ”
PERFORM 200-READ-INPUT
GOBACK
ELSE
IF RETCODE= 2 THEN
103
DISPLAY “INMOD RECEIVED — RETURN CODE 2 — RESTART “
PERFORM 900-GET-REC-COUNT
PERFORM 950-FAST-FORWARD-INPUT
GOBACK
ELSE
IF RETCODE= 3 THEN
DISPLAY “INMOD RECEIVED — RETURN CODE 3 — CHECKPOINT “
PERFORM 600-SAVE-REC-COUNT
GOBACK
ELSE
IF RETCODE= 5 THEN
DISPLAY “INMOD RECEIVED — RETURN CODE 5 — DONE “
MOVE 0 TO RETLENGTH
MOVE 0 TO RETCODE
GOBACK
ELSE
DISPLAY “INMOD RECEIVED – INVALID RETURN CODE “
MOVE 0 TO RETLENGTH
MOVE 16 TO RETCODE
GOBACK.
100-OPEN-FILES.
OPEN INPUT DATA-FILE.
MOVE 0 TO RETCODE.
200-READ-INPUT.
READ INMOD-DATA-FILE INTO DATA-AREA1
AT END GO TO END-DATA.
ADD 1 TO NUMIN.
MOVE 80 TO RETLENGTH.
MOVE 0 TO RETCODE.
ADD 1 TO NUMOUT.
END-DATA.
CLOSE DATA-FILE.
DISPLAY “NUMBER OF INPUT RECORDS = “ NUMIN.
DISPLAY “NUMBER OF OUTPUT RECORDS = “ NUMOUT.
MOVE 0 TO RETLENGTH.
MOVE 0 TO RETCODE.
GOBACK.

What is an OUTMOD?

The FastExport utility is able to write a file that contains a variety of formatted and unformatted
data. It can write the data to disk and to tape. This works because these files and devices all support
a sequential access method. However, there are times when it is necessary or even advantageous to
use some other technique or a special device.

When special output processing is desired, than an OUTMOD (acronym for OUTput MODule) is a
potential solution. It is a user written routine to do the specialized access to the file system, device
or database. The OUTMOD does not replace the utility. Instead, it becomes like a part of the utility.
An OUTMOD can be only written to work with FastExport.

As an example, an OUTMOD might be written to move the data from Teradata and directly into an
RDBMS or test database. Therefore, it must be written to do the following steps:

1. Connect to the RDBMS


2. Receive a row from the FastExport
3. Send the row to another database as an INSERT
4. Loop back and do steps 2 & 3 until there is no more data
104
5. When there is no more data, disconnect from the database

How an OUTMOD Works

The OUTMOD is written to perform the output of the data to a data source. It removes the
responsibility of performing output from the utility. Many times an OUTMOD is written because the
utility is not capable of performing the particular output processing. Other times, it is written for
convenience.

When data is being unloaded from the Teradata Relational Database Management System (RDBMS),
the processing of the data is performed by the utility. The utility is responsible for retrieving the data
via an SQL SELECT from Teradata. This is still the situation when using an OUTMOD. The major
difference is that instead of the utility writing the data directly, the data is sent to the OUTMOD.

An OUTMOD is sometimes called an exit routine. This is because the utility exits itself by passing
control to the OUTMOD. The OUTMOD performs its processing and exits back to the utility after
storing the data.

The following diagram illustrates the normal logic flow when using the utility:

As seen in the above diagram, there is an extra step involved with the processing of an OUTMOD. On
the other hand, it eliminates the need to create an intermediate file. The data destination can be
another RDBMS. However, the user still executes the utility, that portion does not change.

The following chart shows the available languages for mainframe and network-attached systems:

Operating System Programming Language

VM or MVS Assembler, COBOL, or SAS/C


UNIX or Windows C (although not supported, MicroFocus COBOL can be used)

Figure 8-1

Calling an OUTMOD from FastExport

As shown in the diagrams above, the user still executes the utility and the utility is responsible for
calling the OUTMOD. Therefore, the utility needs an indicator from the user that it is supposed to call
the OUTMOD instead of reading a file.

Normally the utility script contains the name of the file or JCL statement (DDNAME). When using an
OUTMOD, the FILE designation is no longer specified. Instead, the name of the program to call is
defined in the script.

The following chart indicates the appropriate statement to define the OUTMOD:

Utility Name Statement (replaces FILE or DDNAME)

FastExport .EXPORT OUTMOD=<OUTMOD-name>

Figure 8-2
105
Writing an OUTMOD

The writing of an OUTMOD is primarily concerned with processing the output data destination.
However, it cannot do the processing haphazardly. It must wait for the utility to tell it what and
when to perform every operation.

It has been previously stated that the OUTMOD receives data from the utility. At the same time, the
utility needs to know that it is expecting to receive the data. Therefore, a handshake degree of
processing is necessary for the two components (OUTMOD and FastExport) to know what is
expected.

As well as passing the data, a status code is sent back and forth between them. Just like all
processing, we hope for a successful completion. Earlier in this book, it was shown that a zero status
code indicates a successful completion.

A memory area must be allocated that is shared between the OUTMOD and the utility. The area
contains the following elements:

1. The return or status code


2. The sequence number of the SELECT within FastExport
3. The length of the data area in bytes
4. The response row from Teradata
5. The length of the output data record
6. The output data record

Cart of the various programming language definitions for the parameters

Assembler OUTCODE DS F
OUTSEQNUM DS F
OUTRECLEN DS F
OUTRECORD DS <as-needed>
OUTLENGTH DS F
OUTDATA DS CL<data-length>
C int _dynamn(OutCode, SeqNum,
RecLength, OutRecord,
OutLength, OutData)
int *OutCode;
int *SeqNum;
int *OutRecLen;
struct tranlog*Outrecord;
int *OutLength;
char *OutData;
COBOL 01 OUTCODE PIC S9(5) COMP.
01 OUTSEQNUM PIC S9(5) COMP.
01 OUTRECLEN PIC S9(5) COMP.
01 OUTRECORD.
03 OUTDATA <one-or-more-fields>.
01 OUTLENGTH PIC S9(5) COMP.
01 OUTDATA PIC X(<data-length>).
106
Figure 8-3

Return/status codes from FastExport to the OUTMOD

Value Indicates that . . .

1 FastExport is calling the OUTMOD for the first time before sending the SELECT
to Teradata. The OUTMOD should open/connect to the data destination and
wait for the first record.
2 FastExport is calling after the last record has been sent to the OUTMOD. It
should close/disconnect from the data destination.
3 FastExport is calling with the next output record. OUTMOD should write it to
the data destination.
4 FastExport has written a checkpoint. The OUTMOD should guarantee that it
can handle a restart if needed. Does not receive a record from FastExport.
5 Teradata RDBMS has restarted. The OUTMOD should reposition itself to receive
and write the next record when it arrives.
6 FastExport and the OUTMOD failed and have been restarted. The OUTMOD
should use the saved record count to reposition in the output data destination
to where it left off. Does not receive a record from FastExport.

Figure 8-4

Return/status codes for the OUTMOD to FastExport

Value Indicates that . . .

0 The OUTMOD successful wrote the output data.


Not 0 The utility failed to write the output data.

Figure 8-5

Entry point for FastExport

All languages <dynamic-name-by-user>

Figure 8-6

Writing a NOTIFY Routine

As seen earlier in this book, there is a NOTIFY statement. If the standard values are acceptable, you
should use them. However, if they are not, you may write your own NOTIFY routine.

If you chose to do this, refer to the NCR Utilities manual for guidance for writing this processing. We
just want you to know here that it is something you can do.

Sample OUTMOD

Below is and example of the PROCEDURE DIVISION commands that might be used for MultiLoad,
TPump or FastExport.

LINKAGE SECTION.
01 OUTCODE PIC S9(5) COMP.
01 OUTSEQNUM S9(5) COMP.
107
01 OUTRECLEN PIC S9(5) COMP.

01 OUTRECORD.
05 INDICATORS PIC 9.
05 REGN PIC XXX.
05 PRODUCT PIC X(8).
05 QTY PIC S9(8) COMP.
05 PRICE PIC S9(8) COMP.
01 OUTRECLENPIC S9(5) COMP.
01 OUTDATA PIC XXXX.
PROCEDURE DIVISION USING
OUTCODE, STATEMENT-NO, OUTRECLEN, OUTRECORD,
OUTRECLEN, OUTDATA.
BEGIN.
MAIN.
IF OUTCODE = 1 THEN
OPEN OUTPUT SALES-DROPPED-FILE
OPEN OUTPUT BAD-REGN-SALES-FILE
GOBACK.
IF OUTCODE = 2 THEN
CLOSE SALES-DROPPED-FILE
CLOSE BAD-REGN-SALES-FILE
GOBACK.
IF OUTCODE = 3 THEN
PERFORM TYPE-3
GOBACK.
IF OUTCODE = 4 THENGOBACK.
IF OUTCODE = 5 THEN
CLOSE SALES-DROPPED-FILE
OPEN OUTPUT SALES-DROPPED-FILE
CLOSE BAD-REGN-SALES-FILE
OPEN OUTPUT BAD-REGN-SALES-FILE
GOBACK.
IF OUTCODE = 6 THEN
OPEN OUTPUT SALES-DROPPED-FILE
OPEN OUTPUT BAD-REGN-SALES-FILE
GOBACK.
DISPLAY “Invalid entry code = ” OUTCODE.
GOBACK.
TYPE-3.
IF QTY IN OUTRECORD * PRICE IN OUTRECORD < 100 THEN
MOVE 0 TO OUTRECLEN
WRITE DROPPED-TRANLOG FROM OUTRECORD
ELSE
PERFORM TEST-NULL-REGN.
TEST-NULL-REGN.
IF REGN IN OUTRECORD = SPACES
MOVE 999 TO REGN IN OUTRECORD
WRITE BAD-REGN-OUTRECORD FROM OU

The Teradata Utilities and the Support Environment

As seen in the many of the Teradata Utilities, the introduction of the capabilities of the Support
Environment (SE) is a valuable asset. It is an inherit part of the utilities and acts as a front-end to
these newer utilities: FastExport, MultiLoad, and TPump. The purpose of the SE is to provide a
feature rich scripting tool.
108
As the newer load and extract functionalities were being proposed for use with the Teradata
RDBMS, it became obvious that certain capabilities were going to be needed by all the utilities.
Rather than writing these capabilities over and over again into multiple programs, it was written
once into a single module/environment called the SE. This environment/module is included with the
newer utilities.

The Support Environment Commands

Alphabetic Command List

Command Functionality

.ACCEPT Read an input record that provides one or more parameter values for
variables
.BEGIN Invoke one of the utilities
.DATEFORM Define the acceptable or desired format for a date in this execution as
either (YY/MM/DD) or (YYYY-MM-DD)
.DISPLAY Write an output message to a specified file
.END Exit the utility
.ENDIF Define the scope of a .IF command, allows multiple operations based
on a conditional comparison
.ELSE Optionally, perform an operation when a condition is not true
.IF Compare variables and values to conditionally perform one or more
operations
.LOGTABLE Specify the restart log
.LOGON Establish a Teradata session
.LOGOFF Terminate a Teradata session
.ROUTE Write output to a specified file
.RUN Read and run commands stored in an external script file
.SET Establish or change a value stored in a variable
.SYSTEM Allows for the execution of a command at the computers operating
system level from within the script

Figure 9-1

The SE allows the writer of the script to perform housekeeping chores prior to calling the desired
utility with a .BEGIN. At a minimum, these chores include the specification of the restart log table
and logging onto Teradata. Yet, it brings to the party the ability to perform any Data Definition
Language (DDL) and Data Control Language (DCL) command available to the user as defined in the
Data Dictionary. In addition, all Data Manipulation Language (DML) commands except a SELECT are
allowed within the SE.

Required Operational Command List

A few of the SE commands are mandatory. The rest of the commands are optional and only used
when they satisfy a need. The following section in this chapter elaborates on the required
commands. The optional commands are covered in later sections. Once the explanation and syntax is
shown, an example of their use is shown in a script at the end of this chapter.
109
Creating a Restart Log Table

The Restart Log table is a mandatory requirement to run a utility that may need to perform a restart.
It is used by the utility to monitor its own progress and provide the basis for restart from the point of
a failure. This restart facility becomes critical when processing millions of data rows. This is normally
better to restart where the error occurred rather than rerunning the job from the beginning (like
BTEQ).

The utilities use the restart log table to ascertain what type of restart, if any, is required as a result
of the type of failure. Failures can occur at a Teradata, network or client system level. The Restart
log makes the process of restarting the utility very much automatic once the problem causing the
failure has been corrected.

The syntax for creating a log table:

.LOGTABLE [<database-name>.]<table-name> ;

When the utility completes successfully with a return code of zero, the restart log table is
automatically dropped.

Creating Teradata Sessions

Teradata will not perform any operation for a user who has not logged onto the system. It needs the
user information to determine whether or not the proper privileges exist before allowing the
operation requested. Therefore, it is necessary to require the user to provide authentication via a
LOGON request.

As a matter of performance, the utilities that use the SE look at the number of AMP tasks to
determine the number of sessions to establish. However, the number of sessions is configurable, but
not as a part of the .LOGON. Instead, setting the number of sessions to establish that is covered in
the .BEGIN paragraph (next).

The syntax for logging onto Teradata:

.LOGON [<tdpid>/]<user-name>,<user-password> [,’acct-id’] ;

Notice that we are discussing the .LOGON after the .LOGTABLE command. Although a log table
cannot be created until after a session is established, the .LOGTABLE command is coded first. At the
same time, the order isn’t strictly enforced and the logon can come first. However, you will see a
warning message displayed from the SE if the .LOGON command is issued first. So, it is best to
make a habit of always specifying the .LOGTABLE command first.

Once a session is established, based on privileges, the user can perform any of the following:
• DDL
• DCL
• Any DML (with the exception of SELECT)
• Establish system variables
• Accept parameter values from a file
• Perform dynamic substitution of values including object names

Beginning a Utility

Once the script has connected to Teradata and established all needed environmental conditions, it is
time to run the desired utility. This is accomplished using the .BEGIN command. Beyond running the
utility, it is used to define most of the options used within the execution of the utility. As an example,
110
setting the number of sessions is requested here. See each of the individual utilities for the
names, usage and any recommendations for the options specific to it.

The syntax for writing a .BEGIN command:

.BEGIN <utility-task> [ <utility-options> ] ;

The utility task is defined as one of the following:

FastExport .BEGIN EXPORT


MultiLoad to load or modify rows .BEGIN [ IMPORT ] MLOAD
MultiLoad to delete rows .BEGIN DELETE MLOAD
TPump .BEGIN LOAD

Figure 9-2

Ending a Utility

Once the utility finishes its task, it needs to be ended. To request the termination, use the .END
command.

The syntax for writing a .END command:

.END <utility-task> ;

When the utility ends, control is returned to the SE. It can then check the return code (see Figure 9-
4) status and verify that the utility finished the task successfully. Based on the status value in the
return code, the SE can be used to determine what processing should occur next.

Terminating a Teradata Sessions

Once the sessions are no longer needed, they also should to be ended. To request their termination,
use the .LOGOFF.

The syntax for logging onto Teradata:

.LOGOFF [<return-code>] ;

Optionally, the user may request a specific return code be sent to the host computer that was used
to start the utility. This might include the job control language (JCL) on a mainframe, the shell script
for a UNIX system, or bat file on DOS. This value can then be checked by that system to determine
conditional processing as a result of the completion code specified.

Optional Command List

The following commands are available to add functionality to the SE. They allow for additional
processing within the preparation for the utility instead of requiring the user to access BTEQ or other
external tools. As with the required commands above, an example of their use is shown in a script at
the end of this chapter.

Accepting a Parameter Value(s)

Allowing the use of parameter values within the SE is a very powerful tool. A parameter can be
substituted into the script much like the substitution of values within a Teradata macro. However, it
is much more elaborate in that the substitution includes the object names used in the SQL, not just
data.
111
When accepting one or more parameter values, they must be in a single record. If multiple
records are needed, they can be read using multiple .ACCEPT commands from different files. Each
record may contain one or more values delimited by a space. Therefore, it is necessary to put
character strings in single quotes. Once accepted by the script, these values are examined and are
stored dynamically stored into parameters named within the script.

The syntax for writing a .ACCEPT command:

The format of the accepted record is comprised of either character or numeric data. Character data
must be enclosed in single quotes (‘) and numeric data does not need quotes. When multiple values
are specified on a single record, a space is used to delimit them from one another. The assignment
of a value to a parameter is done sequentially as the names appear in the .ACCEPT and the data
appears on the record. The first value is assigned to the first parameter and so forth until there are
no more parameter names in which to assign values.

The system variables are defined later in this chapter. They are automatically set by the system to
provide information regarding the execution of the utility. For example, they include the date, time
and return code, to name a few. Here they can be used to establish the value for a user parameter
instead of reading the data from a file.

Example of using a .ACCEPT command:

Contents of parm-record:

Once accepted, this data is available for use within the script. Optionally, an IGNORE can be used to
skip one or more of the specified variables in the record. This makes it easy to provide one
parameter record that is used by multiple job scripts and allowing the script to determine which and
how many of the values it needs.

To not use the integer data, the above .ACCEPT would be written as:

Note: if the system is a mainframe, the FILE is used to name the DD statement in the Job Control
Language (JCL). For example, for the above .ACCEPT, the following JCL would be required:

Establishing the Default Date Format

Depending on the mode (Teradata or ANSI) defined within the DBC Control Record, the dates are
displayed and read according to that default format. When reading date data that does not match
that format, it is rejected and stored in an error table. This rejection includes a valid ANSI date when
it is looking for a Teradata date.

To ease the writing of the code by eliminating the need to specifically define the format of incoming
dates, the .DATEFORM is a useful command. It allows for the user to declare an incoming date with
the ANSI format (YYYY-MM-DD) or the Teradata format (YY/MM/DD).
112
The syntax for writing a .DATEFORM command:

Since these are the only two pre-defined formats, any other format must be defined in the INSERT of
the utility, as in the following example for a MM/DD/YYYY date:

Displaying an Output Message

The .DISPLAY command is used to write a text message to a file name specified in the command.
Normally, this technique is used to provide operational or informational information to the user
regarding one or more conditions encountered during the processing of the utility or SE. The default
file is system printer (SYSPRINT) on a mainframe and standard out (STDOUT) on other platforms.

The message is normally built using a literal character string. However, a user may request the
output to consist of substituted variable or parameter data. This is accomplished using an ampersand
(&) in front of the variables name. See the section below on using a variable in a script for more
details.

The syntax for writing a .DISPLAY command:

Note: If the system is a mainframe, the FILE portion of the command is used to name the DD
statement in the JCL. The JCL must also contain any names, space requirements, record and block
size, or disposition information needed by the system to create the file.

Comparing Variable Data

The .IF command is used to compare the contents of named variable data. Normally, a variable is
compared to a known literal value for control purposes. However, anything can be compared where it
makes sense to do so.

The syntax for writing a .IF command:

The comparison symbols are normally one of the following:

Equal Less than Greater than Not equal Less than or equal Greater than or equal

= < > <> <= >=

Figure 9-3

Routing Messages

The .ROUTE command is used to write messages to an output file. This is normally system
information generated by the SE during the execution of a utility. The default file is SYSPRINT on a
mainframe and STDOUT on other platforms.

The syntax for writing a .ROUTE command:


113

Note: If the system is a mainframe, the FILE is used to name the DD statement in the JCL. The JCL
must also contain any names, space requirements, record and block size, or disposition information
needed by the system.

Running Commands from a File

The .RUN command is used to read and execute other commands from a file. This is a great tool for
using pre-defined and stored command files. This is especially a good way to secure your user id and
password from being written into the script.

In other words, you save your .LOGON in a secured file that only you can see. Then, use the .RUN to
access it for processing. In addition, more than one command can be put into the file. Therefore, it
can add flexibility to the utility by building commands into the file instead of into the script.

The syntax for writing a .RUN command:

The IGNORE and the THRU options work here the same as they do as explained in the .ACCEPT
above.

Note: If the system is a mainframe, the FILE is used to name the DD statement in the JCL.

Setting Variables to Values

The .SET command is used to assign a new value or change an existing value within a variable. This
is done to make the execution of the script more flexible and provide user with more control of the
processing.

The syntax for writing a .SET command:

.SET <variable-name> [ TO ] <expression> ;

Note: The expression can be a literal value based on the data type of the variable or a mathematical
operation for numeric data. The math can use one or more variables and one or more literals.

Running a System Command

The .SYSTEM command is used as a hook to the operating system on which the utility is running.
This is done to communicate with the host computer and request an operation that the SE cannot do
on its own. When using this command, it is important to know which operating system is being used.
This information can be obtained from one of the system variables below.

The syntax for writing a .SYSTEM command:

.SYSTEM ‘<operating-system-specific-command>‘ ;

Note: There is a system variable that contains this data and can be found in the System Variable
section of this chapter.

Using a Variable in a Script

The SE dynamically establishes a memory area definition for a variable at the point it is first
referenced. The data used to initialize it also determines the data type it is to use. To distinguish the
referenced name as a variable instead of being a database object name, a special character is
114
needed. The character used to identify the substitution of variable data into the SQL, is the
ampersand (&) in front of the variable name. However, the ampersand is not used when the value is
being set.

The Support Environment System Variables

The following variables are available within the SE to help determine conditions and system data for
processing of the script.

&SYSDATE Provides the system date in YY/MM/DD format


&SYSDATE(4) Provides the system date in YYYY-MM-DD format
&SYSTIME Provides the system time in HH:MM:SS format
&SYSDAY Provides the system day as three capitalized characters; ex: MON
&SYSOS Provides the system host operating system as a maximum of five
characters; ex: VM/SP
&SYSUSER Provides the user ID
&SYSRC Provides the return or status code from the previous operation
&SYSINSCNT[n] Provides the count for rows inserted into the table where n=1-5,
relative number to identify the table from the TABLES portion of
the .BEGIN in MultiLoad
&SYSUPDCNT[n] Provides the count for rows updated in the table where n=1-5,
relative number to identify the table from the TABLES portion of
the .BEGIN in MultiLoad
&SYSDELCNT[n] Provides the count for rows deleted from the table where n=1-5,
relative number to identify the table from the TABLES portion of
the .BEGIN in MultiLoad

Figure 9-4

Support Environment Example

/* build the restart log called MJL_Util_log in database WORKDB */


.LOGTABLE WORKDB.MJL_Util_log;
.DATEFORM ansidate;
/* get the logon from a file called logon-file */
.RUN FILE logon-file;
/* test the system day to see if it is Friday
notice that the character string is used in single quotes so that it is compared as a character string. Contrast this below for the
table name */
.IF ‘&SYSDAY’ = ‘FRI’ THEN;
.DISPLAY ‘&SYSDATE(4) is a &SYSDAY’ FILE outfl.txt
.ELSE;
.DISPLAY ‘&SYSUSER, &SYSDATE(4) is not Friday’;
.LOGOFF 16;
/* notice that the endif allows for more than one operation after the comparison */
.ENDIF;
/* establish and store data into a variable */
.SET variable1 TO &parm_data1 + 125;
/* the table name and two values are obtained from a file */
.ACCEPT tablename, parm_data1, parm_data2 FILE myparmfile;
/* the table name is not in quotes here because it is not character data. But the value in
parm_data2 is in quotes because it is character data. This is the power of it all ! */
INSERT INTO &tablename VALUES (&variable1, ‘&parm_data2’, &parm_data1);
.LOGOFF;
115

The following SYSOUT file is created from a run of the above script on a day other than
Friday:

**** 13:40:45 UTY2411 Processing start date: TUE AUG 13, 2002

0001 .LOGTABLE WORKDB.MJL_Util_log;


0002 .DATEFORM ansidate;
**** 13:40:45 UTY1200 DATEFORM has been set to ANSIDATE.
0003 .RUN FILE logonfile.txt;
0004 .logon cdw/mikel,;
**** 13:40:48 UTY8400 Maximum supported buffer size: 64K
**** 13:40:50 UTY8400 Default character set: ASCII
**** 13:40:52 UTY6211 A successful connect was made to the RDBMS.
**** 13:40:52 UTY6217 Logtable ‘SQL00.MJL_Util_log’ has been created.

0005 /* test the system day to see if it is Friday


notice that the character string is used in single quotes so that it is
compared as a character string. Contrast this below for the table name */
.IF ‘&SYSDAY’ = ‘FRI’ THEN;
**** 13:40:52 UTY2402 Previous statement modified to:
0006 .IF ‘TUE’ = ‘FRI’ THEN;
0007 .DISPLAY ‘&SYSDATE(4) is a &SYSDAYday’ FILE outfl.txt;
0008 .ELSE;
0009 .DISPLAY ‘&SYSUSER, &SYSDATE(4) is not Friday’ FILE outfl.txt;
**** 13:40:52 UTY2402 Previous statement modified to:
0010 .DISPLAY ‘Michael, 02/08/13(4) is not Friday’ FILE outfl.txt;
0011 /* .LOGOFF 16; */

**** 13:40:55 UTY6212 A successful disconnect was made from the RDBMS.
**** 13:40:55 UTY6216 The restart log table has been dropped.
**** 13:40:55 UTY2410 Total processor time used = ‘10.906 Seconds’
. Start : 13:40:45 — TUE AUG 13, 2002
. End : 13:40:55 — TUE AUG 13, 2002
. Highest return code encountered = ‘16’.
116
The following SYSOUT file is created from a run of the above script on a day other than
Friday:

**** 13:40:45 UTY2411 Processing start date: FRI AUG 16, 2002

0001 .LOGTABLE WORKDB.MJL_Util_log;


0002 .DATEFORM ansidate;
**** 13:40:45 UTY1200 DATEFORM has been set to ANSIDATE.
0003 .RUN FILE logonfile.txt;
0004 .logon cdw/mikel,;
**** 13:40:48 UTY8400 Maximum supported buffer size: 64K
**** 13:40:50 UTY8400 Default character set: ASCII
**** 13:40:52 UTY6211 A successful connect was made to the RDBMS.
**** 13:40:52 UTY6217 Logtable ‘SQL00.MJL_Util_log’ has been created.

0005 /* test the system day to see if it is Friday


notice that the character string is used in single quotes so that it is
compared as a character string. Contrast this below for the table name */
.IF ‘&SYSDAY’ = ‘FRI’ THEN;
**** 13:40:52 UTY2402 Previous statement modified to:
0006 .IF ‘FRI’ = ‘FRI’ THEN;
0007 .DISPLAY ‘&SYSDATE(4) is a &SYSDAYday’ FILE outfl.txt;
**** 13:40:52 UTY2402 Previous statement modified to:
0008 .DISPLAY ‘02/08/13(4) is a FRIday’ FILE outfl.txt;
0009 .ELSE;
0010 .DISPLAY ‘&SYSUSER, &SYSDATE(4) is not Friday’ FILE outfl.txt;
0011 /* .LOGOFF 16; */
/* notice that the endif allows for more than one operation after the
comparison */
.ENDIF;
0012 /* establish and store data into a variable */
/* the table name and two values are obtained from a file */
.ACCEPT tablename, parm_data1, parm_data2 FILE myparmfile.txt;
0013 /* the table name is not in quotes here because it is not character data */
.SET variable1 TO &parm_data1 + 125;
**** 13:40:52 UTY2402 Previous statement modified to:
0014 /* the table name is not in quotes here because it is not character data */
.SET variable1 TO 123 + 125;
0015 INSERT INTO &tablename VALUES (&variable1, ‘&parm_data2’
,&parm_data1);
**** 13:40:52 UTY2402 Previous statement modified to:
0016 INSERT INTO My_test_table VALUES (248, ‘some character data’, 123);
**** 13:40:54 UTY1016 ‘INSERT‘ request successful.
0017 .LOGOFF;
117

**** 13:40:55 UTY6212 A successful disconnect was made from the RDBMS.
**** 13:40:55 UTY6216 The restart log table has been dropped.
**** 13:40:55 UTY2410 Total processor time used = ‘10.906 Seconds’
. Start : 13:40:45 — FRI AUG 16, 2002
. End : 13:40:55 — FRI AUG 16, 2002
. Highest return code encountered = ‘0’.

BTEQ MAINFRAME EXPORT EXAMPLE

BTEQ MAINFRAME EXPORT EXAMPLE – JCL


118
BTEQ MAINFRAME EXPORT SCRIPT EXAMPLE – DATA MODE

.SESSIONS 1

.RUN FILE=ILOGON; /*JCL ILOGON — .LOGON CDW/SQL01,WHYNOT; */

.RUN FILE=IDBENV; /*JCL IDBENV — DATABASE SQL_CLASS; */

.EXPORT DATA DDNAME=REPORT

SELECT
EMPLOYEE_NO,
LAST_NAME,
FIRST_NAME,
SALARY,
DEPT_NO
FROM EMPLOYEE_TABLE

.IF ERRORCODE > 0 THEN .GOTO Done

.EXPORT RESET

.LABEL Done

.QUIT
119
BTEQ MAINFRAME IMPORT EXAMPLE

BTEQ MAINFRAME IMPORT EXAMPLE – JCL

BTEQ MAINFRAME IMPORT SCRIPT EXAMPLE – DATA MODE

.SESSIONS 1

.RUN FILE=ILOGON; /*JCL ILOGON - .LOGON CDW/SQL01,WHYNOT; */

.RUN FILE=IDBENV; /*JCL IDBENV - DATABASE SQL08; */

.IMPORT DATA DDNAME=REPORT


120

FASTEXPORT MAINFRAME EXAMPLE

FASTEXPORT MAINFRAME EXAMPLE – JCL


121
FASTEXPORT MAINFRAME SCRIPT EXAMPLE–RECORD MODE

.LOGTABLE SQL08.SQL08_RESTART_LOG;

.RUN FILE ILOGON; /*JCL ILOGON — .LOGON CDW/SQL01,WHYNOT; */

.RUN FILE IDBENV; /*JCL IDBENV — DATABASE SQL_CLASS; */


122
FASTLOAD MAINFRAME EXAMPLE

FASTLOAD MAINFRAME EXAMPLE — JCL


123
FASTLOAD MAINFRAME SCRIPT EXAMPLE – TEXT MODE

.SESSIONS 1;

LOGON TDP0/SQL08,SQL08;

DROP TABLE SQL08.ERROR_ET;


DROP TABLE SQL08.ERROR_UV;

DELETE FROM SQL08.EMPLOYEE_PROFILE;

DDNAME=DATAIN;

BEGIN LOADING SQL08.EMPLOYEE_PROFILE


ERRORFILES SQL08.ERROR_ET, SQL08.ERROR_UV
CHECKPOINT 5;

INSERT INTO SQL08.EMPLOYEE_PROFILE VALUES


(:EMPLOYEE_NO,
:LAST_NAME,
:FIRST_NAME,
:SALARY,
:DEPT_NO);
124
END LOADING;
LOGOFF;

MULTILOAD MAINFRAME EXAMPLE

MULTILOAD MAINFRAME EXAMPLE — JCL


125
MULTILOAD MAINFRAME SCRIPT EXAMPLE — TEXT MODE

.LOGTABLE SQL08.UTIL_RESART_LOG;

.RUN FILE ILOGON; /*JCL ILOGON — .LOGON CDW/SQL01,WHYNOT; */

.RUN FILE IDBENV; /*JCL IDBENV — DATABASE SQL08; */

.BEGIN MLOAD

TABLES Student_Profile1
ERRLIMIT 1
SESSIONS 1;

.DML LABEL INPUT_INSERT;

INSERT INTO Student_Profile1


126

.IMPORT INFILE INPTFILE


LAYOUT INPUT_FILE
APPLY INPUT_INSERT;

.END MLOAD;
.LOGOFF;

TPUMP MAINFRAME EXAMPLE

TPUMP MAINFRAME EXAMPLE — JCL


127
TPUMP MAINFRAME SCRIPT EXAMPLE — TEXT

.LOGTABLE SQL08.TPUMP_RESTART;

.RUN FILE ILOGON; /*JCL ILOGON — .LOGON CDW/SQL01,WHYNOT; */

.RUN FILE IDBENV; /*JCL IDBENV — DATABASE SQL08; */

DROP TABLE SQL08.TPUMP_UTIL_ET;

.BEGIN LOAD
SESSIONS 1 TENACITY 2
ERRORTABLE TPUMP_UTIL_ET
ERRLIMIT 5
CHECKPOINT 1
PACK 40
RATE 1000
ROBUST OFF;

.DML LABEL INPUT_INSERT IGNORE DUPLICATE ROWS


IGNORE MISSING ROWS;
128
INSERT INTO Student_Profile4

VALUES
(:STUDENT_ID (INTEGER),
:LAST_NAME (CHAR(20)),
:FIRST_NAME (VARCHAR(12)),
:CLASS_CODE (CHAR(2)),
:GRADE_PT (DECIMAL(5,2))
);

.IMPORT INFILE INPUT

LAYOUT INPUT_LAYOUT
APPLY INPUT_INSERT;

.END LOAD;
.LOGOFF;