You are on page 1of 22

Nzload

IBM PureData System for Analytics


Programming and Usage
© Copyright IBM Corporation 2016
Course materials may not be reproduced in whole or in part without the written permission of IBM.
Unit objectives
• Load data from external sources into the IBM PureData System for
Analytics using the nzload High Performance Bulk Loader
• Grant load permissions to users
• Review the log file associated with the load operation
• Specify fixed length format for a data load operation

Nzload © Copyright IBM Corporation 2016


Data loading
• Within the IBM PureData System for Analytics environment, data
loading means simply to transfer data into the IBM PureData System
for Analytics using a number of options:
 External Tables
Tables stored as flat files on the host or client systems and not in the IBM
PureData System for Analytics database
 nzload
Command that provides an easy method for using external tables and
getting data into the IBM PureData System for Analytics
 Backup and Restore
There are different methods for doing backups and restores to transfer data
between systems

Nzload © Copyright IBM Corporation 2016


nzload high performance bulk loader
• Command line interface program
 Command line
 Control file
• Variable length ASCII delimited data records
• Remote client support
• FS/NFS/Named pipes
• Error handling
 .nzlog and .nzbad

The goal when loading large volumes of data is to land the data on
disk as few times as possible

Nzload © Copyright IBM Corporation 2016


How the nzload command works

Nzload © Copyright IBM Corporation 2016


nzload transactions
• An nzload operation is treated as a single transaction
 All records are loaded with a single transaction ID
 If the load fails the records are logically deleted
 The storage space allocated for those records should be recovered at some
point in time
− GROOM
− TRUNCATE TABLE
• Other users can run queries against the tables while they are being
loaded
• New data is only visible to users when the transaction has been
committed

Nzload © Copyright IBM Corporation 2016


nzload example
• Loading data from file customer.del into table CUSTOMER in database
PROD
 Access information has been defined in environment variables
files can be specified globally or relative to
the execution directory

nzload -db PROD -t CUSTOMER -df /tmp/customer.del


-delimiter '|' -maxErrors 50

Here the field delimiter in the maxErrors is useful if big data files contain
customer.del file is “|” a limited number of bad rows. Float values
in Integer fields, etc.
Default = 1
01|Jon|Doe|2537
03|Jack|Jones|654
...
customer.del

Nzload © Copyright IBM Corporation 2016


nzload options and arguments
• nzload -db database_name -t table_name -delim ‘|’ -maxErrors num_errors
-df source_file_name
 Command to run nzload
$ time nzload -db dev -t customer -delim ‘|’ -maxErrors 10 -df customer.unl

• nzload accepts many options and arguments: If not specified then


 Command line defaults are used
 Control file
 Environment variables
− HOSTNAME, NZ_USER, NZ_PASSWORD, NZ_DATABASE
• Required:
-host <host_name> -u <user_name> -pw <password> -db <database_name> -t <table_name>

• Commonly used options and arguments:


-df
-maxErrors
-cf
-dateDelim
-delim
-dateStyle
-nullValue
-allowReplay
Nzload © Copyright IBM Corporation 2016
nzload load continuation
• The -allowReplay option enables load continuation if the system
has been paused due to an S-Blade reset or failover
$ nzload -db dev -t customer -df customer.dat
-allowReplay
• Data is buffered before being sent to the S-Blade in the replay region
located in host memory
 Performs a partial commit (checkpoint) that forces all the unwritten data to
the S-Blades
• In the event of a system pause due to a S-Blade reset or failover NPS
rolls back the load to the beginning of the replay region and restarts
the nzload operation
• There may be a performance impact due to the overhead of
checkpoints

Nzload © Copyright IBM Corporation 2016


NPS 7.0.3 loading interval data type
• Background
• Netezza loader prior to NPS 7.0.3 does not support loading interval
data type. However, Netezza SQL singleton insert and Netezza unload
via external table support interval data type: e.g.
1 year 1 day or 2 years -2 months or 3 days -03:03:03.333
• Customer prior to NPS 7.0.3 has to use staging table to work around
the issue in nzload, this prevents data life cycle management since
rows cannot be inserted/updated in place.
• Benefits of interval data type
• Directly load Netezza interval data which is unloaded from current or
other Netezza system
• Easy to extend to load standard interval data format (ISO 8601)

Nzload © Copyright IBM Corporation 2016


How nzload interval data type works
• Basic Operation
 Exact same control flow as existing load
• System Settings/Options
 No new system-level settings
 No new external table option or nzload switch added
 No impact to upgrade/downgrade
 No impact to BNR
• User Experience
 User will experience no visible changes from previous releases in terms of
using Netezza loading tools, i.e. nzload CLI or NZSQL external table, or
programmable APIs – ODBC, JDBC, OLEDB

Nzload © Copyright IBM Corporation 2016


Coding examples
Examples
insert into <table> select * from <external-table>
(c1 interval, c2 interval, c3 interval) using
(remotesource 'odbc' format 'text' delim '|' timeDelim
':');
nzload -host <hostname> -db <db-name> -t <tbl-name>
-df <external-data-source> -delim '|' -timeDelim ':'

Nzload © Copyright IBM Corporation 2016


Acceptable formats
Acceptable non-standard interval data syntax in EBNF:
' '* 
[ [ '-' ] <digit>+ ' '* 'y'['e'['a'['r'['s']]]] ' '* ]
[ [ '-' ] <digit>+ ' '* 'm'['o'['n'['t'['h'['s']]]]] ' '* ]
[ [ '-' ] <digit>+ ' '* 'd'['a'['y'['s']]] ' '* ]
[ [ '-' ] <time> ] 
Optional elements are enclosed in brackets.
Literal characters are enclosed in single quotes.
* means zero or more
+ means one or more
<digit> is an instance of one of the 10 decimal digits
<time> is an instance of the loader's 24HOUR style time syntax

Nzload © Copyright IBM Corporation 2016


NPS 7.2 _v_load_status (1 of 2)
• Load status information
 When active loads are running on your NPS system, you can use the
v_load_status view to display information about the load operations.
− You can query the system view _v_load_status to display details about the progress of loads
that are running on the system
− View shows information about the load operations such as the table name, database name,
data file, number of processed rows, and number of rejected rows
− More information has been added to the load log file for performance-related details about
the load operation.

• Sample output:
SYSTEM.ADMIN(ADMIN)=> select * from _v_load_status;
PLANID | DATABASENAME | TABLENAME | SCHEMANAME | USERNAME | BYTESPROCESSED |
ROWSINSERTED |ROWSREJECTED | BYTESDOWNLOADED
-------------+-------------------------+--------------+-----------+------------+----------+----------------+--------------+
2932 | SYSTEM | LINEITEM | ADMIN | ADMIN | 142606226 | 1136931 |4 |
131911476

(1 row)

Nzload © Copyright IBM Corporation 2016


NPS 7.2 _v_load_status (2 of 2)
• View _v_load_status and virtual table _vt_load_status are used to
check the status of active load sessions.
• View attributes:
Attribute Type Description

PLANID INTEGER Plan Id of running load

DATABASENAME CHARACTER VARYING(255) Database name used in load

TABLENAME CHARACTER VARYING(255) Table name used in load

SCHEMANAME CHARACTER VARYING(255) Schema name used in load

USERNAME CHARACTER VARYING(255) User name

BYTESPROCESSED BIGINT Number of Bytes Processed

ROWSINSERTED BIGINT Number of inserted rows

ROWSREJECTED BIGINT Number of rejected rows

BYTESDOWNLOADED BIGINT Number of Bytes Downloaded

Nzload © Copyright IBM Corporation 2016


Load permissions
• To execute nzload you must be admin or have SELECT on EXTERNAL
TABLE, SELECT on TABLE, INSERT and LIST permissions
• To create an NPS user that can LOAD all tables in a database
retail(admin)=>DROP USER lduser;
retail(admin)=>CREATE USER lduser WITH PASSWORD ‘loader’;
retail(admin)=>GRANT LIST ON database TO lduser;
retail(admin)=>GRANT SELECT ON EXTERNAL TABLE TO lduser;
retail(admin)=>GRANT All ON TABLE TO lduser;
• To create an NPS user that can LOAD a table in a database
retail(admin)=>DROP USER lduser;
retail(admin)=>CREATE USER lduser WITH PASSWORD ‘loader’;
retail(admin)=>GRANT LIST ON database TO lduser;
retail(admin)=>GRANT SELECT ON EXTERNAL TABLE TO lduser;
retail(admin)=>GRANT SELECT, INSERT ON table_name TO lduser;

Nzload © Copyright IBM Corporation 2016


.nzlog file
• When nzload is executed a nzlog file is created by the system that
contains messages related to the load
 The nzlog file by default is located in your current working directory
 The file name format is <table_name>.<database>.nzlog
 Use the -lf <file_name> option to specify a different nzlog file name
 -outputDir <directory> option may be used to specify the directory
for the nzlog file
 Appends to the log file for every nzload process that loads to the same
database table
• Periodically delete log files to free disk space

Nzload © Copyright IBM Corporation 2016


.nzlog file sample
Load started at: 01-Jan-11 12:34:56 EST
Database: labdb
Tablename: listitems
Datafile: listitems.del
Host: netezza
Output Directory: /nz/Demos/load/Scripts
Log file: listitems_table.logfile
Bad record file: listitems_table.badfile

Load Options
Field delimiter: '|‘ File Buffer Size (MB): 16
NULL value: NULL Quoted data: No
Checkpoint: 0 Max errors: 100
Skip records: 0 Max rows: 0
FillRecord: No Truncate String: No
Escape Char: None Accept Control Chars: No
Distribution stats: No Allow CR in string: No
BoolStyle: ONE_ZERO
Date Style: YMD Date Delim: '/'
Time Style: 24 Hour Time Delim: ':'

Statistics
number of records read: 1234567890
number of bad records: 0
number of discarded records: 0
-------------------------------------------------
number of records loaded: 1234567890

Elapsed Time (sec): 1259.0


Load completed at: 01-Jan-11 12:56:34 EST

Nzload © Copyright IBM Corporation 2016


.nzbad file
• When nzload is executed, a nzbad file is created by the system that
contains only the rejected records from the load file
 The nzbad file by default is located in your current working directory
 The file name format is <table_name>.<database>.nzbad
 Use the -bf <file_name> option may be used to specify a different
nzbad file name
 -outputDir <directory> option may be used to specify the directory
for the nzbad file
• If the file already exists it is overwritten
• If there are no rejected records the file will not be created
 Use the -maxErrors option to specify a different value
 The default is 1

Nzload © Copyright IBM Corporation 2016


.nzlog file
• When nzload is executed a nzlog file is created by the system that
contains messages related to the load:
 The nzlog file by default is located in your current working directory
 The file name format is <table_name>.<database>.nzlog
 Use the -lf <file_name> option to specify a different nzlog file name
 -outputDir <directory> option may be used to specify the
directory for the nzlog file
 Appends to the log file for every nzload process that loads to the same
database table
• Periodically delete log files to free disk space

Nzload © Copyright IBM Corporation 2016


Demonstration 1
Loading and unloading data using the nzload utility

•Load the tables with data


•Ensure that the data is loaded and you are able to review it using the
NzAdmin tool
•Review distribution

Nzload © Copyright IBM Corporation 2016


Unit summary
• Load data from external sources into the IBM PureData System for
Analytics using the nzload High Performance Bulk Loader
• Grant load permissions to users
• Review the log file associated with the load operation
• Specify fixed length format for a data load operation

Nzload © Copyright IBM Corporation 2016

You might also like