Netezza Fundamentals: Introduction To Netezza For Application Developers

Netezza Fundamentals
Introduction to Netezza for Application Developers

Biju Nair
03/02/2013
Version: Draft 1.4
Document provided for information purpose only.
Preface
As with any subject, one may ask why do we need to write a new document on the subject when there is
so much information available online. This question is more pronounced especially in a case like this
where the product vendor publishes detailed documentation. I agree and for that matter this document
in no way replaces the documentation and knowledge already available on Netezza appliance. The
primary objective of this document is
To be a starting guide for anyone who is looking to understand the appliance so that they can be
productive in a short duration
To be a transition guide for professionals who are familiar with other database management
systems and would like to or need to start using the appliance
Also be a quick reference on the fundamentals for professionals who has some experience with
the appliance
With the simple objective in focus, the book covers the Netezza appliance broadly so that the reader can
be productive in using the appliance quickly. References to other documents are been provided for
interested readers to gain more thorough knowledge. Also joining the Netezza developer community on
the web which is very active is highly recommended. If you find any errors or need to provide feedback,
please notify to books at sieac dot com with the subject Netezza and thank you in advance for your
feedback.
sieac llc
Table of Contents
1.
Netezza Architecture ............................................................................................................................ 3
2.
Netezza Objects .................................................................................................................................... 9
3.
Netezza Security.................................................................................................................................. 16
4.
Netezza Storage .................................................................................................................................. 17
5.
Statistics and Query Performance ...................................................................................................... 22
6.
Netezza Transactions .......................................................................................................................... 26
7.
Loading Data, Database Back-up and Restores .................................................................................. 28
8.
Netezza SQL ........................................................................................................................................ 33
9.
Stored Procedures............................................................................................................................... 36
10.
Workload Management .................................................................................................................. 47
11.
Best Practices .................................................................................................................................. 51
12.
Version 7 Key Features.................................................................................................................... 53
13.
Further Reading .............................................................................................................................. 54
sieac llc
1. Netezza Architecture
Building a good foundation helps developing better and beautiful things. Similarly understanding the
Netezza architecture which is the foundation helps develop applications which uses the appliance
efficiently. This section details the architecture at a level which will satisfy the objective to help use the
appliance efficiently.
Netezza uses a proprietary architecture called Asymmetric Massively Parallel Processing (AMPP) which
combines the large data processing efficiency of Massively Parallel Processing (MPP) where nothing
(CPU, memory, storage) is shared and symmetric multiprocessing to coordinate the parallel processing.
The MPP is achieved through an array of S-Blades which are servers on its own running its own
operating systems connected to disks. While there may be other products which follow similar
architecture, one unique hardware component used by Netezza called the Database Accelerator card
attached to the S-Blades. These accelerator cards can perform some of the query processing stages while
data is being read from the disk instead of in the CPU. Moving large amount of data from the disk to the
CPU and performing all the stages of query processing in the CPU is one of the major bottlenecks in the
many of the database management systems used for data warehousing and analytics user cases.
The main hardware components of the Netezza appliance are a host which is a Linux server which can
communicate to an array of S-Blades each of which has 8 processor cores and 16 GB of RAM running
Linux operating system. Each processor in the S-Blade is connected to disks in a disk array through a
Database Accelerator card which uses FPGA technology. Host is also responsible for all the client
interactions to the appliance like handling database queries, sessions etc. along with managing the metadata about the objects like database, tables etc. stored in the appliance. The S-Baldes between themselves
and to the host can communicate through a custom built IP based high performance network. The
following diagram provides a high level logical schematic which will help imagine the various
components in the appliance.
sieac llc
The S-Blades are also referred as Snippet Processing Array or SPA in short and each CPU in the SBlades combined with the Database Accelerator card attached to the CPU is referred as a Snippet
Processor.
Let us use a simple concrete example to understand the architecture. Assume an example data warehouse
for a large retail firm and one of the tables store the details about all of its 10 million customers. Also
assume that there are 25 columns in the tables and the total length of each table row is 250 bytes. In
Netezza the 10 million customer records will be stored fairly equally across all the disks available in the
disk arrays connected to the snippet processors in the S-Blades in a compressed form. When a user
query the appliance for say Customer Id, Name and State who joined the organization in a particular
period sorted by state and name the following are the high level steps how the processing will happen
The host receives the query, parses and verifies the query, creates code to be executed to by the
snippet processors in the S-Blades and passes the code for the S-Blades
The snippet processors execute the code and as part of the execution, the data block which
stores the data required to satisfy the query in compressed form from the disk attached to the
snippet processor will be read into memory. The Database Accelerator card in the snippet
processor will un-compress the data which will include all the columns in the table, then it will
remove the unwanted columns from the data which in case will be 22 columns i.e. 220 bytes out
of the 250 bytes, applies the where clause which will remove the unwanted rows from the data
and passes the small amount of the data to the CPU in the snippet processor. In traditional
databases all these steps are performed in the CPU.
The CPU in the snippet processor performs tasks like aggregation, sum, sort etc on the data
from the database accelerator card and passes the result to the host through the network.
The host consolidates the results from all the S-Blades and performs additional steps like sorting
or aggregation on the data before communicating back the final result to the client.
The key takeaways are
The Netezza has the ability to process large volume of data in parallel and the key is to make
sure that the data is distributed appropriately to leverage the massive parallel processing.
Implement designs in a way that most of the processing happens in the snippet processors;
minimize communication between snippet processors and minimal data communication to the
host.
sieac llc
The simple example helps understand the fundamental components of the appliance and how they work
together, we will build on this knowledge on how complex query scenarios are handled in the relevant
sections.
Terms and Terminology

The following are some of the key terms and terminologies used in the context of Netezza appliance.
Host: A Linux server which is used by the client to interact with the appliance either natively or through
remote clients through OBDC, JDBC, OLE-DB etc. Hosts also stores the catalog of all the databases
stored in the appliance along with the meta-data of all the objects in the databases. It also parses and
verifies the queries from the clients, generates executable snippets, communicates the snippets to the SBlades, coordinates and consolidates the snippet execution results and communicates back to the client.
Snippet Processing Array: SPA is an array of S-Blades with 8 processor cores and 16 GB of memory
running Linux operating system. Each S-Blade is paired with Database Accelerator Card which has 8
FPGA cores and connected to disk storage.
Snippet Processor: The CPU and FPGA pair in a Snippet Processing Array is called a snippet
processor which can run a snippet which is the smallest code component generated by host for query
execution.
Netezza Objects
The following are the major object groups in Netezza. We will see the details of these objects in the
following chapters.
Users
Groups
Tables
Views
Materialized View
Synonyms
Database
Procedures and User Defined Function
For anyone who is familiar with other relational database management systems, it will be obvious that
there are no indexes, bufferpools or tablespaces to deal with in Netezza.
Netezza Failover
As an appliance Netezza includes necessary failover components to function seamlessly in the event of
any hardware issues so that its availability is more than 99.99%. There are two hosts in a cluster in all the
Netezza appliances so that if one fails the other one can takes over. Netezza uses Linux-HA (High
sieac llc
Availability) and Distributed Replicated Block Device for the host cluster management and mirroring of
data between the hosts.
As far as data storage is concerned, one third of every disk in the disk array stores primary copy of user
data, a third stores mirror of the primary copy of data from another disk and another third of the disk is
used for temporary storage. In the event of disk failure the mirror copy will be used and the SPU to
which the disk in error was attached will be updated with the disk holding the mirror copy. In the event
of error in a disk track, the track will be marked as invalid and valid data will be copied from the mirror
copy on to a new track.
If there are any issues with one of the S-Blades, other S-Blades will be assigned the work load. All
failures will be notified based on event monitors defined and enables. Similar to the dual host for high
availability, the appliance also has dual power systems and all the connection between the components
like host to SPA and SPA to disk array also has a secondary. Any issues with the hardware components
can be viewed through the NZAdmin GUI tool.
Netezza Tools
There are many tools available to perform various functions against Netezza. We will look at the tools
and utilities to connect to Netezza here and other tools will be detailed in the relevant sections.
For Administrators one of the primary tools to connect to the Netezza is the NzAdmin. It is a GUI
based tool which can be installed on a Windows desktop and connect to the Netezza appliance. The tool
has a system view which it provides a visual snapshot of the state of the appliance including issues with
any hardware components. The second view the tool provides is the database view which lists all the
databases including the objects in them, users and groups currently defined, active sessions, query history
and any backup history. The database view also provides options to perform database administration
tasks like creation and management of database and database objects, users and groups.
The following is the screen shot of the system view from the NzAdmin tool.
sieac llc
The following is the screen shot of the database view from the NzAdmin tool.
The second tool which is often used by anyone who has access to the appliance host is the nzsql
command. It is the primary tool used by administrators to create, schedule and execute scripts to
perform administration tasks against the appliance. The nzsql command invoke the SQL command
interpreter through which all Netezza supported SQL statements can be executed. The command also
had some inbuilt options which can be used to perform some quick look ups like list of list of databases,
users, etc. Also the command has an option to open up an operating system shell through which the user
sieac llc
can perform OS tasks before exiting back into the nzsql session. As with all the Netezza commands,
the nzsql command requires the database name, users name and password to connect to a database.
For e.g.
nzsql d testdb u testuser p password
Will connect and create a nzsql session with the database testdb as the user testuser after which
the user can execute SQL statements against the database. Also as with all the Netezza commands the
nzsql has the -h help option which displays details about the usage of the command. Once the user
is in the nzsql session the following are the some of the options which a user can invoke in addition to
executing Netezza SQL statements.
\c dbname user passwd Connect to a new database
\d tablename
Describe a table view etc
\d{t|v|i|s|e|x}
List tables\views\indexes\synonyms\temp tables\external tables
\h command
Help on particular command
\i file
Reads and executes queries from file
\l
List all databases
\!
Escape to a OS shell
\q
Quit nzsql command session
\time
Prints the time taken by queries and it can be switched off by \time again
One of the third party vendor tools which need to be mentioned is the Aginity Workbench for Netezza
from Aginity LLC. It is a GUI based tool which runs on Windows and uses the Netezza ODBC driver
to connect to the databases in the appliance. It is a user friendly tool for development work and adhoc
queries and also provides GUI options to perform database management tasks. It is highly
recommended for a user who doesnt have the access to the appliance host (which will be most of the
users) but need to perform development work.
sieac llc
2. Netezza Objects
Netezza appliance comes out of the box loaded with some objects which we are referred to system
objects and users can create objects to develop applications which are referred as user objects. In this
section we will look into the details about the basic Netezza objects which every user of the appliance
need to be aware of.
System Objects:
Users
The appliance comes preconfigured with the following 3 user ids which cant be modified or deleted
from the system. They are used to perform all the administration tasks and hence should be used by
restricted number of users.
User id
root
nz
admin
Description
The super user for the host system on the appliance and has all the access as a super
user in any Linux system.
Netezza system administrator Linux account that is used to run host software on Linux
The default Netezza SQL database administrator user which has access to perform all
database related tasks against all the databases in the appliance.
Groups
By default Netezza comes with a database group called public. All database users created in the system
are automatically added as members of this group and cannot be removed from this group. The admin
database user owns the public group and it cant be changes. Permissions can be set to the public group
so that all the users added to the system get those permissions by default.
Databases
Netezza comes with two databases System and a model database both owned by Admin user. The
system database consists of objects like tables, views, synonyms, functions and procedures. The system
database is primarily used to catalog all user database and user object details which will be used by the
host when parsing, validating and creation of execution code for queries from the users.
User Objects:
Database
Users with the required permission or an admin user can create databases using the create database sql
statement. The following is a sample SQL to create a database called testdb and it can be executed in an
nzsql session or other query execution tool.
create database testdb;
sieac llc
Table
The database owner or user with create table privilege can create tables in a database. The following is a
sample table creation statement which can be executed in an nzsql session or other query execution tool.
create table employee (
emp_id integer not null,
first_name varchar(25) not null,
last_name varchar(25) not null,
sex char(1),
dept_id integer not null,
created_dt timestamp not null,
created_by char(8) not null,
updated_dt timestamp not null,
updated_by char(8) not null,
constraint pk_employee primary key(emp_id)
constraint fk_employee foreign key (dept_id) references department(dept_id)
on update restrict on delete restrict
) distribute on random;
Anyone who is familiar with other DBMS systems the statement will look familiar except for the
distribute on clause details of which we will see in a later section. Also there are no storage related
details like tablespace on which the table need to be created or any bufferpool details which are handled
by the Netezza appliance. The following is the list of all the data types supported by the Netezza
appliance which can be used in the column definitions of tables.
Data Type
byteint (int1)
smallint (int2)
Integer (int or int4)
bigint (int8)
numeric(p,s)
Description/Value
-128 to 127
-32,768 to 32,767
-35,791,394 to 35,791,394
-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
Precision p can range from 1 to 38 and scale from 0 to P
Storage
1 byte
2 bytes
4 bytes
8 bytes
p < 9 4 bytes
10< p<188 bytes
19<p<3816 bytes
numeric
Same as numeric(18,0)
4 16 bytes
decimal
Alias to numeric
4 16 bytes
float(p)
Floating point number with precision p between 1 & 15 p < 7 4 bytes
6 < p < 16 8bytes
real
Alias for float(6)
4 bytes
double precision
Alias for float(15)
8 bytes
Character(n)/char(n)
Fixed length, blank padded to length n. The default value If n is equal to 16
of n is 1. The maximum character string size is 64,000.
or less then n bytes.
If n is greater than
16, disk usage is the
same as varchar (n).
character varying(n) Variable length to a maximum length of n. No blank N+2 or fewer bytes
/varchar(n)
padding, stored as entered. The maximum character depending on the
string size is 64,000.
actual data.
nchar(n)
Fixed length unicode, blank padded to length n. The
maximum length of 16,000 characters.
sieac llc
10
Data Type
nvarchar(n)
Description/Value
Storage
Variable length unicode to a maximum length of n. The
maximum length of 16,000 characters.
Boolean / bool
With value true (t) or false (f).
1 byte
date
Ranging from January 1, 0001, to December 31, 9999.
4 bytes
timestamp
Date part and a time part, with seconds stored to 6 8 bytes
decimal positions. Ranging from January 1, 0001
00:00:00.000000 to December 31, 9999 23:59:59.999999.
Refer the reference document for additional data types and details.
Other than validating the data to be inconsistent with the column data type and the not null constraint,
Netezza doesnt enforce any of the constraints like the primary key or foreign key when inserting or
loading data into the tables for performance reasons. It is up to the application to make sure that these
constraints are satisfied by the data being loaded into the tables. Even though the constraints are not
enforced by Netezza defining them will provide additional hints to the query optimizer to generate
efficient snippet execution code which in turn helps performance.
Users can also create temporary table in Netezza which will get dropped at the end of the transaction or
session in which the temp table is created. The temporary table can be created by adding the temporary
or temp clause as part of the create table statement and the statement can include all the other clauses
applicable for the creation of a regular table. The following is an example of creating a temporary table
create temporary table temp_emp(
id integer constraint pk_emp primary key,
first_name varchar(25)
) distribute on hash(id);
Another way a user can create a table is to model the new table based on a query result. This can be
accomplished in Netezza using the create table as command and are referred as CTAS tables in short.
The following are some sample CTAS statements
create table ctas_emp as select * from emp where dept_id = 1;
create table ctas_emp_dept as select emp.name, dept.name
emp.dept_id = dept.id;
from
emp,
dept
where
Since the new table definition is based on the query result which is executed as part of CTAS statement,
there as many possibilities for which the users can use this like redefining the columns in a current table
while populating the new table at the same time. If the user doesnt want to populate data but create only
the table structure a limit 0 can be included at the end of the query as in the following example.
create table ctas_emp as select * from emp limit 0;
Tables can be deleted from the database using the DROP sql statement which drops the table and its
content. The following is an example
sieac llc
11
drop table employee;
Once tables are created they can be modified using the alter statement. Renaming the table, changing the
owner of the table, adding or dropping a new column, renaming a column, adding or dropping a
constraint are some of the modifications which can be made through the alter statement. The following
are sample alter statements which can be executed through nzsql session or a query execution tool.
alter
alter
alter
alter
table
table
table
table
employee
employee
employee
employee
add column education_level int1;

modify column (first_name varchar(30));
drop column updated_by;
owner to hruser;
The following are some of the points to be aware of when using the alter table statement.
Modifying the column length is only application to columns defined as varchar.

If a table gets renamed the views attached to the table will stop working
Column data types cant be changed through an alter statement
If a table is referenced by a stored procedure adding or dropping a column is not permitted. The
stored procedure need to be dropped first before adding or dropping a column and then the
stored procedure need to be recreated.
View
The database owner or user with create view privilege can create tables in a database. Netezza stores the
select SQL statement for the view and doesnt materialize the data and store the data on disk. The SQL
gets executed pulling the data from the base table whenever a user accesses the view creating a virtual
table. The following is a sample view creation statement which can be executed in an nzsql session or
other query execution tool.
create or replace view it_employees as select * from employee where dept_id = 100;
Materialized View
Materialized views are created to project a subset of columns from a table and data sorted on some of
the projected columns. When created the system materializes and stores the sorted projection of the base
tables data in a new table on disk. Materialized views can be queried directly or can be used to improve
performance against the base table. In the latter case, the materialized view acts like an index since the
system stores an additional column in the view with the details about from where in the base table that
data originated.
A database owner or user with create materialized view privilege can create materialized view in a
database. The following is a sample materialized view creation statement which can be executed in an
nzsql session or query execution tools.
sieac llc
12
create materialized view employee_mview as select

dept_id from employee order by dept_id, first_name;
emp_id,
first_name,
last_name,
This materialized view will improve the performance of the following query against the base table since
the view stores where in the base table the data satisfying the query is stored and in turn acts like an
index in a traditional database.
select * from employee where dept_id = 100 and first_name = John;
The following are the restriction in the creation of materialized views
Only one table can be specified in the FROM clause of the create statement
There can be no where clause in the select clause of the create statement
The columns in the projection list must be columns from the base table and no expressions
The columns in the ORDER BY clause should be one of the columns in the select statement
NULLS LAST or DESC cannot be used in the ORDER BY clause
External, temporary, system or clustered base tables cant be used as base table for materialized
views
As and when records are inserted into the base table the system also adds data to the materialized view
table. But the new data getting inserted will not be in the sorted order and hence there will be some data
in the materialized data which are not in the sorted order. In order to get the materialized view in sorted
order, the views need to refreshed periodically or threshold of unsorted data as a percentage of total
records can be set so that the system performs the refresh when the percentage of unsorted records in
the table exceeds the threshold.
Synonym
A synonym is an alternate way of referencing tables and views. It allows users to create easy to type
names for long table or view names. An admin user or any user with create synonym privilege can create
synonyms in a database and the following is a sample statement to create synonym through an nzsql
session or other query execution tool.
create synonym techies for it_employees;
Synonyms can be created against for a non-existent table or view since it is not verified during creation.
But if the table or view referred by a synonym is not existent during runtime an error message will be
returned. Synonyms cant be created for temporary tables, remote databases or other synonyms.
Sequence
sieac llc
13
Sequences are database objects through which users can generate unique number and can then be used
in an application like creation of unique keys. The following is a sample sequence creation statement
which can be used to populate the id column in the employee table.
create sequence seq_emp_id as integer start with 1 increment by 1 minvalue 1 no
maxvalue no cycle;
Since no max value is used, the sequence will be able to hold up to the largest value of the sequence type
which in this case is 35,791,394 for integer type. Also in this case once the sequence reaches its
maximum value it doesnt start reusing the old values since no cycle option is used to prevent reuse.
Values generated by sequences can be accessed using next value for sequence and following are some
examples.
select next value for seq_emp_id;
select *, next value for seq_emp_id from emp;
Few points to note about using sequences
No two users of a sequence will ever get the same value from a sequence while requesting for
next value
When a transaction using a sequence rolls back the value read from the sequence will not be
rolled back i.e. there can be gaps in the sequence number due to transaction rollbacks
Netezza appliance assigns sequence number ranges to all SPUs available in the system which gets
cached for performance reasons. That would mean the sequences generated by the various SPUs
will create gaps in the sequence numbers at any point in time since the ranges will not be over
lapping.
System will be forced flush cached values of sequences in situations like stopping of the system,
system or SPU crashes or during some alter sequence statements which will also create gaps in
the sequence number generated by a sequence.
Existing sequences can be modified using alter sequence statement. Some of the modifications are
changing the owner, renaming the sequence, restarting the sequence with a new start value, change the
increment value or max value and whether the sequence should or should not reuse the old values once
it has reached its max value. The following is a sample alter sequence statement.
alter sequence seq_emp_id increment by 2;
Also an existing sequence can be dropped using a drop sequence statement and the following is an
example.
drop sequence seq_emp_id;
sieac llc
14
Procedures
We will look into the details about stored procedures when we discuss about application development.
Functions
We will look into the details about user defined functions when we discuss about application
development.
sieac llc
15
3. Netezza Security
This section deals with security in the context of user access to the appliance. In Netezza there are two
levels of access controls, one at the host OS level and second at the Netezza database level. By setting
the required access restrictions at these two levels, the appliance can be secured effectively.
OS Level Security
Netezza host uses industry standard Linux operating system customized for performance and
functionality required for the appliance. As in any Linux installations, user access restrictions need to be
put in place using user ids, user groups and passwords. Out of the box, the appliance is configured with a
root user id which is the Linux super user and the user id nz which is the Netezza system
administrator id which is used to run Netezza on the host. The root user id can be used to create other
user ids for users who need to access the appliance natively through the host command shell. Since the
host access is required to perform very restricted tasks primarily administration tasks, the number of user
ids created to access the appliance should be fairly small. Restrictions on what users can perform can be
set by creating Linux user groups with different access restrictions and attaching the relevant users to the
groups. Setting password selection rules like mix of alphabets, numbers, special characters, minimum
password length etc. along with password expiry for users is a good practice.
Database Level Security
Access to databases is controlled using user ids and passwords which are separate from the OS level user
id and password. If an user need to be able to access to a Netezza database natively through a nzsql
session on the host, the user need to use a OS level user id and password to log in to the host and then
need to invoke the nzsql command using the database level user id and password which has access to
the particular database of interest. Access to databases, objects with in objects and the type of activities
which can be performed on them are all controlled by the privileges granted to the user id to perform
the task. Netezza also supports user groups as with the Linux operating system where privileges can be
assigned to groups and similar users can be attached to the group so that it is easier to manage access to
databases. When a user id is attached to more than one group the user id gets combination of all the
privileges assigned to the groups to which the user id is attached to. The following is a sample create
user statement
create user dbdev with password zksrfas92834;
The following is a sample create group statement

create group devgrp with user dbdev, scott;
sieac llc
16
4. Netezza Storage
As discussed earlier, each disk in the appliance is partitioned into primary, mirror and temp or swap
partitions. The primary partition in each disk is used to store user data like database tables, the mirror
stores a copy of the primary partition of another disk so that it can be used in the event of disk failures
and the temp/swap partition is used to store the data temporarily like when the appliance does data
redistribution while processing queries. The logical representation of the data saved in the primary
partition of each disk is called the data slice. When users create database tables and load data into it, they
get distributed across the available data slices. Logical representation of data slices is called the data
partition. For TwinFin systems each S-Blade or SPU is connected to 8 data partitions and some only to 6
disk partitions (since some disks are reserved for failovers). There are situations like SPU failures when a
SPU can have more than 8 partitions attached to it since it got assigned some of the data partitions from
the failed SPU. The following diagram illustrates this concept
The SPU 1001 is connected to 8 data partitions numbered 0 to 7. Each data partition is connected to one
data slice stored on different disks. For e.g., the data partition 0 points to the data slice 17 stored on the
disk with id 1063. The disk 1063 also stores the mirror of the data partition 18 stored on disk 1064. The
following diagram illustrates what happens when the disk 1070 fails.
Immediately after the disk 1070 stops responding, the disk 1069 will be used by the system to satify
queries for which data is required from data slice 23 and 24. Disk 1069 will serve the requests using the
data in both its primary and mirror partition. This will also create a bottleneck which inturn impacts
query performances. In the meantime, the contents in disk 1070 are regenerated on one of the spare
sieac llc
17
disks in the disk array which in this case is disk 1100 using the data in disk 1069. Once the regen is
complete the SPU data partition 7 is updated to point to the data slice 24 on disk 1100. The regen
process removes the bottleneck of disk 1069 to perform optimally.
In the situation where a SPU fails, the appliance assigns all the data partitions to other SPUs in the
system. Pair of disks which contains the mirror copy of each others data slice will be assigned to other
SPUs which will result in additional two data partitioned to be managed by the target SPU. If for e.g. if
an SPU currently manages data partitions 0 to 7 and if the appliance reassings two data partitions from a
failed SPU, the SPU will have 10 data partitions to manage and it will be numbered from 0 to 9.
Data Organization
When users create tables in databases and store data into it, data gets stored in disk extents which is the
minimum storage allocated on disks for data storage. Netezza distributes the data in data extents across
all the available data slices based on the distribution key specified during the table creation. A user can
specify upto four columns for data distribution or can specify the data to be distributed randomly or non
at all during the table creation process. If an user provides no distribution specification, Netezza uses
one of the columns to distribute the data and the selection of which cant be influenced. When the user
specifies particular column for distribution then Netezza uses the column data to distribute the records
being inserted across the dataslices. Netezza uses hashing to determine the dataslice into which the
record need to be stored. When the user selects random as the option for data distribution, then the
appliance uses round robin algorithm to distribute the data uniformly across all the available dataslices.
The following are some sample table create statements using the distribute clause.
sex char(1),
) distribute on hash(emp_id);
create table nation_state (
ns_id integer not null constraint pk_nation_state primary key,
nation varchar(25),
state_code varchar(5),
state varchar(25)
) distribute on random;
The key is to make sure that the data for a table is uniformly distributed across all the data slices so that
there are no data skews. By distributing data across the data slices, all the SPUs in the system can be
utilized to process any query and in turn improves performance. Also the performance of any query
depends on the slowest SPUs handling the query. So if the data is not uniformly distributed then some
sieac llc
18
of the SPUs will have more data in the data slice attached to it called data skew which will impact the
processing time and in turn the overall query performance. Selecting columns with high cardinality for
the distribution is a good practice to follow.
Even if a column with high cardinality like a date column is chosen to distribute data, there is a
possibility of creating processing skew. For e.g. using the date column as the distribution key, the data
gets distributed fairly evenly across all the data slices. But if most of the queries are looking for data for a
particular month which is fairly often in a data warehousing environment, then only a particular set of
data slices may need to be processes by the appliance which in turn will only utilize a subset of SPUs
causing the query performing sub optimally. This is called processing skew and needs to be prevented by
understanding the processing requirements and choosing the correct distribution keys.
When creating CTAS tables the following pattern is followed in terms of how the data distribution will
happen in the new table created when an explicit distribution criteria is not defined
If a CTAS table is created on a single source table and the source table is randomly distributed,
then the CTAS table will be distributed in the first column in the CTAS table.
If a CTAS table is created on a single source table with defined distribution on certain columns,
the CTAS table will inherit the source tables distribution
If a CTAS table is created by joining two tables, the distribution key of the resulting CTAS table
will be the join key of the two tables
If a CTAS table is created by joining multiple tables and a group by clause, the distribution key
of the resulting CTAS table will be the keys in the group by clause.
Zone Maps
When table data gets stored in extents on disk, Netezza keeps track of the minimal value and the
maximum value of columns of certain data types in data structures called zone maps. The zone maps are
created and maintained automatically for all the columns in the table which is of following data types.
All Integer types (int1, int2, int4, int8)

Date
Timestamp
By keeping track of the min and max values, Netezza will be able to avoid reading disk extends which
will be most of the disk extends in a large data warehouse environment. For e.g. if one of the fact table
stores one million records for the last 10 years which is more than a billion records and if the queries
process only a months data, Netezza will be able to read only the one million record. And if there are 96
snippet processors and if the data is uniformly distributed across all the data slices, then the amount of
data read into the snippet processors will be a little over 100,000 records which is small to process.
Enabling the appliance to utilize the zone map feature by selecting the best data types for columns in the
tables is another key point to take into consideration during design.
When two tables are joined together often like a customer table and order table, the distribution key
selection of the two tables can play an important role in the performance of the queries. If the
sieac llc
19
distribution key is on the join column, for e.g. customer id column in both the customer and order table,
the data distribution will result in the records with the same customer id values ending up in the same
data slice for both the tables. When a query joining the table is being processed, since the matching data
from both the tables are in the same data slice the snippet processor will be able to perform the join
locally and the send the result without performing additional work which in turn improves the
performance of the query. If the tables are not distributed on the columns often used for join, matching
data from both the tables will end up in different data slices which means the snippet processor need to
do perform additional work to satisfy the join. The appliance will choose to temporarily redistribute one
of the tables on the join column if the other is already distributed on the join column and then the
snippet processors can perform the join locally. If both the tables are not distributed on the join column,
then the appliance may redistribute both the tables before the snippet processors can perform the join. If
a table stored relatively small number of records then Netezza can decide to broadcast the whole table to
all the SPUs s that each one has its own copy for processing. What this means the host need to
consolidate the table data from all the data slices and send it across to all the SPUs the complete table
data.
Clustered Base Tables (CBT)

Along with the option to distribute data during table definition, Netezza also provides an option how to
organize the distributed data with in a data slice. For e.g., we may have distributed employee table on
employee id but wanted to have employee records from the same department to be stored closely
together, the column dept id column in the table can be specified in the create table statement as in the
following example
sex char(1),
) distribute on hash(emp_id)
organize on(dept_id);
Netezza allows up to four columns to organize on. When data gets stored on the data slice, records with
the same organize on column values will get stored in the same or close by extends. Organize on
improves queries on fact tables when they are defined with the frequently joined columns to organize the
data. All the columns specified in the organize on clause are zone mapped and by knowing the range of
values of these columns stored in each physical extent, Netezza can eliminate reading unwanted extents
during a join query which improves the query performance. Zone mapping of additional data types are
supported when columns are specified in organize on clause and the following is the list of data types
including data types which are by default zone mapped.
sieac llc
20
Default zone map data types
Integer - 1-byte, 2-byte, 4-byte, and 8-byte

Date
Timestamp
Additional data types which can be zone mapped due to organize on clause
Char - all sizes, but only the first 8 bytes are used in the zone map
Varchar - all sizes, but only the first 8 bytes are used in the zone map
Nchar - all sizes, but only the first 8 bytes are used in the zone map
Nvarchar - all sizes, but only the first 8 bytes are used in the zone map
Numeric - all sizes up to and including numeric(18)
Float
Double
Bool
Time
Time with timezone
Interval
Organize on column definitions can be modified using the table alter statement. But any column
included in organize on clause cant be dropped by the table. If a table is altered to be a clustered base
table, any new records inserted into the table will be organized appropriately. In order to organize the
existing records in the table, GROOM TABLE needs to be executed to take advantage of the data
reorganization by queries. It is a good practice to have fact tables defined as clustered base tables with
data organized on often joined columns to improve multi-dimensional lookup. At the same time care
needs to be taken on the data organization columns by understanding the often executed queries and
also minimizing the number of columns on which the data need to be organized on. Compared to
traditional indexes or using materialized views to improve performance, CBTs has major advantage in
terms of not using additional space but organizing the table data in place. One point to note is that by
changing the data organization will impact the compression which may result in the increase or decrease
the size of the table storage after converting a table to a CBT.
sieac llc
21
5. Statistics and Query Performance

Netezza query optimizer relies on the statistics from catalog to come up with an optimal query execution
plan. So it is imperative that the statics need to be kept current without which the execution plan
generated by the optimizer may be sub-optimal resulting in poor query performance. The following are
the statistics the optimizer relies on
Object Type
Statistics
Table
Number of records in the tables used in the query
Table columns
Minimum value in the column
Maximum value in the column
Count of null value in the column
Count of distinct values in the column (dispersion or cardinality)
Some examples of how the statistics can be used by the optimizer
If a column is null able additional code may need to be generated to check whether the value is
null or not
Knowing the number of records in the table, min and max value along with the number of
distinct value can help estimate the number of relevant records which can be returned for the
query assuming there is uniform distribution
Based on the min and max value the optimizer can determine the type of math need to be
performed like 64 or 128 bit computation.
These statistics in the catalog are generated using the GENERATE STATISTICS command which
collects them and updated the catalog. Admin users, table owners and users who have the GENSTATS
privilege can execute the generate statistics command on tables in databases. The following are some
examples
generate statistics;
-- Generates statistics on all tables in a database
generate statistics on emp;
-- Generates statistics on table emp
generate statistics on emp(emp_id, dept_id);-- Generates statistics on specified columns
Since generate statistics reads every record in the table to generate the statistics, the data is of very high
accuracy.
Netezza automatically maintains the statistics which may be of lesser accuracy compared to what is
generated by the generate statistics command. The following table lists the various commands and the
type of stat automatically maintained by Netezza. Even though the stat may not be very accurate it is
better than not having any
Command
Create Table As
Insert
Update
Delete
Groom
Row Count
Y
Y
Y
N
N
Min/Max vals
Y
Y
Y
N
N
sieac llc
Null count
Y
N
N
N
N
Distinct vals
Y
N
N
N
N
Zone Map
Y
Y
Y
N
Y
22
Note that the groom command just removes the rows deleted logically; there is no change in stats except
for the zone map values.
The following are some recommended scenarios when generate statistics need to be executed for
performance
Significant changes to the database in terms of data

Static tables intended for a long time
Object in queries where is there is three or more joins
Slower query response after changes to tables
On columns used in where, order by, group by and having clauses
When temporary table used in a join contains large volume of data
Netezza also computes just in time statistics using sampling under the following conditions. This is not a
replacement to running generate statistics and also not accurate due to the sampling.
Query has tables that stores more than 5 million records

Query has at least one column restriction like where clause
Only if the query involves user tables
Tables that participate in query join or if a table has an associated materialized view
Along with sampling the JIT statistics collection process utilize zone maps to collect several pieces of
information like
Number of rows to be scanned for the target table

Number of extents need to be scanned for the target table
Max number of extents need to be scanned on the data slices with the greatest skew
Number of rows to be scanned for all the tables in each join
Number of unique values in target table columns involved in join or group by processing
Note that when CTAS tables are created, Netezza schedules generate statistics to collect all required
statistics for the new table. This automatic scheduling of generate statistics can be influenced by the
following two postgresql.conf file settings.
enable_small_ctas_autostats enables or disables auto stats generation on small CTAS tables

ctas_auto_min_rows specifies the threshold number of rows above which the stats generation is
scheduled and the default value is 10000.
Groom
When data is updated in Netezza, the current record is marked for deletion and a new record is created
with updated value instead of updating the current record. Also when a record in a table is deleted, it is
not physically removed from storage instead the record is marked as logically deleted and will not be
visible for future transactions. Similarly when a table is altered to add a drop a column a new version of
sieac llc
23
table is created and queries against the table are serviced by using the different versions. Due to the
logical deletes which results in additional data in the tables and joins on versions of altered tables, query
performance gets impacted. In order to prevent the query performance degradation, users of Netezza
appliance need to GROOM the table.
The groom command is used to maintain user tables and
Reclaiming physical space by removing logically deleted rows in tables

Migrate records from previous versions of tables into the current version and leave with only
one version of the altered table
Organizing tables according to the organize on columns defined in a alter table command
are some of the key functions of the groom command. By default groom synchronizes with the latest
backup set and what that means is logically deleted data will not be removed if that data is not backed up
until up to the latest backup set. What that means is it is best to schedule grooms after database backups
so that the database objects are kept primed. The following are good practices with groom
Groom tables that often receive large updates and deletes

Schedule Generate Statistics after groom on tables so that the stats are up to date and accurate
If there is a need to remove all the records in a table use truncate instead of delete so that there
will be no need for a groom since truncate will reclaim the space
The following are some example groom commands

groom table emp ;
-- to delete logically deleted rows and reclaim space -groom table emp versions;
-- to remove older versions of an altered table and copy rows to new
groom table emp records ready; -- to organize only the records which are ready to be organized
Note that when groom is getting executed, it doesnt lock the tables and users will be able to access to
perform data manipulation on the table data.
Explain
In order to understand the plan the Netezza appliance will use to execute a query, users can use the
explain SQL statement. Understanding the plan will also provide an opportunity for the users to make
any changes to the query or the physical data structure of the tables involved to make the query perform
better. The following are some of the sample explain statements.
explain select * from emp;
explain distribution select emp.name, dept.name from emp, dept where emp.dept_id =
dept.id;
explain plangraph select * from emp;
The output from the explain statement can be generated in verbose format or graphical format which
can be opened in a web browser window. The output details all the snippets which will be executed in
sieac llc
24
sequence on SPUs and also the code which will be executed on the host. The details include the type of
activity like table scan, aggregation etc., the estimated number of rows, the width of the rows, the cost
involved to execute that snippet since the Netezza query optimizer is a cost based optimizer, the object
involved, the confidence on the estimation etc. The following is a sample explain output
QUERY VERBOSE PLAN:
Node 1.
[SPU Sequential Scan table "OWNER" as "B" {}]
-- Estimated Rows = 16, Width = 11, Cost = 0.0.. 0.0, Conf = 100.0
Projections:
1:B.OWNER_NAME 2:B.OWNER_ID
[SPU Broadcast]
[HashIt for Join]
Node 2.
[SPU Sequential Scan table "ORDER" as "A" {}]
-- Estimated Rows = 454718, Width = 8, Cost = 0.0 .. 8.1, Conf = 80.0
Restrictions:
(A.OWNER_ID NOTNULL)
Projections:
1:A.ORDER_ID 2:A.OWNER_ID
Node 3.
[SPU Hash Join Stream "Node 2" with Temp "Node 1" {}]
-- Estimated Rows = 519678, Width = 23, Cost = 0.0 .. 19.7, Conf = 64.0
Restrictions:
(A.OWNER_ID = B.OWNER_ID)
Projections:
1:A.APPLICATION_ID 2:B.OWNER_NAME
Cardinality:
B.ORDER_ID 9 (Adjusted)
[SPU Return]
[Host Return]
QUERY PLANTEXT:
Hash Join (cost=0.0..19.7 rows=519678 width=23 conf=64) {}
(spu_broadcast, locus=spu subject=rightnode-Hash)
(spu_join, locus=spu subject=self)
(spu_send, locus=host subject=self)
(host_return, locus=host subject=self)
l: Sequential Scan table "A" (cost=0.0..8.1 rows=454718 width=8 conf=80) {}
(xpath_none, locus=spu subject=self)
r: Hash (cost=0.0..0.0 rows=16 width=11 conf=0) {}
l: Sequential Scan table "B" (cost=0.0..0.0 rows=16 width=11 conf=100) {}
sieac llc
25
6. Netezza Transactions
All database systems which allow concurrent processing should support ACID transactions i.e. the
transactions should be atomic, consistent, isolated and durable. All Netezza transactions are ACID in
nature and in this section we will see how ACIDity is maintained by the appliance.
By default Netezza SQLs are executed in auto-commit mode i.e. the changes made by a SQL statement
takes in effect immediately after the completion of the statement as if the transaction is complete. If
there are multiple related SQL statements where all the SQL execution need to fail if any one of them
fails, user can use the BEGIN, COMMIT and ROLLBACK transaction control statements to control
the transaction involving multiple statements. All SQL statements between a BEGIN statement and
COMMIT or ROLLBACK statement will be treated as part of a single transaction i.e if anyone of the
SQL statement fails, all the changes made by the prior statements before the failure will be reverted back
leaving the database data in a state as it was before the start if execution of the BEGIN sql statement.
There are SQL statements like BEGIN which are restricted from including inside a BEGIN,
COMMIT/ROLLBACK block which users need to be aware of.
Traditionally database systems used logs to manage transaction commits and rollbacks. All changes
which are part of a transaction are maintained in a log and based on whether user commits or rolls back
the transaction the changes are made durable i.e. written to the storage. Netezza doesnt use logs and all
the changes are made on the storage where user data is stored which also helps with the performance. To
accomplish this for table data changes, Netezza maintains three additional hidden columns (createxid,
deletexid and row id) per table row which stores the transaction id which created the row, the transaction
id which deleted the row and a unique row id assigned to the data row by the system. The transaction ids
are uniquely assigned ids for each transaction executed in the system and they are assigned in increasing
order. When a row is created the deletexid is set to 0 and when it gets deleted it is set to the
transaction id which executed the delete request. When an update is made to a table row, the row data is
not updated in place rather a new row is created with the updated values and the old row deletexid is
updated with the id of the transaction which issued the update command. In case a transaction rolls back
a delete operation, the deletexid of the rows deleted is updated with 0 and in the case of rolling back
update operation, the deletexid of the new row created is set to 1 and the deletexid of the old row is
updated with 0. One key point to note is that the rows are deleted logically and not physically during
delete operations and additional rows are getting created during update operations. What that means is
groom need to be executed regularly on tables which undergo large volume of deletes and updates to
reclaim the space and improve performance.
The transaction isolation is controlled by the transaction isolation levels supported by any database
system and the one used by the individual transaction. By ANSI standards there are four isolation levels
Uncommitted Read
Committed Read
Repeatable Read
Serializable
The following table details the data consistency which can be expected with each isolation levels
sieac llc
26
Isolation Level
Uncommitted
Committed
Repeatable read
Serializable
Uncommitted Data
Y
N
N
N
Non Repeatable Data

Y
Y
N
N
Phantom Data
Y
Y
Y
N
Uncommitted Data: Transaction can see data from another transaction which is still not committed
Non Repeatable Data: When a transaction tries to read the same data, the data is changed by another
transaction but committed after the previous read.
Phantom Data: When a transaction tries to read the same data, the data previously read is not changed
but new rows has been added which satisfies the previous query criteria i.e. new rows has been added to
the data rows.
Netezza allows only the serializable transaction isolation level and if a transaction is found to be not
serializable then it will not get scheduled. This works fine since the appliance is primarily meant for a
data warehousing environment where there are very minimal or no updates once data is loaded. Data
versioning which can provide a consistent view of data for transactions running concurrently combined
with supporting only serializable transactions, Netezza doesnt need locks the synchronization
mechanism used in database systems traditionally to maintain concurrency. For some DDL statements
like dropping a table, the appliance will require exclusive access on objects and that would mean the
DDL statement execution will have to wait until any other operation like SELECT on the object is
complete.
sieac llc
27
7. Loading Data, Database Back-up and Restores

Loading and unloading data, creating back-ups and the process to restore is a vital component for any
database systems and in this section we will look into how there are done in Netezza.
External Tables
Users can define a file as a table and can access them as any other table in the database using SQL
statements. Such tables are called external table. The key difference is that the even though data from
external tables can be accessed through SQL queries the data is stored in files and not in the disks
attached to the SPUs. Netezza allows selects and inserts on external tables but not updates deletes or
truncation. That means users can insert data into the external tables from a normal table in the database
which will get stored in the external file. Also data can be selected from the external table and inserted
into a database table and these two actions are equivalent of data unload and load. Any user with LIST
privilege and CREATE EXTERNAL TABLE privilege can create external tables. The following are
some sample statements to create external tables and how they can be used
-- creates an external table with the same definition as an existing customer table -create
external
table
customer_ext
same
as
customer
(/nfs/customer.data) delimiter |);
using
(dataobject
-- create statement used the column definitions provided to create the external table -create external table product_ext(product_id bigint, product_name
vendor_id bigint) using (dataobject(/tmp/product.out delimiter ,);
varchar(25),
-- create statement uses the definition of existing account table to create the external table and stores the data in the
account.data file in an Netezza internal format -create external table /tmp/account.data using (format internal compress true) as
select * from account;
-- read data from the external table and insert into an existing database table -insert into temp_account select * from external /tmp/account.data using (format
internal compress true);
-- storing data into an external table which in turn into the file. This file will be readable and used with other DBMS -insert into product_ext select * from product;
Once external tables are created they can be altered or dropped. Statements used to drop and alter a not
mal table can be used to perform these actions on external tables. Note that when an external table is
dropped, the file associated with the table is not removed and need to be removed through OS
commands if that is what is intended. Also queries cant have more than one external table or unions
cant be performed on two external tables. When user performs an insert operation on an external table,
the data in the file associated with the external table (if existing) will be deleted before new data is written
i.e. if required backup of the file may need to be made.
External tables are one of the options to backup and restore data in a Netezza database even though it
works at the table level. Since external tables can be used to create data files from table data in a readable
format with user specified delimiters and line escape characters, they can be used to move data between
Netezza and other DMBS systems. Note that the host system is not to store data and hence alternate
arrangements need to be made if the user intends to move large volume of data like using nfs mounts to
store the files created from the external tables.
sieac llc
28
NZLoad CLI
NZLoad is a Netezza command line interface which can be used to load data into database tables.
Behind the scenes NZLoad creates an external table for the data file to be loaded, does the select of data
from the external table and inserts the data into the target table and then drops the external table once
the insert of all the records are complete. All these are done by executing a sequence of SQL statements
and the complete load process is considered part of a single transaction i.e. if there is an error in any of
the steps the whole process is rolled back. The utility can be executed on a Netezza host locally or from
a remote client where the Netezza ODBC client is installed. Even though the utility uses the ODBC
driver, it doesnt require a Data Source Name on the machine it is executed since it uses the ODBC
driver directly bypassing the ODBC driver manager. The following is a sample NZLoad statement
nzload u nz pw nzpass db hr t employee df /tmp/employee.dat delimiter , lf
log.txt bf bad.txt outputDir /tmp fileBufSize 2 allowReplay 2
A control file can also be used to pass parameters using the cf option to the nzload utility instead of
passing through command line. This also allows more than one data file to be loaded concurrently into
the database with different options and each load will be considered as different transaction. The control
file option cf and the data file option df are mutually exclusive i.e they cant be used at the same time.
The following is a same nzload command assuming the load parameters are specified in the load.cf file.
nzload u nz pw nzpass host nztest.company.com cf load.cf
NZLoad exits with an error code of 0 - for successful execution, 1 for failed execution i.e. no records
were inserted, 2 for successful loads where the maximum number of rows which were in error during
load did not exceed the maximum specified in the NZLoad command request.
When NZLoad is getting executed against a table, other transactions can still use the table since the load
command will not commit until the end and hence the data being loaded will not be visible to other
transactions executed concurrently. Also the NZLoad sends chunks of data to be loaded along with the
transaction id to all the SPUs which in turn stores the data immediately onto storage and thus taking up
space. If for any reason the transaction performing the load is cancelled or rolled back at any point, the
data will still remain in storage even though it will not be visible to future transactions. In order to claim
the space taken by such records, users need to schedule a groom against the table. Note NZReclaim
utility can also be used on earlier versions of Netezza but it is getting deprecated from version 6. The
user id used to run the NZLoad utility should have the privilege to create external tables along with
select, insert and list access on the database table.
Database Backup and Restore
Unlike traditional database systems, Netezza has a host component where all the database catalogs,
system tables, configuration files etc. are stored and the SPU component where user data is stored. Both
these components need to be backed-up in sync for a successful restore even though in restore situations
like database or table restore the host component may not be required. Netezza provides nzhostbackup
sieac llc
29
and nzhostrestore utilities to make host backups and restore along with the nzbackup and nzrestore
utilities to manage the user data backup and restoration. This is in addition to external tables which
can be used to backup and restore individual tables which can also be used to make back-ups of all the
tables in a database and hence creating a database backup.
Host Backup and Restore
Netezza host can be backed up using the nzhostbackup command utility. It backs up the Netezza data
directory which stores the system tables, database catalogs, configuration files, query plans, cached
executable code for the SPUs etc which are required for the correct functioning and access to user
databases are stored. When the command is executed, it will pause the system to make a checkpoint
before taking a backup of the /nz/data directory. Once the backup is complete the system is resumed to
its original state and due to this requirement to pause the system, it would be good to schedule the host
backup when there is no or very less activity on the system.
In the event of any issues with the Netezza host, the host can be restored using the nzhostrestore
utility. The restore will restore only the host components like the database catalog entries, system tables
etc to the state when the host backup was made. What that means is if there were actions like dropping
of user tables or truncation or grooming of tables after the host backup, the host restore will not be able
to revert back these actions. This may result in the inconsistent state of the user objects and corrective
action need to be taken to bring them to the state in sync to when the host backup was made. Also in the
case of new tables created after the host backup, when the host restore is done the utility will create SQL
scripts to drop these tables which are orphaned since the entries are not in the catalog. To avoid any
such inconsistencies it is always recommended to schedule the host backup when every there is a major
change to the databases. The following are sample nzhostbackup and nzhostrestore commands.
nzhostbackup /tmp/backup/nz-devhost-bk-01212011.tar.gz
nzhostrestore /tmp/backup/nz-uathost-bk-12212011.tar.gz
During the host restore process the utility checks for the version of the catalog on the host compared to
what is in the backup. If they are not the same or if for some reasons the utility is not able to identify the
versions, then the restore process will fail. This verification step can be skipped using the catverok of
the nzhostrestore utility but it is not recommended. Note that the nz host backup doesnt back up the
operating system components or configurations and they need to be backed up separately.
User Database Backup and Restore
User database and data separate from the host components can be backed up and restored using the
nzbackup and nzrestore utilities.
The nzbackup allows the user to make a full, cumulative or differential backup of a database. The full
backup makes a backup of the complete database, while the differential backup makes a backup of the
changes which were made after the previous backup either full or cumulative. The cumulative backup
creates a backup of all the changes which were made after the previous full backup. Users executing the
nzbackup command need to have backup privilege on the specific database level or at the global level.
sieac llc
30
The following is an example where a full back up is scheduled every month say on day 1, a differential
back up scheduled every day and a cumulative backup scheduled every week.
1
Daily Diff 1
Monthly full
Monthly full
iff
Daily Diff 2
Daily Diff 3
Daily Diff 4
Daily Diff 5
Monthly full
Monthly full
Monthly full
iff
iff
iff
Monthly full
iff
Daily Diff 6
Weekly Cumulative
Monthly full
Monthly full
iff
iff
Daily Diff 7
Monthly full
iff
When the differential back up is run, it captures all the data after the previous backup. If the database
need to be restored as of day 4 after the monthly backup, the monthly backup need to be used along
with the differential backup 1,2 & 3. If the database needs to be restored as of day 8, the then the
monthly backup, along with the cumulative backup and the differential back 6 need to be used. There is
no point in time recovery in Netezza as with other traditional databases where databases can be
recovered to a point in time between databases backup which uses logs which are not available in
Netezza. The following are some example Netezza backup commands.
-- creates a full backup of the hrdb database -nzbackup dir /tmp/backups u nzbackup p nzpass db hrdb
-- creates a differential or incremental backup of the findb database -nzbackup dir /nfs/backups db findb differential
-- creates a cumulative backup of the hrdb database -nzbackup dir /nfs/backups db hrdb cumulative
-- creates a backup of the schema from the hrdb database and not the data -nzbackup dir /nfs/backups db hrdb schema-only
-- creates the backup of the users and permissions at user database and system level -nzbackup dir /nfs/backups users u nzbackup p nzpass
As you may have noticed, there are two other uses for the nzbackup utility other than the database data
backup. One is to generate the schema of a database which then can be used to create another database
with only the entities in the source database and not the data. Second is to back up all the users, groups
and the privileges in the system so that it can be used to restore the user details or create the same set of
users and privileges in another system. The history of database backup can be viewed using the history
option of the nzbackup command.
nzbackup history
nzbackup history db hrdb
A full backup and along with all the differential and cumulative backups after that until the next full
backup is considered park of a backup set. Each backup set is identified by an id along with the backups
made within the backup set identified by a sequence number. All these details are displayed when a user
makes a request for the backup history for a database.
Databases can be restored using the nzrestore command line utility and the user need to have the restore
privilege. A full database restore cant be made against an existing database which can be done in
traditional databases. Instead the existing database need to be dropped or new database needs to be
sieac llc
31
specified to perform a full database restore. As with nzbackup utility the option of schema-only can be
used to create only the objects in the new database and not load the data from the backup. Also only the
users and privileges can be restored using the users option of the nzrestore utility. The following are
some example restore commands
-- Performs a restore of the hrdb database from a full backup -nzrestore db hrdb u nzbackup p nzpass dir /tmp/backups v
-- creates all the objects in fundb into the new database testfindb but doesnt load any data -nzrestore db testfindb sourcedb findb schema-only u nzbackup
/nfs/backups
nzpass
dir
-- Creates all the users and applies all the privileges as in the backup but doesnt drop users or alters existing privileges -nzrestore users u nzbackup p nzpass dir /nfs/backups
-- Restores the database findb upto and until the backup sequence 2 with in the backupset 200137274 -nzrestore db findb u nzbackup p nzpass dir /tmp/backups backupset 200137274
increment 2
Database backups can be used to restore specific table(s) to a point in time using the nzrestore tables
option. One are more tables can be restored by listing them along with the tables option. If the table
being restored is in the database then the table will be overwritten and if it doesnt exist, the table will get
created. Table level restore is applicable only for regular tables.
sieac llc
32
8. Netezza SQL
As with other popular DBMS systems currently available in the market, SQL in Netezza supports SQL92 standard. The SQLs can be executed through the nzsql command line utility or through any JDBC,
ODBC or OLE DB connectivity.
SQL Statements
The following are the SQL statements available for the various object types in Netezza.
Database
SQL Statement
Create Database
Drop Database
Alter Database
Description
To create a new Netezza database
To drop an existing Netezza database
To rename or change owner of an existing database. Renaming a
database will require recompilation of any views in the database.
Also all materialized views will be converted to normal views and
will need to be replaced.
The following are some limits at the database level

Limit Type
Name Length
Connections
Limit
Maximum number of characters 128 bytes
Maximum number of connections to the server:2000 default:500
When connected to one database, objects in another database can be accessed by qualifying the object
name with database name.schema name for e.g. hrdb.hradmin.emp can be used to access the emp
table in hrdb database under hradmin schema. The schema name/object owner name is optional and the
object can be accessed by database nama..object name e.g hrdb..emp since object names are unique
in a database. While objects in a second database can be queried or used in query joins, no data updates
are allowed across databases.
Tables
SQL Statement
CREATE TABLE
DROP TABLE
ALTER TABLE
SELECT col_name(s) FROM tbl
INSERT INTO tbl
UPDATE tbl SET col_nme = val
DELETE FROM tbl
TRUNCATE TABLE tbl
Description
To create a new Netezza table
To drop an existing Netezza table
To rename or change owner or modify (add, drop) columns or
modify constraints in an existing table
To retrieve data from a database table
To insert data into a database table
To update existing data in a table
To delete data from a table
To remove all records from the table. Truncate removes the data
and recovers the storage instead of logically deleting the rows as in
delete statement
sieac llc
33
SQL Statement
UNION, UNION ALL
Description
To combine data from more than one table or row set UNION
can be used in conjunction with SELECT statement
INTERSECT, INTERSECT ALL To retrieve common set of rows in more than one table or row set
INTERSECT can be used in conjunction with SELECT statement
EXCEPT[DISTINCT],
To retrieve set of rows from one table which are not in another
MINUS[DISTINCT,
EXCEPT row set/table EXCEPT or MINUS can be used in conjunction
ALL, MINUS ALL
with SELECT statement
ALTER
VIEWS
ON
tbl To re-materialize all the materialized views which are based on a
MATERIALIZE REFRESH
particular table
The following are some limits at the table level
Limit Type
Name/Column Name Length
Max Column Count
Distribution Keys
Char/Varchar Field length
Row Size
Limit
Maximum number of characters 128 bytes
Maximum number of columns per table: 1600
Maximum number of distribution columns: 4
Maximum number of characters in char/varchar type: 64000
Max row size in a table: 65,535
Synonyms
Synonyms can be used to refer tables and views in the same database with a different name or refer to
objects in a remote database without the database.schema qualifier. The following are the supported
SQL statements for Synonyms
SQL Statement
CREATE SYNONYM
FOR table/view
Description
name To create a synonym and it doesnt verify whether the table or view
for which it is getting created exists or not. If the underlying object
is non-existent a runtime error will be thrown when user tries to
access the Synonym.
DROP SYNONYM name
To drop an existing synonym
ALTER SYNONYM name
To rename or change owner of an existing synonym
Views
Views are created to give a different perception of data like restricted table column or join of data from
two sets of data and/or to control access to the data in the database. The following are the supported
SQL statements for views
SQL Statement
CREATE OR REPLACE VIEW
DROP VIEW
ALTER VIEW
Description
To create a new Netezza view
To drop an existing Netezza view
To rename or change owner of an existing view
sieac llc
34
Materialized Views
The following are the supported SQL statements on Materialized Views
SQL Statement
CREATE
OR
REPLACE
MATERIALIZED VIEW
DROP VIEW
ALTER VIEW vw MATERIALIZE
REFRESH|SUSPEND
Description
To create a new materialized view or replace an existing one
when the base table gets changed that impacts materialized view
To drop an existing materialized view (same as normal views)
Use suspend option to suspend a view so that activites like
base table refresh or groom can run. Use refresh option to
activate a suspended materialized view. The refresh option is
also used to re-materialize the view so that the rows are in the
correct order.
A system default can be set to specify the percentage of rows in a materialized view that can be out of
order and the default value is 20%. Once set the command ALTER VIEWS ON MATERIALIZE
REFRESH will re-materialize all the views in the database which has the set percentage of rows which
are not in the correct order.
Functions
Netezza comes with many built in functions are the following are some of them under various
categories.
Category
Aggregates
Casting
Date Time
Checking null value
String Functions
Fuzzy String Search
Value Functions
Math Functions
Functions
rollup, cube, grouping sets, count, sum, max, min, avg
to_char, to_date, to_number, to_timestamp, cast(value as type),::
extract(field from datetime value) where field = epoch, year, month, day,dow,doy
nvl(), nvl2()
Trim, position, substring, lower, upper, like, not like, character_length
le_dst, dle_dst
current_date, current_time, current_timestamp, current_user, current_db
acos, asin, atan, atan2, cos, cotx, degrees, pi, radiants, sin, tan, random, ceil, abs
This is just a sample list and other built in functions along with the details can be obtained from the
Netezza database user guide.
sieac llc
35
9. Stored Procedures
Stored procedures are programs written in NZPLSQL language to perform database operations. They
are stored in Netezza databases and runs on Netezza host when executed. NZPLSQL is an interpreted
language based in Postgres PL/pgSQL. The key advantages of using stored procedures are
It helps avoid unnecessary network traffic between the appliance and the application requesting
for data which is a major advantage when dealing with large volume of data in a ware housing
environment.
By controlling the access restrictions users can be allowed to perform database operations
through stored procedures without providing access to underlying objects.
By encapsulating business logic in stored procedures there is only one place to make
modifications i.e. helps maintenance
NZPLSQL along with regular SQL statements provides a complete programing language with
capabilities like conditional statement, looping functionality, declaring variables and manipulating it
through expressions, creating and calling sub routines etc. All regular SQL statements except the ones
which are not allowed to be with in a BEGIN/COMMIT block can be used in a stored procedure. NZ
SQL Procedures can accept multiple input variables and return one output either a scalar value or a
result set. The stored procedures can be created, altered, executed and dropped through NZSQL
command utility or through an ODBC, JDBC, OLE-DB connection to a database. The following is a
sample set of SQL statements with respect to stored procedures
/*
Stored procedure to update employee salary
*/
create or replace procedure update_emp_sal(int4) returns int4 execute as
language nzplsql as
begin_proc
declare
int4 ret_code;
-- Variable to store the first value passed as input to the stored procedure -Inc_Percent alias for $1;
begin
update hr..emp set salary = salary+(salary*inc_percent*100);
raise notice Updated employee salary successfully;
-- Section to handle all unhandled exception -exception when others the
raise notice Error in updating employee salary;
End;
End_proc;
-- To display the details about the stored procedure -show procedure update_emp_sal(int4);
-- To execute the stored procedure -exec procedure update_emp_sal(12);
-- Another way to execute the stored procedure -call procedure update_emp_sal(-2);
-- To drop the stored procedure -drop procedure update_emp_sal(int4);
owner
Stored procedures are not case sensitive i.e. they get converted into capital letters irrespective of what
case letter is used. So if there are variables which are spelled the same but use different case letters, they
sieac llc
36
all are considered duplicates. Create or replace procedure statement replaces if a procedure with the same
name already in the database or creates new if none exists.
Admin user and database owner by default have all the permissions to perform actions regarding stored
procedures. Users with grant permission option also provide permissions to other users or groups to
alter, create or replace, drop, execute stored procedures. The following are some sample statements
-- grant permission to create stored procedure to the group devgrp -grant create procedure to group devgrp;
-- grant all permissions regarding stored procedures to user mike -grant all on procedure to mike;
-- revoke permission to create procedure from the group qagrp -revoke create procedure from group qagrp;
-- revoke permission to alter on the procedure update_emp_sal(int4) from user tom -revoke alter on update_emp_sal(int4) from tom;
-- provide execute permission on the procedure update_emp_sal(int4) to user sally -grant execute on update_emp_sal(int4) to sally;
-- grant drop access on the procedure update_emp_sal(int4) to user ted -grant drop on update_emp_sal(int4) to ted;
Note that a procedure is uniquely identified using the name along with the input parameters. Procedures
with the same name but with different input parameter list are considered different stored procedures.
When a procedure is created it can be defined whether the procedure should execute with the authority
of the user who created it or with the authority of the user who is executing it. The EXECUTE AS
OWNER or EXECUTE AS CALLER parameter in the CREATE or ALTER procedure statement
determines the authority used during execution. By specifying that the procedure need to be executed as
owner, even if the user executing the procedure doesnt have the necessary authority on the objects
accessed by the caller of the procedure, the procedure will be executed successfully as long as the owner
who created the procedure has the necessary privileges on the objects. For e.g. if the emp table data can
be updated by only the user in the hrgroup and the stored procedure update_emp_sal(int4) is created by
someone from the hrgroup, the user sally will be able to execute the procedure successfully even if the
user is not part of the hrgroup. The reason is that the procedure was created with execute as owner
option and the owner has the privilege to update the table. If the standard is that only the user with the
necessary permissions on the underlying objects can only be able to manipulate data through a
procedure, then procedures can be created using execute as caller option.
-- alter procedure update_emp_sal(int4) to execute as caller -alter procedure update_emp_sal(int4) execute as caller;
After the execution of this alter statement, if the user sally executes the procedure it will fail since the
user is not part of the hrgroup and doesnt have the update privilege. When a stored procedure is
executed all the objects referred in the procedure will need to be available in the database where it is
getting executed unless the object is fully qualified. If an object from a remote database is used in the
procedure, data can only be queried from the object in the remote database and cant be inserted,
updated or deleted.
When database backups are restored the stored procedures all get restored and can be without errors
unless the version of Netezza where the restore is done doesnt support procedures or some of the
sieac llc
37
features which came out in a later version. It is a good practice to store the procedure code in other
storage other than just relying on the database backups for recovery purposes.
NZPLSQL Structure
[<<label>>]
[DECLARE
declarations]
BEGIN
statements
[EXCEPTION WHEN OTHERS THEN
handler statements]
END;
Comments
NZPLSQL supports block comments where a block of comments can be enclosed between /* */ and a
line comment where the comment can be added to a line after --.
/*
Block Comments where
Multiple lines of comments can be enclosed
*/
-- Anything until the end of the line is considered as a comment
Variables and Constants

Variables and constants are defined in the DECLARE sections of the procedure and the syntax used is
var_or_const_name [CONSTANT] type [NOT NULL] [ { DEFAULT | := } value ];
if the variable is defined as a constant then the value for the variable cant be changed during the
execution of the procedure. The default value defines the initial value of the variable and if the type
defines the data type of values which can be stored into the variable. If a value is not assigned to a
variable a sql null value is stored by default into the variable. So in order to define a variable as not null, a
default value need to be assigned without which a null value will get stored and error will get thrown
since null is not an acceptable value in a not null variable.
Variables can be assigned the data type of a table column or the data type of a table row using the
%TYPE and %ROWTYPE attributes. The following are some examples.
user_id users.user_id%TYPE - sets the data type to the type of the column user_id
Emp_Rec hr..emp%ROWTYPE - sets the data type to the row of the table emp in hr db
When a variable is set to a row type, the individual columns can be accessed using the dot . operator
like emp_rec.emp_id where emp_id is one of the columns in the emp table. The table referred in the
ROWTYPE data type definition should be an existing table in the database where the procedure is
created.
sieac llc
38
Array variables are supported in NZPLSQL and can be defined using VARRAY key word. Following is
the syntax to declare an array variable
var_name VARRAY(n) OF data_type;
where var_name is the name of the variable, n is the size of the array and the data_type is the data
type of the values stored in the array. For e.g. the following is an array variable of type int and size 10
vId VARRAY(10) OF INT;
The size of an array variable can be found using the COUNT method. For e.g. vId.COUNT will return
an int value of 10. If required the size of the array can be changed using the EXTEND and TRIM methods
and both methods accepts an integer value with a default value of 1 as input.
Procedure Input Parameters
Procedures can be passed up to 64 parameters and they are not named. Aliases can be created in the
procedures to access the input parameters and they are constants i.e cant be updated.
CREATE OR REPLACE PROCEDURE update_emp_sal(int, varchar(ANY)) RETURNS int
LANGUAGE NZPLSQL AS
BEGIN_PROC
DECLARE
pId ALIAS FOR $1; -- Can access the first input value passed
pName ALIAS FOR $2; -- Can access the second input value passed
--both the previous variables which are aliases are constants; cant be updated
BEGIN
UPDATE EMP SELECT * FROM EXT_EMP WHERE id = pId;
END;
If any updates need to be made to the value passed as input parameters, then the values need to be
copied into a local variable, make the required updates and use the local variable for further processing.
CREATE OR REPLACE PROCEDURE p1 (int, varchar(ANY)) RETURNS int
LANGUAGE NZPLSQL AS
BEGIN_PROC
DECLARE
pId ALIAS FOR $1; -- Can access the first input value passed
pName ALIAS FOR $2; -- Can access the second input value passed
--both the previous variables which are aliases are constants; cant be updated
vId int4; -- local variable which can be updated
BEGIN
vId = pId+1;
UPDATE EMP SELECT * FROM EXT_EMP WHERE id = vId;
END;
Also procedures can be passed arbitrary number of variables with different data types as input
parameters. This is enabled by using VARARGS as the input parameter and during execution the
number of input parameters and the types can be accessed using the predefined array called
PROC_ARGUMENT_TYPES. The total number of parameters passed can be retrieved from the
COUNT of the array as PROC_ARGUMENT_TYPES.COUNT which returns an int value
sieac llc
39
corresponding to the number of parameters passed. The types of the variables are passed as array
elements and can be retrieved using the index into the array. For e.g to retrieve the data type of the
second variable passed PROC_ARGUMENT_TYPES(1) can be used which will return a value of type
oid. Note that the first array element indexed with 0 corresponds to the first input parameter passed to
the stored procedure.
CREATE OR REPLACE PROCEDURE p1 (varargs) RETURNS int
LANGUAGE NZPLSQL AS
BEGIN_PROC
DECLARE
type oid;
int argCount;
BEGIN
argCount := PROC_ARGUMENT_TYPES.count; -- number of input parameters passed
typ := PROC_ARGUMENT_TYPES(0); -- type of the first input parameter passed
END;
Scope of Variables
A DECLARE section where variables can be defined along with the BEGIN/END section following
the DECLARE section forms a code block and each code block in a procedure can be labeled using
<<label>> where label is a name which can be used to referred the block. Any variable declared in a
code block is local to that block and cant be referred outside of that block. If the same name is used to
define variables in two blocks in the procedure, the code in any block will get to work with the copy of
the variable in that block even though there are other variables with the same name in other blocks.
Other than the declare section, variables are defined when using looping statements like FOR loop. If a
variable is defined in the loop statement with the same name as another variable in the code block, then
the variable in the loop statement will take precedence over other variables with the same name when the
code execute the looping logic. If a variable from another code section need to be referred the variable
can be qualified using the label name of the code section. The following is an example which will help
clarify variable scoping in NZPLSQL.
<<outer>>
DECLARE
val int4;
BEGIN
val := 5;
<<inner>>
DECLARE
val int4;
BEGIN
val := 7;
RAISE NOTICE 'inner val != outer val % %', val, outer.val;-- displays 7,5
FOR val IN 1..10 LOOP
--Note that this is a NEW val variable for the loop.
RAISE NOTICE 'The value of val is %', val; -- displays 1 to 10
END LOOP;
RAISE NOTICE 'inner val is still 7. Value %', inner.val; -- displays 7
END;
RAISE NOTICE 'outer val is still 5. Value %', val; -- displays 5
END;
Expressions and Assignments
sieac llc
40
Expressions in NZPLSQL are evaluated internally by executing the SELECT expression query using
the query executor. All the expressions in a procedure is compiled and cached when they are first
encountered and all parameter values required for the expressions are passed to the executor in a
parameter array for substitution before the execution of the expression. The cached copy of expressions
will be preserved until there is change in the procedure or the database is stopped or the procedure is
dropped.
While using expressions, it is important to remember about the implicit casting of values apart from the
explicit casting which happens due to variable type definition. For e.g
a int;
a := 1+0.5;
Will generate a result of a = 2 since the variable is explicitly defined as integer and also the value gets
rounded up. At the sametime, the following expression
a = (1+0.3)*2;
will generate a result of 3. This is because Netezza will implicitly convert the 1 to numeric to match the
data type of 0.3 which results in 1.3 for the addition, which again will be multiplied with 2.0 due to
implicit casting of 2 to match the numeric value of 1.3 producing final result of 2.8 for the expression.
Since the variable a is defined as int, the value 2.8 will be rounded up to generate the final result of 3.
If unsure about the implicit casting rules, any expression can be executed as a select statement with
proper values to generate a result to understand the implicit casting applied to the expression. The select
statement can also be used to create a table and the resulting column definition from the table created by
the expression will also provide details about the implicit casting. Also NZPLSQL functions like sqrt()
also does some implicit casting which users need to be aware of.
Apart from the implicit casting, users also need to be aware of the maximum values which can be stored
in various data types supported by NZSQL to avoid overflows which will generate incorrect results. For
e.g. if we add 1 to an integer variable storing the value of 2147483647 which is the maximum positive
value an integer can store will result in the incorrect value of -2147483648 due to overflow.
Executing Queries
Apart from being able to execute static queries, queries can be created in the procedures and executed
dynamically in stored procedures. The dynamic queries can be executed using the EXECUTE
IMMEDIATE query statement where query is a string. When a query gets executed in a procedure,
the system provides status about the query using the ROW_COUNT and LAST_OID status indicators.
ROW_COUNT provides the number of rows processed by the query and the LAST_OID provides the
OID of the last row inserted by the SQL. LAST_OID is relevant only to INSERT queries since it
provides the object id of the last inserted row.
When a select query is executed and to check whether any rows are returned, the ROW_COUNT status
variable can be used. Also the reserved variable FOUND which is equivalent to checking
sieac llc
41
ROW_COUNT >= 1 can be used immediately after the select query execution to find whether any rows
were retrieved by the query.
Similarly the expression IS NULL can be used to verify whether a variable is null or not. For e.g. the
following statement
if dept_id is null then
Dept_id = 999
end if;
will execute the statements in the if statement control block when there is no value assigned to the
dept_id or when it is assigned NULL value.
Execution Control Statements
NZPLSQL supports multiple control statement structures and the following provides a brief overview
of the supported structures
Conditional Control
IF-THEN, IF-THEN-ELSE, IF-THEN-ELSE-IF, and IF-THEN-ELSIF-ELSE are the
conditional control structures supported.
The following is a sample IF-THEN statement
IF v_sal > 10000 THEN
v_count := v_count + 1;
END IF;
All operators which generates a Boolean result can be used in the IF statement to control the logic flow.
The following is a sample IF-THEN-ELSE control statement
ELSE
v_count_min := v_count_min + 1;
END IF;
The following is a sample IF-THEN-ELSE-IF control statement

ELSE IF v_sal > 5000
sieac llc
42
v_count_min := v_count_min + 1;
END IF;
END IF;
The following is a sample IF-THEN-ELSIF-ELSE control statement

IF location.category = H THEN
notification := Hazardous;
ELSIF location.category = R THEN
notification := Restricted;
ELSE
notification := Standard;
END IF;
Note ELSIF can also be spelled as ELSEIF which produces the same result.
Iterative Control
NZPLSQL supports LOOP-END LOOP, WHILE loop and FOR loop iterative control statements.
Users can terminate out of loops using EXIT statement. The following is an example of LOOP-END
LOOP statement
<<loop1>>
LOOP
Perform tasks
IF v_count > 1000 THEN
EXIT loop1;
END IF;
Perform tasks
END LOOP;
In the sample above IF control statement is used to determine when to exit from the loop. The same can
be accomplished using EXIT WHEN statement as in the following sample. Note that when a label is
used in the EXIT statement, the label should belong to the current loop to which the EXIT statement
belongs to or should be one of the outer loop in which the current loop is part of.
<<loop1>>
LOOP
sieac llc
43
Perform tasks
<<loop2>>
LOOP
Perform tasks
IF v_count > 1000 THEN
EXIT loop1;
-- The code exits both loop 1 and loop 2
END IF;
Perform tasks
END LOOP; -- This ends loop 2
END LOOP; -- This ends loop 1
The following is an example of WHILE loop statement

WHILE state = MA and year = 2000 LOOP
perform tasks
END LOOP;
The following is an example of FOR loop statement

FOR count in 1..100 LOOP - reverse key word can be used to count backwards
Perform tasks
END LOOP;
Note that if required both the WHILE and FOR loop can be exited out using EXIT statement in
combination with an IF or WHEN control statement before the loops are completed.
Working with table rows
While discussing variable types, we saw the data type of %ROWTYPE which defines the variable to be
the type of a specific table row. Instead being a type of a specific table, a variable can also be defined as a
type of RECORD which can be used to store data from any table. For e.g. a variable r_EMP defined as
EMP%ROWTYPE can store data from the EMP table when used in a SELECT statement like
SELECT * INTO r_EMP WHERE ID = 100;
But when used to select data from a DEPT table into the same variable will result in error. But if a
variable r_REC is defined as RECORD, then the variable can be used to store data from both the EMP
table and the DEPT table with no issues. Another way table row data can be retrieved is by defining
multiple variables with the same data type definitions as the table columns and then use them as comma
separated list in the select statement like the following example
sieac llc
44
SELECT * FROM EMP INTO v_ID, v_NAME, v_DEPT WHERE ID = 1000;
So far it is all good to retrieve one record from tables. And as mentioned earlier, the FOUND or
ROW_COUNT variables can be used to verify whether any records were returned from the SELECT
queries executed. But if the query returns a set of rows then a FOR loop is required to traverse through
the result set and act on them. The following is a sample code to handle result sets in NZPLSQL
FOR r_REC in SELECT * FROM EMP LOOP
perform actions
END LOOP;
can be defined as a RECORD or a ROWTYPE of EMP table type. The column values from the
table row data can be retrieved using the . operator and the table column name like r_REC.EMP_ID.
r_REC
Error Handling and Messages

EXCEPTION statement can be used to handle exceptions during the execution of a NZPLSQL. When
an error occurs the SQLERRM stores the text of the error message. The following is an example
BEGIN
Perform tasks
EXCEPTION WHEN OTHERS THEN
Perform tasks
END;
The exception section of the code gets executed only when there is an error and is placed at the end of
the procedure code. The RAISE statement which can be used to prompt a message complements the
EXCEPTION statement to provide some detailed error messages to the user. The following is an
example
BEGIN
Perform tasks
RAISE NOTICE Exception details %, SQLERRM;
END;
RAISE can take in different levels and one of them is NOTICE. Others being DEBUG and
EXCEPTION. While RAISE EXCEPTION will abort the transaction, the DEBUG and NOTICE
levels are only used to send messages to logs. Also EXCEPTIONs which occur in a procedure will
propagate through the call chain until it reaches a point where it is handled using EXCEPTION statement
or reaches the main procedure.
Result set as return value
sieac llc
45
We have seen how procedures can return unique values and aware of the fact that procedures can also
return result set. In order for a procedure to return a result set, it should be defined with the return type
if REFTABLE(table_name). The following is an example
CREATE OR REPLACE PROCEDURE SP_SELECT_EMP() RETURNS REFTABLE(EMP) LANGUAGE PLSQL AS
The table referred to in the REFTABLE clause should exist in the database even if there is no data in it
during the creation of the procedure. In order to return the data in the table, the RETURN REFTABLE
statement should be used as in this example where all the data in the EM P table will be returned to the
caller of the procedure who needs to handle the result set.
CREATE OR REPLACE PROCEDURE SP_SELECT_EMP() RETURNS REFTABLE(EMP) LANGUAGE PLSQL AS
BEGIN_PROC
BEGIN
Perform tasks
RETURN REFTABLE;
Perform tasks
END;
END_PROC;
Tables referred in the REFTABLE clauses in procedures currently defined in the database cant be dropped
until the procedure is dropped or the procedure referring to the table is altered to return another table
data as result set.
sieac llc
46
10.
Workload Management
Netezza appliance comes with components to configure the system resource usage so that it can be
utilized efficiently by the various user groups. In order to configure the system for optimal usage the
system usage need to be monitored and understood clearly. In this section we will look into how to
monitor the system usage and how the appliance can be configured for optimal resource usage.
Query History Collection and Reporting
Understanding the queries getting executed and the resources used in the system provides a good
understanding of how the appliance is being utilized by the users of the installation. Netezza can
automatically collect and store the details about the queries getting executed by creating a query history
database and enabling a query history configuration. The query history provides data like
Queries executed, their start and end time and the total execution time
Queries executed by users and user groups
Tables and the columns in the table accessed by the queries and the operations performed
Apart from using the historical query statistics to define the work load management, the data will also
help review and if required redefine the distribution and organization of table data. Enabling query
history data collection involves creating a query history database, creating a query history configuration
which defines the type of data which need to be collected and enabling the query history configuration
so that the system starts populating the query history database. The query history database should be
secured as with any other user defined database so that required privileges are granted only to the
required users.
The query history database can be created using the nzhistcreatedb command utility. The utility creates
the history database with the required tables to store data and views to run queries against them. It is
recommended to use the views to run any queries against the data collected instead of the underlying
tables for future compatibility purposes. The following is an example to create prodhistdb query history
database
nzhistcreatedb d prodhistdb t query u huser o hadmin p hadminpass v 1
The user hadmin will be the owner for the new query database and the user huser will be used to load
query statistic data into the tables in the history database. Both the user ids should be in existence in the
system before the nzhistcreatedb command is executed.
In order to start collecting the query statistics and store into the history database, query history
configurations need to be defined to collect the required level of query data and a configuration need to
be enabled. Multiple configurations can be defined for various levels of data but only one configuration
can be active at any time. When a configuration is enabled it collects by default the data about login
failures, session start and end and query history process startup. In addition to these default data users
can define a configuration to collect data about queries, plan, tables and columns. The following is an
example on creating a query history configuration and how to enable it
sieac llc
47
-- To create a history configuration which collects all data regarding queries -CREATE HISTORY CONFIGURATION prod_hist HISTTYPE QUERY DATABASE prodhistdb USER huser
PASSWORD 'huserpass' COLLECT PLAN,COLUMN LOADINTERVAL 10 LOADMINTHRESHOLD 4
LOADMAXTHRESHOLD 20 STORAGELIMIT 25 LOADRETRY 1 VERSION 1;
-- To set a new history configuration to take in effect when the appliance is started next time -SET HISTORY CONFIGURATION prod_hist;
-- Stop and start the appliance so that the new history configuration set can take in effect -nzstop
nzstart
The sample history configuration uses the history database created previously and requires the user id
and password of the user defined as the history database user specified in the u option in the query
history database creation. The sample history configuration once enabled will collect data for all the areas
since by specifying PLAN in the COLLECT option it also collects for QUERY and also by specifying
COLUMN also collects the data for TABLES implicitly. As you may have noticed in the sequence of
steps to enable a history configuration, the system requires a stop and a start i.e. setting a configuration
will not get the configuration to take in effect. Once a configuration gets enabled, the system will start
collecting data and stores it in a directory. The data stored will get loaded into the history database at
regular interval mentioned in the LOADINTERVAL option of the configuration or when the
LOADMAXTHRESHOLD is reached.
To stop query history data collection a history configuration with histtype of none can be created and
that configuration can be enabled so that no query history data will be collected and loaded into the
query history database. The following is an example
-- To create a history configuration which collects no history data -CREATE HISTORY CONFIGURATION disable_history HISTTYPE NONE;
-- To set a new history configuration to take in effect when the appliance is started next time -SET HISTORY CONFIGURATION disable_history;
Overtime history database will grow in size with loading of data and the database data need to be purged
based on the period for which data is required so that the size can be controlled. The command line
utility nzhistcleanupdb can be used to delete data from the history database until up to a certain date and
time. Along with minimizing the storage use, purging unwanted data will also help improve the
performance against the history database. The following is an example to purge records which were
created before 2012 Jan 31 00:00:00 hrs from the prodhistdb database
nzhistcleanupdb -d prodhistdb -u hadmin -pw hadminpass t "2012-12-31"
Also the history database can be dropped as any user database by using the drop database command. It is
important that no active history configurations are using the database before it is dropped so that there
are no issues with the query history data collection process.
For anyone who is interested in understanding queries executed and the pattern of usage of the system
the following are some of the key views in the query history database.
sieac llc
48
Name
$v_hist_queries
$v_incomplete_queries
$v_successful_queries
$v_unsuccessful_queries
$v_hist_log_events
$v_table_access_stats
$v_column_access_stats
Description
View to query about completed queries
View to query about queries for which data is not captured completely due
to system reset or incomplete load of query statistics
Same view as $v_hist_queries but only shows data for successful queries
Same view as $v_hist_queries but only shows data for unsuccessful queries
Shows data about all the events which happened in the system
Shows cumulative stats on all access happened on each table in the system
Shows cumulative stats on all access happened on all table column
Workload Management
Once the pattern of system usage is understood the workload in the system can be managed so that the
system resources can be utilized efficiently. Netezza provides the following features in order to manage
the workload
Feature
Short Query
(SQB)
Description
Bias This is to reserve system resources for all queries which are estimated to be
complete in less than 2 seconds. The query time limit of less than 2 seconds is
configurable along with the resources allocated for SQB. For the configuration
changes to take in effect, the system would require a pause and resume. The
resources which can be configured are number slots in the GRA scheduler and
snippet scheduler queues for short queries, memories in snippet processors and
host for short queries.
Guaranteed Resource User groups can be created called resource sharing groups (RSG) to which a min
Access (GRA)
and max percentage of system resources can be assigned. This will make sure
that any job or query executed by a user attached to the resource will be
guaranteed the min percentage of the system resources. By default the admin
users is defined to get 50% of the system resources and hence it is advisable to
use the admin user sparingly. Also for any administrative tasks like backup etc. it
is a good practice to have a resource sharing group defined with the required
privileges and have users attached to the group so that they can perform the
tasks.
Priority
Query A user, group or session can be assigned a priority of critical, high, normal or
Execution (PQE)
low and the appliance will prioritize the allocation of resources and schedule the
queries or jobs executed in the order of priority accordingly. Critical and high
priority jobs and queries get more resources than normal or low priority. If
multiple jobs/queries with differing priorities are executed with in the same
RSG, the jobs will get proportion of the resources assigned to the RSG based
on the priority. There are two hidden priorities which are used by the system
System Critical the highest priority for system operations and the System
Background for low priority system jobs.
Gate Keeper
Unlike the other 3 features, gate keeper is not enabled by default and would
require a configuration change. Gate Keeper can be used to throttle the number
of queries on various categories which can be executed at the same time in the
system. By default configuration parameters are provided to set the maximum
number of jobs/queries which can be executed concurrently in the system for
the four job priorities of critical, high, normal and low. The number of jobs in
sieac llc
49
Feature
Description
each priority which can be executed concurrently can be modified using the
configuration parameters. Once the number of jobs/queries for a particular
priority reaches the number set in the configuration, the gate keeper starts
queuing additional jobs in the internal queues for the particular category. If gate
keeper is used along with GRA then the jobs gets scheduled based on the
resource availability for the job in the RSG to which it belongs and the priority.
Gate keeper can also be used to configure additional queues based to throttle
queries based on the estimated amount of time to execute queries.
The following are some examples of workload management configuration using GRA, PQE and gate
keeper
-- To create a RSG with minimum resource allocation of 15% and max allocation of 30% -CREATE GROUP reporting WITH RESOURCE MINIMUM 15 RESOURCE MAXIMUM 30;
-- To create a user group with default priority of high and all the users attached to it gets the priority of high -CREATE GROUP tableau WITH DEFPRIORITY HIGH;
-- To set the priority if an active session to critical -ALTER SESSION 501028 SET PRIORITY TO CRITICAL;
-- To alter the default priority and the maximum priority of an user -ALTER USER mike WITH DEFPRIORITY LOW MAXPRIORITY HIGH;
sieac llc
50
11.
Best Practices
Define all constraints and relationships between objects. Even though Netezza doesnt enforce
them other than the not null constraint, the query optimizer will still use these details to come-up
with an efficient query execution plan.
When defining columns use data types for which Netezza can create zone maps. Some of the
easy targets are not using columns with numeric(x,0).
If data for a column is known to have a fixed length value, then use char(x) instead of varchar(x).
Varchar(x) uses additional storage which will be significant when dealing with TB of data and
also impacts the query processing since additional data need to be pulled in from disk for
processing.
Use NOT NULL wherever data permits. This will help improve performance by not having to
check for null condition by the appliance and will reduce storage usage.
Use the same data type for columns used in joins so that the query execution can be efficient
which in turn helps queries execute faster.
User the same data type and length for columns with the same name in all the tables in the
database.
Distribute on columns of high cardinality and ones that used to join often. It is best to distribute
fact and dimension table on the same column. This will reduce the data redistribution during
queries improving the performance.
Even if both the fact and dimension table cant be distributed on the same key, make effort to
avoid redistribution on the fact table by choosing the right column for distribution.
Distribute on one column whenever possible and do not create keys for the sake of distribution.
Use random distribution as the last resort and it is fine to use random distribution on small table
since they may get broadcast.
Define clustered base table when data in a fact table is often looked through multiple
dimensions. Use columns which will used to look into the data through multiple dimensions
when organizing data in a clustered base table.
Create materialized view on a small set of the columns from a large table often used by user
queries.
Create sorted materialized views with the most restricting column in the order by clause of the
view so that it can be used as an index.
Do not drop and recreate materialized view since the OID will get changed and may impact
other dependent objects.
Schedule grooms after major changes like updates, deletes, alters on tables so that space
utilization is optimizes along with increase in query performance.
Schedule regular generate statistics jobs particular on larger tables and tables with very activity
so that the optimizer can generate optimal execution plans which improves query performance.
Schedule regular backups which should also include host backup so that data backup and the
catalogs are in sync.
Do not store large quantity of data in the host since it will affect the performance. Use network
mounts or third party vendor products to take and store backups.
sieac llc
51
Monitor the system work load at regular intervals and alter the system workload management
accordingly.
Use admin user sparingly. Define a separate group with proper resource allocation along with
required privileges to perform admin related tasks and add users to it to perform admin tasks.
Prefer joins over correlated sub queries.
sieac llc
52
12.
Version 7 Key Features

Page level zone maps instead of extent level.
o Until version 6.0 zone maps were created at the extend level which is 3 MB. But starting
version 7.0 zone maps are created at the page level which is 128 KB and that means less
amount of data will be brought into SPU since the system knows what data is stored at a
much more granular level and eliminate unwanted pages to be read.
Parallelism in query snippet processing
o Currently snippets in a query plan are processed in sequence and with version 7.0 where
possible snippets in query will be executed in parallel which will improve the query
performance significantly.
Restricted distribution of snippets to SPUs
o Currently snippets in query plans are scheduled and executed in all the SPUs. Since the
appliance knows what data is stored in which disk which intron attached to which SPU,
query snippets will only get distributed to the SPU which has the relevant data for the
query to be processed.
sieac llc
53
13.
Further Reading
Subject Area
Architecture
Netezza User Objects
Administration
Data Loading, Backup and Restore
Query History, Work Load Mgmt
Stored Procedures
Reference
Netezza System Administration Guide
Netezza User Guide
Netezza User Guide
Netezza Advance Security Administration Guide
Netezza Data Loading Guide
Netezza Stored Procedures Guide
sieac llc
54

Netezza Fundamentals: Introduction To Netezza For Application Developers

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Netezza Fundamentals: Introduction To Netezza For Application Developers

Uploaded by

Copyright:

Available Formats

Netezza Fundamentals

Introduction to Netezza for Application Developers

Version: Draft 1.4

Document provided for information purpose only.

Netezza Architecture ............................................................................................................................ 3

Netezza Objects .................................................................................................................................... 9

Netezza Storage .................................................................................................................................. 17

Statistics and Query Performance ...................................................................................................... 22

Netezza Transactions .......................................................................................................................... 26

Loading Data, Database Back-up and Restores .................................................................................. 28

Netezza SQL ........................................................................................................................................ 33

Workload Management .................................................................................................................. 47

Best Practices .................................................................................................................................. 51

Version 7 Key Features.................................................................................................................... 53

Further Reading .............................................................................................................................. 54

The key takeaways are

Terms and Terminology

drop table employee;

add column education_level int1;

Modifying the column length is only application to columns defined as varchar.

create materialized view employee_mview as select

The following are the restriction in the creation of materialized views

Few points to note about using sequences

The following is a sample create group statement

All Integer types (int1, int2, int4, int8)

Clustered Base Tables (CBT)

Default zone map data types

Integer - 1-byte, 2-byte, 4-byte, and 8-byte

5. Statistics and Query Performance

Significant changes to the database in terms of data

Query has tables that stores more than 5 million records

Number of rows to be scanned for the target table

enable_small_ctas_autostats enables or disables auto stats generation on small CTAS tables

Reclaiming physical space by removing logically deleted rows in tables

Groom tables that often receive large updates and deletes

The following are some example groom commands

Non Repeatable Data

7. Loading Data, Database Back-up and Restores

The following are some limits at the database level

Variables and Constants

Expressions and Assignments

The following is a sample IF-THEN-ELSE-IF control statement

The following is a sample IF-THEN-ELSIF-ELSE control statement

-- The code exits both loop 1 and loop 2

The following is an example of WHILE loop statement

The following is an example of FOR loop statement

SELECT * FROM EMP INTO v_ID, v_NAME, v_DEPT WHERE ID = 1000;

Error Handling and Messages

Version 7 Key Features

You might also like