You are on page 1of 9

 Explain the difference between Primary Index and Secondary Index?

Primary index is used to specify where the data resides in Teradata. It is used to specify which
AMP gets the data row. Each table in Teradata is required to have a primary index defined. If
the primary index is not defined, Teradata automatically assigns the primary index. Primary
index provides the fastest way to access the data. A primary may have a maximum of 64
columns. There are two reasons you might pick a different Primary Index then your Primary
Key. They are (1) for Performance reasons and (2) known access paths. Primary INDEX is used
for finding best access path for data retrieval and data insertion and Primary KEY is used for
finding each rows uniquely just like in other RDBMS.

Primary Index Rules


Rule 1: One Primary Index per table.
Rule 2: A Primary Index value can be unique or non-unique.
Rule 3: The Primary Index value can be NULL.
Rule 4: The Primary Index value can be modified.
Rule 5: The Primary Index of a populated table cannot be modified.
Rule 6: A Primary Index has a limit of 64 columns.

Difference between PI and PK:

PRIMARY KEY PRIMARY INDEX


1 PRIMARY KEY cannot be NULL PRIMARY INDEX can be NULL
PRIMARY KEY is not PRIMARY INDEX is mandatory
2 mandatory in Teradata In Teradata
PRIMARY KEY does not help PRIMARY INDEX helps in data
3 in data distribution. distribution.
PRIMARY INDEX can be UNIQUE
PRIMARY KEY should be (Unique Primary Index)
4 unique. or NON UNIQUE (Primary Index).
PRIMARY KEY is logical PRIMARY INDEX is physical
5 implementation. implementation.

Primary index is defined while creating a table. There are 2 types of Primary Indexes.

 Unique Primary Index(UPI)

 Non Unique Primary Index(NUPI)

 Unique Primary Index (UPI):


If the table is defined to be having UPI, then the column deemed as UPI should not have any
duplicate values. If any duplicate values are inserted, they will be rejected. Error:

Failed 2801: Duplicate unique primary key error


A Unique Primary Index (UPI) will always spread the rows of the table evenly amongst the
AMPs. UPI access is always a one-AMP operation. 

NULL and Unique Indexes:

For unique indexes, Teradata Database treats nulls as if they are equal rather than unknown.
For single-column unique indexes, only one row may have null for the index value; otherwise
uniqueness violation error occurs.

 Non Unique Primary Index(NUPI):

If the table is defined to be having NUPI, then the column deemed as UPI can
accept duplicate values. Duplicate values can exist. A Non-Unique Primary
Index will almost never spread the table rows evenly. An All-AMP operation will
take longer if the data is unevenly distributed??(Not sure). You might pick a
NUPI over an UPI because the NUPI column may be more effective for query
access and joins. It can have n number of null values.

By default, the index is NUPI if the ‘unique’ keyword is not mentioned. If


primary index is not mentioned and Teradata creates a default primary index, it
will also be a NUPI. When it is not possible to have all unique values to be
chosen as a PI, we choose a PI which is almost unique i.e. lesser duplicates
which is called as non unique primary index.

What is parsing engine in Teradata?

Parsing Engine is responsible for receiving queries from the client and preparing
an efficient and least expensive execution plan.

Responsibilities:

 Receive the SQL query from the client


 Parse the SQL query check for syntax errors
 Check if the user has required privilege against the objects used in the SQL
query
 Check if the objects used in the SQL actually exists
 Prepare the execution plan to execute the SQL query and pass it to BYNET
 Receives the results from the AMPs and send to the client
What is BYNET?

Communication layer between the PE and the AMP. It allows the communication
between PE and AMP and also between the nodes. It receives the execution plan
from Parsing Engine and sends to AMP. Similarly, it receives the results from
the AMPs and sends to Parsing Engine.

What is AMP (Access Module Processor)?

AMPs, called as Virtual Processors (vprocs) are the one that actually stores and
retrieves the data. AMPs receive the data and execution plan from Parsing
Engine, performs any data type conversion, aggregation, filter, sorting and
stores the data in the disks associated with them. Records from the tables are
evenly distributed among the AMPs in the system. Each AMP is associated with
a set of disks on which data is stored. Only that AMP can read/write data from
the disks.

The AMPs work independently and therefore retrieve data concurrently. Thus all
AMPs perform their work in parallel, hence Parallel processing.

Each AMP attached to the Teradata system listens to the PE via the BYNET for
instructions. Each AMP is connected to its own disk and has the privilege to read
or write the data to its disk. The AMP can be best considered as the computer
processor with its own disk attached to it. Whenever it receives the instructions
from the PE it fetches the data from its disk and sends it to back to PE through
BYNET. Each AMP is allowed to read and write in its own disk ONLY. This is
known as the ‘SHARED NOTHING ARCHITECTURE’. Teradata spreads the rows
of the table evenly across all the AMPs, when PE asks for data all AMPs work
simultaneously and read the records from its own DISK. Hence a query will be
as slow as the slowest AMP in the system. This is known as parallelism.

How does SET table check for duplicate records?

For each row inserted, System checks if there is any record with the same row
hash. If the table has UPI defined, then it will reject the record as duplicate.
Otherwise, it will compare the entire record for duplicate. This will severely
affect the system performance.
Difference between UPI vs PI in Teradata
Unique primary index and Non-unique primary index are associated with SET
and MULTISET tables respectively. For a SET table, Unique primary index is
always defined. The reason behind is in order to avoid the overhead of duplicate
check. If UPI is not defined for SET table, then the SET table itself will scan the
entire records to find out the duplicates whereas UPI proactively checks
duplicate entry. Since less columns are defined under UPI, it finds duplicate
records faster than SET. NUPI otherwise will be used for MULTISET tables. Since
Multiset table does not look for duplicate entries, NUPI will be just used to index
fields.

SET and MULTISET tables:

The default in Teradata mode is SET and the default in ANSI mode is


MULTISET.There is no way to change a SET table into a MULTISET table after it
has been created. There is a performance impact involved with SET tables. Each
time a row is inserted or updated, Teradata has to check if the next row to be
added would violate the uniqueness constraint. This is called DUPLICATE ROW
CHECK, and will seriously degrade performance if many rows with the same
primary index are inserted. The number of checks increases exponentially
with each new row added to the table. There is no performance impact
for SET tables when there is a UPI (Unique primary index) defined on
the table. As the UPI itself ensures uniqueness, no DUPLICATE ROW
CHECK will be performed.

What is the use of Secondary Index?

Secondary Indexes provide an alternate path to the data, and should be used
on queries that run many times. Teradata runs extremely well without
secondary indexes, but since secondary indexes use up space and overhead,
they should only be used on “KNOWN QUERIES” or queries that are run over
and over again. Once you know the data warehouse, environment you can
create secondary indexes to enhance its performance.
A Secondary Index (SI) is an alternate data access path. It allows you to access
the data without having to do a full-table scan. You can drop and recreate
secondary indexes dynamically, as they are needed. Secondary Indexes are
stored in separate subtables that requires additional disk space and
maintenance which is handled automatically by the system.
The entire purpose for the Secondary Index Subtable will be to point back to the
real row in the base table via the Row-ID. 
Secondary Index Rules
Rule 1: Secondary Indexes are optional. 
Rule 2: Secondary Index values can be unique or non-unique.
Rule 3: Secondary Index values can be NULL.
Rule 4: Secondary Index values can be modified.
Rule 5: Secondary Indexes can be changed.
Rule 6: A Secondary Index has a limit of 64 columns.

List out all forms of LOCKS that are available in Teradata

Locking prevents multiple users who are trying to change the same data at the
same time and in turn helps in preventing data corruption. Teradata
has dedicated lock manager to automatically lock at the database, table and
row hash level. Database Lock: Lock will be applied to all the tables and views.
Table Lock: Lock applied to the all rows in table/view.
Row hash Lock: Single or multiple rows in a table will be locked.

Exclusive - Exclusive locks are placed when a DDL is fired on the database or
table, meaning that the Database object is undergoing structural changes.
Concurrent accesses will be blocked.

Compatibility: NONE

Write - A Write lock is placed during a DML operation. INSERT, UPDATE and
DELETE will trigger a write lock. It may allow users to fire SELECT queries.

But, data consistency will not be ensured.


Compatibility: Access Lock - Users not concerned with data consistency. The
underlying data may change and the user may get a "Dirty Read"

Read - This lock happens due to a SELECT access. A Read lock is not compatible
with Exclusive or Write locks.
Compatibility: Supports other Read locks and Access Locks

Access - When a user uses the LOCKING FOR ACCESS phrase. An Access lock
allows users to read a database object that is already under write-lock or read-
lock. An access lock does not support Exclusive locks. An access lock does not
ensure Data Integrity and may lead to "Stale Read"

Syntax: Locking row for access


If you put a LOCKING ROW (or Tablename) FOR ACCESS on the Select query,
the Select will read through any Write lock at row or table level. (So-called
"Dirty Read".) This only applies to Select - a Write lock cannot be downgraded
to an Access Lock.

The chart in Figure indicates that a WRITE lock blocks other WRITE locks
requested by other users. Additionally all READ lock requests are also blocked
because the current data is being changed and therefore, not available until it is
finished. This is where the ACCESS lock can be useful .

It is also seen in Figure that the WRITE lock does not block an ACCESS lock.
Therefore, a user can request an ACCESS lock for a SELECT instead of the
default READ lock. This does however mean that the data read may or may not
be the latest version. Hence, the nickname "Dirty Read."

It is very common to use the ACCESS locking when creating a view. Since most views only
SELECT rows, a WRITE lock is not needed. Plus, if maintenance is being performed on a
table, selecting rows using a view with an ACCESS lock is not delayed due to a WRITE
lock. So, users are happy and don't call to complain that the "system is slow."

Another time to use the LOCKING modifier is for multi-step transactions.


Consider this situation: The first step is a SELECT and obtains a READ lock. This
lock allows other users to also SELECT from the table with a READ lock. Then,
the next step of the transaction is an UPDATE. It must now upgrade the READ
lock to a WRITE lock.
This upgrade of the lock cannot occur while other users have a READ lock on
the resource. Therefore, the transaction must wait for the READ locks to
disappear. This might dramatically increase the time to complete the
maintenance transaction. Therefore, by upgrading the initial default of a READ
lock to a WRITE lock for the SELECT it eliminates the potential for a delay in the
middle of the transaction.

What is Partitioned Primary Index?

Partitioned Primary Index (PPI) is an indexing mechanism in Teradata Database.


PPI is used to improve performance for large tables when you submit queries
that specify a range constraint.PPI allows you to reduce the number of rows to
be processed by using partition elimination.
PPI will increase performance for incremental data loads, deletes, and data
access when working with large tables with range constraints.
If the table is partitioned then the AMP will sort its rows by the partition.
Types of partitioning:

RANGE_N Partitioning

Below is the example for RANGE_N Partition by day. 


CREATE TABLE ORDER_TABLE
(
ORDER_NO INTEGER NOT NULL,
CUST_NO INTERGER,
ORDER_DATE DATE,
ORDER_TOTAL DECIMAL(10,2)
)
PRIMARY INDEX(ORDER_NO)
PARTITION BY RANGE_N
(ORDER_DATE BETWEEN
DATE '2012-01-01' AND DATE '2012-12-31'
EACH INTERVAL '7' DAY);

Case_N Partitioning
CREATE TABLE ORDER_TABLE
(
ORDER_NO INTEGER NOT NULL,
CUST_NO INTERGER,
ORDER_DATE DATE,
ORDER_TOTAL DECIMAL(10,2)
)
PRIMARY INDEX(ORDER_NO)
PARTITION BY CASE_N
(ORDER_TOTAL < 1000,
 ORDER_TOTAL < 2000,
 ORDER_TOTAL < 5000,
 ORDER_TOTAL < 10000,
 ORDER_TOTAL < 20000,
 NO CASE, UNKNOWN);

The UNKNOWN Partition is for an Order_Total with a NULL value. The NO CASE
Partition is for partitions that did not meet the CASE criteria. 
For example, if an Order_Total is greater than 20,000 it wouldn‘t fall into any of
the partitions so it goes to the NO CASE partition.

http://tunweb.teradata.ws/SQLAssistantWeb/login.aspx?isTrial=1

http://www.teradatawiki.net

http://dbmstutorials.com/teradata/teradata_partition_primary_index.html
What is multi-value compression (MVC)?

You can compress data at the column level using multi-value compression, a
lossless, dictionary-based compression scheme. With MVC, you specify a list of
values to be compressed when defining a column in the CREATE TABLE/ALTER
TABLE statement. When you insert a value into the column which matches a
value in the compression list, a corresponding compress bit is set instead of
storing the actual value, thus saving the disk storage space.

The best candidates for compression are the most frequently occurring values in
each column. MVC is a good compression scheme when there are many
repeating values for a column.

If MVC is not an efficient compression scheme for data in a particular column,


you can compress the column using algorithmic compression (ALC). You can
specify MVC alone, or both MVC and ALC on the same column. If you define
both on the same column, ALC is applied only to those non-null values that are
not specified in the value compression list of the MVC specification. You can also
use MVC together with block-level compression (BLC).

You can use MVC to compress columns with these data types:

 Any numeric data type


 BYTE
 VARBYTE
 CHARACTER
 VARCHAR
 DATE

To compress a DATE value, you must specify the value as a Date literal
using the ANSI DATE format (DATE 'YYYY-MM-DD'). For example:

COMPRESS (DATE '2000-06-15')

 TIME and TIME WITH TIME ZONE


 TIMESTAMP and TIMESTAMP WITH TIME ZONE

To compress a TIME or TIMESTAMP value, you must specify the value as a TIME
or TIMESTAMP literal. For example:
COMPRESS (TIME '15:30:00')
COMPRESS (TIMESTAMP '2006-11-23 15:30:23')
In addition, you can use COMPRESS (NULL) for columns with these data types:

 ARRAY
 Period
 Non-LOB distinct or structured UDT

 Why use PPI instead of SI or disadvantage of using PPI in Teradata tables?


 Limitations of fastload?
 What is NOPI in TD 14?
 Explain what all steps you take for performance tuning of TD query? (Hint – prepare this well, favorite
question in Teradata interviews)
 Why using secondary index if we have already primary index?
 Explain the basic structure of TPT script
 What is hot standby node ? (DBA related question)
 How to create case sensitive column in Teradata tables? (look into the DDL of table)

You might also like