Professional Documents
Culture Documents
Primary index is used to specify where the data resides in Teradata. It is used to specify which
AMP gets the data row. Each table in Teradata is required to have a primary index defined. If
the primary index is not defined, Teradata automatically assigns the primary index. Primary
index provides the fastest way to access the data. A primary may have a maximum of 64
columns. There are two reasons you might pick a different Primary Index then your Primary
Key. They are (1) for Performance reasons and (2) known access paths. Primary INDEX is used
for finding best access path for data retrieval and data insertion and Primary KEY is used for
finding each rows uniquely just like in other RDBMS.
Primary index is defined while creating a table. There are 2 types of Primary Indexes.
For unique indexes, Teradata Database treats nulls as if they are equal rather than unknown.
For single-column unique indexes, only one row may have null for the index value; otherwise
uniqueness violation error occurs.
If the table is defined to be having NUPI, then the column deemed as UPI can
accept duplicate values. Duplicate values can exist. A Non-Unique Primary
Index will almost never spread the table rows evenly. An All-AMP operation will
take longer if the data is unevenly distributed??(Not sure). You might pick a
NUPI over an UPI because the NUPI column may be more effective for query
access and joins. It can have n number of null values.
Parsing Engine is responsible for receiving queries from the client and preparing
an efficient and least expensive execution plan.
Responsibilities:
Communication layer between the PE and the AMP. It allows the communication
between PE and AMP and also between the nodes. It receives the execution plan
from Parsing Engine and sends to AMP. Similarly, it receives the results from
the AMPs and sends to Parsing Engine.
AMPs, called as Virtual Processors (vprocs) are the one that actually stores and
retrieves the data. AMPs receive the data and execution plan from Parsing
Engine, performs any data type conversion, aggregation, filter, sorting and
stores the data in the disks associated with them. Records from the tables are
evenly distributed among the AMPs in the system. Each AMP is associated with
a set of disks on which data is stored. Only that AMP can read/write data from
the disks.
The AMPs work independently and therefore retrieve data concurrently. Thus all
AMPs perform their work in parallel, hence Parallel processing.
Each AMP attached to the Teradata system listens to the PE via the BYNET for
instructions. Each AMP is connected to its own disk and has the privilege to read
or write the data to its disk. The AMP can be best considered as the computer
processor with its own disk attached to it. Whenever it receives the instructions
from the PE it fetches the data from its disk and sends it to back to PE through
BYNET. Each AMP is allowed to read and write in its own disk ONLY. This is
known as the ‘SHARED NOTHING ARCHITECTURE’. Teradata spreads the rows
of the table evenly across all the AMPs, when PE asks for data all AMPs work
simultaneously and read the records from its own DISK. Hence a query will be
as slow as the slowest AMP in the system. This is known as parallelism.
For each row inserted, System checks if there is any record with the same row
hash. If the table has UPI defined, then it will reject the record as duplicate.
Otherwise, it will compare the entire record for duplicate. This will severely
affect the system performance.
Difference between UPI vs PI in Teradata
Unique primary index and Non-unique primary index are associated with SET
and MULTISET tables respectively. For a SET table, Unique primary index is
always defined. The reason behind is in order to avoid the overhead of duplicate
check. If UPI is not defined for SET table, then the SET table itself will scan the
entire records to find out the duplicates whereas UPI proactively checks
duplicate entry. Since less columns are defined under UPI, it finds duplicate
records faster than SET. NUPI otherwise will be used for MULTISET tables. Since
Multiset table does not look for duplicate entries, NUPI will be just used to index
fields.
Secondary Indexes provide an alternate path to the data, and should be used
on queries that run many times. Teradata runs extremely well without
secondary indexes, but since secondary indexes use up space and overhead,
they should only be used on “KNOWN QUERIES” or queries that are run over
and over again. Once you know the data warehouse, environment you can
create secondary indexes to enhance its performance.
A Secondary Index (SI) is an alternate data access path. It allows you to access
the data without having to do a full-table scan. You can drop and recreate
secondary indexes dynamically, as they are needed. Secondary Indexes are
stored in separate subtables that requires additional disk space and
maintenance which is handled automatically by the system.
The entire purpose for the Secondary Index Subtable will be to point back to the
real row in the base table via the Row-ID.
Secondary Index Rules
Rule 1: Secondary Indexes are optional.
Rule 2: Secondary Index values can be unique or non-unique.
Rule 3: Secondary Index values can be NULL.
Rule 4: Secondary Index values can be modified.
Rule 5: Secondary Indexes can be changed.
Rule 6: A Secondary Index has a limit of 64 columns.
Locking prevents multiple users who are trying to change the same data at the
same time and in turn helps in preventing data corruption. Teradata
has dedicated lock manager to automatically lock at the database, table and
row hash level. Database Lock: Lock will be applied to all the tables and views.
Table Lock: Lock applied to the all rows in table/view.
Row hash Lock: Single or multiple rows in a table will be locked.
Exclusive - Exclusive locks are placed when a DDL is fired on the database or
table, meaning that the Database object is undergoing structural changes.
Concurrent accesses will be blocked.
Compatibility: NONE
Write - A Write lock is placed during a DML operation. INSERT, UPDATE and
DELETE will trigger a write lock. It may allow users to fire SELECT queries.
Read - This lock happens due to a SELECT access. A Read lock is not compatible
with Exclusive or Write locks.
Compatibility: Supports other Read locks and Access Locks
Access - When a user uses the LOCKING FOR ACCESS phrase. An Access lock
allows users to read a database object that is already under write-lock or read-
lock. An access lock does not support Exclusive locks. An access lock does not
ensure Data Integrity and may lead to "Stale Read"
The chart in Figure indicates that a WRITE lock blocks other WRITE locks
requested by other users. Additionally all READ lock requests are also blocked
because the current data is being changed and therefore, not available until it is
finished. This is where the ACCESS lock can be useful .
It is also seen in Figure that the WRITE lock does not block an ACCESS lock.
Therefore, a user can request an ACCESS lock for a SELECT instead of the
default READ lock. This does however mean that the data read may or may not
be the latest version. Hence, the nickname "Dirty Read."
It is very common to use the ACCESS locking when creating a view. Since most views only
SELECT rows, a WRITE lock is not needed. Plus, if maintenance is being performed on a
table, selecting rows using a view with an ACCESS lock is not delayed due to a WRITE
lock. So, users are happy and don't call to complain that the "system is slow."
RANGE_N Partitioning
Case_N Partitioning
CREATE TABLE ORDER_TABLE
(
ORDER_NO INTEGER NOT NULL,
CUST_NO INTERGER,
ORDER_DATE DATE,
ORDER_TOTAL DECIMAL(10,2)
)
PRIMARY INDEX(ORDER_NO)
PARTITION BY CASE_N
(ORDER_TOTAL < 1000,
ORDER_TOTAL < 2000,
ORDER_TOTAL < 5000,
ORDER_TOTAL < 10000,
ORDER_TOTAL < 20000,
NO CASE, UNKNOWN);
The UNKNOWN Partition is for an Order_Total with a NULL value. The NO CASE
Partition is for partitions that did not meet the CASE criteria.
For example, if an Order_Total is greater than 20,000 it wouldn‘t fall into any of
the partitions so it goes to the NO CASE partition.
http://tunweb.teradata.ws/SQLAssistantWeb/login.aspx?isTrial=1
http://www.teradatawiki.net
http://dbmstutorials.com/teradata/teradata_partition_primary_index.html
What is multi-value compression (MVC)?
You can compress data at the column level using multi-value compression, a
lossless, dictionary-based compression scheme. With MVC, you specify a list of
values to be compressed when defining a column in the CREATE TABLE/ALTER
TABLE statement. When you insert a value into the column which matches a
value in the compression list, a corresponding compress bit is set instead of
storing the actual value, thus saving the disk storage space.
The best candidates for compression are the most frequently occurring values in
each column. MVC is a good compression scheme when there are many
repeating values for a column.
You can use MVC to compress columns with these data types:
To compress a DATE value, you must specify the value as a Date literal
using the ANSI DATE format (DATE 'YYYY-MM-DD'). For example:
To compress a TIME or TIMESTAMP value, you must specify the value as a TIME
or TIMESTAMP literal. For example:
COMPRESS (TIME '15:30:00')
COMPRESS (TIMESTAMP '2006-11-23 15:30:23')
In addition, you can use COMPRESS (NULL) for columns with these data types:
ARRAY
Period
Non-LOB distinct or structured UDT