You are on page 1of 27

Module 8: Access Considerations and Constraints

After completing this module, you will be able to: Analyze Optimizer Access scenarios. Explain partial value searches and data conversions. Identify the effects of conflicting data types. Determine the cost of I/Os. Identify column level attributes and constraints. Identify table level attributes and constraints. Add, modify and drop constraints from tables. Explain how the Identity column allocates new numbers.

Access Method Comparison


Unique Primary Index Very efficient Non-Unique Secondary Index Efficient only if the number of rows accessed is a small percentage of the total data rows in the table. All AMPs, multiple rows Spool file if needed Full-Table Scan

One AMP, one row No spool file


Non-Unique Primary Index Efficient if the number of rows per value is reasonable and there are no severe spikes. One AMP, multiple rows Spool file if needed

Efficient since each row is touched


only once. All AMPs, all rows Spool file may equal the table in size

Unique Secondary Index Very efficient Two AMPs, one row No spool file

The Optimizer chooses the fastest access method. COLLECT STATISTICS to help the Optimizer make good decisions.

Optimizer Access Scenarios


Col_2

Col_1

USI UPI NUPI or USI 1 USI USI

NUSI UPI NUPI USI Either, Both, or FTS 2 NUSI or FTS


3

NOT INDEXED UPI NUPI USI NUSI or FTS


3

SINGLE TABLE CASE


WHERE Table_1.Col_1 = :value_1 AND Table_1.Col_2 = :value_2 ;

UPI NUPI USI NUSI NOT INDEXED

Column the Optimizer uses for access.

USI

FTS

Notes: 1. The Optimizer prefers Primary Indexes over Secondary Indexes. It chooses the NUPI if only one I/O (block) is accessed. The Optimizer prefers Unique indexes over non-unique indexes. Only one row is involved with USI even though it is a two-AMP operation. 2. Depending on relative selectivity, the Optimizer may use either NUSI, may use both with NUSI Bit Mapping, or may do a FTS.

3. It depends on the selectivity of the index.

Partial Value Searches


Columns values must not be decomposable. LIKE, INDEX, and SUBSTRING operators indicate decomposable data.

Show all calls placed by people within Area Code 415:


SELECT FROM WHERE , phone, Call phone LIKE '415%' ;

Always decompose data to the finest level of access usage.

Use the SQL concatenation operator ( ll ) to display the data:


SELECT FROM WHERE , area_code ll '/' ll phone, Call AREA_CODE = 415 ;

The Teradata Database does a FTS on a partial index value unless the index is ordered by value (Value-ordered NUSI or Hash Index).

Data storage and display should be treated as separate issues.

Data Conversions

Columns (or values) must be of the same data type to be compared. If column (or values) types differ, internal conversion is performed.

Character data is compared using the hosts collating sequence.


Unequal-length character strings are converted by right-padding the shorter one with blanks.

Numeric values are converted to the same underlying representation. Character to numeric comparison requires the character value to be
converted to a numeric value.

Data conversion is expensive and generally unnecessary. Implement data types at the Domain level. Comparison across data types may indicate that Domain definitions are not
clearly understood.

Storing Numeric Data


When comparing character data to numeric, Teradata will always convert character to numeric, then do the comparison.
Case 1 Table 1 CREATE TABLE Emp1 (Emp_no CHAR(6), Emp_name CHAR(20)) PRIMARY INDEX (Emp_no); Case 2 Table 1 CREATE TABLE Emp2 (Emp_no INTEGER, Emp_name CHAR(20)) PRIMARY INDEX (Emp_no);

Comparison Rules: To compare columns, they must be of the same Data types. Character data types will always be converted to numeric (when comparing character to numeric). Bottom Line: Always store numeric data in numeric data types to avoid unnecessary and costly data conversions.

Statement 1 SELECT * FROM Emp1 WHERE Emp_no = '1234';


Statement 2 SELECT * FROM Emp1 WHERE Emp_no = 1234; Results in Full Table Scan

Statement 1 SELECT * FROM Emp2 WHERE Emp_no = 1234;


Statement 2 SELECT * FROM Emp2 WHERE Emp_no = '1234'; Results in unnecessary conversion

Data Conversion Example


CREATE SET TABLE TFACT01.Table1 (col1 CHAR(12) NOT NULL) UNIQUE PRIMARY INDEX (col1); EXPLAIN SELECT * FROM Table1 WHERE col1 = '8';
1) First, we do a single-AMP RETRIEVE step from TFACT01.Table1 by way of the unique primary index "TFACT01.Table1.col1 = '8' " with no residual conditions. The estimated time for this step is 0.03 seconds. -> The row is sent directly back to the user as the result of statement 1. The total estimated time is 0.03 seconds.

EXPLAIN SELECT * FROM Table1 WHERE col1 = 8;


1) First, we lock a distinct TFACT01."pseudo table" for read on a RowHash to prevent global deadlock for TFACT01.Table1. 2) Next, we lock TFACT01.Table1 for read. 3) We do an all-AMPs RETRIEVE step from TFACT01.Table1 by way of an all-rows scan with a condition of ("(TFACT01.Table1.col1 (FLOAT, FORMAT '-9.99999999999999E-999')UNICODE)= 8.00000000000000E 000") into Spool 1, which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 1,001 rows. The estimated time for this step is 0.28 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.28 seconds.

Matching Data Types


The following data types are identical to the hashing algorithm:

INTEGER = DATE = DECIMAL (x,0) CHAR = VARCHAR = LONG VARCHAR BYTE = VARBYTE GRAPHIC = VARGRAPHIC Administer data type assignments at the domain level.
Give matching Primary Indexes across tables the same data type.

Counting I/O Operations


Many factors influence the number of physical I/Os in a transaction:

Cache hits Swapping Rows per block Cylinder splits/migrates Mini-Cylpacks Number of spool files Spool file sizes

I/Os may be done serially or in parallel. Data and index block I/O may or may not require Cylinder Index I/O. Changes to data rows and USI rows require Transient Journal I/O. I/O counts indicate the relative cost of a transaction. A given I/O operation may not cause any actual physical I/O.

Transient Journal I/O


The Transient Journal is

A journal of transaction before images.

Provides for automatic rollback in the event of TXN failure. Is automatic and transparent. TJ space comes from available free cylinders in the system. When a transaction completes, TJ space is returned to free cylinder lists. Provides Transaction Integrity.

Therefore, when modifying a table, there are I/Os for data table and the Transient Journal. Some situations where Transient Journal is not used include:

INSERT / SELECT into an empty table DELETE FROM tablename ALL Utilities such as FastLoad and MultiLoad

INSERT and DELETE Operations


INSERT INTO tablename . . . ;
*
DATA ROW * * * * For each USI * * * * For each NUSI *

DELETE FROM tablename . . . ;

= I/O Operations
READ DATA BLOCK WRITE TRANSIENTJOURNAL INSERT or DELETE the DATA ROW WRITE NEW DATA BLOCK WRITE CYLINDER INDEX READ INDEX BLOCK WRITE TRANSIENTJOURNAL INSERT or DELETE the NEW INDEX ROW WRITE NEW INDEX BLOCK WRITE CYLINDER INDEX READ INDEX BLOCK ADD or DELETE the ROWID on the ROWID LIST or ADD or DELETE the SUBTABLE ROW WRITE NEW INDEX BLOCK WRITE CYLINDER INDEX

* *

I/O operations per row = 4 + [ 4 * (#USIs) ] + [ 3 * (#NUSIs) ] Double for FALLBACK

UPDATE Operations
UPDATE tablename SET colname = exp . . .
* = I/O Operations
READ CURRENT DATA BLOCK WRITE TRANSIENTJOURNAL CHANGE DATA COLUMN WRITE DATA BLOCK WRITE CYLINDER INDEX

DATA ROW

* * * *

If colname = USI column


* * * * * * * * READ CURRENT INDEX BLOCK WRITE TRANSIENTJOURNAL DELETE INDEX ROW WRITE INDEX BLOCK WRITE CYLINDER INDEX READ NEW INDEX BLOCK WRITE TRANSIENT JOURNAL INSERT NEW INDEX ROW WRITE NEW INDEX BLOCK WRITE CYLINDER INDEX

If colname = NUSI column


* READ CURRENT INDEX BLOCK REMOVE DATA ROWS ROWID FROM LIST or REMOVE INDEX ROW IF LAST ROWID WRITE INDEX BLOCK WRITE CYLINDER INDEX READ NEW INDEX BLOCK ADD DATA ROWS ROWID TO LIST or ADD NEW INDEX ROW WRITE NEW INDEX BLOCK WRITE CYLINDER INDEX

* * *

* *

I/O operations per row = 4 + [ 8 * (#USIs) ] + [ 6 * (#NUSIs) ] Double for FALLBACK

Primary Index Value Update


UPDATE tablename SET PI_column = expression_or new_value . . . ;
Note: All Secondary Index subtable rows have to be updated. * = I/O Operations

DATA ROW ** READ CURRENT DATA BLOCK, WRITE TRANSIENTJOURNAL DELETE the DATA ROW ** WRITE NEW DATA BLOCK, WRITE CYLINDER INDEX ** READ NEW DATA BLOCK, WRITE TRANSIENTJOURNAL INSERT the DATA ROW ** WRITE NEW DATA BLOCK, WRITE CYLINDER INDEX For each USI * * * * READ INDEX BLOCK WRITE TRANSIENTJOURNAL UPDATE the INDEX ROW with the new ROW ID WRITE NEW INDEX BLOCK WRITE CYLINDER INDEX

For each NUSI

*
* *

READ INDEX BLOCK UPDATE the ROW ID on the ROW ID LIST with the new ROW ID WRITE NEW INDEX BLOCK WRITE CYLINDER INDEX

I/O operations per row = 8 + [ 4 * (#USIs) ] + [ 3 * (#NUSIs) ]

Permanent Journal I/O


These counts include: 1. Write the PJ block, 2. Write the Cylinder Index.
BEFORE IMAGE NONE AFTER IMAGE NONE PJ I/O COUNT (Count) 0

NONE
SINGLE SINGLE NONE DUAL SINGLE

SINGLE
NONE SINGLE DUAL NONE DUAL

2
2 4 4 4 6

DUAL
The total number of Permanent Journal I/O operations per row is: INSERT : Total PJ I/O = Count + (#USIs * Count) DUAL

SINGLE
DUAL

6
8

SINGLE image journaling is not allowed on FALLBACK tables.

DELETE : Total PJ I/O = Count + (#USIs * Count)


UPDATE : Total PJ I/O = Count + (#USIs changed * Count * 2) Changes to NUSI columns cause no additional I/Os. Changes to PI columns double the counts. Total I/O = Total PJ I/O + DATA I/O

Table Level Attributes


CREATE MULTISET TABLE Table_1, FALLBACK, DATABLOCKSIZE = 16384 BYTES, FREESPACE = 10 PERCENT, CHECKSUM = NONE (column1 INTEGER, column2 CHAR(5) ); SET MULTISET Dont allow duplicate rows Allow duplicate rows (ANSI)

DATABLOCKSIZE =
BYTES KILOBYTES (or KBYTES) MINIMUM DATABLOCKSIZE MAXIMUM DATABLOCKSIZE IMMEDIATE

Maximum multi-row block size for table in:


Rounded to nearest sector (512) Increments of 1024 (7168) (130,560) May be used to immediately re-block the data (ALTER)

FREESPACE Percent of freespace to keep on cylinder during load operations (0 - 75%). CHECKSUM = DEFAULT | NONE | LOW | MEDIUM | HIGH | ALL Disk I/O Integrity Check V2R5.1 feature

Column Level Constraints


PRIMARY KEY UNIQUE CHECK REFERENCES No Nulls, No Duplicates No Nulls, No Duplicates Verify values or range Relates to other columns

CREATE TABLE Table_2 (col1 INTEGER NOT NULL col2 INTEGER NOT NULL col3 INTEGER col4 INTEGER );

CONSTRAINT CONSTRAINT CONSTRAINT CONSTRAINT

primary_1 unique_1 check_1 reference_1

PRIMARY KEY, UNIQUE, CHECK (col3 > 0), REFERENCES Table_3(col_a)

All constraints are named. All constraints are at column level. PRIMARY KEY columns must have NOT NULL attribute. UNIQUE columns must also have NOT NULL attribute.

Table Level Constraints


CREATE TABLE Table_4 (col1 INTEGER NOT NULL, col2 INTEGER NOT NULL, col3 INTEGER NOT NULL, col4 INTEGER NOT NULL, col5 INTEGER, col6 INTEGER, CONSTRAINT CONSTRAINT CONSTRAINT CONSTRAINT CHECK FOREIGN KEY ); primary_1 unique_1 check_1 reference_1 PRIMARY KEY UNIQUE CHECK FOREIGN KEY REFERENCES (col1, col2), (col3, col4), (col2 > 0 OR col4 > 0), (col5, col6) Table_5 (colA, colB), Table_6 (colX)

Named

(col4 > col5), (col3) REFERENCES

Unnamed

Some constraints are named. Some constraints are unnamed. All constraints are at table level.

Example: Department Table with Constraints


CREATE TABLE Department ( dept_number INTEGER NOT NULL CONSTRAINT primary_1 ,dept_name CHAR(20) NOT NULL UNIQUE ,dept_mgr_number INTEGER ,budget_amount DECIMAL (10,2) ,CONSTRAINT refer_1

PRIMARY KEY

FOREIGN KEY (dept_mgr_number) REFERENCES Employee (employee_number) CHECK (dept_number > 999)

,CONSTRAINT );

dn_1000_plus

Some constraints are named, some are not. Some constraints are at column level. Some are at table level.

Example: SHOW Department Table


SHOW TABLE Department; CREATE SET TABLE PD.Department , FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT ( dept_number INTEGER NOT NULL, dept_name CHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL, dept_mgr_number INTEGER, budget_amount DECIMAL(10,2), CONSTRAINT dn_1000_plus CHECK ( dept_number > 999 ), CONSTRAINT refer_1 FOREIGN KEY ( dept_mgr_number ) REFERENCES PD.EMPLOYEE ( EMPLOYEE_NUMBER )) UNIQUE PRIMARY INDEX primary_1 ( dept_number ) UNIQUE INDEX ( dept_name );
Note: The Primary Key constraint defined with the CREATE TABLE doesn't appear in this SHOW TABLE.

Notes: Primary key constraint becomes a named index. Unique constraint becomes a unique index. All constraints are specified at table level.

Altering Table Constraints


To add constraints to a table:
ALTER TABLE tablename ADD CONSTRAINT constrname CHECK . . . ADD CONSTRAINT constrname UNIQUE . . . ADD CONSTRAINT constrname PRIMARY KEY . . . ADD CONSTRAINT constrname FOREIGN KEY . . .

To modify existing constraints:


ALTER TABLE tablename MODIFY CONSTRAINT constrname . . . ;

Note: Only constraint that can be modified is a named CHECK constraint.

To drop constraints:
ALTER TABLE tablename DROP CONSTRAINT constrname ;

In V2R5, the ALTER TABLE command can also be used to add new columns (up to 2048) to an existing table.

Identity Column Overview


Also known as a DBS Generated Unique Primary Index: A table-level unique number system-generated for every row as it is inserted in the table. Identity Columns may be used to ...

Guarantee row uniqueness in a table Guarantee even row distribution for a table Optimize and simplify initial port from other databases that use generated keys
Identity Columns are valid for:

Single inserts Multi-session concurrent insert requests (e.g., TPump) INSERT SELECT
Identity Columns Save Overhead/Maintenance Costs:

Reduce need for uniqueness constraints Reduce manual coding tasks Generate unique PK values Comply with the ANSI Standard

Identity Column Implementation


Characteristics of the IDENTITY Column feature are ...

Implemented at column level in a CREATE TABLE statement


Data type may be any exact numeric type GENERATED ALWAYS always generates a value GENERATED BY DEFAULT generates a value only when no value is specified

GENERATED ALWAYS + NO CYCLE implies uniqueness CYCLE restarts numbering after the maximum/minimum number is
generated

DBSControl setting indicates the number pool size to reserve for generating
numbers

Each Vproc may reserve 1 1,000,000 numbers; default is 100000.

Numbering gaps can occur


Generated numbers do not reflect row insertion sequence Exact incrementing is not guaranteed

Scalability and performance are favored over enforced sequential


numbering

Identity Column Example 1


Example 1: GENERATED ALWAYS AS IDENTITY This command always generates a value. It does not cycle and does not repeat prior used values.
CREATE TABLE Table_A (Cust_Number INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 1001 INCREMENT BY 1 MAXVALUE 1000000 NO CYCLE), LName VARCHAR(15), Zip_code INTEGER);

INSERT INTO Table_A SELECT c_custid, c_lname, c_zipcode FROM Customer;


SELECT * FROM Table_A ORDER BY 1; Cust_Number 1001 1002 1003 1004 : 101001 101002 101003 : LName Tatem Kroger Yang Miller : Powell Gordan Smoothe : Zip_Code 89714 98101 77481 45458 : 57501 89714 80002 :

Customer has 500 rows new customer


numbers generated are not sequentially numbered from 1001 to 1500.

Numbering gaps can occur exact


incrementing is not guaranteed.

Pools (range of numbers) are reserved


and allocated by Teradata software.

Default for next allocation pool is


DBSControl parameter value of 100,000.

Identity Column Example 2


Example 2: GENERATED BY DEFAULT AS IDENTITY This option generates a value only when no value is specified for the column.
CREATE TABLE Table_B (Cust_Number INTEGER GENERATED BY DEFAULT AS IDENTITY (START WITH 10000000 INCREMENT BY -1 MINVALUE 0), LName VARCHAR(15), Zip_code INTEGER); INSERT INTO Table_B SELECT NULL, c_lname, c_zipcode FROM Customer;

SELECT * FROM Table_B ORDER BY 1 DESC; Cust_Number 10000000 9999999 9999998 9999997 : 9900000 9899999 9899998 : LName Tatem Kroger Yang Miller : Powell Gordan Smoothe : Zip_Code 89714 98101 77481 45458 : 57501 89714 80002 :

Customer has 500 rows new customer


numbers are generated because NULL was part of SELECT.

If MINVALUE is not used, the minimum


value for an INTEGER is -2,147,483,647.

CYCLE option is not used default is NO


CYCLE.

GENERATED BY DEFAULT provides


capability of copying the contents of one table with an Identity column into another.

Identity Column Considerations


Generated Always Identity Columns

Typically define the Primary Index. Define as the Primary Index only if it is the primary path. If it is also used as an access path, consider it as a Secondary Index.
Generated By Default Identity Columns

Facilitate copying data from one table into another. Use a numeric type large enough to hold all the values that will ever be required. Never use as a substitute for a good logical database design. May not optimally utilize Teradata join and access capabilities.

Restrictions
A table can only have 1 Identity column. FastLoad and MultiLoad do not support Identity columns with Teradata V2R5.0. ALTER TABLE statement can not add an Identity Column to an existing table. Cannot be part of a composite primary or a composite secondary index. Cannot be used with Global Temporary or volatile tables. Cannot be used in a join index, hash index, PPI or value-ordered index. Atomic UPSERTs are not supported on a table with an Identity Column as its PI. GENERATED ALWAYS Identity Column value updates are not supported.

Note: With Teradata V2R5.1, Identity columns are supported with the FastLoad, MultiLoad, and Teradata Warehouse Builder (TWB) utilities.

Review Questions
1. Which one of the following situations requires the use of the Transient Journal? a. INSERT / SELECT into an empty table b. UPDATE all the rows in a table c. DELETE all the rows in a table d. loading a table with FastLoad 2. What is a negative impact of updating a UPI value? ______________________________________________________ ______________________________________________________ 3. What are the 4 types of constraints? _____________ 4. 5. 6. 7. 8. 9. True or False? True or False? True or False? True or False? True or False? True or False? _____________ _____________ _____________

A primary key constraint is always implemented as a primary index. A primary key constraint is always implemented as a unique index. Multi-column constraints must be coded as table level constraints. Only named check constraints may be modified. Named primary key constraints may always be dropped if they are no longer needed. Using the START WITH 1 and INCREMENT BY 1 options with an Identity column will provide sequential numbering with no gaps for the column.

Module 8: Review Question Answers


1. Which one of the following situations requires the use of the Transient Journal? a. INSERT / SELECT into an empty table b. UPDATE all the rows in a table c. DELETE all the rows in a table d. loading a table with FastLoad 2. What is a negative impact of updating a UPI value? Very I/O intensive - updating the Primary Index requires that (internally) the data row be deleted and re-inserted into the table as well as updating the existing secondary index references to the new RowID 3. What are the 4 types of constraints? Primary Key 4. 5. True or False? True or False? Unique References Check

A primary key constraint is always implemented as a primary index. A primary key constraint is always implemented as a unique index.

6.
7. 8. 9.

True or False?
True or False? True or False? True or False?

Multi-column constraints must be coded as table level constraints.


Only named check constraints may be modified. Named primary key constraints may always be dropped if they are no longer needed. Using the START WITH 1 and INCREMENT BY 1 options with an Identity column will provide sequential numbering with no gaps for the column.

You might also like