This action might not be possible to undo. Are you sure you want to continue?
Ŵƽ lndex subLable ls noL requlred
2)whaL lf 8l ls deflned on Lwo populaLed Lables ?
Ŵƽ vlolaLlna rows are puL lnLo error LableŦŦ
3)Whlle lnserLlna a row ln chlld Lable lnserL falls as Lhe value ls noL found ln parenL LableŦŦ whaL can be
done Lo make Lhe lnserL successful?
Ŵƽ make Lhe value Lo null
Ŵƽ lnserL lL ln parenL Lable
Ŵƽ aeL a value whlch ls presenL ln ÞarenL Lable
3) updaLe fallsŦŦŦŦ whaL happens???
Ŵƽ lns sLmL wlll be rolled back
ŴƽupdaLe wlll be rolled backŦ
Ŵƽ locks wlll be released afLer processlna rollback of updaLe
Ŵƽ locks wlll noL be releaseŦŦŦ
6)whaL ls [olnback??
Ŵƽ[oln lndex ls [olned back Lo base LableŦ
aaar [oln lndexŦ
8)whaL ls Lrue abL alobal Lemp Lables ?
Ŵƽ uses Lemp space of user loaaed ln
Ŵƽ can use Wl1P uA1A opLlon
Ŵƽ ulc lnfo ls sLored uu
9) CeneraLe alwavs ldenLlLv colŦŦŦ max 100 mln 1
Ŵƽ cannoL lnserL more Lhan 100 rowsŦ
10) sel * from L1 where uplƹ1Ť
sel * from L1 where uplƹ2Ť
11)11 rlLe [oln 12 LefL [oln 13
12) WhaL are lnner Lables and whaL are ouLer Lables?
13) 1rlaaers Crder clause when does lL use?
Ŵƽ 1rlaaerlna acLlon should be same
Ŵƽ 1rlaaered sLmLs shoud be same
Ŵƽ WPLn condlLlon musL be same
Ŵƽ LuMţLLuMţÞuMţueslanlna appţ developlna appţ assurance LesLlnaŦŦŦ
13)denormallzaLlon comes lnLo plcLure ln
16)maLch followlna CSuMţ remalnlna sumţ movlna sumŦŦ 3 SCLsŦŦ
sum(a1) rows 2 precedlna
sum(a1) unbounded preceedlna rows
sum(a1) unbounded followlna rowsŦŦ
17)when does opL aoes for nesLed [oln ?
18)whaL are [oln Lvpes ?
cross [olnţ lnner [oln ţ ouLer [oln
19)sel * from depL where mar noL ln (sel mar from emplovee )Ť
mar ln depL ls nullŦŦ whaL ls Lhe resulL of Lhe above querv ??
20)1W8 ??? auLomaLlc resLarLţ mulLlple Lables can be loadedţ daLa can be
read from mulLlple sourcesŦŦ
21)adv of nC CPLCk opLlon over Wl1P CPLCk CÞ1lCn ln 8l ???
Ŵƽlndex sub Lable ls noL bullLŦŦŦ
Ŵƽ !oln ellmlnaLlons posslble (ln boLh cases ls posslble)
22)updaLe cursors are allowed ln
Ŵƽ SLored procedures
23)locklna L1 for wrlLe sel * fromL L1 where uplƹ3 wlll place
ŴƽwrlLe lock on Lable
Ŵƽ wrlLe lock on row hash
24) nuSl on slnale col sLaLs are sLored ln
23) whlch Lwo resLrlcL users access Lo db ob[ecL ?
26) creaLe procedure( a lnţ b lnouLţ c ouL)
whlch of Lhe followlna are Lrue ??
c ƹ a+1
b ƹ c+1
c ƹ a+b
b ƹ a+1
Ŵƽ SLored procedures
DeveIopment Life CycIe
A Logical Data Model (LDM) is a relational representation of a business enterprise.
The logical data model is a collection of two-dimensional, column-and-row tables that represent
the real world business enterprise.
Normalization is a technique for placing non-key attributes (columns) into tables in order to
minimize redundancy, provide fIexibiIity, and eIiminate update anomaIies.
Process of designing the database
Testing the design or modeI
Consists of three main forms (ruIes), to which tabIes must
First NormaI Form
Eliminate the practice of "repeating groups.¨
The relationship of the column to the Primary Key should be one to one.
PK should be unique.
Ìf there are multiples in the column, you need to reconsider your choice of Primary Key.
The solution here is to find a more definitive PK and redescribe the table in this manner, so each
occurrence is in its own row.
Attributes must not repeat within a tabIe.
No repeating groups.
1:1 reIationship of coIumn to PK.
The rule of Second NormaI Form is not necessary in all cases.
Ìt only comes into play when a table has a multicolumn Primary Key.
Ìn this situation, focus your attention on the combination of columns in the PK.
Ìf the placement came as a result of only one column of the PK, then these columns must be
The result here is that you may be establishing a new entity. Remember that the data column is
important and you don't want to just throw it away.
An attribute must reIate to the entire Primary Key, not just a
TabIes with a singIe-coIumn Primary Key (entities) are aIways
in Second NormaI form.
The final test for the three forms of Normalization is a test of whether or not each data column is
directly related to the Primary Key of the table, or perhaps the column is related to another column
in the table.
This situation usually happens when we have included both the identification code for a set of data
and it's description.
Sometimes we do this to keep data together. But in doing so we often lose the value of the
For instance if we keep the job code of the employee along with the job description (note that
these two columns are related to each other), so that if one changes both need to change. When
there are no employees with this job code we lose the definition of the job and it's description.
They should be kept in their own table for reference. This situation may uncover minor entities that
you have forgotten.
Attributes must reIate to the Primary Key and not to each other.
Cover up the PK and any No DupIicate coIumns; remaining
coIumns must not describe each other.
ssues with NormaIization
Ìf you normalize to first normal form, the table will have many more rows and you will have to do
many UNÌONS or JOÌNS or CASE operations to get viable reports.
When the data is not in the first normal form it is limited to pre-planned occurrences.
The impact of implementing at second normal form is that the data is now separated and therefore
you will have to do more joins to get the data in presentable form.
When the data is not in second normal form, then you have anomalies in the update process on
the tables. .
Third normal form is again the necessity of doing additional joins to get the data for presentation.
The impact of not going to 3NF is the potential of missing business data because of the
improbable location of minor entities.
enefits of normaIization
W 3NF can accurately model business relationships.
W Supports ability to ask any question (ad hoc), not just known questions.
W Supports function of the warehouse as repository of detail that is needed to offer complete
analysis against any data at any time.
Repeating Groups are attributes in a denormalized table that would be rows if the table were
Temporary tabIes are created for specific purposes, but are not a part of the Logical Model. As
such, they can be denormalized. This will provide performance benefits while keeping the Logical
Tables can be divided into sub-entity tables with fewer columns.
Many SQL queries will run faster against the sub-entities. Ìt is more difficult to do FDLs, ÌNSERTs
and DELETEs since you have to deal with several tables instead of just one. UPDATEs may
also require more effort.
Teradata stores password information in encrypted form in the DBC.DBase system table.
The PasswordString column from the DBC.DBase table displays encrypted passwords.
The DBC.Users view displays PasswordLastModDate and PasswordLastModTime.
DBC.SysSecDefaults has all the pre-requisites for the password given to an user
#ANT/#'OK LOON Statements: DBC.LogonRule ...macro
User PriviIeges (Access #ights)
Teradata stores access rights information in the system table DBC.AccessRights.
The xtended IogicaI data modeI is an "extension¨ to the original logical data model and is used
by the physical database designer to select indexes.
Ìnformation provided by the ELDM results from user input about transactions and transaction
O Value Access Frequency: how often the table is accessed based on a specific value for this
column (Or group of columns).
O Value Access R * F (Rows * Frequency): how many rows will be accessed multiplied by the
frequency of access.
O Join Access Frequency: how often the table is joined to another table based on the value
that appears in this column (or group of columns).
O Join Access Rows: how many rows will be joined multiplied by the frequency of access.
O Distinct Values: number of different values that can be found in this column.
O Maximum Rows per Value: number of rows that can be found for the value occurring most
often in this column. (Nulls are not included in this entry.)
O Rows per Null Value: number of rows that can be found having a NULL value in this
O Typical Rows per Value: number of rows that can be found for a typical value in this
O Change Rating: A relative rating for the frequency with which the values in this column
Teradata's parallelism is never more apparent than when performing queries involving set
Another way to take advantage of the parallelism, and flexibility, of Teradata, is by
consolidating data on Teradata instead of preprocessing it on the client or host side prior to data
loads. Load it first and process it later on Teradata!
For the Teradata #DMS, a PDM is designed:
ased on the LDM, to retrieve and manipuIate tabIes, rows, and coIumns as physicaI
To provide optimaI performance using Teradata.
dentifies index structures as access paths to the data.
÷ Primary ndex
» Mandatory and only one allowed per table.
» Can contain up to 16 columns.
» Chosen for either access or row distribution.
» Can be unique (UPÌ) or Non-Unique (NUPÌ).
» Query access is always a one AMP operation.
» Can contain NULLs.
» Values can be updated.
÷ Secondary ndex
» Optional and can be as many as 32 to a table.
» Can contain up to 16 columns each.
» Can be Unique (USÌ) or Non-Unique (NUSÌ)
» Chosen for improved access to data.
» Stored as a sub-table.
» Can be created/dropped dynamically.
» Can contain NULLs.
» Can be used to enforce uniqueness.
Need not be based on the LDM. ("De-normaIization") However, conforming to these
ruIes is strongIy advised.
#eIationships between tabIes are seen onIy through queries on them-an FK is not
required for joins.
The Teradata #DMS is extremeIy fIexibIe in aIIowing you the freedom to perform any
kind of processing at any time!
hen deveIoping appIications, use set manipuIation when possibIe to maximize
LDM = Logical Data Model
ELDM = Extended Logical Data Model
PDM = Physical Data Model
ETL = Extract, Transformation, and Load
orking with the DAs
Ìn some cases, it may be helpful to capture SQL. You or the DBA can do this by:
Using access logging
Modifying TDP exit routines
Data CoIIection for the AppIication DeveIoper: The asics
Periodic CoIIection of Disk Summary by orkIoad roup
Current, Peak and Max tabIe statistics for Perm, SpooI and Temp
tabIes, found in DC.DiskSpace.
Note: Peak statistics, by defauIt, accumuIate from day 0, and
herefore have no reaI vaIue. However, if you reset peak statistics
every hour, you wiII see variations in peak statistics from hour to
hour, day to day, etc.
Periodic CoIIection of TabIe rowth
Mbytes and number of rows
CoIIect User Logons per day per workIoad
CoIIect and summarize appIication User Iogon/Iogoff history found
Mapped to query throughputs for predictions based on knowing
user Iogon increases.
PP2-hat is a Preprocessor?
The Teradata Preprocessor is a precompiler for programs that access the Teradata database.
The precompiler is an entity that parses application source code and replaces all
embedded SQL it finds with calls that are acceptable to the native compiler for the host language.
The preprocessor is the precompiler plus the services that execute, or provide runtime
support for, the compiled application. Ìt is a runtime module that is used at program execution.
logs on to Teradata,
syntax checks your SQL statements,
verifies table names, columns, and access rights with the Data Dictionary.
The Preprocessor builds code in data division for data elements.
Ìt builds code in procedure division to handle passing your SQL statements to Teradata.
Ìt will comment out the source code it uses, and it outputs the COBOL source for input to the
There are two transaction modes for which a program may be precompiled:
COMMÌT and BTET
Using the COMMÌT keyword allows you to break your program into multiple transactions.
The Begin Transaction and End Transaction statements are placed to group small
transactions into larger ones by nesting them.
hat is Dynamic SQL?
Dynamic SQL means the program dynamically builds SQL requests at runtime. The Preprocessor
doesn't see the SQL requests until runtime.
As static SQL has limited parameter capabilities dynamic SQL helps in overcoming this.
How Does CL ork?
CLÌv2 sends requests to the Teradata server, and provides the application with a response
returned from the server.
CLÌv2 provides support for:
O Managing multiple serially-executed requests on a session
O Managing multiple simultaneous sessions to the same or different Teradata servers.
O Using cooperative processing that the application can perform operations on the client and
the Teradata server at the same time.
O Communicating with two-phase commit coordinators for CÌCS and ÌMS transactions.
O Generally insulating the application from details in communicating with a Teradata server.
O CLÌ minimizes overhead and allows developers to create applications with efficient
DDL Requests are not cached because they are not considered repeatable. For example, you
would not be able to repeat the same CREATE TABLE Request.
Parser parses the SQL statement and prepares AMP steps to execute the request. Ìt als provides cache for the SQL
and amp exec plan.
There are two important reasons to use Macros whenever applicable:
O Macros reduce parcel size, thus dramatically improving performance.
O Macros will increase the likelihood of matching the R-T-S cache because users won't have
to re-enter their SQL
Ìf an identical Request exists in Request-To-Steps cache:
÷ Call SECURÌTY and GNCAPPLY and pass them the memory address of the AMP steps.
÷ These steps do not have DATA parcel values bound into them; they are called Plastic steps.
Otherwise, the Request Parcel passes the request to the SYNTAXER.
The larger the Request Parcel, the longer these steps take.
Macros reduce parcel size, dramatically improving performance.
#equest-To-Steps Cache Logic
The entire Request Parcel is put through the hashing algorithm to produce a 32-bit hash of
the parcel. Ìf there is an entry in R-T-S cache with the same hash value, the system must do a
byte-by-byte comparison between the incoming text and the stored text to determine if a true a
match exists. The larger the size of the Request Parcel, the longer these steps takes.
The Syntaxer checks the syntax of an incoming Request parcel for errors. Ìf the syntax is correct,
the Syntaxer produces an initial Parse Tree, which is then sent to the Resolver.
The Resolver takes the initial Parse Tree from the Syntaxer and replaces all Views and
Macros with their underlying text to produce the Annotated Parse Tree. Ìt uses DD information to
"resolve" View and Macro references down to table references. The DD tables shown in the
diagram on the right-hand page (DBase, Access Rights, TVM, TV Fields and Ìndexes) are the
tables that the Resolver utilizes for information when resolving DML requests.
Nested Views and Macros can cause the Resolver to take substantially more time to do its
The nesting of views (building views of views) can have a very negative impact on performance.
At one site, what a user thought was a two-table join was actually a join of two views which were
doing joins of other views, which were doing joins of other views, which were doing joins of base
tables. When resolved down to the table level, the two "table¨ join was really doing a 12-table join.
The data the user needed reside in a single table.
DBC.Next is a DD table that consists of a single two-column row. used to assign a globally
unique numeric ÌD to every Database/User, Table, View and Macro. DBC.Next always contains
the next value to be assigned to any of these. Think of the two columns as counters for ÌD values.
O The DD keeps track of all SQL names and their numeric ÌDs.
O The RESOLVER uses the DD to verify names and convert them to ÌDs.
O The AMPs use the numeric Ìds supplied by the RESOLVER.
The dictionary cache is a buffer in parsing engine memory that stores the most recently used
dictionary information. These entries, which also contain statistical information used by the, are
used to convert database object names to their numeric ÌDs.
The statement, or request, cache stores successfully parsed SQL requests so they can be reused,
thereby eliminating the need for reparsing the same request parcel. The cache is a PE buffer that
stores the steps generated during the parsing of a DML statement The statement cache is
checked at the start of the parsing process, before the Syntaxer step, but after the Request parcel
has been checked for format errors. Ìf the system finds a matching cache request, the Parser
bypasses the Syntaxer, Resolver, Optimizer, and Generator steps, performs a security check (if
required), and proceeds to the Apply stage.
The DD Cache is part of the cache found on every PE. Ìt stores the most recently used DD
information including SQL names, their related numeric ÌDs and Statistics.
The DD tables that provide the information necessary to parse DML requests are:
W Access Rights
The Optimizer analyzes the various ways an SQL Request can be executed and determines
which is the most efficient. Ìt acts upon the Annotated Parse Tree after Security has verified the
permissions and generates an Optimized Parse Tree.
DD operations replace DDL statements in the Parse tree.
W The OPTÌMÌZER evaluates DML statements for possible access paths:
÷ Available indexes referenced in the WHERE clause.
÷ Possible join plans from the WHERE clause.
÷ Full Table Scan possibility.
W Ìt uses COLLECTed STATÌSTÌCS or dynamic samples to make a choice.
W Ìt generates Serial, Parallel, Ìndividual and Common steps.
W OPTÌMÌZER output passes to either the Generator or the EXPLAÌN facility.
W Additional steps are needed for Check Constraints and Triggers.
The Generator acts upon the Optimized Parse Tree from the Optimizer and produces the Plastic
Steps. Plastic Steps do not have data values from the DATA parcel bound in, but do have hard-
coded literal values embedded in them.
Plastic Steps produced by the Generator are stored in the R-T-S Cache unless a request is not
PIastic Steps for #equests with DATA parceIs are cached
'iews and Macros cannot be nested beyond eight IeveIs.
Nested 'iews and macros take Ionger for initiaI parsing.
MuIti-statement requests (incIuding macros) generate more ParaIIeI
and Common steps.
xecution pIans remain current in cache for up to four hours.
DDL "spoiIing" messages may purge DD cache entries at any time.
#equests against purged entries must be reparsed and re-
Primary ndex Choice Criteria
There are three Primary Ìndex Choice Criteria: DistributionDemographics,
They are listed in the order that they should be considered when selecting a Primary index.
Access demographics . Access columns are those that would appear (with a value) in
a WHERE clause in an SQL statement. Choose the column most frequently used for access to
maximize the number of one-AMP operations. Examples of access demographics are join access
frequency and value access frequency.
Distribution demographics The more unique the index, the better the distribution.
Optimizing distribution optimizes parallel processing .Ìn choosing a Primary Ìndex, there is a trade-
off between the issues of access and distribution. The most desirable situation is to find a PÌ
candidate that has good access and good distribution. Many times, however, index candidates
offer great access and poor distribution or vice versa. When this occurs, the physical designer
must balance these two qualities to make the best choice for the index.
An important point to note here is that even distribution of data will not only speed up processing,
but will also avoid premature 2644 ("Out of PERM space¨) and 2646 ("Out of SPOOL space¨)
errors. Although the PERM space limit for a database may be defined as 50 GB, the real
limit is 50 GB divided by the number of AMPs in the system. Assume that the system has 50
AMPs, even though this database may have plenty of spare space, as soon as you try to use
more than 1 GB of PERM space on any single AMP, Teradata will reject the ÌNSERT and
return a 2644 error code. (The same considerations apply to SPOOL space).
voIatiIity, how often the data values will change. The Primary Ìndex should not be very
volatile. Any changes to Primary Ìndex values may result in heavy Ì/O overhead, as the rows
themselves may have to be moved from one AMP to another. Choose a column with stable data
Note: Ìf you find that one AMP has higher CPU usage than other AMPs, and that data is highly
skewed, this may be evidence of a poor choice of primary index.
Primary ndexes (UP and NUP)
A Primary ndex may be different than a Primary Key.
very tabIe has onIy one Primary ndex.
A Primary ndex may contain nuII(s).
SingIe-vaIue access uses ON AMP and typicaIIy one /O.
Unique Primary ndex (UP)
nvoIves a singIe base tabIe row at most.
No spooI fiIe is ever required.
The system automaticaIIy enforces uniqueness on the index vaIue.
Non-Unique Primary ndex (NUP)
May invoIve muItipIe base tabIe rows.
A spooI fiIe is created when needed.
DupIicate vaIues go to the same AMP and the same data bIock.
OnIy one /O is needed if aII the rows fit in a singIe data bIock.
DupIicate row check for a ST tabIe is required if there is no US on
MuIti-CoIumn Primary ndexes
More coIumns = more uniqueness
Distinct vaIue increases.
More coIumns = Iess usabiIity
P can onIy be used when vaIues for aII P coIumns are
provided in SQL statement.
PartiaI vaIues cannot be hashed.
ood Distribution Demographics for a Primary ndex
Column Distribution demographics are expressed in four ways:
Maximum #ows per 'aIue
Maximum #ows NULL
TypicaI#ows per 'aIue.
Distinct 'aIues is the total number of different values a column contains. For PÌ selection, the
higher the Distinct Values (in comparison with the table row count), the better. Distinct Values
should be greater than the number of AMPs in the system, whenever possible. We would prefer
that all AMPs have rows from each TABLE.
Maximum #ows per 'aIue is the number of rows in the most common value for the column or
columns. When selecting a PÌ, the lower this number is, the better the candidate. For a column or
columns to qualify as a UPÌ, Maximum Rows per Value must be 1.
Maximum #ows NULL should be treated the same as Maximum Rows Per Value when being
considered as a PÌ candidate.
TypicaI #ows per 'aIue gives you an idea of the overall distribution which the column or
columns would give you. The lower this number is, the better the candidate. Like Maximum Rows
per Value, Typical Rows per Value should be small enough to fit on one data block.
Secondary Ìndexes are generally defined to provide faster set selection.The Teradata RDBMS
allows up to 32 SÌs per table.
Secondary Ìndex values, like Primary Ìndex values, are input to the Hashing Algorithm.
As with Primary Ìndexes, the Hashing Algorithm takes the Secondary Ìndex value and outputs a
These Row Hash values point to a subtable which stores index rows containing the base table SÌ
column values and Row ÌDs which point to the row(s) in the base table with the corresponding SÌ
The Teradata RDBMS can tell whether a table is a SÌ subtable from the Subtable ÌD, which is part
of the Table ÌD.
Subtables store the row hash of the base table secondary index value, the column values, and the
Row ÌD to the base table rows.
Users cannot access subtables directly.
Secondary ndex Considerations
Sis require additional storage to hold their subtables. Ìn the case of a Fallback table, the SÌ
subtables are Fallback also. Twice the additional storage space is required.
Sis require additional Ì/O to maintain these subtables.
The Optimizer may choose to do a Full Table Scan rather than utilize the NUSÌ in two cases:
When the NUSÌ is not selective enough.
When no COLLECTed STATÌSTÌCS are available.
As a guideline, choose only those rows having frequent access as NUSÌ candidates.
After the table has been loaded, create the NUSÌ indexes, COLLECT STATÌSTÌCS on the
indexes, and then do an EXPLAÌN referencing each NUSÌ. Ìf the Parser chooses a Full Table
Scan over using the NUSÌ, drop the index.
Secondary ndex SubtabIes
Ìt compares and contrasts examples of Primary (UPÌs and NUPÌs), Unique Secondary (USÌs) and
Non-Unique Secondary Ìndexes (NUSÌs).
Primary ndexes (UPs and NUPs)
As you have seen previously, in the case of a Primary Ìndex, the Teradata RDBMS hashes the
value and uses the Row Hash to find the desired row. This is always a one-AMP operation and is
shown in the top diagram on the right-hand page.
Unique Secondary ndexes (USs)
An index subtable contains index rows, which in turn point to base table rows matching the
supplied index value.
USÌ rows are globally hash distributed across all AMPs, and are retrieved using the same
procedure for Primary Ìndex data row retrieval. Since the USÌ row is hash-distributed on different
columns than the Primary Ìndex of the base table, the USÌ row typically lands on an AMP other
than the one containing the data row. Once the USÌ row is located, it "points" to the corresponding
Comment |U1]: WhaL ls fall back??
data row. This requires a second access and usually involves a different AMP. Ìn effect, a USÌ
retrieval is like two PÌ retrievals:
Master Ìndex - Cylinder Ìndex - Ìndex Block
Master Ìndex - Cylinder Ìndex - Data Block.
Non-Unique Secondary ndexes (NUSs)
NUSÌs are implemented on an AMP-local basis.
Each AMP is responsible for maintaining only those NUSÌ subtable rows that correspond to base
table rows located on that AMP.
Since NUSÌs allow duplicate index values and are based on different columns than the PÌ, data
rows matching the supplied NUSÌ value could appear on any AMP.
Ìn a NUSÌ retrieval (illustrated at the bottom of the right-hand page), a message is sent to all
AMPs to see if they have an index row for the supplied value. Those that do use the "pointers" in
the index row to access their corresponding base table rows. Any AMP that does not have an
index row for the NUSÌ value will not access the base table to extract rows.
US SubtabIe eneraI #ow Layout
The layout of a USÌ subtable row is illustrated at the top of the right-hand page. Ìt is composed of
The first two bytes designate the row Iength.
The next 8 bytes contain the #ow D of the row. Within this Row ÌD,
4 bytes of Row Hash
4 bytes of Uniqueness Value.
The following 2 bytes are additionaI system bytes (which will be explained later).
The next section contains the S vaIue. This is the value that was used by the Hashing Algorithm
to generate the Row Hash for this row. This section varies in length depending on the index.
Following the SÌ value are 8 bytes containing the #ow D of the base tabIe row. The base table
Row ÌD tells the system where the row corresponding to this particular USÌ value is located.
The last two bytes contain the reference array pointer at the bottom of the block.
The Teradata RDBMS creates one index subtable row for each base table row.
US SubtabIe eneraI #ow Layout
W USÌ rows are distributed on Row Hash, like any other row.
W The Row Hash is based on the base table secondary index value.
W The second Row ÌD identifies the single base table row that carries the secondary
index value (probably on a different AMP).
W There is one index subtable row for each base table row.
The only difference between this and the three-part message used in PÌ access is that the
Subtable ÌD portion of the Table ÌD references the USÌ subtable not the data table.
Using the DSW for the Row Hash, the Communication Layer directs the message to the correct
AMP which uses the Table ÌD and Row Hash as a logical index block identifier and the Row Hash
and USÌ value as the logical index row identifier. Ìf the AMP succeeds in locating the index row, it
extracts the base table Row ÌD ("pointer").
The Subtable ÌD portion of the Table ÌD is then modified to refer to the base table and a new
three-part message is put onto the Communications Layer.
Once again, the Communication Layer uses the DSW to identify the correct AMP. That AMP uses
Table ÌD and Row Hash to locate the correct data block and then uses Row Hash and Uniqueness
Value (Row ÌD) to locate the correct row.
NUS SubtabIe eneraI #ow Layout
There are, however, two major differences:
First, NUSÌ entries are not distributed by the Hash Map. NUSÌ subtable rows are built from
the base table rows found on that particular AMP and refer only to the base rows of that AMP.
Second, NUSÌ rows may point to more than one base table row. There can be many base
table Row ÌDs in a NUSÌ row. Because NUSÌs are always AMP-local to the base table rows, it is
possible to have the same NUSÌ value represented on multiple Amps. A NUSÌ subtable is just
another table from the perspective of the file system.
NUS SubtabIe eneraI #ow Layout
_ The Row Hash is based on the base table secondary index value.
_ The other Row ÌDs identify the base table rows on this AMP that carry the Secondary
_ There are one or more subtable rows for each secondary index value on the AMP.
_ The Row ÌDs "point¨ to base table rows on this AMP only.
_ The maximum size of a single NUSÌ row is 64 KB.
SingIe NUS Access (etween, Less Than, or reater Than)
Utilize the NUSÌ and do a Full Table Scan (FTS) of the NUSÌ subtable. Ìn this case, the Row ÌDs
of the qualifying base table rows would be retrieved into spool. The Teradata RDBMS would use
those Row ÌDs in spool to access the base table rows themselves.
Ìgnore the NUSÌ and do an FTS of the base table itself.Ìn order to make this decision, the
Optimizer requires COLLECTed STATÌSTÌCS.
Note: The only way to determine for certain whether an index is being used is to utilize the
DuaI NUS Access
AND with quaIity Conditions
Ìf one of the two indexes is strongly selective, the system uses it alone for access.
Ìf both indexes are weakly selective, but together they are strongly selective, the system does a
Ìf both indexes are weakly selective separately and together, the system does an FTS.
Ìn any case, any conditions in the SQL statement not used for access (residual conditions)
become row qualifiers.
O# with quaIity Conditions
Do an FTS of the two NUSÌ subtables.
Retrieve Row ÌDs of qualifying base table rows into two separate spools.
Eliminate duplicates from the two spools of Row ÌDs.
Access the base rows from the resulting spool of Row ÌDs.
Ìf only one of the two columns joined by the OR is indexed, the Teradata RDBMS always does an
FTS of the base tables.
NUS it Mapping is a process that determines common Row ÌDs between multiple NUSÌ values
by a process of intersection:
When aggregation is performed on a NUSÌ column, the Optimizer accesses the NUSÌ subtable
that offers much better performance than accessing the base table rows. Better performance is
achieved because there should be fewer index blocks and rows in the subtable than data
blocks and rows in the base table, thus requiring less Ì/O.
xpIicitIy decIares a Iock type for one or more objects.
Any Iock may be upgraded.
OnIy a #AD Iock may be downgraded to an ACCSS Iock.
Locks are never reIeased or downgraded during a transaction.
The system hoIds the most restrictive Iock.
No SQL "#eIease Lock" statement exists.
Locks reIease onIy at COMMT/nd Transaction, or when #oIIback compIetes.
The Locking Modifier appIies to a tabIe, database, view, or row.
LOCK #O Iocks aII rows that hash to a specific vaIue. t is a row hash Iock, not a row
xpIicit Iocking creates an aII-AMP (tabIe or database) operation.
COLLECTed STATÌSTÌCS are stored in one of two Data Dictionary (DD) tables (DBC.TVFields or
you can use the HELP STATÌSTÌCS statement to display information about current column or
HELP STATÌSTÌCS returns the following information about each column or index for which
statistics have been COLLECTed in a single table:
The Date the statistics were last COLLECTed or refreshed.
The Time the statistics were last COLLECTed or refreshed.
The number of unique values for the column or index.
The name of the column or column(s) that the statistics were COLLECTed on.
Use Date and Time to help you determine if your statistics need to be refreshed or DROPped. The
example on the right-hand page illustrates the HELP STATÌSTÌCS output for the employee table.
HLP ND is an SQL statement which returns information for every index in the specified table.
An example of this command and the resulting BTEQ output is shown on the right-hand page. As
you can see,
HELP ÌNDEX returns the following information:
Whether or not the index is unique
Whether the index is a PÌ or an SÌ
The name(s) of the column(s) which the index is based on
The Ìndex ÌD Number
The approximate number of distinct index values.
This information is very useful in reading EXPLAÌN output. Since the EXPLAÌN statement only
returns the Ìndex ÌD number, you can use the HELP ÌNDEX statement to determine the structure
of the index with that ÌD.
Aggregation and DSTNCT Summary
Aggregation is fuIIy paraIIeIized.
A#SA is an aIgorithm that outIines a method for performing
aggregation on a paraIIeI database.
÷ Aggregate IocaIIy
÷ #edistribute IocaI aggregation
÷ Sort redistributed aggregation
÷ Aggregate gIobaIIy
f vaIues are nearIy unique, DSTNCT may outperform a #OUP
because there is no IocaI aggregation.
Run EXPLAÌN during Application Development to:
÷ Determine access paths
÷ Show locking profiles
÷ Validate the usage of indexes
÷ Shows activities for triggers and join index access as well as DDL, DML, and DCL.
. . . (Last Use) . . . A spooI fiIe is no Ionger needed and wiII be reIeased after this step.
. . . with no residual conditions . . . AII appIicabIe conditions have been appIied to the
. . . END TRANSACTION . . .Transaction Iocks are reIeased and changes are committed.
. . . eliminating duplicate rows . . . Doing a DSTNCT operation. (Duplicate rows that
only exist in spool files.)
. . . by way of the sort key in spool field1 . . . FieId1 is created to aIIow a tag sort.
. . . we do a SMS . . . A Set ManipuIation Step caused by using a UNON,
MNUS, or NT#SCT operator.
. . . we do a BMSMS . . . A way of handIing two or more weakIy seIective secondary
indexes that are Iinked by AND in the H# cIause. (BMSMS = Bit Map Set
. . . which is redistributed by hash code to all AMPs . . .
#edistributing each of the data rows for a tabIe to a new AMP based on the hash vaIue
for the coIumn(s) invoIved in the join.
. . . which is duplicated on all AMPs . . .
Copying aII rows of a tabIe onto each AMP in preparation for a join. (e.g., a 'Product
. . . which is built locally on the AMPs . . .
ach 'proc buiIds a portion of a spooI fiIe from the data found on its IocaI disk space.
. . . by way of a traversal of index #n extracting row ids only . . .
A spooI fiIe is buiIt from the row-id vaIues found in a secondary index (index #n). These
row id's wiII be used Iater for extracting rows from a tabIe.
. . . Aggregate Intermediate Results are computed locally . . .
Aggregation requires no redistribution phase.
. . . Aggregate Intermediate Results are computed globally . . .
oth IocaI and gIobaI aggregation phases are performed.
. . . we lock a distinct <dbname>."pseudo table" . . .
A way of preventing global deadlocks specific to an MPP massively parallel processing")
DBS like the Teradata database.
. . . by way of a RowHash match scan . . .
A merge join on hash values.
. . . where unknown comparison will be ignored . . .
Comparisons involving NULL values will be ignored.
. . . e lock DBC.AccessRights for write on row hash . . .
A DDL request is being executed.
. . . low-end row(s). . .
Used by the #DMS to estimate the size of the spooI fiIe needed to
accommodate the data.
. . . low-end row(s) to high-end row(s). . .
The high-end gets propagated to subsequent steps but has no infIuence on
the choosing the pIan.
. . . Iow-end time . . .
Used in choosing the pIan (based on Iow-end row estimate, cost constants,
and cost functions).
. . . low-end time to high-end time . . .
The high-end has no infIuence on choosing the pIan.
MultiLoad does not have rollback capability.
With regard to Fallback writes, FastLoad performs them at the end of the task, while MultiLoad
and TPump do them during processing.
End-to-End Time to load may include any or all of the following:
W Receipt of source data
W Transformation and Cleansing
W Target table apply
W Fallback processing
W Permanent Journal processing
W Secondary Ìndex maintenance
W Statistics maintenance
Transformation and Cleansing of data is performed before the data is loaded onto Teradata. The
disadvantage of that is that this is a serial process. The more work you can perform on the
Teradata side, the more you can take advantage of parallelism.
Note: An advantage of TeraBuilder-based utilities is that you can use
Ìn general, if you have to perform statistical analysis of very large data sets, you should use
Teradata to do this.
Transformation and CIeansing
here to do it?
Consider the impact on Ioad time.
÷ here is aII the required data?
» Move Teradata data to client: "Export-Transform-Load"
» Move client data to Teradata: "Load-Transform" or
÷ Teradata side advantage: ParaIIeIism
÷ SimpIe transformations:
» Transformation pipe to load utility
÷ CompIex transformations:
» Transform on Teradata
÷ hen in doubt:
Fastest load rate
÷ Ìnserts only
÷ Empty target table required
÷ Table not accessible until load is complete
FastLoad ELAPSED SECONDS = cquisition Elapsed Seconds + pply Elapsed Seconds
After data acquisition begins, there is an access lock on data, and a writelock
on the table.
During acquisition, you can release MultiLoad and cancel the job. During the apply phase,
however, you cannot release MultiLoad.
MultiLoad is restartable unless you drop the restart log or error table. You can look at a worktable,
restart log, or error table, but you should use access locks on those tables when you look at them,
or you may crash MultiLoad.
÷ rite Lock enabIes dirty reads, but does not enabIe other tasks to
modify the tabIe for the duration of the MuItiLoad AppIy.
USs/#s/Join ndexes not aIIowed on target tabIe.
No #oIIback, but uses a Checkpoint /#estart.
÷ Acquisition: Checkpoints to your specification. (CIient-side)
÷ AppIy: Checkpoints every databIock. (Database-side)
MuItiLoad on NUP TabIes
W MultiLoad with a highly non-unique NUPÌ can reduce performance considerably.
÷ Some measurements show NUPÌ is 7 to 9 times slower than UPÌ.
÷ Multiset helps reduce this difference by eliminating duplicate row checking.
W But if NUPÌ improves locality of reference, NUPÌ MultiLoad can be faster than UPÌ
W NUPÌ MultiLoad with few (10 or less) rows/value performs like UPÌ MultiLoad.
Minimize acquisition time by loading multiple tables with 1 source. (&p to 5 target tables.)
Go for the highest hits per datablock ratio.
÷ Do one, not multiple, MultiLoads to a single table.
÷ Do less frequent MultiLoads.
÷ Load to smaller target tables:
» Active vs. archive table partitions
÷ Reduce your row size.
÷ Use large datablock sizes.
Consider drop and recreate secondary indexes.
Concurrent MultiLoads maximize throughput. (vs. response time)
÷ Two to three concurrent MultiLoads will saturate Teradata in apply phase.
Processes one update at a time÷not intended for higher MultiLoad rates.
Most compatible with mixed workloads and real time data availability goals.
÷ Minimal lock contention (row level write lock)
Checkpoint / Restartable (Client side)
Rollback (Teradata side)
Resource Governors can "slow down¨ the loads.
TPump Updates-FaIIback and JournaIing Costs
FaIIback: #educe throughput by 2.
LocaI JournaIing (efore, After): #educe throughput by 1.2.
After JournaIing: #educe throughput by 1.75.
Cost for changing US vaIue:
Change row sent to AMP owning US row.
AdditionaI CPU/US is 1.0 the CPU path of primary tabIe
÷ i.e.: f it takes 100 seconds to do the primary inserts/deIetes, it wiII take an
additionaI 100 seconds to update each US.
Cost for changing NUS vaIue (w/ 1 row/vaIue):
NUS change row appIied IocaIIy.
AdditionaI CPU/NUS is 0.55 the CPU path of primary tabIe
÷ i.e.: f it takes 100 seconds to do the primary inserts/deIetes, it wiII take an
additionaI 55 seconds to update each NUS.
Too few and you might not meet your window. Too many and you may impact current work. You
must also consider the host resources.
Ìf there is no fallback, indexes or journaling, then only a single step will be required to
perform the prime index modification statement. Ìf fallback is defined on the table, then an
additional step, or task, is required to accomplish the prime index modification statement.
Similarly, additional steps are required for journaling and index updates.
Ìf your goal is to maximize TPump throughput without hinderance from any other
competing workloads, the TPump client will need to drive into Teradata at least 30
concurrent tasks per node.
(Ìf your throughput goal is less than the maximum achievable from Teradata due to the need to
share resources with competing workloads, then you will not need that many concurrent tasks.)
There are lots of ways to achieve 30 concurrent tasks per node. .
When the table has 2 indexes, but no fallback or journalling, (ie: 3 tasks per statement)
you can:use 2 concurrent multi-statement request each with 5 statements
Ìn TPump terms, the PACK factor represents the number of statements in a multistatement
request. Sessions represent the number of concurrent requests.
For fuII tabIe updates:
No ndex Management is required if the index coIumns are not the
object of the update.
For fuII tabIe inserts and deIetes:
Secondary ndex modifications are done a row-at-a-time.
t's better to drop and recreate the index unIess the number of rows
to update is very small.(i.e.,: <= 0.1% of the rows being updated)
Teradata Join PIans (Or strategies)
W ROW HASH
W ROW ÌD
#eIationaI Join Types
Ìn a product Join, every qualifying row of one table is compared to every qualifying row in
the other table. Rows which match on WHERE conditions are saved.
The WHERE clause is missing.
W A Join condition is not based on equality (NOT =, LESS THAN, GREATER THAN).
W Join conditions are ORed together.
W There are too few Join conditions.
W A referenced table is not named in any Join condition.
W Table aliases are incorrectly used.
W The Optimizer determines that it is less expensive than the other Join types.
Merge Joins are commonly done when the Join condition is based on equality. They are generally
more efficient than product joins because the number of rows comparisons is smaller.
1. Ìdentify the Smaller Table.
2. Put the qualifying data from one or both tables into Spool (if necessary).
3. Move the Spool rows to the AMPs based on the Join Column hash (if necessary).
4. Sort the Spool rows by Join Column Row Hash value (if necessary).
5. Compare those rows with matching Join Column Row Hash values.
W Compare matching join column row hash values for the rows.
W Cause significantly fewer comparisons than a product join.
W Require rows to be on the same AMP to be joined.
Ìt is basically the same query as the previous example except that the join condition is now
Teradata copies the employee rows into Spool and redistributes them on employee.dept Row
Hash. The Merge Join then occurs with the rows to be Joined located on the same AMPs.
M1 occurs when one of the tables is already distributed on the Join Column Row Hash. The Join
Column is the PÌ of one, not both, of the tables.
Joining on very non-unique values (AMP 2) could cause "insufficient spool¨ errors. Collect
Statistics to help the parser avoid this situation.
DUPLÌCATE and SORT the Smaller Table on all AMPs.LOCALLY BUÌLD a copy of the Larger
Table and SORT.
M1 & M2 is selected when one UPÌ is used for join
M3 will occur when the Join Column is the Primary Ìndex of both tables.
Merge Join-Matching Primary ndexes
Note that there is no redistribution of rows or sorting which means that Merge Join Plan M3
is being used. (Remember that M3 occurs when the Join Column is the Primary Ìndex of both
tables.) No redistribution or sorting is needed since the rows are already on the proper AMPs and
in the proper order for Joining.
#ow Hash Join
Ìn a Row Hash Join, the smaller table is sorted into join column row hash sequence and then
duplicated on all AMPs. The larger table is then processed a row at a time. For those rows that
qualify for joining (WHERE), the Row Hash of the join column(s) is used to do a binary
search of the smaller table for a match.
The Parser can choose this join plan when the qualifying rows of the small table can be held AMP
Rows must be on the same AMP to be joined.
The UNÌON operator is used instead of the OR operator which means that two separate answer
sets are generated and then combined.
Join Strategies Summary
W The optimizer chooses the best join strategy based on available indexes and demographics.
W Most general form of join.
W Does not sort rows.
W Number of comparisons is product of number of rows in each table.
W Should be avoided if possible.
W Commonly done when join is based on equality.
W Require some preparation.
W Generally more efficient than a product join.
W Better performance results if the left table (in the explain) is the unique (smaller) table.
W Used for finding rows that don't have a match.
W Caused by NOT ÌN and EXCEPT.
W Prevent null join values to get a result other than NULL.
Row Hash Join
W Smaller table is sorted into join column row hash sequence and duplicated on all AMPs.
W Can be used when rows of smaller table can be held in AMP memory.
W COLLECT STATÌSTÌCS on both tables guides the parser.
W Very efficient.
W Doesn't always use all AMPs.
W Requires equality value for UPÌ or USÌ on Table1 and join on a column of that single row to any
index on Table2.
Cartesian Product Join
W Unconstrained product join.
W Consumes significant system resources.
Unique constraints are implemented in the Teradata database as Unique Secondary Ìndexes
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.