Page 1 of 137

Mod 1 - Welcome to the Teradata Database Objectives After completing this module, you should be able to: Describe the Teradata Database. Describe the advantages of the Teradata Database. Define the terms associated with relational databases. Describe the advantages of a relational database. HOT TIP: This module contains links to important supplemental course information. Please be sure to click on each hotword link to capture all of the training content.

What is the Teradata Database?

The Teradata Database is a relational database management system (RDBMS) that drives a company's data warehouse. The Teradata Database provides the foundation to give a company the power to grow, to compete in today's dynamic marketplace, and to evolve the business by getting answers to a new generation of questions. The Teradata Database's scalability allows the system to grow as the business grows, from gigabytes to terabytes and beyond. The Teradata Database's unique technology has been proven at customer sites across industries and around the world. The Teradata Database is an open system, compliant with industry ANSI standards. It is currently available on these industry standard operating systems, UNIX MP-RAS (Discontinued with Teradata 13.10), Microsoft Windows 2000, Microsoft Windows 2003 and Novell SUSE Linux operating systems. For this reason, Teradata is considered an open architecture. The Teradata Database is a large database server that accommodates multiple client applications making inquiries against it concurrently. Various client platforms access the database through a TCP-IP connection or across an IBM mainframe channel connection. The Teradata Database is accessed using SQL (Structured Query Language), the industry standard access language for communicating with an RDBMS. The ability to manage large amounts of data is accomplished using the concept of parallelism, wherein many individual processors perform smaller tasks concurrently to accomplish an operation against a huge repository of data. To date, only parallel architectures can handle databases of this size.

How Is The Teradata Database Used?

file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt.htm

10/20/2011

Page 2 of 137

Each Teradata Database implementation can model a company's business. The ability to keep up with rapid changes in today's business environment makes the Teradata Database an ideal foundation for many applications, including: Enterprise data warehousing Active data warehousing Customer relationship management Internet and E-Business Data marts

Just for Fun . . .
Based on what you know so far, what do you think are some Teradata Database features that make it so successful in today's business environment? (Details on the following are coming up next.) A. Scalability. B. Single data store. C. High degree of parallelism. D. Ability to model the business. E. All of the above.

Feedback:

That's correct! Teradata has all these features.

What Makes the Teradata Database Unique? In this Web-Based Training, you will learn about many features that make the Teradata Database, an RDBMS, right for business-critical applications. To start with, this section covers these key features: Single data store Scalability Unconditional parallelism (parallel architecture) Ability to model the business Mature, parallel-aware Optimizer

Single Data Store

file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt.htm

10/20/2011

Page 3 of 137

The Teradata Database acts as a single data store, with multiple client applications making inquiries against it concurrently. Instead of replicating a database for different purposes, with the Teradata Database you store the data once and use it for many applications. The Teradata Database provides the same connectivity for an entry-level system as it does for a massive enterprise data warehouse.

Scalability

"Linear scalability" means that as you add components to the system, the performance increase is linear. Adding components allows the system to accommodate increased workload without decreased throughput. Linear scalability enables the system to grow to support more users/data/queries/complexity of queries without experiencing performance degradation. As the configuration grows, performance increase is linear, slope of 1. The Teradata Database was the first commercial database system to scale to and support a trillion bytes of data. The chart below lists the meaning of the prefixes: Prefix Exponent Meaning kilomegagigatera103 106 109 1012 1,000 (thousand) 1,000,000 (million) 1,000,000,000 (billion) 1,000,000,000,000 (trillion)

file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt.htm

10/20/2011

Symmetric multiprocessing platforms manage gigabytes of data to support an entry-level data warehousing system.000.000. You can start with a couple of nodes. The Teradata Database runs on highly optimized Teradata servers in the following configurations: SMP . without manual interventions such as sorting. and this extends to data loading with the use of parallel loading utilities. The Teradata Database is the only database that is predictably scalable in multiple dimensions.000.htm 10/20/2011 . The Teradata Database efficiently processes increasingly sophisticated business questions as users realize the value of the answers they are getting. Data model .000. The Teradata Database is scalable in multiple ways.000. MPP . However.000. query complexity. An MPP Teradata Database system easily accommodates that growth whenever it happens. Complexity The Teradata Database is adept at complex data models that satisfy the information needs throughout an enterprise. Concurrent Users As is proven in every Teradata Database benchmark. The Teradata Database's scalability provides investment protection for customer's growth and application development. Applications . The Teradata Database has the proven ability to handle from hundreds to thousands of users on the system simultaneously. the data is automatically redistributed through the reconfiguration process. With the Teradata Database. who are often running multiple.000. Platforms . including hardware.Applications you develop for Teradata Database configurations will continue to work as the system grows.000. the Teradata Database can handle the most concurrent users. Adding many concurrent users typically reduces system performance.Page 4 of 137 petaexa- 1015 1018 1. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt.The Teradata Database's modular structure allows you to add components to your existing system. protecting your investment in application development.The physical and logical data models remain the same regardless of data volume.000 (quadrillion) 1.Massively parallel processing systems can manage hundreds of terabytes of data.When you expand your system.000 (quintillion) The Teradata Database can scale from 100 gigabytes to over 100 terabytes of data on a single system without losing any performance capability. and later expand the system as your business grows. adding more components can enable the system to accommodate the new users with equal or even better performance. you can increase the size of your system without replacing: Databases .000. and number of concurrent users. unloading and reloading. Hardware Growth is a fundamental goal of business. The Teradata Database provides automatic data distribution and no reorganizations of data are needed. or partitioning. It has the ability to perform large aggregations during query run time and can perform up to 64 joins in a single query. complex queries.

" file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Parallelism uses multiple processors working together to accomplish a task quickly. The Teradata Database processes requests in parallel without mandatory query tuning. parallel lines. the parallel loading of the rides becomes essential to their successful operation.Page 5 of 137 Unconditional Parallelism The Teradata Database provides exceptional performance using parallelism to achieve a single answer faster than a non-parallel system. As the line approaches the boarding platform. groups of people can step into their seats simultaneously. This allows Teradata to interface with 3rd party Business Intelligence (BI) tools and submit queries from other database systems. as guests stand in line for an attraction such as a roller coaster. yet these varying perspectives have a common basis for a "single view of the business. column range constraints. The Teradata Database's parallelism does not depend on limited data quantity. Individual departments can use their own assumptions and views of the data for analysis. That way. or specialized data models -The Teradata Database has "unconditional parallelism. Parallelism is evident throughout a Teradata Database." Teradata supports ad-hoc queries using ANSI-standard SQL. At the biggest amusement parks. The line moves faster than if the guests step onto the attraction one at a time. An example of parallelism can be seen at an amusement park. it typically will split into multiple. Ability to Model the Business A data warehouse built on a business model contains information from across the enterprise. from the architecture to data loading to complex request processing. and includes SQL-ready database management information (log files).htm 10/20/2011 .

In a functional model. logical architecture. to delivery. A key Teradata Database strength is its ability to model the customer's business. Generally this process involves removing redundant attributes. or because their architecture limits them to that type of model. You get consistent answers from the different viewpoints above using a single business model. to customer satisfaction. Track products throughout the supply chain. Vary levels of service based on a customer's profitability. meaning that it has knowledge of system components (how many nodes. but Third Normal Form is the method for relational modeling that we recommend to customers. Mature. with data organized according to what it represents. The data model should be designed without regard to usage and be the same regardless of data volume. not how it is accessed. so it is easy to understand. Determine if a customer on the phone has used the company's website. The Optimizer is further explained in the next module. It determines the least expensive plan (time-wise) to process queries fast and in parallel. A Teradata Database allows the data to represent a business model.Page 6 of 137 With the Teradata Database's centrally located. to sale. etc. The Teradata Database can support star schema and other types of relational modeling. to maintenance. vprocs. keys. stable one. data is organized according to what is done with it.htm 10/20/2011 . able to handle: Multiple complex queries Multiple joins per query Unlimited ad-hoc processing The Optimizer is parallel-aware. The Teradata Database supports business models that are truly normalized. from initial manufacture. But what happens if users later want to do some analysis that has never been done before? When a system is optimized for one department's function. to inventory. Normalization is the process of reducing a complex data structure into a simple. and relationships from the conceptual data model. companies can get a cohesive view of their operations across functional areas to: Find out which divisions share customers. Analyze relationships between results of different departments.). users can ask new questions of the data that were never anticipated. not functional models for different departments. the other departments' needs (and future needs) may not be met. Parallel-Aware Optimizer The Teradata Database Optimizer is the most robust in the industry. Our competitors typically implement star schema or snowflake models either because they are implementing a set of known queries in a transaction processing environment. avoiding the costly star schema and snowflake implementations that many other database vendors use. With a Teradata Database as the enterprise data warehouse. throughout the business cycle and even through changes in the business environment. The Teradata Database supports normalized logical models because it is able to perform 64 table joins and large aggregations during queries. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt.

Managed (the data integrity and value are maintained). An entity is a person. Be extremely flexible in the way that data can be selected and used. a valid table does not have to be populated with data rows. Be easy to understand Model the business. it just needs to be defined with at least one column. Relational databases are based on the relational model. the entity is the employee and each row represents a single employee. not the applications Allow businesses to quickly respond to changing conditions In addition. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Rows Each row contains all the columns in the table. Users and applications access data in an RDBMS using industrystandard SQL statements. Each row represents an occurrence of an entity defined by the table. and each table can contain only one row format. place.Page 7 of 137 What is a Relational Database? A database is a collection of permanently stored data that is: Logically related (the data was created for a specific purpose). According to the relational model. Shared (many users may access the data). A relational database is designed to: Represent a business and its business practices. Protected (access to the data is controlled). or significance. hierarchy. which is founded on mathematical Set Theory. A table is a two-dimensional representation of data that consists of rows and columns. The order of rows is arbitrary and does not imply priority.htm 10/20/2011 . In this example. It is a single entity in the table. or event about which the table contains information. The Teradata Database is a relational database. SQL is a set-oriented language for relational database management. thing. a single copy of the data can serve multiple purposes Relational databases present data as of a set of tables. A row is one instance of all columns. The relational model uses and extends many principles of Set Theory to provide a disciplined approach to data management.

Tables are logically related to each other by a common field. Relational databases do not use access paths to locate data." such as only part names. In the example below. The data in the columns is atomic data. yet be accessible for multiple purposes. Within a table. or only supplier names. the Last_Name column contains last names only. so a telephone number might be divided into three columns: the area code. so information such as customer telephone numbers and addresses can exist in one table. data connections are made by data values. Answering Questions with a Relational Database A relational database is a set of logically related tables. and nothing else. and the suffix. or only employee numbers. Missing data values would be represented by "nulls" (the absence of a value). etc. the prefix. so the customer data can be analyzed according to area code. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. this connection is referred to as a join. the column position is arbitrary.htm 10/20/2011 .Page 8 of 137 Columns Each column contains "like data. In relational terminology. Data connections are made by matching values in one column with the values in a corresponding column in another table.

Page 9 of 137 The diagrams below show how the values in one table may be matched to values in another table. and billing statement data. stable one. Here are a few other examples of questions that can be answered: "How many mats did customer Wood purchase?" "What is the statement number for O'Day's purchase of $45. and relationships from the conceptual data model. An enterprise model is one that provides the ability to look across functional processes. Logical/Relational Modeling The logical model should be independent of usage. keys. The design of the data model is the same regardless of data volume. The tables below shows customer. By file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Customer ID. The data contained in the tables can be associated using columns with matching data values. This is done by performing a join between the tables using the common field. we say that relation is “in normal form. Generally this process involves removing redundant attributes. The common field of Customer ID lets you look up information such as a customer name for a particular statement number. Customer ID.30?" "For statement #344627. a relational database is a collection of tables. what state did the customer live in?" To sum up. Normalization theory is constructed around the concept of normal forms that define a system of constraints. A variety of front-end tools can be accommodated so that the database can be created quickly. even though the data exists in two different tables. Normalization is the process of reducing a complex data structure into a simple." The intent of normalizing a relational database is to put one fact in one place.htm 10/20/2011 . order. related by a common field. If a relation meets the constraints of a particular normal form.

Star Schema Model As a model is refined. it passes through different states which can be referred to as normal forms. No repeating groups are allowed within entities. Some characteristics of a Star Schema model include: They tend to have fewer entities They advocate a greater level of denormalization file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. A slightly more detailed statement of this principle would be the definition of a relation (or table) in a normalized relational database: A relation consists of a primary key. While the Teradata Database can support any data model that can be processed via SQL. The star schema is considered an important special case of the snowflake schema. Common uses for the tuple as a data type are: 1. which uniquely identifies any tuple. Third normal form requires that the entity must conform to the first and second normal form rules. A tuple is an ordered set of values. In addition. Every non-key attribute within an entity is fully dependent upon the entire key (key attributes) of the entity. The star schema consists of a few fact tables (possibly only one.htm 10/20/2011 . justifying the name) referencing any number of dimension tables. not a subset of the key. and zero or more additional attributes. each of which represents a single-valued (atomic) property of the entity type identified by the primary key. The separator for each value is often a comma.Page 10 of 137 decomposing your relations into normalized forms. no non-key attributes within an entity is functionally dependent upon another non-key attribute within the same entity. Second normal form requires that the entity must conform to the first normal form rules. Star Schema The star schema (sometimes referenced as star join schema) is the simplest style of data warehouse schema. you can eliminate the majority of update anomalies that can occur when data is stored in de-normalized tables. For passing a string of parameters from one program to another 2. Representing a set of value attributes in a relational database 3NF vs. an advantages of a normalized data model is the ability to support previously unknown (ad-hoc) questions. A normalized model includes:: Entities Attributes Relationships First normal form rules state that each and every attribute within an entity instance has one and only one value.

In the example below. Rule 6: A Primary Key may be any number of columns. Primary Key Rules Rules governing how Primary Keys must be defined and how they function are: Rule 1: A Primary Key is required. Rule 3: The Primary Key value cannot be NULL. but even then. In any given row. A Primary Key can be composed of one or more columns. Rule 4: The Primary Key value should not be changed. Rule 2: A Primary Key value must be unique. The Primary Key may span more than one column.Page 11 of 137 Primary Key In the relational model. Rule 1: A Primary Key is Required In the logical model. there is only one Primary Key. Each table must have one. and only one. the value of the Primary Key uniquely identifies the row. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt.htm 10/20/2011 . each table requires a Primary Key because that is how each row is able to be uniquely identified. a Primary Key (PK) is used to designate a unique identifier for each row when you design a table. Primary Key. Rule 5: The Primary Key column should not be changed. the Primary Key is the employee number.

Page 12 of 137 Rule 2: Unique PK Within the column(s) designated as the Primary Key. each row must have a Primary Key value and cannot be NULL (without a value). The Primary Key's purpose is to uniquely identify a row. No duplicate values are allowed. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. the combined value of the columns must be unique.htm 10/20/2011 . Because NULL is indeterminate. In a multi-column Primary Key. it cannot "identify" anything. the values in each row must be unique. even if an individual column in the Primary Key has duplicate values. Rule 3: PK Cannot Be NULL Within the Primary Key column.

If you changed a Primary Key. the column(s) designated as the Primary Key should not be changed.Page 13 of 137 Rule 4: PK Value Should Not Change Primary Key values should not be changed. Rule 5: PK Column Should Not Change Additionally.htm 10/20/2011 . file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. you would lose all historical tracking of that row. If you changed a Primary Key. you would lose all the information relating that table to other tables.

there is no limit to the number of columns that can be designated as the Primary Key. Each Foreign Key references a matching Primary Key in another table in the database. Foreign Key A Foreign Key (FK) is an identifier that links related tables. so it may consist of one or more columns.Page 14 of 137 Rule 6: No Column Limit In the relational model. A Foreign Key defines how two tables are related to each other. and FIRST NAME. the Primary Key consists of three columns: EMPLOYEE NUMBER. in the table below. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. the Department Number column that is a Foreign Key actually exists in another table as a Primary Key. For example. In the example below.htm 10/20/2011 . LAST NAME.

In the example table below: The Department Number Foreign Key relates to the Department Number Primary file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. In fact.Page 15 of 137 Having tables related to each other gives users the flexibility to look at the data in different ways. Rule 3: The Foreign Key value may be NULL. Rule 5: A Foreign Key may be any number of columns. Tables that do have them can have multiple Foreign Keys because a table can relate to many other tables.htm 10/20/2011 . a table can have an unlimited number of foreign keys. Rule 1: Optional FKs Foreign Keys are optional. Rule 4: The Foreign Key value may be changed. not all tables have them. without the database administrator having to manage and maintain many tables of duplicate data for different applications. Foreign Key Rules Rules governing how Foreign Keys must be defined and how they operate are: Rule 1: Foreign Keys are optional. Rule 6: Each Foreign Key must exist as a Primary Key in a related table. Rule 2: A Foreign Key value may be non-unique.

Page 16 of 137 Key in the Department table. More than one employee could be assigned to the same department. Rule 2: Unique or Non-Unique FKs Duplicate Foreign Key values are allowed. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Having tables related to each other makes a relational database flexible so that different users can look up information they need. The Job Code FK relates to the Job Code PK in the Job Code table. while simplifying the database administration so the data doesn't have to be duplicated for each purpose or application.htm 10/20/2011 .

For example. if Arnando Villegas moves from Department 403 to Department 587.Page 17 of 137 Rule 3: FKs Can Be NULL NULL (missing) Foreign Key values are allowed. For example. A multi-column foreign key is used to relate to a multi-column Primary Key in a related table. the Foreign Key value in his row would change. In the relational model. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. an employee might not be assigned to a department. under special circumstances. Rule 4: FK Value Can Change Foreign Key values may be changed.htm 10/20/2011 . Rule 5: FK Has No Column Limit The Foreign Key may consist of one or more columns. there is no limit to the number of columns that can be designated as a Foreign Key.

Remember. A department number that does not exist in the Department Table would be invalid as a Foreign Key value in the Employee Table. . it must match a Primary Key value in the related table. Just for Fun . This rule can apply even if the Foreign Key is NULL. a missing value is defined as a non-value.Page 18 of 137 Rule 6: FK Must Be PK in Related Table Each Foreign Key must exist as a Primary Key in a related table. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. .htm 10/20/2011 . there is no value present. or missing. So the rule could be better stated: if a value exists in the Foreign Key column.

B.1 Choose the best answer from the pull-down menu: A A A contains "like data. No foreign keys. Teradata is an ideal foundation for customer relationship management. Foreign Keys have no relationship to existing Primary Key selections. e-commerce. complete this sentence.htm 10/20/2011 . C.2 Which statement is true? A. or Foreign Key. D. and only one. click How is the Teradata Database Used?. What is a Relational Database?. a single table can have either: (Choose two. Primary Key.Page 19 of 137 To check your understanding of Primary Keys and Foreign Keys. Exercise 1.3 Create a relationship between the two tables by clicking on: file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. is one instance of all columns in a table. column. click Rows or Columns. D. Exercise 1. Feedback: Show Answers Reset To review these topics. A database is a two-dimensional array of rows and columns. Multiple primary keys. Multiple foreign keys. No primary keys. Feedback: To review these topics." can contain only one row format. C. According to the relational model. and active data warehousing applications.) A. B. Feedback: Check Answer Show Answer Exercise 1. A Primary Key must contain one.

4 Click on the name of the customer who placed order 7324.Page 20 of 137 The Foreign Key column in the Product table The Primary Key column in the Vendor table Feedback: To review these topics. click Foreign Key or Primary Key. Exercise 1.5 How many calendars were shipped on 4/15? (These same tables were used in the previous file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Exercise 1.htm 10/20/2011 . click Primary Key or Foreign Key. Feedback: To review this topic.

Scalability. E. functional data store for their own assumptions and analysis.6 Which one is NOT a unique feature of the Teradata Database? A.) A. C. Gives each department in the enterprise a self-contained. Provides a mature. parallel-aware Optimizer that chooses the least expensive plan for the SQL request. Exercise 1. 2 C. Ability to Model the Business. Exercise 1. Unconditional Parallelism. B. Ability to model the business. 30 Feedback: To review this topic. Feedback: To review these topics. click Primary Key or Foreign Key. with data organized according to what it represents. and Mature. Parallel-Aware Optimizer. 10 B.htm 10/20/2011 . D. so there is no performance degradation as you grow the system. Provides linear scalability.Page 21 of 137 exercise. click Single Data Store. 40 D. Provides automatic and even data distribution for faster query processing via its unconditional parallel architecture.7 True or False: The logical model should be independent of file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt.

False Feedback: To review this topic. List and define the different types of data marts. Describe the overall Teradata Database parallel architecture. Evolution to Active Data Warehousing Data Warehouse Usage Evolution There is an information evolution happening in the data warehouse environment today. what you need from the data warehouse evolves too. As your company evolves in its use of the data warehouse. and a data mart. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Please be sure to click on each hotword link to capture all of the training content. Explain how the architecture helps to maintain high availability and reliability for Teradata Database users. business-critical components of the enterprise. click Logical/Relational Modeling Mod 2 . Explain the advantages of detail data over summary data. Changing business requirements have placed demands on data warehousing technology to do more things faster.Teradata Database and Data Warehouse Architecture Objectives After completing this module. Define a data warehouse. A. you should be able to: Identify the different types of enterprise data processing. List and describe major Teradata Database hardware and software components and their functions. True B. active data warehouse.htm 10/20/2011 . Data warehouses have moved from back room strategic decision support systems to operational. HOT TIP: This module contains links to important supplemental course information.Page 22 of 137 usage.

This stage requires data mining tools and building predictive models using historical detail. Example: determine the best offer for a specific customer based on a real-time event. Questions are usually known in advance. Stage 4 focuses on tactical decision support. such as a significant ATM deposit. such as why sales went down or discovering patterns in customer buying habits. Stage 2 Analyzing: Focuses on why something happened. Tactical decision support is not focused on developing corporate strategy. Interactive customer relationship management (CRM) on a web site or at an ATM is about making decisions to optimize the customer relationship through individualized product offers. more and more decisions become executed with event-driven triggers to initiate fully automated decision processes. users can model customer demographics for target marketing. but rather on supporting the people in the field who execute it. You can automate decision-making when a customer interacts with a web site. Stage 5 Active Data Warehousing: The larger the role an ADW plays in the operational aspects of decision support. Users perform ad-hoc analysis. pricing. slicing and dicing the data at a detail level.htm 10/20/2011 .Page 23 of 137 Stage 1 Reporting: The initial stage typically focuses on reporting from a single view of the business to drive decision-making across functional and/or product boundaries. Examples: Inventory management with just-in-time replenishment. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Scheduling and routing for package delivery. such as a weekly sales report. As technology evolves. As an example. Stages 1 to 3 focus on strategic decision-making within an organization. and questions are not known in advance. content delivery and so on. Stage 3 Predicting: Analysts utilize the system to leverage information to predict what will happen next in the business to proactively manage the organization's strategy. the more incentive the business has to automate the decision processes. Stage 4 Operationalizing: Providing access to information for immediate decisionmaking in the field enters the realm of active data warehousing. Altering a campaign based on current results.

and often immediate. Intelligence .Page 24 of 137 Active Enterprise Intelligence Active Enterprise Intelligence is the seamless integration of the ADW into the customer’s existing business and technical architectures. it enables the linkage and alignment of operational systems. and enables new operational users. actions. Active Enterprise Intelligence (AEI) is a business strategy for providing strategic and operational intelligence to back office and front line users from a single enterprise data warehouse. The technology that enables that business value is the Teradata Active Data Warehouse (ADW). The Active Enterprise Intelligence environment: Active . Enterprise . business processes and people with corporate goals so companies may execute on their strategies.Provides a single view of the business. services. Most importantly. across appropriate business functions. and capable of driving better. processes. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. faster decisions that drive intelligent. features.Is responsive. and applications. ADW is an extension of our existing Enterprise Data Warehouse (EDW).Supports traditional strategic users and new operational users of the Enterprise Data Warehouse. The Teradata ADW is a combination of products. and business partnerships that support the Active Enterprise Intelligence business strategy.htm 10/20/2011 . agile.

analytical queries in order to guarantee tactical work throughput.htm 10/20/2011 . It is only from that source that the additional traits needed by the active data warehouse can evolve. Businesses today want more than just strategic insight from their data warehouse implementations . Restraints may need to be placed on the longer. large multi-user workloads. Data warehousing requirements have evolved to demand a decision capability that is not just oriented toward corporate staff and upper management. but actionable on a day-to-day basis. as well as continuous updating of information so data is always fresh and accurate. Data volumes and user concurrency levels may explode upward beyond expectation. background data feeds. today's data. The Teradata Database is positioned exceptionally well for stepping up to the challenges related to high availability. The origin of the active data warehouse is the timely. requests are handled as transactions. And for both strategic and tactical decisions to be useful to the business. For either. even this minute's data has to be available. but when executed properly. integrated store of detail data available for analytic business decision-making. but expands to take on short. Tactical decisions are the drivers for day-to-day management of the business. data processing has been divided into two categories: on-line transaction processing (OLTP) and decision support systems (DSS). These new "active" traits are supplemental to data warehouse functionality. oneto-one marketing. The Teradata Database technology supports evolving business requirements by providing high performance and scalability for: Mixed workloads (both tactical and strategic queries) for mission critical applications Large amounts of detail data Concurrent users The Teradata Database provides 7x24 availability and reliability. such as a request to update an account. and possibly event-driven updates all at the same time. Decisions such as when to replenish Barbie dolls at a particular retail outlet may not be strategic at the level of customer segmentation or long-term pricing strategies. such as operational data store (ODS) level information. A transaction is a logical unit of work. We refer to this capability as "tactical" decision support.they want better execution in running the business through more effective use of information for the decisions that get made thousands of times per day. the work mix in the database still includes complex decision support queries. While accessing the detail data directly remains an important opportunity for analytical work. tactical work may thrive on shortcuts and summaries. and minute-to-minute decision-making. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. they make a big difference to the bottom line. this hour's data.Page 25 of 137 Active Data Warehouse Data warehouses are beginning to take on mission-critical roles supporting CRM. tactical queries. and handling complex queries that are required for an active data warehouse implementation. For example. Evolution of Data Processing Traditionally.

Very little I/O processing is required to complete the transaction. insert. DSS systems often process huge volumes of detail data. perform queries as the need arises). Data Mining Data Mining (predictive modeling) involves analyzing moderate to large amounts of file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Once our card is validated. response time can be in seconds or minutes. have evolved. and delete rows in the data tables. and make projections. These types of questions are essential for long range. We expect these transactions to be performed quickly. A database used as a decision support system (DSS) usually receives fewer. OLTP is typified by a small number of rows (or records) or a few of many possible tables being accessed in a matter of seconds or less. products. This type of transaction also takes place when we deposit money into a checking account and the balance gets updated. the user may be looking for historical trends. strategic planning. They must occur in real time. processed and analyzed. ad-hoc queries and may involve numerous tables. With OLAP. Decision support systems include batch reports. sales teams. and over time. OLAP tools from companies like Microstrategy and Cognos provide an easy to use Graphical User Interface to allow “slice and dice” analysis along multiple dimensions (for example. an on-line transaction processing (OLTP) environment typically has users accessing current data to update.e. and predictive what-if type queries that are often complex and unpredictable in their processing. which roll-up numbers to give business the big picture. this involves a lot of detail data to be retrieved. pre-written scripts users now require the ability to perform ad hoc queries (i. inventories. On-line Transaction Processing (OLTP) Unlike the DSS environment. sales rankings or seasonal inventory fluctuations for the entire corporation.Page 26 of 137 An RDBMS is used in the following main processing environments: DSS OLTP OLAP Data Mining Decision Support Systems (DSS) In a decision support environment. Usually. reveal trends. The results are used to establish strategies. analysis. users submit requests to analyze historical detail data stored in the tables.htm 10/20/2011 . This type of transaction takes place when we take out money at an ATM. On-line Analytical Processing (OLAP) OLAP is a modern form of analytic processing within a DSS environment. very complex. locations. Instead of routine.). etc. Therefore. a debit transaction takes place against our current balance to reflect the amount of cash withdrawn.

customers are scored). If it only told you the total amount of deposits and withdrawals. peaks and valleys are leveled when the peaks fall at the end of a reporting period and are cut in half. Decision support -. would you be able to tell if a certain check had cleared? To answer that question you need a list of every check received by your bank. most business decisions were based on summary data. switching to a competitor. Think of your monthly bank statement that records checking account activity. decision-makers must have four things: The right data Enough detail data Proper data structure Enough computer power to access and produce reports on the data Consider your own business and how it uses data. Phase 2: The model is then applied against current detail data of customers (that is. Here's another example. There are two phases to data mining. Is that data detailed or summarized? file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt.htm 10/20/2011 .Page 27 of 137 detailed historical data to detect behavioral patterns (for example. With summarized data. You need detail data. To answer business questions. Scores can indicate a customer's likelihood of purchasing a product. Phase 1: An “analytic model” is built from historical data incorporating the detected behavior patterns (takes minutes to hours). buying. that are then used to predict future behavior. or fraud patterns). attrition.is the real purpose of databases. or being fraudulent.answering business questions -. to predict likely outcomes (takes seconds or less). The problem is that summarized data is not as useful as detail data and cannot answer some questions with accuracy. Advantages of Using Detail Data Until recently.

are there questions it cannot answer? Check Your Understanding Which type of data processing supports answering this type of question. OLTP B.Page 28 of 137 If it's summarized. set processing takes its sets at once. Then the next row is fetched and processed. Instead of processing row-by-row sequentially. Row-by-Row Processing Row-by-row processing is where there are many rows to process.htm 10/20/2011 . Both can be processed with a single command. which is what relational databases do best. "How many women's dresses did our store sell in December of last year?" A. Set Processing Both cursor and set processing define set(s) of rows of the data to process. but. you can process relational data set-by- file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. while a cursor processes the rows sequentially. OLAP D. A benefit of row processing is that there is less lock contention. This is row-by-row processing and it makes for a slow program. Data Mining C. DSS Feedback: Row vs. Set Processing A lot of data processing is set processing. one row is fetched at a time and all calculations are done on it. then it is updated or inserted.

The latter is referred to as throughput. Throughput When determining how fast something is. access time. Throughput A throughput measure is an amount of something per unit time. a single SQL statement completely processes all rows that meet the condition as a set. Response Time This speed measure is specified by an elapsed time from the initiation of some activity until its completion. there are two kinds of measures. or execution time depending on the context. this can be 10 to 30 or more times faster than row-at-atime processing. For storage systems or networks throughput is measured as bytes or bits per unit time. Some good uses of SET processing include: An update with all AMPs involved Single session processing which takes advantage of parallel processing Efficient updates of large amounts of data Response Time vs. without a cursor. the number of instructions executed per unit time is an important component of performance. The phrase response time is often used in operating systems contexts. For example. With sufficient rows to process. You can measure how long it takes to do something or you can measure how much gets done per unit time. For operating systems throughput is often measured as tasks or transactions per unit time. transmission time. The former is referred to as response time.htm 10/20/2011 . For processors.Page 29 of 137 set. What Does this Mean to Teradata? Throughput measures quantity of queries completed during a time interval Response Time measures the average duration of queries file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. to sum all payment rows with 100 or less balances.

It is a technique to properly assemble and manage data from various sources to answer business questions not previously possible or known. Data Warehousing is a process. A Data Warehouse has a centrally located logical architecture which minimizes data synchronization and provides a single view of the business. not a product. add nodes) Implement workload management to control resources Decrease the number of concurrent users The Data Warehouse A data warehouse is a central. While data warehouses may begin somewhat small in scope and purpose. Data warehouses have become more common in corporations where enterprise-wide detail data may be used in on-line analytical processing to make strategic and tactical business decisions. Warehouses often carry many years worth of detail data so that historical trends may be analyzed using the full power of the data. ( i.htm 10/20/2011 . file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. enterprise-wide database that contains information extracted from the operational systems..Page 30 of 137 a measure of the amount of work processed how many queries were processed the number of queries executed in an hour a measure of process completion how long that processing takes the elapsed time per query In order to improve both response time and throughput on a Teradata system. you can: Increase CPU power. Many data warehouses get their data directly from operational systems so that the data is timely and accurate. they often grow quite large as their utility becomes more fully exploited by the enterprise.e.

Data Models . In the data mart. it removes the need for massive data loading and transforming. Because in theory the data warehouse contains the detail data of the entire enterprise. An enterprise data model serves as a neutral data model that is normalized to address all business areas and not specific to any function or group. In addition. but rather are an existing part of the data warehouse. function or application. Independent Data Marts Independent data marts are created directly from operational systems. a logical view of the warehouse might provide the specific information for a given user community. Without the proper technology. Data might be aggregated. much as a physical data mart would. whereas an application model is built for a specific business area.Enterprise vs. the data is usually transformed as part of the load process. Application To build an EDW. as the requirements of the data mart dictate. an enterprise data model is more extensible than an application data model. Data marts may have both summary and detail data for a particular use rather than for general use. While having many of the advantages of the logical data mart. a logical data mart can be a slow and frustrating experience for end users. making a single data store available for all user needs. this approach still requires the movement and transformation of data but may provide a better vehicle for performancecritical user queries. Usually the data has been preaggregated or transformed in some way to better handle the particular type of requests of a specific user community.Page 31 of 137 Data Marts A data mart is a special purpose subset of enterprise data used by a particular department.htm 10/20/2011 . With the proper technology. just as is a data warehouse. The application data model only looks at one aspect of the business whereas an enterprise logical data model integrates all aspects of the business. dimensionalized or summarized historically. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Dependent Data Marts Dependent data marts are created from the detail data in the data warehouse. Logical Data Marts Logical data marts are not separate physical structures or a data load from a data warehouse. an enterprise data model should be leveraged.

The major drawback to logical data marts is the lack of physical control over the data. However. Some corporations start with several data marts before deciding to build a true data warehouse. A major problem with proliferating data marts is that. they are often not good at ad hoc. Because data marts are designed to handle specific types of queries from a specific type of user. but they take longer and are more expensive to implement. This approach has several inherent problems: While data marts have obvious value. Because data in the warehouse is not pre-aggregated or dimensionalized. or "what if" queries like a data warehouse is. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Because dependent marts use the warehouse as their foundation.Page 32 of 137 It is intended to encompass the entire enterprise. depending on where you look for answers. They provide a single view of the business. Dependent Data Marts Dependent data marts provide all advantages of a logical mart and also allow for physical control of the data as it is extracted from the data warehouse. they are not a true enterprise-wide solution and can become very costly over time as more and more are added. use of parallelism in the logical mart can overcome some of the limitations of the non-transformed data. there is often inconsistency. they are generally considered a better solution than independent marts. Data Mart Pros and Cons Independent Data Marts Independent data marts are usually the easiest and fastest to implement and their payback value can be almost immediate.htm 10/20/2011 . performance against the logical mart may not be as good as against an independent mart. They may not provide the historical depth of a true data warehouse. There is no historical limit to the data and "what if" querying is entirely feasible. Logical Data Marts Logical data marts overcome most of the limitations of independent data marts.

A node is a term for a processing unit under the control of a single operating system.htm 10/20/2011 .Page 33 of 137 A Teradata Database System A Teradata Database system contains one or more nodes.An SMP Teradata Database has a single node that contains multiple CPUs sharing a memory pool. The nodes are connected using the BYNET. you use: SMP system: System Console (keyboard and monitor) attached directly to the file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. The node is where the processing occurs for the Teradata Database.Multiple SMP nodes working together comprise a larger. There are two types of Teradata Database systems: Symmetric multiprocessing (SMP) . which allows multiple virtual processors on multiple nodes to communicate with each other. Massively parallel processing (MPP) . MPP implementation of a Teradata Database. To manage a Teradata Database system.

Hardware components are shown on the left side of the node and software components are shown on the right side. a user typically logs on through one of multiple client platforms (channel-attached mainframes or network-attached workstations).htm 10/20/2011 . Shared Nothing Architecture The Teradata Database virtual processors. Client access is discussed in the next module. Each AMP uses system resources independently of the other AMPs so they can all work in parallel for high system performance overall. For a description. A conceptual diagram of a node and its major components is shown below. The main component of the "shared-nothing" architecture is that each AMP manages its own dedicated portion of the system's disk space (called the vdisk) and this space is not shared with other AMPs. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. click on each component. or vprocs (which are the PEs and AMPs). and contains a large number of hardware and software components.Page 34 of 137 SMP node MPP system: Administration Workstation (AWS) To access a Teradata Database system. share the components of the nodes (memory and cpu). Node Components A node is the basic building block of a Teradata Database system.

It will allow you to store data that is accessed more frequently ("hot data") on faster devices and data that is accessed less frequently ("cold data") on slower devices and it can automatically migrate the data based on access frequency. The purpose is to manage a multi-temperature warehouse. speeds. Teradata Virtual Storage pools all of the cylinders within a clique's disk space and allocates cylinders from this storage pool to individual AMPs. is a change to the way in which Teradata accesses storage. Teradata Virtual Storage enables the mixing of drive sizes. Feedback: Teradata Virtual Storage What is Teradata Virtual Storage? Teradata Virtual Storage. The host channel adapter card connects to "bus and tag" cables through a Teradata Gateway. Teradata Virtual Storage is responsible for: Pooling clique storage and allocating cylinders from the storage pool to individual AMPs Tracking where data is stored on the physical media Maintaining statistics on the frequency of data access and on the performance of physical storage media Migrating frequently used data (“hot data”) to fast disks and data used less frequently (“cold data”) to slower disks. An Ethernet card is a hardware component used in the connection between a network-attached client and the node. adding drives does not require adding AMPs. You can add storage to the clique-storage-pool versus to every AMP which allows sharing of storage devices among AMPs. Since storage is pooled and shared by the AMPs.Page 35 of 137 Check Your Understanding Which of the following statements is true? PDE is an application that runs on the Teradata Database software. and technologies so you can "mix" storage devices. Benefits and Key Concepts file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. AMPs manage system disks on the node.htm 10/20/2011 . Teradata Virtual Storage is designed to allow the Teradata Database to make use of new storage technologies such as adding fast Solid State Disks (SSDs) to an existing system with a different disk technology/speed/capacity.00. introduced with Teradata 13.

This can minimize system down time when evacuations are necessary. This diagram illustrates the conceptual differences with and without Teradata Virtual Storage. as needs and opportunities arise. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Teradata Virtual Storage can move (“migrate”) storage cylinders to faster or slower physical media within each clique. the extra storage will be shared. AMPs don't know the physical location of a cylinder and it can change. This allows the Teradata Virtual Storage product to intelligently place more frequently accessed data on faster physical storage. All of the cylinders in a clique are effectively in a pool that is managed by the TVS vproc. Lower Barriers to System Growth Device management features of Teradata Virtual Storage provide the ability to pool storage within each clique.Page 36 of 137 Teradata Virtual Storage provides the following benefits: Storage Optimization. Pre-Teradata Virtual Storage After Teradata Virtual Storage Cylinders were addressed by drive # and cylinder #. but Teradata Virtual Storage supports a “soft evacuation” feature that allows much of the data to be moved while the system remains online. Consequently. if necessary.” Complete data evacuation requires a system restart. storage can be added to the system in smaller increments. Teradata Virtual Storage can migrate data away from a physical storage device in order to prepare for removal or replacement of the device. As data access patterns change. Cylinders are assigned a unique cylinder ID (virtual ID) across all of the pdisks. Data Migration.htm 10/20/2011 . by all AMPs in the clique. and Data Evacuation Teradata Virtual Storage maintains statistics on frequency of data access (“data temperature”) and on the performance (“grade”) of physical media. This can improve system performance over time. Each storage device (pdisk) can be shared. If the number of storage devices is not a multiple of the number of AMPs in the clique. This process is called “evacuation.

it will automatically reconfigure that network so all messages avoid the unusable path. Using the BYNET The BYNET (pronounced.Page 37 of 137 With Teradata Virtual Storage you can easily add storage to an existing system. Because both networks in a system are active. BYNET Unique Features The BYNET has several unique features: Scalable: As you add more nodes to the system. High performance: An MPP system typically has two BYNET networks (BYNET 0 and BYNET 1). The BYNET handles the internal communication of the Teradata Database. Fault tolerant: Each network has multiple connection paths. Added drives are shared by all AMPs These new drives may have different capacities and / or performance than those drives which already reside in the system. in the rare case that BYNET 0 file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. the system benefits from having full use of the aggregate bandwidth of both the networks. If the BYNET detects an unusable path in either network.and sometimes even increase performance. "bye-net") is a high-speed interconnect (network) that enables multiple nodes in the system to communicate. Depending on the nature of the dispatch request. the communication between nodes may be to all nodes (Broadcast message) or to one specific node (point-to-point message) in the system. All communication between PEs and AMPs is done via the BYNET. This response information is also routed back to the requesting PE via the BYNET. they are dispatched onto the BYNET. Before Teradata Virtual Storage: Existing systems have integral number of drives / AMP Today adding storage requires an additional drive per AMP – means 50% or 100% increase in capacity With Teradata Virtual Storage: You can add any number of drives. This linear scalability means you can increase system size without performance penalty -.htm 10/20/2011 . The messages are routed to the appropriate AMP(s) where results sets and status information are generated. Additionally. When the PE dispatches the steps for the AMPs to perform. the overall network bandwidth scales linearly.

Page 38 of 137 cannot be reconfigured. (Note: You do not need to know this information for the certification exam.) A.) Just for Fun . A single vproc on a node B. For more information on communication between the vprocs and nodes. All vprocs on a node file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. The PDE and BYNET software emulate BYNET activity in a single-node environment. . Software: The BYNET driver (software) is installed on every node. Load balanced: Traffic is automatically and dynamically distributed between both BYNETs. hardware on BYNET 0 is disabled and messages are rerouted to BYNET 1. SMP systems do not contain BYNET hardware. A group of vprocs on a node C. PDE software on the node has the ability to route the message to which three? (Choose three.htm 10/20/2011 . BYNET Hardware and Software The BYNET hardware and software handle the communication between the vprocs and the nodes. click here. . consisting of BYNET boards and cables. When a message is delivered to a node using BYNET hardware and software. This BYNET driver is an interface between the PDE software and the BYNET hardware. 1. Hardware: The nodes of an MPP system are connected with the BYNET hardware.

so data remains file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. When a node resets. If one node goes down in a clique the vprocs will migrate to the other nodes in the clique. Each multi-node system has at least one clique. The diagram below shows three cliques. The overall system is connected by the BYNET. "kleek") is a group of nodes that share access to the same disk arrays. The cabling determines which nodes are in which cliques -. The vprocs (AMPs) from the failed node migrate to the operational nodes in its clique.the nodes of a clique are connected to the disk array controllers of the same disk arrays. cliques provide for data access through vproc migration. 2. 4. The nodes in each clique are cabled to the same disk arrays. the following happens to the AMPs: 1.htm 10/20/2011 . Cliques in a System Vprocs are distributed across all nodes in the system. Multiple cliques in the system should have the same number of nodes. Channel attached PEs will not migrate. While that node remains down. Cliques Provide Resiliency In the event of a node failure. that channel connection is not available. Disks managed by the AMP remain available and processing continues while the failed node is being repaired. The PE vprocs will migrate as follows: LAN attached PEs will migrate to other nodes in the clique. All vprocs on all nodes Feedback: Check Answer Show Answer Cliques A clique (pronounced. the Teradata Database restarts across all remaining nodes in the system. 3. When the node fails.Page 39 of 137 D.

Eliminates the need for a restart to bring a failed node back into service. A second restart of Teradata is not needed. When the failed node is recovered/repaired and restarted. The performance degradation is 0%. Can be brought into the configuration when a node fails in the clique. If a node in the clique fails. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Hot Standby Nodes are positioned as a performance continuity feature. However. system performance decreases due to the loss of a node. the AMPs from the failed node move to the hot standby node. it becomes the new hot standby node. Characteristics of a hot standby node are: A node that is a member of a clique. Helps with unplanned outages.Page 40 of 137 available. Does not normally participate in the trusted parallel application (TPA).htm 10/20/2011 . Hot Standby Node A Hot Standby Node (HSN) is a node that is a member of a clique that is not configured (initially) to execute any Teradata vprocs. System performance degradation is proportional to clique size.

htm 10/20/2011 . Performance degradation is 0% as AMPs are moved to the Hot Standby Node. 2. you need both of the following: Operating system license (UNIX. When node 1 is recovered it becomes the new Hot Standby Node. or Linux) Teradata Database software license file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt.Page 41 of 137 1. Microsoft Windows. Software Components A Teradata Database node requires three distinct pieces of software: For each node in the system.

Page 42 of 137

Operating System

The Teradata Database can run on the following operating systems: UNIX MP-RAS (Not supported beyond Teradata 13.) Microsoft Windows 2000 SuSE Linux

Parallel Database Extensions (PDE)
The Parallel Database Extensions (PDE) software layer was added to the operating system to support the parallel software environment. The PDE controls the virtual processor (vproc) resources.

Trusted Parallel Application (TPA)
A Trusted Parallel Application (TPA) uses PDE to implement virtual processors (vprocs). The Teradata Database is classified as a TPA. The four components of the Teradata Database TPA are:

file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt.htm

10/20/2011

Page 43 of 137

AMP (Top Right) PE (Bottom Right) Channel Driver (Top Left) Teradata Gateway (Bottom Left)

Teradata Database Software: PE
A Parsing Engine (PE) is a virtual processor (vproc) that manages the dialogue between a client application and the Teradata Database, once a valid session has been established. Each PE can support a maximum of 120 sessions. The PE handles an incoming request in the following manner: 1. The Session Control component verifies the request for session authorization (user names and passwords), and either allows or disallows the request. 2. The Parser does the following: Interprets the SQL statement received from the application. Verifies SQL requests for the proper syntax and evaluates them semantically. Consults the Data Dictionary to ensure that all objects exist and that the user has authority to access them. 3. The Optimizer is cost-based and develops the least expensive plan (in terms of time) to return the requested response set. Processing alternatives are evaluated and the fastest alternative is chosen. This alternative is converted into executable steps, to be performed by the AMPs, which are then passed to the Dispatcher. The Optimizer is "parallel aware," meaning that it has knowledge of the system components (how many nodes, vprocs, etc.), which enables it to determine the fastest way to process the query. In order to maximize throughput and minimize resource contention, the Optimizer must know about system configuration, available units of parallelism (AMPs and PEs), and data demographics. The Teradata Database Optimizer is robust and intelligent, and enables the Teradata Database to handle multiple complex, ad-hoc queries efficiently. 4. The Dispatcher controls the sequence in which the steps are executed and passes the steps received from the optimizer onto the BYNET for execution by the AMPs. 5. After the AMPs process the steps, the PE receives their responses over the BYNET.

file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt.htm

10/20/2011

Page 44 of 137

6. The Dispatcher builds a response message and sends the message back to the user. To review the PE software, click the buttons (rectangles) on the PE.

Click on the PE buttons.

Teradata Database Software: AMP
The AMP is a vproc in the Teradata Database's shared-nothing architecture that is responsible for managing a portion of the database. Each AMP will manage some portion of each table on the system. AMPs do the physical work associated with generating an answer set (output) including sorting, aggregating, formatting, and converting. The AMPs retrieve and perform all database management functions on the required rows from a table. An AMP accesses data from its single associated vdisk, which is made up of multiple ranks of disks. An AMP responds to Parser/Optimizer steps transmitted across the BYNET by selecting data from or storing data to its disks. For some requests, the AMPs may redistribute a copy of the data to other AMPs. The Database Manager subsystem resides on each AMP. This subsystem will: Lock databases and tables. Create, modify, or delete definitions of tables. Insert, delete, or modify rows within the tables. Retrieve information from definitions and tables. Return responses to the Dispatcher. Earlier in this course, we discussed the logical organization of data into tables. The Database Manager subsystem provides a bridge between that logical organization and the physical organization of the data on disks. The Database Manager performs a space-management function that controls the use and allocation of space. To review the AMP software, click the buttons (rectangles) on the AMP.

file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt.htm

10/20/2011

and back to the client. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Teradata Database Software: Channel Driver Channel Driver software is the means of communication between an application and the PEs assigned to channel-attached clients.htm 10/20/2011 . to the host channel adapter in the node. In the diagram below.Page 45 of 137 Click on the AMP buttons. There is one Channel Driver per node. to the PE. to the Channel Driver software. the blue dots show the communication from the channel-attached client.

Teradata Extreme Data Appliance 1550 The Teradata Extreme Data Appliance 1550 provides for deep strategic intelligence from extremely large amounts of detailed data.Page 46 of 137 Teradata Database Software: Teradata Gateway Teradata Gateway software is the means of communication between an application and the PEs assigned to network-attached clients.htm 10/20/2011 . Customers may easily migrate applications from one platform to another without having to change data models. This appliance is based on the field proven Teradata Active Enterprise Data Warehouse 5550 processing nodes and provides the same scalability and data warehouse file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Teradata Purpose-Built Family Platform Each platform is purpose built to meet different analytical requirements. In the diagram below. to the PE. It supports very high-volume. non-enterprise data/analysis requirements for a small number of power users in specific workgroups or projects that are outside of the enterprise data warehouse (EDW). ETL. to the Ethernet card in the node. or underlying structures. They all leverage the Teradata Database. There is one Teradata Gateway per node. the blue dots show the communication from the network-attached client. to the Teradata Gateway software. and back to the client.

is a group of nodes with access to the same disk arrays. Teradata Data Mart Appliance 2500/2550/2555 The Teradata Data Mart Appliance 2500 is a server that is optimized specifically for high DSS performance.45% faster on a per node basis.Page 47 of 137 capabilities as any other Teradata platform. but are approximately 40% . is installed on each node in the system. power and performance of both the Intel® Xeon™ quad-core processors and BYNET V3 technologies offers unsurpassed performance and capacity within the scalable data warehouse. The power of the Teradata Database combined with the throughput. Teradata Active Enterprise Data Warehouse . They offer expansion capabilities up to 1024 TPA and non-TPA nodes. These systems are optimized for fast scans and heavy “deep dive” analytics.htm 10/20/2011 . causes vprocs to migrate to other nodes.1 Select the answers from the options given in the drop-down boxes that correctly complete the sentences.5550 H and 5555 C/H These models are targeted to the full-scale large data warehouse. The Teradata Data Mart Appliance 2550 and 2555 have similar characteristics to the 2500. Characteristics of the Teradata Data Mart Appliance 2500/2550/2555 include: Delivered ready to run Integrated system fully staged and tested Includes a robust set of tools and utilities Rapid time to value with system live within hours Competitive price point Capacity on demand available if needed Easy migration to an EDW/ADW Exercise 2. A copy of Feedback: Show Answers Reset file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. carries the communication between nodes in a system.

4 Select AMP. BYNET. Exercise 2. Sorts.Page 48 of 137 To review these topics. There are two types of virtual processors: AMPs and PEs. click Software Components. Using the BYNET. Windows 2000. and Linux. C. Accesses data on its assigned vdisk. click Trusted Parallel Application (TPA). Runs on a foundation called a TPA. Feedback: Show Answers Reset To review this topic. Click each of your choices and check the Feedback box below each time to see if you are correct. and formats data in the processing of requests. aggregates. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt.) A. Feedback: Check Answer Show Answer To review these topics. or Operating System.2 Which three statements about the Teradata Database are true? (Choose three. Parallel Database Extensions (PDE). Exercise 2. or PE in the pull-down menu as the component responsible for the following tasks: Carries messages between nodes.3 Four of these components are contained in the TPA software. or Cliques. D. PDE is a software layer that allows TPAs to run in a parallel software environment. click Cliques Provide Resiliency. B. Exercise 2. A Teradata Database System. Runs on UNIX MP-RAS (discontinued after Teradata 13).htm 10/20/2011 .

select the answer that correctly completes the sentence. In processing a request. OLTP. Communication Between Vprocs. Exercise 2. Feedback: file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Exercise 2. How many blue jeans were sold across all of our Eastern stores in the month of March in child sizes? Feedback: Show Answers Reset To review these topics. click Teradata Database Software: PE. and Teradata Database Software: AMP. facilitating AMP/PE communication. determines the most efficient plan Feedback: To review this topic.7 From the drop-down box below. the for processing the requested response.Page 49 of 137 Chooses the least expensive plan for creating a response set. Transports responses from the AMPs back to the PEs. Show the top ten selling items for 1997 across all stores. Teradata Database Software: PE. Feedback: Show Answers Reset To review these topics.5 From the drop-down box below. rather than for general use.htm 10/20/2011 . select the answer that correctly completes the sentence. Can manage up to 120 sessions. click Node Components. DSS or Data Mining (DM) in the pull-down menu as the appropriate type of data processing for the following requests: Withdraw cash from ATM. A(n) may contain detail or summary data and is a special purpose subset of enterprise data for a particular function or application.6 Select OLAP. click Evolution of Data Processing. Distributes incoming data or retrieves rows being requested to generate an answer set. Communication Between Nodes. Exercise 2.

Feedback: To review this topic. Exercise 2. and technologies so you can "mix" storage devices. click Teradata Virtual Storage.Page 50 of 137 To review this topic. enable(s) the mixing of drive sizes.htm 10/20/2011 . select the answer that correctly completes the sentence. Exercise 2.g. click Teradata Purpose-Built Family Platform. 2550) in the pull-down menu as the appropriate platform for each description: A server that is optimized specifically for high DSS performance such as fast scans and heavy “deep dive” analytics. speeds.9 From the drop-down box below. Exercise 2. select the answer that correctly completes the sentence. A(n) supports the coexistence of tactical and strategic queries. 5550) or Teradata Data Mart Appliance (e.g. 1550) Teradata Active Enterprise Data Warehouse (e.10 Select Teradata Extreme Data Appliance (e. Feedback: Show Answers Reset To review these topics.g. Provides for deep strategic intelligence from extremely large amounts of detailed data and supports very high-volume. click Active Data Warehouse.11 file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Feedback: To review this topic. click Data Marts. non-enterprise data/analysis requirements for a small number of power users in specific workgroups or projects that are outside of the enterprise data warehouse (EDW).8 From the drop-down box below. Exercise 2. Scalable data warehouse targeted to the full-scale large DW with expansion up to 1024 TPA and non-TPA nodes.

set processing takes its sets at once. You may hear file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. you should be able to: Describe how the clients access the Teradata Database.12 True or False: Both cursor row processing and set processing define set(s) of rows of the data to process and can be processed with a single command. but. while a cursor processes the rows sequentially. or node) and communicates with RDBMS software on the node. Throughput. click Response Time vs. Client Connections Users can access data in the Teradata Database through an application on both channel-attached and network-attached clients.htm 10/20/2011 . False Feedback: To review this topic. Feedback: Show Answers Reset To review these topics. Measures how much gets done per unit time.Page 51 of 137 Match the performance term to its definition: Measures how long it takes to do something. click Row vs. Set Processing Mod 3 . True B. Describe the Teradata client utilities and their use. the node itself can act as a client. Illustrate how the Teradata Database processes a request. Teradata client software is installed on each client (channel-attached. Please be sure to click on each hotword link to capture all of the training content. HOT TIP: This module contains links to important supplemental course information. networkattached. Additionally. A. Exercise 2.Client Access Objectives After completing this module.

or Linux-based system that is LAN-attached. to the Channel Driver software. The following software components installed on the mainframe are responsible for communications between client applications and the Channel Driver on a Teradata Database node: Teradata Director Program (TDP) software to manage session traffic. Communication with the Teradata Database System Communication from client applications on the mainframe goes through the mainframe channel. UNIX. such as IBM or Amdahl. installed on the channel-attached client. Channel-Attached Client Channel-attached clients are IBM-compatible mainframe systems supported by the Teradata Database. This application could be a business intelligence (BI) tool or a data integration (DI/ETL/ELT) tool.htm 10/20/2011 . which is channelattached to the Teradata Database. Call-Level Interface (CLI).Page 52 of 137 either type of client referred to by the term "host. submitting queries to Teradata or loading/updating tables in the database. a library of routines that are the lowest-level interface to the Teradata Database. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. and submits the response to the user. to the Host Channel Adapter on the node. receives the response. The client application submits an SQL request to the database. or it may be a PC." though this term is not typically used in documentation or product literature. The client may be a mainframe system.

to the Ethernet card on the node. Call-Level Interface. to the Teradata Gateway software. JDBC enables the development of web-based Teradata end user tools that can access Teradata through a web server. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Version2 (CLIv2) is a library of routines that enable an application program to access data stored in the Teradata Database. JDBC will also provide support for access to other commercial databases. Communication with the Teradata Database System Communication from applications on the network-attached client goes over the LAN. ODBC-compliant applications connect with a database through the use of a driver that translates the application's ODBC commands into database syntax.Page 53 of 137 Network Attached Client The Teradata Database supports network-attached clients connected to the node over a LAN. When used with network-attached clients. legacy API to Teradata from a network host.htm 10/20/2011 . WinCLI is an additional. CLIv2 contains the following components: CLI (Call-Level Interface) MTDP (Micro Teradata Director Program) MOSI (Micro Operating System Interface) Java Database Connectivity (JDBC) is an Application Programming Interface (API) that allows platform independent Java applications to access a DBMS using Structured Query Language (SQL). The following software components installed on the network-attached client are responsible for communication between client applications and the Teradata Gateway on a Teradata Database node: Open Database Connectivity (ODBC) is an application programming standard that defines common database access mechanisms to simplify the exchange of data between a client and server.

it will be treated like an application on a network-attached client. If you install application software on a node. An application on a node can be executed through: System Console that manages an SMP system. the Teradata Gateway software and the PE provide the connection to the Teradata Database. Network-attached workstation Feedback: Check Answer Show Answer Request Processing file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. As a review. . such as over a network-attached client connection. Bus terminal C. communications from applications on the node go through the Teradata Gateway. .htm 10/20/2011 .) A. System console D. Node The node is considered a network-attached client. This ensures high availability. Just for Fun . In other words. The Teradata Database is configured with two LAN connections for redundancy. Remote login. answer this question: Which two can you use to run an application that is installed on a node? (Choose two.Page 54 of 137 On the database side. Mainframe terminal B.

7. and the largest MPP implementations. 8. Various client utilities are available for tasks from loading data to managing the system. depending on whether the user is accessing the Teradata Database through a channelattached or network-attached client: 1. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Request is passed to the PE(s). SQL request is sent from the client to the appropriate component on the node: Channel-attached client: request is sent to Channel Driver (through the TDP). PE Dispatcher sends steps to the AMPs over the BYNET. 4. PEs parse the request into AMP steps. 3. Network-attached client: request is sent to Teradata Gateway (through CLIv2 or ODBC).htm 10/20/2011 . Mainframe Request Flow Workstation Request Flow Teradata Client Utilities Teradata has a robust suite of client utilities that enable users and system administrators to enjoy optimal response time and system manageability. 2. AMPs perform operations on data on the vdisks.Page 55 of 137 The steps for processing a request like the one above are somewhat different. 6. Response is returned to the client (channel-attached or network-attached). The same utilities run on smaller entry-level systems. 5. PE Dispatcher receives response. Response is sent back to PEs over the BYNET. Teradata utilities leverage the Teradata Database’s high performance capabilities and are fully parallel and scalable.

is a Teradata Database tool used for submitting SQL queries on all platforms. the Teradata Database load utilities are recommended for more efficiency. BTEQ provides the following functionality: Standard report writing and formatting Basic import and export of small amounts of data to and from the Teradata Database across all platforms. BTEQ BTEQ (Basic Teradata Query) -.htm 10/20/2011 .pronounced “BEE-teek” -. Two mentioned in this section are BTEQ and Teradata SQL Assistant. Ability to submit SQL requests in the following ways: Interactive Batch file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. For tables more than a few thousand rows. described in this section: Query Submitting Utilities BTEQ Teradata SQL Assistant Load and Unload Utilities FastLoad MultiLoad TPump FastExport Teradata Parallel Transporter (TPT) Administrative Utilities Teradata Manager Teradata Dynamic Workload Manager (TDWM) Priority Scheduler Database Query Log (DBQL) Teradata Workload Analyzer Performance Monitor (PMON) Teradata Active Systems Management (TASM) Teradata Analyst Pack Archive Utilities Archive Recovery Facility (ARC) NetVault (third party) NetBackup (third party) Query Submitting Utilities The Teradata Database provides tools that are front-end interfaces for submitting SQL queries.Page 56 of 137 Teradata Database client utilities include the following.

htm 10/20/2011 . Data Load and Unload Utilities file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. and text files. Microsoft Access. to help you build scripts for data mining and knowledge discovery.Page 57 of 137 Teradata SQL Assistant Teradata SQL Assistant (formerly known as Queryman) is an information discovery/query tool that runs on Microsoft Windows. Teradata SQL Assistant enables you to access the Teradata Database as well as other ODBC-compliant databases. Some of its features include: Ability to save data in PC-based formats. such as Microsoft Excel. the Teradata Database load utilities are recommended for more efficiency. For tables more than a few thousand rows. History of submitted SQL syntax. Import and export of small amounts of data to and from ODBC-compliant databases. Help with SQL syntax.

The load and unload utilities are: FastLoad MultiLoad TPump FastExport Teradata Parallel Transporter (TPT) The concurrency limit for utilities is now 60: Up to 30 concurrent FastLoad and MultiLoad jobs. it can be restarted again from the last checkpoint. A typical use is for mini-batch or frequent batch where you load the data to an empty "staging" table. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. FastLoad loads data into an empty table in parallel. they accommodate the size of the system. operational data marts.Page 58 of 137 In a data warehouse environment. without having to start the job from the beginning. the table can be made available to users. FastLoad Use the FastLoad utility to load data into empty tables. Because the utilities are scalable. These systems are the source of data such as daily transaction files. using multiple sessions to transfer blocks of data. Up to 60 concurrent FastExport jobs (assuming no FastLoad or MultiLoad jobs). Teradata provides a suite of data load and unload utilities optimized for use with the Teradata Database. The utilities have full restart capability. FastLoad achieves high performance by fully exploiting the resources of the system.htm 10/20/2011 . Performance is not limited by the capacity of the load and unload tools. and Internet statistics. and then use an SQL INSERT/SELECT command to move it to an existing table. or other distributed systems throughout a company. orders. They run on any of the supported client platforms: Channel-attached client Network-attached client Node Using Teradata Load and Unload Utilities Teradata load and unload utilities are fully parallel. usage records. FastLoad loads to a single empty table at a time. After the data load is complete. such as mainframe applications. the database tables are populated from a variety of sources. ERP (enterprise resource planning) information. This feature means that if a load or unload job should be interrupted for some reason.

htm 10/20/2011 . using multiple sessions. Access locks may be used to query tables being maintained with MultiLoad. In addition. update. TPump maintains up to 60 tables at a time. TPump updates a row at a time and uses row hash locks. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. TPump Use TPump to: Continuously load. and places a lock on the destination table(s) to prevent user queries from getting inconsistent results before the data load or update is complete.Page 59 of 137 MultiLoad Use the MultiLoad utility to maintain tables by: Inserting rows into a populated or empty table Updating rows in a table Deleting multiple rows from a table MultiLoad can load multiple input files concurrently and work on up to five tables at a time. Users can continue to run queries during TPump data loads. MultiLoad is optimized to apply multiple rows in blocklevel operations. MultiLoad usually is run during a batch window. which eliminates the need for table locks and "batch windows" typical with MultiLoad. or delete data in tables Update lower volumes of data using fewer system resources than other load utilities Vary the resource consumption and speed of the data loading activity over time TPump performs the same operations as MultiLoad.

Page 60 of 137 TPump has a dynamic throttle that operators can set to specify the percentage of system resources to be used for an operation. and TPump) in a single parallel environment.htm 10/20/2011 . The destination for the exported data can be a: Host file: A file on your channel-attached or network-attached client system User-written application: An Output Modification (OUTMOD) routine you write to select. FastExport file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Typically. transformation and loading processes common to all data warehouses. Teradata Parallel Transporter combines the functionality of the Teradata utilities (FastLoad. and the tables being exported are locked. validate. or within limits when TPump may affect other business users of the Teradata Database. Using built-in operators. MultiLoad. Teradata Parallel Transporter Teradata Parallel Transporter is a load/update/export tool that enables data extraction. FastExport is run during a batch window. It transfers large amounts of data using block transfers over multiple sessions and writes the data to a host file on the networkattached or channel-attached client. Its extensible environment supports FastLoad INMODs. FastExport Use the FastExport utility to export data from one or more tables or views on the Teradata Database to a client-based application. You can export data from any table or view on which you have the SELECT access rights. This enables operators to set when TPump should run at full capacity during low system usage. FastExport. and preprocess the exported data. FastExport is a data extract utility.

some transformation. parallel tasks. The Load. let's compare the Teradata Parallel Transporter Operators with the classic utilities that we just covered. Teradata Parallel Transporter (TPT) Operator Teradata Utility Description TPT Operator Teradata Utility Description A consumer-type operator that uses the Teradata FastLoad protocol. There is a set of open APIs (Application Programmer Interface) to add third party or custom data transformation to Teradata Parallel Transporter scripts. The Load. Update. SQL-like scripting language. Teradata Parallel Transporter provides a single. Supports Error limits and Checkpoint/ Restart. The SQL Select and Insert operators submit the Teradata SELECT and INSERT commands. and the SQL Select/Insert operators are included when you purchase the Infrastructure. FastExport and TPump utilities. MultiLoad. Both support LOAD FastLoad file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Teradata Parallel Transporter Operators The operators are components that "plug" into the Teradata Parallel Transporter infrastructure and actually perform the functions. Data Connector operator. The INMOD and OUTMOD adapters.Page 61 of 137 OUTMODs. Update. Export and Stream operators are purchased separately. and loads all in one SQL-like scripting language. as well as a GUI to make scripting faster and easier. To simplify these new concepts. The FastLoad INMOD and FastExport OUTMOD operators support the current FastLoad and FastExport INMOD/OUTMOD features. Export and Stream operators are similar to the current FastLoad. The Data Connector operator is an adapter for the Access Module or non-Teradata files. a single Teradata Parallel Transporter script can load data from disparate sources into the Teradata Database in the same job. You can do the extract. as indicated by the green arrow. A single Teradata Parallel Transporter job can load data from multiple disparate sources into the Teradata Database. but built for the Teradata PT parallel environment. and Access Modules to provide access to all the data sources you use today. Using multiple.htm 10/20/2011 .

Page 62 of 137 Multi-Value Compression and PPI. Administrative Utilities Administrative utilities use a graphical user interface (GUI) to monitor and manage various aspects of a Teradata Database system. Reads data from an ODBC ODBC N/A Provider. reads an unspecified number of data files. Utilizes the Teradata MultiLoad protocol to enable job based table UPDATE MultiLoad updates. A producer operator that emulates EXPORT FastExport the FastExport utility Uses multiple sessions to perform STREAM TPump DML transactions in near real-time. Reads external data DataConnector N/A files. Some of the components that make up Teradata’s Workload Management capability are: Teradata Manager file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. This allows highly scalable and parallel inserts and updates to an existing table. This operator emulates the Data Connector API. writes data to external data files. The administrative utilities are: Workload Management: Teradata Manager Teradata Dynamic Workload Manager (TDWM) Priority Scheduler Database Query Log (DBQL) Teradata Workload Analyzer Performance Monitor Teradata Active Systems Management (TASM) Teradata Analyst Pack Workload Management Workload Management in Teradata is used to control system resource allocation to the various workloads on the system.htm 10/20/2011 .

increase database efficiency. and enhance workload capacity. For example. with TDWM a request can be scheduled to run periodically or during a specified time period. schedule later. TDWM provides a graphical user interface (GUI) for creating rules that manage database access. tables. Via the rules created through TDWM. databases.TDWM can restrict requests that will exceed a certain processing time. date. For examples of Teradata Manager functions. including database system CPU and disk utilization.htm 10/20/2011 . manipulate. and administer one or more Teradata Database systems through a GUI. network activity. or executed when they are submitted to the Teradata Database. control. queries can be rejected. and number of users. Running on LAN-attached clients. or reject) queries based on current workload and set thresholds. Results can be retrieved any time after the request has been submitted by TDWM and executed. macros. Object controls can control workload requests based on user IDs. Object control thresholds .Page 63 of 137 Teradata Manager is a production and performance monitoring system that helps a DBA or system manager monitor. TDWM can restrict queries based on factors such as: Analysis control thresholds . file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt.TDWM can limit access to and use of static criteria such as database objects and other items. Environmental factors -TDWM can manage requests based on dynamic environment factors. time. or whose expected result set size exceeds a specified number of rows. views. and group IDs. Teradata Manager has a variety of tools and applications to gather. and analyze information about each Teradata Database being administered. click here: Teradata Manager Examples Teradata Dynamic Workload Manager (TDWM) Teradata Dynamic Workload Manager (also known as Teradata DWM or TDWM) is a query workload management tool that can restrict (run. throttled. suspend.

their components are compared against criteria that are defined by the administrator. suspended. are allocated to different users in a Teradata system. You can provide Priority Scheduler parameters to directly define a strategy for controlling resources. (e. The database administrator can use the following capabilities of TDWM to manage work submitted to the database in order to maximize system resource utilization: Query Management Scheduled Requests With Query Management. Define general TASM controls . With Scheduled Requests. database query requests are intercepted within the Teradata Database. goals and Priority Scheduler facility (PSF) mapping/weights.1.TASM automates the allocation of resources to workloads to assist the DBA or application developer with system performance management.g. a new concept as of Teradata V2r6. TDWM allows the Database Administrator to provide operational control of and to effectively manage and regulate access to the Teradata Database. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt.htm 10/20/2011 . and requests that fail to meet the criteria are restricted: either run.Page 64 of 137 Teradata Dynamic Workload Manager is a key supporting product component for Teradata Active Systems Manager (TASM). CPU). or rejected. described in another sub-topic below. clients can submit SQL requests to be executed at scheduled off-peak times.. Priority Scheduler Priority Scheduler is a resource management tool that is used to assign resources and controls how computer resources. scheduled later. This resource management function is based on scheduler parameters that satisfy site-specific requirements and system parameters that depict the current activity level of the Teradata Database system. The major functions performed by the DBA are to: Define Filters and Throttles. Define Workloads (new) and their operating periods.

and to abort sessions that are causing system problems. Provides the ability to migrate existing Priority Schedule Definitions (PD Sets) into new workloads. Teradata Workload Analyzer supports the conversion of existing Priority Scheduler Definitions (PD Sets) into new workloads. ( i. Establishes workload definitions from query history or directly. Statistics from DBQL data 2. Query counts and response times can be charted and SQL text and processing steps can be compared to fine-tune applications for optimum performance.e. Workload Definitions and recommended Service Level Goals) by either using: 1. Performance Monitor The Performance Monitor (formerly called PMON) collects near real-time system configuration. In addition. Control Pre-Execution what and how much is allowed to begin execution. resource usage. Can be used “iteratively” to analyze and understand how well existing workload definitions are working and modify them if necessary. Resource Management Query Executes Resource control during Manage the level of resources allocated to different priorities of executing work. Teradata Dynamic Workload Manager Priority Scheduler file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Application flow control: Resource control prior to execution. Migrated current Priority Scheduler settings Teradata Workload Analyzer can also apply best practice standards to workload definitions such as assistance in Service Level Goal (SLG) definition and priority scheduler setting recommendations.htm 10/20/2011 . Teradata Workload Analyzer Teradata Workload Analyzer recommends candidate workloads for analysis. Recommends workload allocation group mappings and Priority Scheduler facility (PSF) weights.Page 65 of 137 Database Query Log (DBQL) The Database Query Log (DBQL) logs query processing activity for later analysis. and session information from the Teradata Database either directly or through Teradata Manager. In addition. Provides recommendations for appropriate workload Service Level Goals (SLGs). it provides the following capabilities: Identifies classes of queries and candidate workloads for analysis and recommends workload definitions and operating rules. Workload Analyzer creates a Workload Rule set. Performance Monitor formats and displays this information as requested: Performance Monitor allows you to analyze current performance and both current and historical session information..

(enhanced with TASM) Teradata Workload Analyzer (TWA) – which recommends candidate workloads for analysis . and “workload definitions”. By analyzing this information.(enhanced with TASM) Teradata Manager .(new with TASM) Teradata Active Systems Management (TASM). allows you to perform the following: Limit user concurrency s Monitor Service Level Goals (SLGs) on a system Optimize mixed workloads Reject queries based on table access Prioritize workloads Provide more consistent response times and influence response times React to hardware failures Block access on a table to a user. and Determine the workload on a system.Page 66 of 137 execution Performance Monitor Database Query Log Allows DBA or user to During Query examine the active Execution workload. Application Query PostExecution Analyze query performance and behavior after completion.htm 10/20/2011 . Teradata Visual Explain Teradata Visual Explain makes query plan analysis easier by providing the ability to file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt.which reviews historical workloads . throttles. TASM is primarily comprised of three products that are used to create and manage “workload definitions”: Teradata Dynamic Workload Manager (TDWM) . Teradata Active Systems Management (TASM) Teradata Active System Management is made up of several products/tools that assist the DBA or application developer in defining and refining the rules that control the allocation of resources to workloads running on a system. Workload definitions are rules to control the allocation of resources to workloads and are new with Teradata V2R6. Teradata Analyst Pack Teradata Analyst Pack is a suite of the following products. the workload definitions can be adjusted to improve the allocation of system resources. Tools are also provided to monitor workloads in real time and to produce historical reports of resource utilization by workloads.1. These rules include filters.

Teradata System Emulation Tool (Teradata SET) Teradata SET simplifies the task of emulating a target system by providing the ability to export and import all information necessary to fake out the optimizer in a test environment. It is helpful in identifying the performance implications of data skew and bad or missing statistics. Teradata Index Wizard provides a graphical user interface (GUI) that guides the user through analyzing a database workload and provides recommendations for improving performance through the use of indexes. As changes are made within a database. Visual Explain uses a Query Capture Database to store query plans which can then be visualized or manipulated with other Teradata Analyst Pack tools. collection. query. Teradata Index Wizard provides support for Partitioned Primary Indexes (PPI) recommendations. indexes. Schedule the COLLECT STATISTICS activity. This feature is useful for verifying queries and reproducing optimizer related issues in a test environment. The DBA is then given the opportunity to accept or reject the recommendations. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Select databases. or recollection of statistics. All information required for query plan analysis such as database object definitions. database administrators and database support personnel to better understand why the Teradata Database Optimizer chooses a particular plan for a given SQL query. Teradata Index Wizard Teradata Index Wizard automates the process of manual index design by recommending secondary indexes for a particular workload. the Statistics Wizard identifies those changes and recommends which tables should have statistics collected. or columns for analysis. data demographics and cost and cardinality estimates is available through the Teradata Visual Explain interface. PPI is discussed in the Indexes module of this course.Page 67 of 137 capture and graphically represent the steps of the plan and perform comparisons of two or more plans. resulting in better query plans and helping the DBA to efficiently manage statistics.htm 10/20/2011 . This information can be used along with the Target Level Emulation feature to generate query plans on the test system as if they were run on the target system. The Statistics Wizard enables the DBA to: Specify a workload to be analyzed for recommendations to improve the performance of the queries in that workload. and which columns/indexes would benefit from having statistics defined and collected for a specific workload. based on age of data and table growth. It is intended for application developers. Teradata SET allows the user to capture the following by database. or workload: System cost parameters Object definitions Random AMP samples Statistics Query execution plans Demographics This tool does not export user data. Teradata Statistics Wizard Teradata Statistics Wizard is a graphical tool that has been designed to automate the collection and re-collection of statistics. tables.

In addition. and can run on a channel-attached client. views. ARC interfaces to third party products to support backup and restore capabilities in a network-attached environment. and other objects. databases. ARC is used to back up and restore data. Restoring tables. ARC archives and restores database objects. With the ARC utility you can copy a table and restore it to another Teradata Database. or a node. It supports commands written in Job Control Language (JCL).Page 68 of 137 Archival Utilities Teradata provides the Archive Recovery utility (ARC) to perform backup and restore operations on tables. There are several scenarios where restoring objects from external media may be necessary: Restoring non-Fallback tables after a disk failure. allowing recovery of data that may have been damaged or lost. Restoring tables that have been corrupted by batch processes that may have left the data in an uncertain state. and will backup data directly across the channel into the mainframe-attached tape subsystem. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. or macros that have been accidentally dropped by the user. It is scalable and parallel. Archive a single partition. network-attached client. ARC may be running on the node or on the channel-attached client.htm 10/20/2011 . Miscellaneous user errors resulting in damaged or lost database objects. Archiving on Channel-Attached Clients In a channel-attached (mainframe) client environment.

along with one of the following tape management products: NetVault (from BakBone Software Inc.htm 10/20/2011 . Exercise 3. Correctly placed icons will stay where you put them.Page 69 of 137 Archiving on Network-Attached Clients In a network-attached client environment.) Veritas NetBackup – from Symantec Software These products provide modules for Teradata Database systems that run on networkattached clients or a node (Microsoft Windows or UNIX MP-RAS). ARC is used to back up data. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Data is backed up through these interfaces into a tape storage subsystem using the ARC utility.1 Processing a Request: Drag an icon from the group on the right to its correct position in the empty boxes on the left.

Page 70 of 137 To review this topic. Data extract utility that exports data from a Teradata table and writes it to a host file. Uses parallel processing to load an empty table. Exercise 3. inserts. Performs the same function as the STREAM Teradata Parallel Transporter operator. Feedback: Show Answers Reset To review these topics.htm 10/20/2011 . and FastExport. click FastLoad. or deletes empty or populated tables (block level operation). Enables constant loading (streaming) of data into a table to keep data fresh. TPump. Performs the same function as the UPDATE Teradata Parallel Transporter operator. MultiLoad. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. click Request Processing. Exercise 3. Correctly placed components will stay where you put them. Updates.3 Move the software components required for a channel connection into the appropriate blue squares.2 Select the appropriate Teradata load or unload utility from the pull-down menus.

Exercise 3. Archiving on Channel-Attached Clients. E.5 Select the correct type of connection (network-attached client or channel-attached client) from the drop-down boxes below that corresponds to the listed software and hardware components. click BTEQ. C. NetVault and Veritas NetBackup are utilities used for network management. click Channel Attached Client. Teradata Manager. TDWM can reject a query based on current workload and set thresholds. TDWM.4 Which three statements are true? (Choose three. BTEQ runs on all client platforms to access the Teradata Database. Feedback: Check Answer Show Answer To review these topics. D.htm 10/20/2011 . Teradata Gateway Teradata Director Program Channel Driver Ethernet Card "mainframe host" Feedback: file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. and Archiving on Network-Attached Clients. Teradata SQL Assistant. Exercise 3.Page 71 of 137 To review this topic. Archive Recovery (ARC) is used to copy and restore a table to another Teradata Database. Teradata SQL Assistant and TDWM are the two utilities used for Teradata system management.) A. B.

Exercise 3.e. click Channel Attached Client or Network Attached Client. (i. Feedback: Show Answers Reset To review this topic. filters. A. False Feedback: To review this topic.. True B. Verifies queries and reproduces optimizer related (query plans) issues in a test environment.7 __________ is made up of several products/tools that assist the DBA or application developer in defining and refining the rules. Recommends and automates the Statistics Collection process. Database Query Log C. Teradata Workload Analyzer B.6 Select the correct Teradata Analyst Pack tool from the drop-down menus below.Page 72 of 137 Show Answers Reset To review this topic. Teradata Active Systems Manager D. Uses a Query Capture Database to store query plans. throttles and workload definitions).8 True or False: Workload definitions are rules to control the allocation of resources to workloads. Exercise 3. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. click Administrative Utilities. A. click Teradata Analyst Pack. Recommends one or more Secondary Indexes for a table. click Administrative Utilities. Performance Monitor Feedback: To review this topic.htm 10/20/2011 . that control the allocation of resources to workloads running on a system. Exercise 3.

Please be sure to click on each hotword link to capture all of the training content.htm 10/20/2011 . Describe the function of the Data Dictionary. List methods for authentication and security on Teradata. . Mod 4 . Temp Space. click Network Attached Client. you should be able to: Distinguish between a Teradata Database and a Teradata User. Feedback: Show Answers Reset To review this topic. ________ compliant applications connect with a database through the use of a driver that translates the application's ________ commands into database syntax. is a library of routines that enable an application program to access data stored in the Teradata Database. and Spool Space. is an Application Programming Interface (API) that allows platform independent Java applications to access a DBMS using Structured Query Language (SQL).Data Structure Objectives After completing this module. and explain how each is used.9 Select the correct term from the drop-down menus below. HOT TIP: This module contains links to important supplemental course information. List and define the Teradata Database objects. is an additional legacy API that allows access to Teradata from a network host. is an application programming standard that defines common database access mechanisms to simplify the exchange of data between a client and server. Creating Databases and Users In the Teradata Database. Define Perm Space.Page 73 of 137 Exercise 3. Databases (including a special category of Databases called file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. It enables the development of web-based Teradata end user tools that can access Teradata through a web server and also provides support for access to other commercial databases.

have attributes assigned to them: Access Rights: Privileges that allow a User to perform operations (such as CREATE. all Permanent Space is assigned to Database DBC (also a User in Teradata Database terminology. Spool Space: The amount of space assigned and available to a User or Database to gather answer sets. The subordinate Database or User is called the child. Permanent Space for the new Database or User comes from its immediate parent. Perm Space: The maximum amount of Permanent Space assigned and available to a User or Database to store tables. When the Teradata Database software is first installed. and these results remain available to the User until the session is terminated. Unlike some other relational databases. Only the Permanent Space limit is defined. Permanent Space not being used for tables is available for Temp Space as well as Spool Space. DROP.htm 10/20/2011 . hierarchical organization. The owning Database or User is called the parent. A Logical Database Hierarchy In a logical. For example. Permanent Space not being used for tables is available for Spool Space. Depending on how the system is set up. the Teradata Database does not physically pre-allocate Perm Space for Databases and Users when they are defined during object definition time. Temp Space: The amount of space used for global temporary tables. Permanent Space limits for the children are subtracted from Database DBC. and SELECT) against database objects. A User must have the correct access rights to a database object in order to access it. the following Databases are created: Database Crashdumps (initially empty) User SystemFE (with its views and macros) User SysAdm (with its views and macros) Because Database DBC is the immediate parent of these child Databases. when executing a conditional query. qualifying rows are temporarily stored using Spool Space. a single query could temporarily use all available system space to store its result in spool.Page 74 of 137 Users). file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. During installation. Tables created in Temp Space will survive a restart. because you can log on to it with a userid and password). All Databases have a defined upper limit of Permanent Space. then the space is consumed dynamically as needed. Databases (including Users) are created subordinate to existing Databases or Users.

One way to set up this hierarchy would be to create a Database Administrator User directly subordinate to Database DBC.Page 75 of 137 Creating a New Database After the initial installation. Data Layers There are several “layers” built in to the EDW environment. the parent of all other Databases (including Users). All Databases and Users in the system created from User SysDBA . stored procedure. view. These layers include: Staging – the primary purpose of the staging layer is to perform data file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. you can assign it any name) with the majority of the system's Perm Space assigned to it. Next. Most of the system Permanent Space would be assigned to the Database Administrator User. User SysDBA (we called it SysDBA.htm 10/20/2011 . macro. all other Users and Databases would be created from the database administrator User. This setup gives you the freedom to have multiple administrators logging on to the Database Administrator User. Each table. and trigger are owned by a Database (or User). Your hierarchy would look like this: Database DBC at the highest level. and limit the number of people logging on directly to Database DBC (which has more access rights than any other User). and their Permanent Space limits would be subtracted from the Database Administrator User's space limit. you will create your database hierarchy.

The User Payroll is created as a child of HR with 100 GB of Permanent Space. etc. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. join indexes. which now has 100 GB (200 GB minus 100 GB). friendly access to end users. pre-aggregations. The 100 GB for Payroll is subtracted from HR. Database Marketing is created as a child of SysDBA. Semantic – this layer is the “access” layer. In this case. The purpose of this layer is to provide efficient.Page 76 of 137 transformation. Physical – the physical layer is where denormalizations that will make access more efficient occur. The 200 GB for HR is subtracted from SysDBA. the User SysDBA has 500 GB of maximum Permanent Space assigned to it. either in the ETL or ELT process. whether a Teradata application or a 3rd party tool. who now has 300 GB (500 GB minus 200 GB). At a different level under SysDBA. Maximum Perm Space Allocations: An Example Below is an example of how Permanent Space limits for Users and Databases come from the immediate parent User or Database.htm 10/20/2011 . summary tables. Access is often provided via views and business intelligence (BI) tools. The User HR is created from SysDBA with 200 GB of maximum Permanent Space.

the words "database" and "user" have specific definitions. The 100 GB for Marketing comes from its parent. User: A Special Kind of Database A user may be a collection of tables. Teradata allows users to create scalar functions to return single value results. maintained. (Requires Perm Space) View: A virtual "window" into subsets of one or more tables or other views. It is pre-defined using a single SELECT statement. (Requires Perm Space) User Defined Function: Allows authorized users to write external functions. aggregate functions to return summary results. A file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. triggers. macros. Database: The Teradata Definition In Teradata. User: A database that has a user ID and password for logging on to the Teradata Database. (Uses no Perm Space) Macro: A definition composed of one or more Teradata SQL and report formatting commands. and triggers. views. and deleted using SQL. (Uses no Perm Space) Trigger: One or more Teradata SQL statements attached to a table and executed when specified conditions are met. SysDBA. UDFs may be used to protect sensitive data such as personally identifiable data. (Uses no Perm Space) Stored Procedure: A combination of procedural and non-procedural statements run using a single CALL statement. which now has 200 GB (300 GB minus 100 GB). macros. A Teradata Database also provides a key role in space allocation and access control.Page 77 of 137 with 100 GB of maximum Permanent Space. a "database" is a logical grouping of information contained in tables. but no tables or stored procedures. Note: A Database with no Perm Space can contain views. and table functions to return tables. and stored procedures. These Teradata Database objects are created. A Teradata Database is a defined. including: Database: A defined object that may contain other database objects. A Teradata Database In Teradata Database systems.htm 10/20/2011 . Table: A two-dimensional structure of columns and rows of data. and may contain other database objects. logical repository that can contain objects.

Note: In this course. we will use uppercase "U" for User and uppercase "D" for Database when referring to these specific Teradata Database objects.htm 10/20/2011 . the Spool Space limit for the User or Database is inherited from its parent. To log on to a Teradata Database. but the Database or User's maximum spool allocation can only be as large as its immediate parent. if no Spool Space limit were defined for any Users or Databases." Spool Space is work space used to hold intermediate answer sets. For this reason. If it is not defined. an erroneous SQL request could create a "runaway transaction" that consumes all of the system's resources. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. defining Spool Space limits for a User or Database is highly recommended. you need to specify a user (which is simply a database with a password). a user is the same as a database except that a user can actually log on to the database. The Spool Space limit for a Database or User is not subtracted from its immediate parent. and has attributes in addition to the ones listed above: User ID Password So. For example: Database A has a Spool Space limit of 500 GB. You cannot log on to a database because it has no password. Spool Space Maximum Spool Space As mentioned previously in "Creating Databases and Users. Database B is created as a child of Database A. Any Perm Space currently unassigned is available as Spool Space. Thus. The maximum Spool Space that can be allocated to Database B is 500 GB.Page 78 of 137 user is a specific type of database. Defining a Spool Space limit is not required when Users and Databases are created.

Page 79 of 137

Database C is created as another child of Database A. The maximum Spool Space that can be allocated to Database C is also 500 GB.

Because Spool Space is work space, temporarily used and released by the system as needed, the total maximum Spool Space allocated for all the Databases and Users on the system can actually exceed the total system disk space. But this is not the amount of Spool Space actually consumed.

Consuming Spool Space
The maximum Spool Space for a Database (or User) is merely an upper limit of the Spool Space that the Database can use while processing a transaction. There are two limits to Spool Space utilization: The maximum Spool Space assigned to a User or Database. If a transaction is going to exceed its assigned limit, it is aborted and an error message is given stating that the maximum Spool Space was exceeded. Physical limitation of disk space. For a specific transaction, the system can only use the amount of Spool Space actually available on the system at that particular time, whether a maximum spool limit has been defined or not. If a job is going to exceed the Spool Space available on the system, an error message is given stating that there is not enough space to process the job. As the amount of Permanent Space used to store data varies over a long period of time, so will the amount of space available for spool (work space).

file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt.htm

10/20/2011

Page 80 of 137

Temporary Space
Temporary Space is Permanent Space currently not being used. Temporary Space is used for global temporary tables, and these results remain available to the user until their session is terminated. Tables created in Temp Space will survive a restart.

Check Your Understanding
Which statement is true? (Check the best answer.) The Spool Space used by a request is limited to the amount of Spool Space assigned to the originating user and the physical space available on the system at that point in time. A request can use as much Spool Space as necessary as long as it does not exceed the system’s total installed physical space limit. A request can use as much Spool Space as necessary as long as it does not exceed the Spool Space limit of the originating user, regardless of the space available on the system. The Spool Space used by a request is limited only by the maximum Perm Space of the originating user.

file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt.htm

10/20/2011

Page 81 of 137

Feedback:

Data Dictionary The Data Dictionary is a set of relational tables that contains information about the RDBMS and database objects within it. It is like the metadata or "data about the data" for a Teradata Database (except that it does not contain business rules, like true metadata does). The Data Dictionary resides in Database DBC. Some of the items it tracks are: Disk space Access rights Ownership Data definitions

Disk Space
The Data Dictionary stores information about how much space is allocated for perm and spool for each Database and User. The table below shows an example of Data Dictionary information for space allocations. In this example, the Users Payroll and Benefits have no Permanent Space allocated or consumed because they do not contain tables.

Access
The Data Dictionary also stores information about which Users can access which database objects. System Administrators are often responsible for archiving the system. In the example below, it is likely that the SysAdm User would have access to the tables in the Employee and Crashdumps databases, as well as other objects. When you grant and revoke access to any User for any database object, privileges are stored in the AccessRights table in the Data Dictionary.

file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt.htm

10/20/2011

Page 82 of 137

Owners
The Data Dictionary also stores information about which Databases and Users own each database object.

Definitions
The Data Dictionary stores definitions of all database objects, their names, and their place in the hierarchy.

file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt.htm

10/20/2011

Page 83 of 137

For macros, the Data Dictionary also stores the actual SQL statements of the macro. While stored procedures also contain statements (SQL and SPL statements), the statements for each stored procedure are kept in a separate table and distributed among the AMPs (like regular user data), rather than in the Data Dictionary.

Database Security There are several mechanisms for implementing security on a Teradata Database. These mechanisms include authenticating access to the Teradata Database with the following: LDAP Single Sign-On Passwords Authentication After users have logged on to Teradata Database and have been authenticated, they are authorized access to only those objects allowed by their database privileges. Additional Security Mechanisms In addition to authentication, there are several database objects or constructs that allow for a more secure database environment. These include: Privileges, or Access Rights Views Macros Stored Procedures User Defined Functions (UDFs) Roles – a collection of Access Rights Privilege (access right) is the right to access or manipulate an object within Teradata. Privileges control user activities such as creating, executing, inserting, viewing,

file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt.htm

10/20/2011

700 GB Feedback: See Calculation To review this topic. can be granted to groups of users to further protect the security of data and objects within Teradata. In addition to access rights. the database hierarchy can be set up such that users access tables or applications via the semantic layer. C. Macros. Privileges may also include the ability to grant privileges to other users in the database. D. A User and password. Exercise 4.3 Select the answers from the options given in the drop-down boxes that correctly complete the sentences. An IP address. Roles.1 When you log on to the Teradata Database. 600 GB D. Feedback: To review this topic. which are a collection of access rights. A SELECT command. which could include Views.htm 10/20/2011 . how much available Perm Space does Database_Employee have now? A. Stored Procedures. or tracking database objects and data. deleting. you must specify: A. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. A view is a "virtual table" that does not exist as an actual table.Page 84 of 137 modifying. 300 GB B. click Space Allocations: An Example. Exercise 4. click A Teradata Database. 500 GB C.2 Database_Employee was created with 500 GB of Perm Space. B. If Database_Addresses (100 GB of Perm Space) and Database_Compensation (100 GB of Perm Space) both are created from Database_Employee. and even UDFs. Exercise 4. The path to the data.

click A Teradata Database. Exercise 4. Exercise 4. Exercise 4. and stored procedures. Feedback: Show Answers Reset To review these topics. triggers.Page 85 of 137 Permanent Space is pre-defined and allocated for a Database or User. and table functions to return tables.5 A Teradata User is a special type of database: A.6 True of False: A User-Defined Function (UDF) allows authorized users to write external functions. Always B. Work area consumed by the system as it processes requests. Perm Space limits apply to Databases. tables. Teradata allows users to create scalar functions to return single value results. Users must have privileges to access any database object. aggregate functions to return summary results. Temp Space is used for global temporary tables. views.htm 10/20/2011 . click Creating Databases and Users. Users. Maximum space allocated to Databases and Users for data. Perm Space is assigned to a User or Database to gather answer sets.4 Select the choice from the drop-down box that corresponds to each statement: Privileges granted to Users and Databases. Sometimes C. macros. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Creating a New Database. Feedback: Show Answers Reset To review these topics. Never Feedback: To review this topic. UDFs may be used to protect sensitive data such as personally identifiable data. click Creating Databases and Users.

Feedback: Show Answers Reset To review these topics. you should be able to: file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. whether a Teradata application or a 3rd party tool. This is the “access” layer. LDAP B. User Defined Functions D. either in the ETL or ELT process. preaggregations. Exercise 4.) A.7 The three Teradata Database security mechanisms for authenticating access to the Teradata Database are? (Choose three.Page 86 of 137 A. etc.8 Match the data “layers” built into the Teradata EDW environment to their definitions. Single Sign-On C. Access is often provided via views and business intelligence (BI) tools. click A Teradata Database. Exercise 4. friendly access to end users. This layer is where denormalizations that will make access more efficient occur.Data Protection Objectives After completing this module. The primary purpose for this layer is to perform data transformation. join indexes. The purpose of this layer is to provide efficient. False Feedback: To review this topic. Passwords Feedback: Check Answer Show Answer To review these topics. True B. Mod 5 .htm 10/20/2011 . click Data Layers . summary tables. click Database Security.

The industry has agreed on six RAID configuration levels (RAID 0 through RAID 5). With the Teradata Database. the two RAID technologies that are supported are RAID 1 and RAID 5. The classifications do not imply superiority of one mode over the other." implies that either data. It uses groups of disk drives called "arrays" to ensure that data is available in the event of a failed disk drive or other component. Protecting Data Several types of data protection are available with the Teradata Database. "redundant. RAID 5 is called RAID S. All the data protection methods shown on this page are covered in further detail later in this module. List the types and levels of locking provided by the Teradata Database. but differentiate how data is stored on the disk drives. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt.Page 87 of 137 Describe the types of data protection and fault tolerance used by the Teradata Database.htm 10/20/2011 . Discuss the types of RAID protection used on Teradata Database systems. RAID Redundant Array of Inexpensive Disks (RAID) is a storage technology that provides data protection at the disk drive level. The word. On systems using EMC disk drives. Please be sure to click on each hotword link to capture all of the training content. Explain the concept of Fallback tables. and permanent journals. and/or components have been duplicated in the array's architecture. functions. HOT TIP: This module contains links to important supplemental course information. transient journals. Describe the function of recovery journals. Explain basic data storage concepts.

Page 88 of 137 Disk arrays contain the following major components: SCSI bus Physical disks Disk array controllers For maximum availability and performance.htm 10/20/2011 . Fallback Fallback is a Teradata Database feature that protects data against AMP failure. The four types of locks are: Exclusive Write Read Access file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Fallback uses clusters of AMPs that provide for data availability and consistency if an AMP is unavailable. Locks Locks can be placed on database objects to prevent multiple users from simultaneously changing them. Having two disk array controllers provides a level of protection in case one controller fails. the Teradata Database uses dual redundant disk array controllers. and provides parallelism for disk access. As shown later in this module.

The highest level of data protection is RAID 1 with Fallback.Page 89 of 137 Journals The Teradata Database has journals that are used for specific types of data or process recovery: Recovery Journals Permanent Journals RAID 1 RAID 1 is a data protection scheme that uses mirrored pairs of disks to protect data from a single drive failure. Recovery with RAID 1 is faster than with RAID 5. RAID 1: How It Works file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. RAID 1: Effects on Your System RAID 1 requires double the number of disks because every drive has an identical mirrored copy.htm 10/20/2011 .

the Teradata Database is unaffected and the following are each handled in a different way: Reads Writes Replacements Reads: When a drive is down. Reading: Using both copies of the data. the RDAC stripes the data across both the regular and mirror disks. There may be a minor performance penalty because the read will occur from one drive instead of both.htm 10/20/2011 . the system reads data blocks from the first available disk. RAID 1: How It Handles Failures If a disk fails.Page 90 of 137 RAID 1 protects against a single disk failure using the following principles: Mirroring Reading Mirroring: RAID 1 maintains a mirrored disk for each disk in the system. Note: If you configure more than one pair of disks per AMP. the system reads the data from the other drive. This does not so much protect data as provide a performance benefit. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt.

Normal system performance is affected during the reconstruction of the failed disk. RAID 5 also uses some overhead during a write operation. RAID 5: Effects on Your System The number of disks per rank varies from vendor to vendor. The number of disks in a rank impacts space utilization: 4 drives per rank requires a 33% increase in data space. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. then calculate and write the parity. Replacements: After you replace the failed disk. RAID 5 RAID 5 is a data protection scheme that uses parity striping in a disk array to protect data from the failure of a single drive.htm 10/20/2011 . the system writes to the functional drive. 5 drives per rank requires a 25% increase in data space. No mirror image exists at this time. Note: RAID S is the name for RAID 5 implemented on EMC disk drives. the disk array controller automatically reconstructs the data on the new disk from the mirror image.Page 91 of 137 Writes: When a drive is down. because it has to read the data.

The process of writing data and parity to the disk drives includes a read-modify-write operation for each new segment: 1. RAID 5 uses the concept of a rank. and 3. The system calculates the parity byte using the binary XOR algorithm and writes it to disk drive 4. Write new parity. 5.Page 92 of 137 RAID 5: How It Works RAID 5 uses a data parity scheme to provide data protection. using a binary "exclusive-or" (XOR) algorithm. 4. 2. Rank: For the Teradata Database. the system uses the parity byte to calculate the missing data from the down drive so the system can remain operational. Parity: In RAID 5. Read existing data on the disk drives in the rank. Calculate the parity: existing data + new data + existing parity = new parity. A "parity byte" is an extra byte written to a drive in a rank. data bytes are written to disk drives 1. 3. If one of the disk drives in the rank becomes unavailable.htm 10/20/2011 . if a disk fails. data is handled as follows: Data is striped across a rank of disks (spread across the disk drives) one segment at a time. Read existing parity in that rank for the corresponding segment. any missing data block may be reconstructed using the other 3 disks. interleaved with the data. Parity is also striped across all disk drives. Note that the disks in a rank are not directly cabled to each other. RAID 5: How It Handles Failures file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. With a rank of 4 disks. In the example below. Write new data. which is a set of disks working together. 2.

the system writes to the functional drives. Disk 2 has experienced a failure. 0000 1100 Feedback: file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Normal system performance is affected during reconstruction of the failed disk. To allow users to still access the data while Disk 2 is down. Writes: When a drive is down. but not to the failed drive. Replacements: After you replace the failed disk. 0010 0110 D. What would be the missing byte for this segment? A. Give It a Try In the example below. the Teradata Database is unaffected and the following are each handled in different ways: Reads Writes Replacements Reads: Data is reconstructed on-the-fly as users request data using the binary XOR algorithm.Page 93 of 137 If a disk fails. 1111 0011 B. using known data values to calculate the missing data. the disk array controller automatically reconstructs the data on the new disk. the system must calculate the data on the missing disk drive using the parity byte.htm 10/20/2011 . 0111 1011 C.

If an AMP fails. you may specify whether or not the system should keep a Fallback copy of the table. it is automatic and transparent. Fallback: Effects on Your System Fallback has the following effects on your system: Space In addition to the original database size.Page 94 of 137 Fallback Fallback is a Teradata Database feature that protects data in the case of an AMP vproc failure. If Fallback is specified. Fallback guarantees the maximum availability of data. Fallback provides AMP fault tolerance at the table level. the alternate row is still available on the other AMP. During table creation or after a table is created. If either AMP fails. With Fallback tables. Fallback guarantees that the two copies of a row will always be on different AMPs. It is especially useful in applications that require high availability. you need space for: Fallback-protected tables (100% additional storage space for each Fallbackprotected table) RAID protection of Fallback-protected tables file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Users may continue to use Fallback tables without any loss of access to data. the system accesses the Fallback rows to meet requests. You can specify Fallback protection at the table or database level. if one AMP fails.htm 10/20/2011 . Fallback protects your data by storing a second copy of each row of a table on a different AMP in the same cluster. all data is still available.

Fallback: Software Tools The following Teradata utilities are used to recover a failed AMP: Vproc Manager: Enables you to: Display and modify vproc states. you need twice the disk space for storage and twice the I/O required for INSERTs. Recovery Manager: Lets you monitor recovery processing. Table Rebuild: Reconstructs tables on an AMP from data on other AMPs in the cluster. The Fallback option does not require any extra I/O for SELECTS. With Fallback use. and the Fallback I/O will be performed in parallel with the primary I/O so there is no performance hit. Permits access to data while an AMP is off-line. Fallback benefits include: A level of protection beyond RAID disk array protection.htm 10/20/2011 . Can be specified on a table-by-table basis to protect data requiring the highest availability.Page 95 of 137 Performance There is a benefit to protecting your data. Automatically restores data that was changed during the AMP off-line period. Fallback: How It Works file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Initiate Teradata Database restarts. The highest level of data protection is Fallback and RAID1. but there are costs associated with that benefit. UPDATEs. and DELETEs of rows in Fallback protected tables. as the system will read from one copy or the other.

The Primary and Fallback data rows are written in parallel. When a table is defined as Fallback-protected. P=Primary F=Fallback Read: When an AMP is down with a table that is defined as Fallback. Below is a cluster of four AMPs. Write: Each Primary data row has a duplicate Fallback row on another AMP. the system stores a second copy of each row in the table on a "Fallback AMP" in the AMP cluster.Page 96 of 137 Fallback is accomplished by grouping AMPs into clusters. Teradata will access the Fallback copies of the rows. Each AMP has a combination of Primary and Fallback data rows: Primary Data Row: A record in a database table that is used in normal system operation. Fallback Data Row: The online backup copy of a Primary data row that is used in the case of an AMP failure. P=Primary F=Fallback file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt.htm 10/20/2011 . More Clusters: The diagram below shows how Fallback data is distributed among multiple clusters.

after the AMP is recovered by replacing the failed disk(s). Each software component recognizes and interacts with different components of the data storage environment: Operating system: Recognizes a logical unit (LUN). Fallback protects against the failure of a single AMP in a cluster. The operating system recognizes the LUN as its "disk. If the system needs to find a Primary row from the failed AMP. it reads the Fallback copy of that row. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt.htm 10/20/2011 . PDE. Reads: When an AMP fails. This technique enables the use of RAID technology to provide data availability without affecting the operating system. so the system cannot access any of that AMP's disk space. and are updated there. Replacement: Repairing the failed AMP requires replacing the failed physical disks and bringing the AMP online. Using vdisks instead of direct connections to physical disk drives enables the use of RAID technology with the Teradata Database. the associated AMP vproc fails. and the Teradata Database do not recognize the physical disk hardware. Disk Allocation The operating system. If two AMPs in a cluster fail. Teradata Database: Recognizes a virtual disk (vdisk). the system uses the Fallback data on the other AMPs to automatically reconstruct data on the newly replaced disks.Page 97 of 137 Fallback: How It Handles Failures If two physical disks fail in the same RAID 5 rank or RAID 1 mirrored pair." and is not aware that it is actually writing to spaces on multiple disk drives. the system reads all rows it needs from the remaining AMPs in the cluster. Once the AMP is online. Writes: A failed AMP is not available. which is on another AMP. the system halts and must be restarted manually. PDE: Translates LUNs into vdisks using slices (in UNIX) or partitions (in Microsoft Windows and Linux) in conjunction with the Teradata Parallel Upgrade Tool. Copies of its unavailable primary rows are available as Fallback rows on the other AMPs in the cluster.

a LUN consists of one partition. taking up only 35 sectors) User slices for storing data. Pdisks: User Data Space After a LUN is created. it is divided into partitions. if you are using RAID 5. which is further divided into slices: Boot slice (a very small slice.htm 10/20/2011 . file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. a LUN includes a region of space from each of the physical disk drives in a rank. For example.Page 98 of 137 Creating LUNs Space on the physical disk drives is organized into LUNs. The RAID level determines how the space is organized. These user slices are called "pdisks" in the Teradata Database. In UNIX systems.

partitions (Microsoft Windows). both use a Master Boot Record and an MS DOS style partition table. However. Assigning Pdisks to AMPs The pdisks (user slices or partitions. The partitions store data and are called "pdisks" in the Teradata Database. An AMP manages only its own vdisk (disk space assigned to it). an AMP recognizes only the vdisk. No cabling is involved. not slices. The AMP has no control over the physical disks or ranks that compose the vdisk. pdisks are the user slices (UNIX).htm 10/20/2011 . The combined space on the pdisks is considered the AMP's vdisk. depending on the operating system) are assigned to an AMP through the software. not the vdisk of any other AMP. Thus. Linux systems are similar to Microsoft Windows. LUNs in Microsoft Windows do not have a boot slice. generally all pdisks from a rank (RAID 5) or mirrored pair (RAID 1) are assigned to the same AMP for optimal performance. A LUN may have one or more pdisks. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. processing their portion of the data. a LUN consists of multiple partitions.Page 99 of 137 In Microsoft Windows systems. All AMPs then work in parallel. or partitions (Linux) and are used for storage of the tables in a database. Vdisks and Ranks Each AMP in the system is assigned one vdisk. In summary. Although numerous configurations are possible. they contain a "Master Boot Record" that includes information such as the partition layout. Instead.

Thus. This is the collective name for all the logical disk space that an AMP manages. It is just another name for user slice or partition. but from the Teradata Database point of view." and is not aware that it is actually writing to spaces on multiple disk drives. choose the correct term from the pulldown boxes next to each definition. For a UNIX system. A logical unit that is composed of a region of space from each of the physical disk drives in a rank. a portion of physical disk drive space that is used for storing data. These are assigned to AMPs. The operating system sees this logical unit as its "disk. One of these from each disk drive in a rank composes a LUN. One of these from each disk drive in a rank composes a LUN. Feedback: Show Answers Reset Journals for Data Availability The following journals are kept on the system to provide data availability in the event of a component or process failure in the system: file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. This is Teradata Database terminology for a user slice (UNIX) or partition (Microsoft Windows) that store data. it is composed of all the pdisks assigned to that AMP (as many as 64 pdisks). For a Microsoft Windows system. a portion of physical disk drive space that is used for storing data.htm 10/20/2011 .Page 100 of 137 Reviewing the Terminology To help review the terminology you just learned. which manage the data stored.

Down-AMP Recovery Journal The Down-AMP Recovery Journal allows continued system operation while an AMP is down (for example. The journal is discarded once the process is complete. Any changes to the data on the failed AMP are logged into the Down-AMP Recovery Journal by the other AMPs in the cluster. When a transaction is started. A Transient Journal is used during normal system operation to keep "before images" of changed rows so the data can be restored to its previous state if the transaction is not completed. A Down-AMP Recovery Journal is used with Fallback-protected tables to maintain a record of write transactions (updates. system restarts. and the AMP is brought online. and so on).htm 10/20/2011 . The Down-AMP Recovery Journal starts automatically after the loss of an AMP in a cluster. and purged by the system automatically. Transient Journal A Transient Journal maintains data integrity when in-flight transactions are interrupted (due to aborted transactions. the system automatically stores a copy of all the rows affected by the transaction in the Transient Journal until the transaction is committed (completed).Page 101 of 137 Recovery Journals Permanent Journals Recovery Journals The Teradata Database uses Recovery Journals to automatically maintain data integrity in the case of: An interrupted transaction (Transient Journal) An AMP failure (Down-AMP Recovery Journal) Recovery Journals are created.) on the failed AMP while it is unavailable. and the "rollback" operation is completed. Recovery Journals are tables stored on disk arrays like user data is. maintained. In the event of a transaction failure. Permanent Journals file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. when two disk drives fail in a rank or mirrored pair). so no DBA intervention is required. Data is returned to its original state after transaction failure. deletes. inserts. etc. Once the transaction is complete. the "before images" are reapplied to the affected tables and deleted from the journal. the "before images" are purged. When the failed AMP is brought back online. so they take up disk space on the system. the restart process includes applying the changes in the Down-AMP Recovery Journal to the recovered AMP. fully recovered. creates. This happens on each AMP as changes occur.

the Database Administrator must dump the Permanent Journal to external media. The Permanent Journal captures images concurrently with standard table maintenance and query activity. and so on. The additional disk space required may be calculated in advance to ensure adequate resources.for rollforward to "redo" to a specific state. Dual images -. After images -. archiving.for rollback to "undo" a set of changes to a previous state. Locks are automatically acquired during the processing of a request and released when the request is terminated.htm 10/20/2011 . Permanent Journals are tables stored on disk arrays like user data is. Locks Locking prevents multiple users who are trying to access or change the same data simultaneously from violating data integrity. When you create a table with Permanent Journaling. It can also reduce the need for costly and timeconsuming full-table backups.this means that the Permanent Journal table is Fallback protected. The Database Administrator maintains the Permanent Journal entries (deleting. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. so they take up additional disk space on the system. you must specify whether the Permanent Journal will capture: Before images -. This concurrency control is implemented by locking the target data. Levels of Locking Locks may be applied at three levels: Database Locks: Apply to all tables and views in the database. thus reducing the need for full-table backups since only changes are backed up rather than the entire database. It provides fulltable recovery to a specific point in time. You specify the use of Permanent Journals at the table level.Page 102 of 137 Permanent Journals are an optional feature used to provide an additional level of data protection. In addition. you can choose that the system captures: Single images (the default) -. You can also specify that the system keep both before images and after images.this means that the Permanent Journal table is not Fallback protected. Periodically.) How Permanent Journals Work A Database (object) can have one Permanent Journal.

htm 10/20/2011 . Exclusive Exclusive locks are applied to databases or tables. Types of Locks The four types of locks are described below. With an exclusive lock. Write Write locks enable users to modify data while maintaining data consistency.. During this time.Page 103 of 137 Table Locks: Apply to all rows in the table or view. Exclusive locks are used when a Data Definition Language (DDL) command is executed (i. never to rows. all other locks are held in a queue until the write lock is released. single-row changes. no other user can access the database or table. Several users may hold concurrent read locks on the same data. While the data has a write lock on it. Row Hash Locks: Apply to a group of one or more rows in a table. Access locks are designed for decision support on tables that are updated only by small. during which time no data modification is permitted.e. other users can only obtain an access lock. Read locks prevent other users from obtaining the following locks on the locked data: Exclusive locks Write locks Access Access locks can be specified by users unconcerned about data consistency. because you may get file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. An exclusive lock on a database or table prevents other users from obtaining any lock on the locked object. Access locks are sometimes called "stale read" locks. Read Read locks are used to ensure consistency during read operations. The use of an access lock allows for reading data while modifications are in process. They are the most restrictive type of lock. CREATE TABLE).

Page 104 of 137 "stale data" that has not been updated. click RAID 1: How It Handles Failures or RAID 5: How It Handles Failures. Parity Striping D.2 RAID 5 protects data from disk failures using: A. Mirroring C. A. False Feedback: To review this topic. Allows other users to obtain an access lock only. Partitioning file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Access locks prevent other users from obtaining the following locks on the locked data: Exclusive locks What Type of Lock? Match the type of lock to the descriptions: Allows other users to see a stable version of the data. the Teradata Database halts. This kind of lock cannot be applied to rows. but not make any modifications. DARDAC B. Exercise 5. Feedback: Show Answers Reset Exercise 5.htm 10/20/2011 . then restarts. not any other type of lock. True B.1 True or False: If a single disk drive fails.

D. Locks prevent multiple users from simultaneously changing the same data. Transient Journal. Archival Utilities. click RAID 5: How It Works. click Fallback.htm 10/20/2011 . C. True B. True B. Protects data from a transaction that does not complete. False Feedback: file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. B. Cliques Provide Resiliency.Page 105 of 137 Feedback: To review this topic. A. A clique provides protection in the case of a node failure. Feedback: Check Answer Show Answer To review these topics. True or False: Fallback protection is specified at the row hash level. ARC protects disk arrays from electrostatic discharge. click Down-AMP Recovery Journal. Feedback: Show Answers Reset To review this topic.3 Match the type of journal to the appropriate phrase: Stores before-images and after-images.) A. or Permanent Journals. Exercise 5. Starts logging changes for a Fallback table when an AMP goes down. False Feedback: 2. or Locks.4 Which three statements are true? (Choose three. A. Exercise 5. Exercise 5.5 True or False: Restoration of Fallback-protected data starts automatically when a failed AMP is brought online. Fallback protects data from the failure of one AMP per cluster.

The operating system sees this as its logical "disk. Fallback: How It Works. Distinguish between a USI and a NUSI.htm 10/20/2011 . match the storage concepts to the descriptions: The collection of pdisks used to store data. Pdisks: User Data Space. Exercise 5. In the Teradata Database. Distinguish between a UPI and a NUPI. you should be able to: List tasks Teradata Database Administrators never have to perform.Indexes Objectives After completing this module. Describe the sequence of events for locating a row. Explain the roles of the hashing algorithm and hash map in locating a row.Page 106 of 137 To review these topics. A collection of areas across the disk drives in a rank. A collection of disk drives used to provide data availability. Describe the operation of a full-table scan. An area of a LUN (also known as a user slice in UNIX or partition in Microsoft Windows) that stores user data. click Assigning Pdisks to AMPs. click Fallback. Indexes in the Teradata Database Indexes are used to access rows from a table without having to search the whole table. Please be sure to click on each hotword link to capture all of the training content. HOT TIP: This module contains links to important supplemental course information. Creating LUNs." A collection of AMPs that keeps Fallback copies of rows for each other in case one AMP fails. or RAID 5: How It Works. Feedback: Show Answers Reset To review these topics.6 From the drop-down boxes below. Once file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. an index is made up of one or more columns in a table. Distinguish between a primary index and a primary key. Mod 6 . Fallback: How It Handles Failures. This space is assigned to an AMP. Define primary and secondary indexes and their purposes. Explain the makeup of the Row-ID and its role in row storage. Define a Partitioned Primary Index (PPI) and its purpose.

Primary Indexes and Secondary Indexes are used to locate the data rows more efficiently than scanning the whole table. Even data distribution is critical to performance because it optimizes the parallel access to the data. the work is evenly divided among the AMPs so they can work in parallel and complete their processing about the same time. the rows are evenly distributed across the AMPs for the best performance. also called "skewed data. The values do not have to be evenly spaced. The way to guarantee even distribution of data is by choosing a Primary Index whose columns contain unique values. While other vendors may require data partitioning or index maintenance. You specify which column or columns are used as the Primary Index when you create a table.htm 10/20/2011 . The slowest AMP becomes a bottleneck. Unevenly distributed data.Page 107 of 137 Teradata Database indexes are selected. Data Distribution When the Primary Index for a table is well chosen. If the data is evenly distributed. they are maintained by the system. these tasks are unnecessary with the Teradata Database. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. an all-AMP operation will take longer than if all AMPs were evenly utilized. In the Teradata Database." they just have to be unique to be evenly distributed. Secondary Index columns can be specified when you create a table or at any time during the life of the table." causes slower response time as the system waits for the AMP(s) with the most data to finish their processing. Each AMP is responsible for a subset of the rows in a table. or even "truly random. If distribution is skewed. there are two types of indexes: Primary Indexes define the way the data is distributed.

The automatic. and they are independent of any query being submitted. and illustrates why the Teradata Database system is so easy to manage and maintain compared to other databases. the data can be redistributed on the larger configuration with no offloading and reloading required. they are not required. sort. The benefits of having unordered data are that they don't need any maintenance to preserve order. Teradata Database Manageability A key benefit of the Teradata Database is its manageability. Physical partitioning of disk space. The DBA does not waste time on labor-intensive data maintenance tasks.). Data is not distributed in any particular order. The distribution is the same regardless of the data volume being loaded. In other words. Pre-allocate table/index space. split. While it is possible to have partitioned indexes in the Teradata Database. etc. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. large tables are distributed the same way as small tables. Write or run programs to split input source files into partitions for loading. and they are created logically. unordered distribution of data eliminates tasks for a Teradata Database Administrator that are necessary with some other relational database systems. Things Teradata Database Administrators Never Have to Do Teradata Database Administrators never have to do the following tasks: Reorganize data or index space.Page 108 of 137 When data is loaded into the Teradata Database: The system automatically distributes the data across the AMPs based on row content (the Primary Index values).htm 10/20/2011 . Unload/reload data spaces due to expansion. Pre-prepare data for loading (convert. The list of tasks that Teradata Database Administrators do not have to do is long. With the Teradata Database.

000. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. updating and deleting data affects manual data distribution schemes thereby reducing query performance and requiring reorganization. With the Teradata Database. C. Because they scan indexes and use only part of the data in the index to search for answers to a query. each AMP will still contain some rows from that table. B. each AMP will not have exactly the same number of rows from that table. This is not as efficient as the Teradata Database's method of data storage and access. However. They use them to avoid accessing the underlying tables if possible. Teradata Database Administrator know that if data doubles. Others may use indexes as a way to select a small amount of data to return the answer to a query. duplicating data in the tables. Skewed data leads to poor performance in processing data access requests. they can carry extra data in the indexes.Page 109 of 137 With the Teradata Database. Customers tell us that their DBA staff requirements for administering non-Teradata databases are three to four times higher.) A. With other databases. especially when it comes to staffing Database Administrators. A Teradata Database provides high performance because it distributes the data evenly across the AMPs for parallel processing. Other DBAs have to ask themselves questions like: How should I partition the data? How large should I make the partitions? Where do I have data contention? How are the users accessing the data? Many other databases require the DBAs to manually partition the data. If a table has 103 rows and there are 4 AMPs in the system. No partitioning or data re-organizations are needed.000 rows. the system can expand easily to accommodate it. Many databases use range distribution. if the Primary Index is chosen well.htm 10/20/2011 . the workload for creating a table of 100 rows is the same as creating a table with 1. How Other Databases Store Rows and Manage Data Even data distribution is not easy for most databases to do. which creates intensive maintenance tasks for the DBA. They might place an entire table in a single partition.000. The Teradata Database provides huge cost advantages. The disadvantage of this approach is it creates a bottleneck for all queries against that data. This way they do not have to read the table at all in some cases. your DBA can spend more time with users developing strategic applications to beat your competition! What Do You Think? Which two statements are true about data distribution and Teradata Database indexes? (Choose two. adding. The rows of a table are stored on a single disk for best access performance. It is not the most efficient way to either store or access data rows. The assumption is that the index will be smaller than the tables so they will take less time to read.

For a given row. Rule 1: One PI Per Table Each table must have a Primary Index.or multiple-column) Primary Index. Teradata Database performance can be increased by maintaining the indexes and conducting periodic data partitioning and sorting. It is also used to access rows without having to search the entire table. Rule 2: A Primary Index value can be unique or non-unique. While a Primary Index may be composed of multiple columns.htm 10/20/2011 . file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Rule 5: The Primary Index of a populated table cannot be modified. because this choice affects both data distribution and access. Feedback: Check Answer Show Answer Primary Index A Primary Index (PI) is the physical mechanism for assigning a data row to an AMP and a location on the AMPs disks. the table can have only one (single. The Primary Index is the way the system determines where a row will be physically stored. the Primary Index value is the combination of the data values in the Primary Index columns. You specify the column(s) that comprise the Primary Index for a table when the table is created.Page 110 of 137 D. Rule 4: The Primary Index value can be modified. Choosing a Primary Index for a table is perhaps the most critical decision a database designer makes. Primary Index Rules The following rules govern how Primary Indexes in a Teradata Database must be defined as well as how they function: Rule 1: One Primary Index per table. Rule 6: A Primary Index has a limit of 64 columns. Rule 3: The Primary Index value can be NULL. A Primary Index operation is always a one-AMP operation.

but in specific instances can still be a good Primary Index choice.For a given row. so the columns are unique. With a UPI. there can be more than one row with the same PI value. Non-Unique Primary Index (NUPI) . the combination of the Social Security Number and Employee Number columns would be a UPI. there is no duplicate row checking done during a load.For a given row. in the case where old employee numbers are sometimes recycled.htm 10/20/2011 . So. the combination of the data values in the columns of a Unique Primary Index are not duplicated in other rows within the table. the combination of the data values in the columns of a Non-Unique Primary Index can be duplicated in other rows within the table. For example. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. either the Department Number column or the Hire Date column might be a good choice for a NUPI if you will be accessing the table most often via these columns. For example. A NUPI can cause skewed data.Page 111 of 137 Rule 2: Unique or Non-Unique PI There are two types of Primary Index: Unique Primary Index (UPI) . which makes it a faster operation. This uniqueness guarantees even data distribution and direct access.

Rule 4: PI Value Can Be Modified The Primary Index value can be modified. you must drop the table. and reload the table. if Loretta Ryan changes departments. Rule 6: PI Has 64-Column Limit file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt.htm 10/20/2011 . the Primary Index must be Non-Unique. When you update the index value in a row. Rule 5: PI Cannot Be Modified The Primary Index of a table cannot be modified. you could have one row with a null value. The ALTER TABLE statement allows you to change the PI of a table if the table is empty. In the event that you need to change the Primary Index. In the table below. the Teradata Database re-hashes it and redistributes the row to its new location based on its new index value.Page 112 of 137 Rule 3: PI Can Be NULL If the Primary Index is unique. the Primary Index value for her row changes. recreate it with the new Primary Index. If you have multiple rows with a null value.

the system will use the first column in the table and designate it as a Non-Unique Primary Index. If a Primary Key has not been specified. it must have a Primary Index specified.col_y INT .col_c INT) UNIQUE PRIMARY INDEX (col_b). and reload the table. Creating a Non-Unique Primary Index The SQL syntax to create a Non-Unique Primary Index is: CREATE TABLE sample_2 (col_x INT . If you do not specify a Primary Index in the CREATE TABLE statement. recreate it with the new Primary Index.col_b INT . The Primary Index is designated in the CREATE TABLE statement in SQL. you cannot modify the Primary Index of a table. In the event that you want to change the Primary Index.Page 113 of 137 You can designate a Primary Index that is composed of 1 to 64 columns. the system will choose the first unique column.htm 10/20/2011 . the system will use the Primary Key as the Primary Index. Creating a Unique Primary Index The SQL syntax to create a Unique Primary Index is: CREATE TABLE sample_1 (col_a INT . Modifying the Primary Index of a Table As mentioned in the Primary Index rules. Data Mechanics of Primary Indexes This section describes how Primary Indexes are used in: Data distribution Data access file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. If there are no unique columns. SQL Syntax for Creating a Primary Index When a table is created.col_z INT) PRIMARY INDEX (col_x). you must drop the table.

2. An advantage of the Teradata Database is that the Teradata File System manages data and disk space automatically. During a reconfiguration. The system looks at the hash map. which identifies the specific AMP where the row will be stored (in this example. using a data loading utility) Inserting or updating rows (one or more rows. which results in evenly distributed workloads. no data is accessible to users until the system is operational in its new configuration. the data being affected by the load or insert is not available to other users until the transaction is complete. using SQL) Changing the system configuration (redistribution of data. An AMP is responsible for the storage. Row Distribution Process The process the system uses for inserting a row on an AMP is described below: 1. Duplicate Row Hash Values It is possible for the hashing algorithm to end up with the same row hash value for two file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. The output of the hashing algorithm is the row hash value (in this example. maintenance. The Teradata Database's automatic hash distribution eliminates costly data maintenance tasks. caused by reconfigurations to add or delete AMPs) When loading data or inserting rows. Rows are distributed to AMPs during the following operations: Loading data into a table (one or more rows. The row is stored on the target AMP. which eliminates the need to rebuild indexes when tables are updated or structures change.htm 10/20/2011 . The system uses the Primary Index value in each row as input to the hashing algorithm. Each AMP holds a portion of the rows of each table. 3. data is hashed across all AMPs in the system for even data distribution. 646).Page 114 of 137 Distributing Rows to AMPs The Teradata Database uses hashing to randomly distribute data across all AMPs for balanced performance. AMP 3). For example. in a two clique system. 4. and retrieval of the data under its control.

the check for duplicate rows is replaced by a check for duplicate index values. every row is assigned a unique Row ID. Hash synonyms are rare. the entire row is the same. To differentiate each row in a table. duplicate NUPI values will produce the same row hash value. Duplicate Rows A duplicate row is a row in a table whose column values are identical to another row in the same table. not just the index. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. The uniqueness value is incremented by 1 for any additional rows inserted with the same row hash value.htm 10/20/2011 . the AMP adds the row ID. The Teradata Database checks for and does not permit duplicate rows. only the row hash value portion of the Row ID is needed to locate the row. is unique? When you create a table. Because duplicate rows are allowed in the Teradata Database. The Teradata Database will not check for duplicate rows. When using a Unique Primary Index. the following definitions determine whether or not it can contain duplicate rows: MULTISET tables: May contain duplicate rows. which. Hash synonym: Also called a hash collision.Page 115 of 137 different rows. the ANSI Standard does allow duplicate rows and the Teradata Database supports that. If a SET table is created with a Unique Primary Index. by definition. The first row inserted with a particular row hash value is assigned a uniqueness value of 1. you will still get uniform data distribution. this occurs when the hashing algorithm calculates an identical row hash value for two different Primary Index values. In other words. There are two ways this could happen: Duplicate NUPI values: If a Non-Unique Primary Index is used. The Row ID is the combination of the row hash value and a uniqueness value. how does it affect the UPI. Row ID = Row Hash Value + Uniqueness Value The uniqueness value is used to differentiate between rows whose Primary Index values generate identical row hash values. Although duplicate rows are not allowed in the relational model (because every Primary Key must be unique). SET tables: The default. stored as a prefix of the row. When each row is inserted. In most cases.

Known Access Paths . For example. and deletes that specify the Primary Index are much faster than those that do not.Page 116 of 137 Accessing a Row With a Primary Index When a user submits an SQL request against a table using a Primary Index. Eliminates duplicate row checking during a load. The process is explained below. Choosing a Unique or Non-Unique Primary Index Criteria for choosing a Primary Index include: Uniqueness: A UPI is often a good choice because it: Guarantees even data distribution. which makes it a faster operation. and might meet the other criteria better. 6. which is the most direct and efficient way for the system to find a row. The output of the hashing algorithm is the row hash value.htm 10/20/2011 . 3. updates. the request becomes a one-AMP operation. The hash map points to the specific AMP where the row resides. The data is sent over the BYNET to the PE. it is best to choose column(s) that will be frequently used for access. A NUPI with few duplicate values could provide good (if not perfectly uniform) distribution. and the PE sends the answer set on to the client application. 5. Hashing Process 1. the following SQL statement would directly access a row based on the equality WHERE clause: file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. The AMP locates the row(s) on its vdisk.Use in value access: Retrievals. 2. Because a Primary Index is a known access path to the data. 4. The primary index value goes into the hashing algorithm. The PE sends the request directly to the identified AMP.

the table may be used by the Mail Room to track package delivery. a column containing room numbers or mail stops may not be unique if employees share offices. in an Invoicing table. E. A customer ID. Column(s) with values in sequential order for best load and access performance. Column(s) with many duplicate values for redundancy. Non-volatile values: Look for columns where the values do not change frequently. For example.Page 117 of 137 SELECT * FROM employee WHERE employee_ID = ABC456789 A NUPI may be a better choice if the access is based on another. if the Employee table and the Payroll table are related by the Employee ID column. Feedback: Check Answer Show Answer Partitioned Primary Index The Teradata Database provides an indexing mechanism called Partitioned Primary Index (PPI). What Do You Think? Which three are key considerations in choosing a Primary Index? (Choose three. For example. but probably changes too frequently to make a good Primary Index. Consider Primary Key and Foreign Key columns as potential candidates for Primary Indexes. statement number. B.) A. but a better choice for access. try to find the column(s) that best fit these criteria and the business need. mostly unique column. For join performance. Column(s) frequently used in queries to access data or to join tables.Use in join access: SQL requests that use a JOIN statement perform the best when the join is done on a Primary Index. PPI is used to: file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. D. or other more stable columns may be better choices. Column(s) containing unique (or nearly unique) values for uniform distribution. to minimize redistribution of table rows. the outstanding balance column for all customers probably has few duplicates. C. a NUPI can be a better choice than a UPI. Column(s) with values that are stable (do not change frequently). When choosing a Primary Index. Join Performance .htm 10/20/2011 . In that case. then the Employee ID column could be a good Primary Index choice for one or both of the tables. For example.

the ORDER in which the rows are stored on the AMP is affected. Using the traditional method. there are four partitions. Reduce the number of rows to be processed by using a technique called partition elimination. 4 AMPs with Orders Table Defined with NPPI Using PPI. and data access when working with large tables with range constraints. Instantly drop old data and rapidly add new data. the rows are stored in row hash order. Within the partitions. the rows are stored first by partition and then by row hash. deletes.Page 118 of 137 Improve performance for large tables when you submit queries that specify a range constraint. Increase performance for incremental data loads. How Does PPI Work? Data distribution with PPI is still based on the Primary Index: Hash Primary Determines which AMP gets the row Value Index With PPI. the rows are stored in row hash order.htm 10/20/2011 . 4 AMPs with Orders Table Defined with PPI on O_Date file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. No Partitioned Primary Index (NPPI). In our example. Avoid full-table scans without the overhead of a Secondary Index.

a query specifying the date 02/09 allows the Optimizer to eliminate the other partitions so each AMP can access just the 02/09 partition to retrieve the rows.htm 10/20/2011 . in the table above. This reduces the number of partitions to be accessed and rows to be processed. the Optimizer uses partition elimination to eliminate partitions that are not included in the query. the Row Hash. For example. The multilevel PPI feature improves response to business questions. it improves the performance of queries that can take advantage of partition elimination. Data Storage Using PPI To store rows using PPI: specify Partitioning in the CREATE TABLE statement. The analysis performed for a specific state (such as Connecticut) within a date range that is a small percentage of the many years of claims history in the data warehouse (such as March 2006) would take advantage of partition elimination for faster performance. a retailer may commonly run an analysis of retail sales for a particular district (such as eastern Canada ) for a specific timeframe (such as the first quarter of 2004) on a table partitioned by date of sale and sub-partitioned by sales district. Data Storage Using PPI file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. an insurance claims table could be partitioned by claim date and then subpartitioned by state. Specifically. The query will run through the hashing algorithm as normal. Similarly. and come out with the Base Table ID.Page 119 of 137 With PPI. and the Primary Index values. For example. the Partition number(s).

ALL-AMPs . Access Without a PPI QUERY PLAN SELECT * FROM Store_NPPI WHERE Location_Number = 3.Full-Table Scan file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt.htm 10/20/2011 . If you query on Location 3 on this NPPI table. the entire table will be scanned to find records for Location (Full-Table Scan).Page 120 of 137 Access Without a PPI Let's say you have a table with Store information by Location and did not use a PPI.

(i. an insurance claims table could be partitioned by claim date and then subpartitioned by state.Single Partition Scan Multi-Level Partitioned Primary Index Multi-level partitioning allows each partition. For example. This improves response to business questions by improving the performance of queries which take advantage of partition elimination. Syntax: file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. PPI) to be sub-partitioned.Page 121 of 137 Access with a PPI In the same example for a PPI table. Each partitioning level is defined independently using a Range_N or Case_N expression. Note: an MLPPI table must have at least two partition levels defined.e. The analysis performed for a specific state (such as Connecticut) within a date range that is a small percentage of the many years of claims history in the data warehouse (such as March 2006) would take advantage of partition elimination for faster performance. ALL-AMPs . you would partition the table with as many Locations as you have (or will soon have in the future. you create multiple access paths to the rows in the base table that the Optimizer can choose from. With MLPPI you can use multiple partitioning expressions instead of only one for a table or a non-compressed join index. This query will run much faster than the Full-Table Scan in the previous example.) Then if you query on Location 3.. With a multi-level PPI (MLPPI). each AMP will use partition elimination and each AMP only has to scan partition 3 for the query.htm 10/20/2011 . Access With a PPI QUERY PLAN SELECT * FROM Store WHERE Location_Number = 3.

and noncompressed join indexes. NoPI stands for No Primary Index. May be created on Volatile tables. Organizing/sorting rows based on row hash is therefore avoided. the hash value as well as AMP ownership of a row is arbitrary.e. What is a NoPI Table? A NoPI Table is simply a table without a primary index. there are no row-ordering constraints and therefore rows can be appended to the end of the table as if it were a spool table. May replace a Value Ordered NUSI for access. The objective was to divide data as evenly as possible among the AMPs to make use of Teradata’s parallel processing. Access via the Primary Index may take longer. This feature is referred to as the NoPI Table feature. the optimizer can choose an efficient single-AMP execution plan for SQL requests that specify values for the columns of the primary index. They allow for range queries without having to use a secondary index. Within the AMP. For example. a NUPI allows local joins to other similar entities Row hash locks are used for SELECT with equality conditions on the PI columns. (i. They reduce the I/O for range constraint queries They take advantage of dynamic partition elimination They provide multiple access paths to the data. Prior to Teradata Database 13.Page 122 of 137 Advantages and Disadvantages Advantages of partitioned tables: They provide efficient searches by using partition elimination at the various levels or combination of levels. Specific partitions maybe archived or deleted. a table can be defined without a primary index.htm 10/20/2011 . rows are always appended at the end of the table and never inserted in the middle of a hash sequence. base tables. Partitioned tables allow for fast deletes of data in a partition. It is a Teradata 13. Without a PI. Each row stored in a table has a RowID which includes the row hash that is generated by hashing the primary index value. and an MLPPI provides even more partition elimination and more partitioning expression choices.00 feature. As rows are inserted into a NoPI table. Each row in a NoPI table has a hash bucket file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Disadvantages of partitioned tables: Rows in a partitioned table are 2 bytes longer. global temp tables. The primary index was primarily used to hash and distribute rows to the AMPs according to hash ownership. Teradata tables required a primary index. Starting with Teradata Database 13.) The Primary Index may be either a UPI or a NUPI. you can use last name or some other value that is more readily available to query on.00.00. Full table joins to a NPPI table with the same PI may take longer..

Secondary Indexes are stored in separate subtables that require extra overhead in terms of disk space. but a second column must also have unique values.htm 10/20/2011 .) The Primary Index exists for even data distribution and access. So.Page 123 of 137 value that is internally generated. but here is a preview. as they are needed. Unlike Primary Indexes. What Do You Think? In what instances would it be a good idea to define a Secondary Index for a table? (This information will be covered in this module. it is just that typically all the rows on one AMP will have the same hash bucket value. All of the above. The table already has a Unique Primary Index. You can drop and recreate secondary indexes dynamically. Feedback: Secondary Index Rules Several rules that govern how Secondary Indexes must be defined and how they function are: file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Secondary Indexes do require some system resources. Benefits: A NoPI table will reduce skew in intermediate ETL tables which have no natural PI. Loads (FastLoad and TPump array insert) into a NoPI staging table are faster. and by a vendor (who access the same data based on the vendor's product code column). Secondary indexes do not affect how rows are distributed among the AMPs. and maintenance which is handled automatically by the system. A NoPI table is internally treated as a hashed table. The column is specified as a Unique Secondary Index (USI) to enforce uniqueness on the second column. but a Secondary Index is defined to efficiently generate reports based on a different set of columns. It allows you to access the data without having to do a full-table scan. Secondary Index A Secondary Index (SI) is an alternate data access path. The Product table is accessed by the retailer (who accesses data based on the retailer's product code column).

then picks the best method. You can define a Secondary Index for each heavily used access path. Rule 2: Secondary Index values can be unique or non-unique. The database will check USIs to see if the values are unique. you can make the Primary Key a USI to enforce uniqueness on the Primary Key. You can define 0 to 32 Secondary Indexes on a table for multiple data access paths. Rule 4: Secondary Index values can be modified. Rule 1: Optional SI While a Primary Index is required. a Secondary Index is optional. If one path to the data is sufficient. in which every row of a table is read. no Secondary Index need be defined. which is less direct than a UPI (one AMP) access. Speeds up access to a row (data retrieval speed). A Non-Unique Secondary Index (NUSI) is usually specified to prevent full-table scans. A Unique Secondary Index (USI) serves two possible purposes: Enforces uniqueness on a column or group of columns. Rule 5: Secondary Indexes can be changed.htm 10/20/2011 . Different groups of users may want to access the data in various ways. Rule 2: Unique or Non-Unique SI Like Primary Indexes.Page 124 of 137 Rule 1: Secondary Indexes are optional. Secondary Indexes can be unique or non-unique. if you have chosen different columns for the Primary Key and Primary Index. Rule 6: A Secondary Index has a limit of 64 columns. but more efficient than a full-table scan. The Optimizer determines whether a full-table scan or NUSI access will be more efficient. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Rule 3: Secondary Index values can be NULL. For example. Accessing a row with a USI requires one or two AMPs.

To use the Secondary Index below. the system physically drops the subtable that contained it. Rule 5: SI Can Be Changed Secondary Indexes can be changed. the user would specify both Budget and Manager Employee Number. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt.Page 125 of 137 Accessing a row with a NUSI requires all AMPs. Rule 6: SI Has 64-Column Limit You can designate a Secondary Index that is composed of 1 to 64 columns. the Secondary Index column may contain NULL values. When the index is dropped. Secondary Indexes can be created and dropped dynamically as needed. Rule 3: SI Can Be NULL As with the Primary Index.htm 10/20/2011 . Rule 4: SI Value Can Be Modified The values in the Secondary Index column may be modified as needed.

They count as two indexes against the total of 32 non-primary indexes you can define on a base or join index table. This is known as a Sparse Join Index. You can define a join index on one or several tables. multi-table or single-table. Define a summary table without denormalizing the database. You can only define a hash index on a single table. Value-ordered NUSIs have the following limitations: The sort key is limited to a single numeric column.Page 126 of 137 Other Secondary Indexes Join Index Join indexes have several uses: Define a pre-join table on frequently joined columns (with optional data aggregation) without denormalizing the database. can be sparse. Create a full or partial replication of a base table with a primary index on a foreign key column to facilitate joins of very large tables by hashing their rows to the same AMP as the large table. They are base tables that cannot be accessed directly by a query. whether simple or aggregate. Hash indexes are not indexes in the usual sense of the word. Value-Ordered NUSI Value-ordered NUSIs are very efficient for range constraints and conditions with an inequality on the secondary index column set. it is possible to search only a portion of the index subtable for a given range of key values. Hash Index Hash indexes are used for the same purposes as single-table join indexes. Single-table join index functionality is an extension of the original intent of join indexes. Hash indexes create a full or partial replication of a base table with a primary index on a foreign key column to facilitate joins of very large tables by hashing them to the same AMP. hence the confusing adjective "join" used to describe a single-table join index. Because the NUSI rows are sorted by data value. A sparse join index uses a constant expression in the WHERE clause of its definition to narrowly filter its row population.htm 10/20/2011 . the major advantage of a value-ordered NUSI is in the performance of range queries. Sparse Index Any join index. The sort key column cannot exceed four bytes. Join Indexes file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Thus.

such an index is often referred to as a covering index. A join index is useful for queries where the index structure contains all the columns referenced by one or more joins. For obvious reasons. Aggregate Join Index Aggregate one or more columns of a single table or multiple tables into a summary table. referred to as an aggregate join index. Join indexes are also useful for queries that aggregate columns from tables with large cardinalities. Facilitates aggregation queries by eliminating aggregation processing. thereby allowing the index to cover all or part of the query.htm 10/20/2011 . Facilitates the ability to join the foreign key table with the primary key table without redistributing the data. Join indexes provide additional processing efficiencies: Eliminate base table access Eliminate aggregate processing Reduce Joins Eliminate redistributions The three basic types of join indexes commonly used with Teradata will be described first: Single Table Join Index Distribute the rows of a single table on the hash value of a foreign key value. aggregates selected columns. Using Secondary Indexes file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. optionally. Facilitates join operations by possibly eliminating join processing or by reducing/eliminating join data redistribution.Page 127 of 137 A Join Index is an optional index which may be created by a User. Join indexes are defined in a way that allows join queries to be resolved without accessing or joining their underlying base tables. stores and maintains the result from joining two or more tables. The preaggregated values are contained in the AJI instead of relying on base table calculations. Useful for resolving joins on large tables without having to redistribute the joined rows across the AMPs. A join index is a system-maintained index table that stores and maintains the joined rows of two or more tables (multiple table join index) and. These indexes play the role of pre-join and summary tables without denormalizing the logical design of the database and without incurring the update anomalies presented by denormalized tables. Multi-Table Join Index Pre-join multiple tables.

The subtables for USIs and NUSIs are distributed differently: USI: The Unique Secondary Indexes are hash distributed separately from the data rows.the processing for the subtable and base table are done on the same AMP. However. In addition. the company wants reports on how many departments each manager is responsible for. depending on the hash value.Page 128 of 137 In the table below. How Secondary Indexes Are Stored Secondary indexes are stored in index subtables. The subtable row may be stored on the same AMP or a different AMP than the base table row. It has duplicate values. all AMPs are activated because the non-unique value may be found on multiple AMPs. so it has been made a USI for efficient access. The following access methods do not use a Primary Index: Unique Secondary Index (USI) Non-Unique Secondary Index (NUSI) Full-Table Scan file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Data Access Without a Primary Index You can submit a request without specifying a Primary Index and still access the data. (As you remember. This reduces activity on the BYNET and essentially makes NUSI queries an AMP-local operation . NUSI: The Non-Unique Secondary Indexes are stored in subtables on the same AMPs as their data rows. in all NUSI access requests. users will be accessing data based on the Department Name column. so it is a NUSI. the base table rows are distributed based on the Primary Index value). so the Manager Employee Number can also be made a secondary index. based on their USI value. The values in that column are unique.htm 10/20/2011 .

6. The message goes back over the BYNET to the AMP with the row and the AMP accesses the data row (in this case. as explained below. the USI request would be a one-AMP operation. USI Access 1. because both are hashed separately.or two-AMP operation. 3. The subtable indicates where the base row resides (in this case. The SQL is submitted. 2. The hashing algorithm calculates a row hash value (in this case. The hash map points to the AMP containing the subtable row corresponding to the row hash value (in this case. 5. As shown in the example above. the request becomes a one. If both were on the same AMP. a customer number of 56). 602). 4. The row is sent over the BYNET to the PE.Page 129 of 137 Accessing Data with a USI When a user submits an SQL request using the table name and a Unique Secondary Index. and the PE sends the answer set on to the client application.htm 10/20/2011 . row 778 on AMP 4). AMP 2). AMP 4). However. as explained below. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. specifying a USI (in this case. accessing data with a USI is typically a two-AMP operation. the request becomes an all-AMP operation. Accessing Data with a NUSI When a user submits an SQL request using the table name and a Non-Unique Secondary Index. it is possible that the subtable row and base table row could end up being stored on the same AMP.

in some cases file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. 4. at any time. specifying a NUSI (in this case. A full-table scan is another way to access data without using Primary or Secondary Indexes. You can ask any question. you can access data on any column. Full-Table Scan – Accessing Data Without Indexes In the Teradata Database. The qualifying rows are sent over the BYNET to the PE. In evaluating an SQL request. Each participating AMP locates the row IDs (row hash value plus uniqueness value) of the base rows corresponding to the hash value (in this case. 3. The participating AMPs access the base table rows. The other AMPs discard the message. All AMPs are activated to find the hash value of the NUSI in their index subtables. The hashing algorithm calculates a row hash value for the NUSI (in this case. and 115). 2. three qualifying rows are returned). 567). of any data.Page 130 of 137 NUSI Access 1. AMP1 and AMP2). If the request does not use a defined index. While Secondary Indexes generally provide a more direct access path. 6. 222. 5. whether that column is an index or not. a last name of "Adams"). The AMPs whose subtables contain that value become the participating AMPs in this request (in this case.htm 10/20/2011 . The SQL is submitted. one row from AMP 1 and two rows from AMP 2). the Optimizer examines all possible access methods and chooses the one it believes to be the most efficient. which are located on the same AMP as the NUSI subtable (in this case. and the PE sends the answer set on to the client application (in this case. the base rows corresponding to hash value 567 are 640. the Teradata Database does a full-table scan.

Page 131 of 137 the Optimizer will choose a full-table scan because it is more efficient. Uniquely identify a row (Primary Key). Every data block must be read and each data row is accessed only once. all AMP operations will take longer. hash or join index and that most of the rows in the table would qualify for the answer set if a NUSI were used. For example. if a request using last names in a Customer database searched on the very prevalent "Smith" in the United States. the Teradata Database routinely permits ad-hoc queries with fulltable scans. if a request searched an Employee database for all employees hired between January 2001 and June 2001. As long as the choice of Primary Index has caused the table rows to distribute evenly across all of the AMPs. if a request searched an Employee database for all employees whose annual salary is greater than $100. even if the Salary column is an index. However. if a Primary Index causes skewed data distribution. then a full-table scan would be used. When choosing between a NUSI and a full-table scan. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. A full-table scan is an all-AMP operation. For example. In this example. An SQL request uses a non-equality WHERE clause on an index column. An SQL request uses a range WHERE clause on an index column. even if the Hire_Date column is an index. then the Optimizer may choose a full table scan to efficiently find all the matching rows in the result set. the optimizer may choose to do a full-table scan. the parallel processing of the AMPs working simultaneously can accomplish the full-table scan quickly.htm 10/20/2011 . A request could turn into a full-table scan when: An SQL request searches on a NUSI column with many duplicates. While full-table scans are impractical and even disallowed on some commercial database systems. you must specify a value for each column in the index or the Teradata Database will do a full-table scan. For example.000. if the optimizer determines that there is no selective SI. a full-table scan can be avoided by using an equality WHERE clause on a defined index column. as it does not have updated data demographics. If statistics are stale or have not been collected on the NUSI column(s). A Teradata Database mechanism used in a physical database design. Summary of Keys and Indexes Some fundamental differences between Keys and Indexes are shown below: Keys Indexes A relational modeling convention used in a logical data model. then a full-table scan would be used. Used for row distribution (Primary Index). Establish relationships between tables (Foreign Key). Used for row access (Primary Index and Secondary Index). For all requests. it would most likely choose the full-table scan as the most efficient access method.

A Primary Index may include the same columns as the Primary Key. and can function fully with no awareness of Primary Keys. Rules for Keys and Indexes A summary of the rules for keys (in the relational model) and indexes (in the Teradata Database) is shown below. When you define a Primary Key in a Teradata Database table. A Primary Index is always required when creating a Teradata Database table. a credit card account number may be a good Primary Key. Because a Primary Key requires unique values. In the Teradata Database. The Teradata Database's parallel architecture uses Primary Indexes to distribute and access the data rows. the RDBMS will implement the specified column(s) as an index. you do have the option to define a Primary Key or Foreign Key for any table. you may want the Primary Key and Primary Index to be different. a defined Primary Key is implemented as one of the following: Unique Primary Index (If the DBA did not specify the Primary Index in the file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. but does not have to. In some cases. as a mechanism for maintaining referential integrity according to relational theory. Rule 1 2 3 4 5 Primary Key One PK Unique values No NULLs Values should not change Column should not change Foreign Key Multiple FKs Unique or non-unique NULLs allowed Values may be changed Column should not change Primary Index One PI Unique or non-unique NULLs allowed Secondary Index 0 to 32 SIs Unique or nonunique NULLs allowed Values may be changed Values may be (redistributes row) changed Column cannot be changed (drop and recreate table) 64-column limit n/a Index may be changed (drop and recreate index) 64-column limit n/a 6 7 No column limit n/a No column limit FK must exist as PK in the related table Defining Primary and Foreign Keys in the Teradata Database Although Primary Indexes are required and Primary Keys are not.Page 132 of 137 While most commercial database systems use the Primary Key as a way to retrieve data. you use the Primary Key only when designing a database. but customers may prefer to use a different kind of identification to access their accounts. For example. a Teradata Database system does not. The Teradata Database itself does not require keys in order to manage the data.htm 10/20/2011 .

the rules that govern that type of index now apply to the Primary Key. Exercise 6. there is no limit to the number of columns in a Primary Key. In a Teradata Database system. For example. "Primary Key" means the same thing as "Primary Index.htm 10/20/2011 ." D. click Rule 2: Unique or Non-Unique PI or Distributing Rows to AMPs.) Unique Secondary Index (If the PK was not chosen to be the PI) When a Primary Key is defined in Teradata SQL and implemented as an index.Page 133 of 137 CREATE TABLE statement. C. hash map file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. What Do You Think? Which statement is true? (Choose the best answer. A Primary Key is used to access data. in relational theory. NUPI C. while a Primary Key is used to uniquely identify a row. B.) A. Feedback: Exercise 6. while a Primary Index is used to uniquely identify a row. Neither UPI nor NUPI Feedback: To review this topic. However. A Primary Index is used to distribute data.1 Which one provides uniform data distribution through the hashing algorithm? A. while a Primary Key is converted to a hash map. UPI B. A Primary Index is used to distribute data. Both UPI and NUPI D. if you specify a Primary Key in Teradata SQL.2 The output from the hashing algorithm is the: A. the 64-column limit for indexes now applies to that Primary Key.

the call center operator should be able to easily access and confirm customer information. Accessing a row with a Non-Unique Secondary Index (NUSI) requires AMP(s).Page 134 of 137 B. Exercise 6. In addition. uniqueness value C. Full-Table Scan . A full-table scan accesses row(s). Accessing Data with a USI. row hash Feedback: To review this topic. Whenever a customer calls. Accessing a row with a Non-Unique Primary Index (NUPI) accesses multiple rows on (s). Select the best Primary Index for the business use. the company wants to track all service activities on a perhousehold basis. Accessing Data with a NUSI. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. row ID D. AMP Feedback: Show Answers Reset To review these topics.3 Choose the appropriate answers from the drop-down boxes that complete each sentence: Accessing a row with a Unique Secondary Index (USI) typically requires AMP(s).000 customers of this regional telecommunication services company. Exercise 6.4 Which column should be selected as the Primary Index in the CUSTOMER table below? The table contains information on 50. Accessing a row with a Unique Primary Index (UPI) accesses row(s) on one AMP. click Accessing a Row With a Primary Index.htm 10/20/2011 . click Distributing Rows to AMPs.Accessing Data Without Indexes.

C. because each address is clearly a household. Exercise 6. D. Columns 2 and 3 together. E. which is what is being tracked. easy to remember and input. because most of the customers with the same last name belong to a single household. Column 1. B.6 Which task does a Teradata Database Administrator have to perform? (Choose one. Unique Primary Index C. Column 4. hash synonym Feedback: To review this topic. Feedback: To review this topic.htm 10/20/2011 . Customers must give their Customer ID when calling for service. and it is easy for the customer to remember. click Distributing Rows to AMPs or Accessing a Row With a Primary Index. Pre-allocate table space Feedback: To review this topic. because it is the Primary Key and its unique values will cause table rows to be distributed evenly for best performance. Exercise 6. because the combination is nearly unique. Re-organize data C. Select Primary Indexes B. click Teradata Database Manageability. Exercise 6. because it is nearly unique.7 With a ______ you create multiple access paths to the rows in the base table that the Optimizer can choose from which improves response to business questions by improving the performance of queries which take advantage of partition file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. even distribution of rows B.) A. multi-AMP request D. Column 2. Column 5. Pre-prepare data for loading D. click Choosing a UPI or NUPI.Page 135 of 137 A. and can be used for householding.5 The row ID helps the system to locate a row in case of a(n): A.

click Choosing a Unique or Non-Unique Primary Index or Data Access without a Primary Index. the optimizer may choose to do a fulltable scan. Teradata Certification Teradata Certification file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. click What is a NoPI Table?.htm 10/20/2011 .Page 136 of 137 elimination? A. False Feedback: To review this topic.9 True or False: If statistics are stale or have not been collected on the NUSI column(s). False Feedback: To review this topic. click Multi-Level PPI . as it does not have updated data demographics. Exercise 6. NoPI Feedback: To review this topic. Multi-Level Partitioned Primary Index (MLPPI) B.8 True or False: A NoPI Table is simply a table without a primary index. What is a NoPI Table? . A. NUPI C. As rows are inserted into a NoPI table. rows are always appended at the end of the table and never inserted in the middle of a hash sequence. A. True B. True B. Partitioned Primary Index (PPI) D. Exercise 6. Choosing a Unique or Non-Unique Primary Index or Partitioned Primary Index . Organizing/sorting rows based on row hash is therefore avoided.

test center locations. We recommend you review the WBT content and the practice questions located on the TCPP website before signing up for the official Teradata 12 Basics Certification exam. practice questions. and registration information is located on the Teradata Certified Professional Program (TCPP) website. consider the first level of Teradata Certification.htm 10/20/2011 . Teradata Certified Professional. Information on the Teradata Certified Professional Program (TCPP) including exam objectives.Page 137 of 137 Now that you have learned about the Teradata Database basics. file://C:\Documents and Settings\PJ186002\Desktop\teradata intoduction wbt. Candidates for the Teradata Certified Professional Certification must pass the Teradata 12 Basics Certification exam administered at Prometric testing centers listed on the TCPP website.

Sign up to vote on this title
UsefulNot useful