Professional Documents
Culture Documents
Series 7 Version 2
DecisionStream for Data
Warehouse Developers
Instructor Guide
Printed in Canada (05/03) !035101!
SERIES 7 VERSION 2 DECISIONSTREAM FOR DATA W AREHOUSE DEVELOPERS
While every attempt has been made to ensure that the Part 35101
information in this document is accurate and complete,
some typographical or technical errors may exist. DS7001
Cognos cannot accept responsibility for any kind of loss Published TBD, 2003
resulting from the use of this document.
© 2003, Cognos Incorporated
This page shows the original publication date. The Portions copyright (C) Microsoft Corporation, One
information contained in this book is subject to change Microsoft Way, Redmond, Washington 98052-6399
without notice. Any improvements or changes to either USA. All rights reserved.
the product or the course will be documented in
subsequent editions. Sample product images with the pound symbol (#) in the
lower right hand corner are copyright (C) 1998
PhotoDisc, Inc.
This guide contains proprietary information, which is Cognos, the Cognos logo, Better Decisions Every Day,
protected by copyright. All rights are reserved. No part of Axiant, Cognos Accelerator, COGNOSuite,
this document may be photocopied, reproduced, or DecisionStream, Impromptu, NovaView, PowerCube,
translated into another language without the prior written PowerHouse, PowerPlay, Scenario and 4Thought are
consent of Cognos Incorporated. trademarks or registered trademarks of Cognos
Incorporated in the United States and/or other countries.
All other names are trademarks or registered trademarks
of their respective companies.
Contents
CONTENTS.................................................................................................................. III
POST-CLASS AGENDA.........................................................................................XVII
Course Overview
Course Overview
Series 7 Version 2 DecisionStream for Data Warehouse Developers
is a five-day, instructor-led course designed for enterprise data mart
builders. It will teach users how to move, merge, consolidate, and
transform data from a range of data sources to build and maintain
subject-area data marts. DecisionStream data mart targets include all the
major relational databases and are typically designed to create best
practices dimensional data mart solutions (star schema).
Specifically, the course deals with the dimensional framework, builds, and
templates. The dimensional framework is a repository for reference
structures (dimensions, hierarchies, lookups, and dimension templates)
that promotes the reuse of these structures. Builds are units of work that
progress through data acquisition, transformation, and delivery.
Templates are objects that define source and target attributes for
dimension data.
Course Prerequisites
Participants should have:
• knowledge of Windows
• knowledge of database concepts
• knowledge of dimensional analysis concepts
• working knowledge of SQL
Day 1
Start End Description Time (hr:min)
9:00 AM 9:15 AM Introduction 0:15
9:15 AM 10:30 AM Module 1 - Lecture 1:15
10:30 AM 10:45 AM Break 0:15
10:45 AM 11:45 PM Module 2 - Lecture 1:00
11:45 PM 12:00 PM Module 2 - Workshop 0:15
12:00 PM 1:00 PM Lunch 1:00
1:00 PM 2:45 PM Module 3 - Lecture 1:45
2:45 PM 3:00 PM Break 0:15
3:00 PM 3:15 PM Module 3 - Workshop 0:15
3:15 PM 4:45 PM Module 4 - Lecture 1:30
4:45 PM 5:00 PM Wrap Up - Day 1 0:15
Day 2
Start End Description Time (hr:min)
9:00 AM 9:10 AM Review Day 1 0:10
9:10 AM 10:25 AM Module 5 - Lecture 1:15
10:25 AM 10:40 AM Module 6 - Lecture 0:15
10:40 AM 10:50 AM Break 0:10
10:50 AM 12:00 PM Module 6 - Lecture 1:10
12:00 PM 1:00 PM Lunch 1:00
1:00 PM 1:25 PM Module 6 - Workshop 0:25
1:25 PM 2:35 PM Module 7 - Lecture 1:10
2:35 PM 2:45 PM Module 8 - Lecture 0:10
2:45 PM 3:00 PM Break 0:15
3:00 PM 3:45 PM Module 8 - Lecture 0:45
3:45 PM 4:45 PM Module 9 - Lecture 1:00
4:45 PM 5:00 PM Wrap Up - Day 2 0:15
Day 3
Start End Description Time (hr:min)
9:00 AM 9:15 AM Review Day 2 0:15
9:15 AM 10:45 AM Module 10 - Lecture 1:30
10:45 AM 11:00 AM Break 0:15
11:00 AM 12:00 PM Module 11 - Lecture 1:00
12:00 PM 1:00 PM Lunch 1:00
1:00 PM 2:30 PM Module 12 - Lecture 1:30
2:30 PM 2:45 PM Module 13 - Lecture 0:15
2:45 PM 3:00 PM Break 0:15
3:00 PM 4:30 PM Module 13 - Lecture 1:30
4:30 PM 4:45 PM Module 13 - Workshop 0:15
4:45 PM 5:00 PM Wrap Up - Day 3 0:15
Day 4
Start End Description Time (hr:min)
9:00 AM 9:10 AM Review Day 3 0:10
9:10 AM 10:30 AM Module 14 - Lecture 1:20
10:30 AM 10:45 AM Module 15 - Lecture 0:15
10:45 AM 11:00 AM Break 0:15
11:00 AM 12:00 PM Module 15 - Lecture 1:00
12:00 PM 1:00 PM Lunch 1:00
1:00 PM 1:45 PM Module 16 - Lecture 0:45
1:45 PM 3:00 PM Module 17 - Lecture 1:15
3:00 PM 3:15 PM Break 0:15
3:15 PM 4:45 PM Module 18 - Lecture 1:30
4:45 PM 5:00 PM Wrap Up - Day 4 0:15
Day 5
Start End Description Time (hr:min)
9:00 AM 9:15 AM Review Day 4 0:15
9:15 AM 10:30 AM Module 19 - Lecture 1:15
10:30 AM 10:45 AM Break 0:15
10:45 AM 12:00 PM Module 20 - Lecture 1:15
12:00 PM 1:00 PM Lunch 1:00
1:00 PM 2:30 PM Module 21 - Lecture 1:30
2:30 PM 2:45 PM Break 0:15
2:45 PM 3:15 PM Module 22 - Lecture 0:30
3:15 PM 4:15 PM Wrap Up - Course 1:00
Instructional Materials
Student Guide
The Student Guide contains explanations and features of the product,
along with the presentation slides that are presented by the instructor.
Student demos and workshops are incorporated in the course to enrich
the learning experience through hands-on practice.
Demos
Workshops
Instructor Guide
The Instructor Guide contains the same content presented in the Student
Guide, along with additional notes to supplement and add value to the
lecture. The information can be generic, non-technical information, such
as multiple ways to perform the same command or a more in-depth
discussion of a topic. It may also be used to address more technical
questions from participants or as supplementary technical discussion, at
the discretion of the instructor. It helps to provide the appropriate level of
information to a specific audience.
Instructor Installation CD
The Instructor Installation CD contains an executable file that can install
any or all of the following files. By inserting the CD into your computer
and following the prompts as the auto install runs, these files will be
installed in C:\Edcognos\DS7001.
Instructor Slides
These files contain the Microsoft PowerPoint slide presentation for each
module of the course as presented in the Student Guide:
• StartDS7001.ppt • Mod7DS7001.ppt • Mod15DS7001.ppt
• IntroDS7001.ppt • Mod8DS7001.ppt • Mod16DS7001.ppt
• Mod1DS7001.ppt • Mod9DS7001.ppt • Mod17DS7001.ppt
• Mod2DS7001.ppt • Mod10DS7001.ppt • Mod18DS7001.ppt
• Mod3DS7001.ppt • Mod11DS7001.ppt • Mod19DS7001.ppt
• Mod4DS7001.ppt • Mod12DS7001.ppt • Mod20DS7001.ppt
• Mod5DS7001.ppt • Mod13DS7001.ppt • Mod21DS7001.ppt
• Mod6DS7001.ppt • Mod14DS7001.ppt • Mod22DS7001.ppt
These files are the Instructor Guide Microsoft Word documents in PDF.
Student Data
The Student Data files and folders are required for the completion of the
demos and workshops. This is the same data that is contained on the
Student Data CD described earlier and is available with the Instructor
Installation CD. The student data files and folders are installed in
C:\Edcognos\DS7001. They are:
The previous items cannot be accessed directly from the CD. They must
be installed on your computer by using the EXE auto install.
Setup Complete
Windows 2000.
Viewer for Microsoft PowerPoint 2000 or the full PowerPoint 2000 application
on the instructor's computer.
• What hours are available for accessing the teaching site, copying
the files to the hard disk, tuning the color on the PC viewer, and
so on?
Prepare to Teach
After you have configured the instructor and student computers, consider
the following:
• Make sure you complete each of the demos before teaching the
course so that you become familiar with each step required.
• Make sure that there is a Student Guide for each participant and
that they have the student data files so that they can practice after
leaving the course.
Document Conventions
Conventions used in this guide follow Microsoft Windows application
standards, where applicable. As well, the following conventions are
observed:
PowerPoint Tips
Here are valuable keyboard commands you can use to improve your
presentation.
Command Key(s)
Help ?
Move between PowerPoint and the ALT+TAB or click the application name
product on the status bar
You can also jump to a specific slide by typing its slide number and
pressing the ENTER key. However the slide number is not the same as
the printed page number because a page may be built from several slides
to produce an animation sequence.
Important Tips:
Class Format Use this slide to explain the class format and
lecture with slides
emphasize that participants are encouraged to actively
student guides as
reference material
perform the hands-on demos while following along
Hands-on demos to learn
and practice
with the instructor.
Independent workshop
exercises for more practice
Mention that the Student Guide contains copies of the slides and further
supporting notes for the participants to use as reference material in the
future.
Introduction
Module 1: Getting Started
Module 2: Create a Catalog
Module 3: Create Hierarchies
Module 4: Create Basic Builds
Module 5: Conformed Dimensions
Module 6: Derivations
Module 7: Templates, Lookups, and Attributes
Module 8: Fact Builds
Module 9: History Preservation
Module 10: Hierarchical Dimensions
Module 11: Facts in Depth Merging
Module 12: User-Defined Functions and Variables
Module 13: JobStreams
Module 14: Facts and History Preservation
Module 15: Aggregation
Module 16: Pivoting
Module 17: Ragged Hierarchies
Module 18: Packaging and Navigator
Module 19: Resolving Data Quality Issues
Module 20: Troubleshooting and Tuning
Module 21: Delivery in Depth
Module 22: The Command Line Interface
Appendix A: Step-by-Step Solutions
Appendix B: Entity-Relationship Diagram of the
GO_Demo Database
Post-Class Agenda
• Make notes for yourself about what went well during the course
and what needs improvement. When you are preparing for your
next teach, you can refer to these.
Your Feedback
These course materials were designed by a group of instructional
designers in Ottawa, Canada.
Cognos Incorporated
K1G 4K9
Course Objectives
1. Getting Started
2. Create a Catalog
3. Create Hierarchies
5. Conformed Dimensions
6. Derivations
Audience:
Prerequisites:
8. Fact Builds
9. History Preservation
13. JobStreams
15. Aggregation
16. Pivoting
17. Ragged Hierarchies
18. Packaging and Navigator
19. Resolving Data Quality Issues
20. Troubleshooting and Tuning
21. Delivery in Depth
22. The Command Line Interface
Series 7 Version 2 DecisionStream for Data Warehouse
Developers
Task-oriented You are working in the product and you The Help menu on the
online need specific task-oriented help. DecisionStream main menu bar.
Books for Printing You want to use search engines to find The Cognos Series 7 Version 2 folder.
(.pdf) information. You can then print out Navigate from Start to Programs/
selected pages, a section, or the whole Cognos Series 7 Version 2/
book. Documentation/Cognos
DecisionStream.
Use Step-by-Step online books (.pdf) if
you want to know how to get something
done but prefer to read about it in a
book. The Step-by-Step online books
contain the same information as the
online help although the method of
presentation may be different.
Documentation You have to know which type of Help The Help menu on the
Roadmap will provide you with answers to your DecisionStream main menu bar.
questions.
Cognos on the You want to access any of the following: The Help menu on the
Web DecisionStream main menu bar.
• online support
Task-Oriented Help
Contents
The Help function is always available from the main menu bar.
From the Help menu, click Contents and Index, or press F1.
Index
An index is a tool that points to or leads you to the related topic. Each topic in a
help file has one or more index terms from which that topic can be accessed.
Either type the term you are looking for or scroll through the interactive list of
terms available.
Find
The Find tab accesses a search engine that will search for instances of a term
within the contents of the help file. Use this tab when you cannot find a term in
the index.
For example, node is not an index term. Using the Find tab, type node, and the
search engine will find this term. You can then display this help topic, and all
instances of the term builds will be highlighted.
Click the Find tab to search for words or phrases that do not have an index entry.
You must install Adobe Acrobat Reader to view the .pdf files.
Location Topic
New Features for Series 7 This guide outlines the new features for
DecisionStream Series 7.
1. Getting Started
2. Create a Catalog
3. Create Hierarchies
5. Conformed Dimensions
6. Derivations
Objectives
What is DecisionStream?
Data warehouses are becoming ever larger, with increasing demands for faster Instructional Tips
warehouse population, refresh, and support for multiple, dependent data marts. Explain the meaning of an ETL tool and
DecisionStream is a scalable, high-performance, flexible, multi-platform data discuss how DecisionStream distinguishes
transformation tool that addresses these needs. DecisionStream extracts data and itself from the many other ETL tools that
exist. Specifically, discuss the integration of
transforms it to deliver dimensional data marts. conformed data marts and the manner in
which DecisionStream produces valuable
DecisionStream: data, block by block, on a frequent basis.
Data In
To Users
Data Warehouse
A data warehouse is a database that is used to hold data for reporting and analysis. Questions
Ask the class "Why would you not put
The data may be accessed directly by users or it may be used to feed data marts. crude oil into an engine?" It is because the
The data warehouse is used as a source of reporting data for the whole enterprise. oil has to be refined to be used by an
internal combustion engine.
A successful data warehouse must:
Instructional Tips
• provide an integrated view of the enterprise, across many subject areas
The points in the slide are from Kimball et
al''s The Data Warehouse Lifecycle Toolkit
• be implemented in phases, delivering business value at each stage (1998).
The data staging area is where the raw operational data is acquired, transformed,
cleaned, and combined so that it can be reported on and queried by users. This
area lies between the operational source systems and the user database and is
typically not accessible to users.
The data warehouse database contains the data that is organized and stored
specifically for direct user queries and reports. It differs from an OLTP database
in that it is designed primarily for reads, not writes.
Implementation Phase
Using DecisionStream, you create data marts that present an integrated view Instructional Tips
Data warehousing design has been called
of your business by consolidating the data into a common structure. an art rather than a science, and students
However, there is no single correct view. How you choose to design your are often frustrated where there are no
data warehouse depends on your organization's requirements, environment scientific, clear-cut answers to the
and readiness for data warehousing. problems facing them. Students must
understand the various solutions and the
The user's input into the requirements of the data warehouse is paramount. pros and cons of each decision they make.
Although the user will not design or build the database itself, user
requirements dictate how changing dimensions are handled, how
summarization is tackled, the hierarchical structure of the data in the
dimension tables, and so on. It is important to maintain contact with the
users during the more technical phase of the project.
In most cases, it is important that the data warehouse and operational system be
kept separate because their characteristics are so different. The warehouse must
not be a mirror image of the operational system.
Sales
Branches Products
HR
Sales
Fact
Customer Channels
Inventory Manufacturing
Time
Ralph Kimball et al (1998) defines a data mart as "a logical subset of the complete
data warehouse."
Must we build a single comprehensive data warehouse and extract data marts, or
must we build data marts and combine them into a data warehouse? Building a
single data warehouse:
• is difficult to maintain
The danger of data marts is that if they are not integrated, they can become
"islands of information," where each subject area is isolated from the others. The
way to address this situation is to use conformed dimensions. This topic is
discussed further in Module 5, "Conformed Dimensions."
Product Customer
Sales Area
Line Type Orders Data Mart
Order
Line The Time table is constructed
from static data. It is not
derived from the Orders OLTP
system.
An OLTP system is different from a data mart in that it is designed for writing
transactional data, not querying or reporting. OLTP systems are normalized,
which means that the data to be written is broken down into its simplest form,
removing all redundancy from the data. All of these tables are related through
referential integrity, which makes writing new data to the OLTP database fast and
efficient.
The biggest difference between a typical data mart and the OLTP system that it is
built from is the number of tables. In the slide, the Orders data mart has only one
central table containing mostly numeric data, along with four other tables with
detailed information relating to these numbers. This is a typical "star schema,"
discussed further in Module 4, "Create Basic Builds."
What is a Dimension?
Aggregating across
locations
Dimensions provide context for the key performance indicators (KPIs) that a
business uses to measure its performance.
For example, a retail chain-store may categorize its sales data by the products that
it sells, by its retail outlets, and by fiscal periods. This organization has the
business dimensions Product, Location, and Time. The measures of the business,
such as how much it sells, lie at the intersection of these dimensions.
You can derive summary information by aggregating data along one or more
dimensions. The slide example on the right shows the aggregation of data along
the Location dimension to give the total sales of widgets during July.
Shipping
Products
Sales X X X X X Branches
Distr. X X X
Mkting XX X X Distribution
Channels fact
Sales
HR X X X Fact
Customer Marketing
HR fact
fact
Time
Promotion
Training
Various business topics become natural data marts. A fully integrated data
warehouse will have data marts that use conformed dimensions and each data
mart will have a set of measures (fact table).
Conformed dimensions are the common elements in each data mart, which, when
combined into the data warehouse, form the overlapping glue. A conformed
dimension is re-useable and identical in every data mart in which it is used.
Data
OLTP Data
OLTP
Data
Integrated
Text
ERP + Data Marts
-
ERP x/y
Text Metadata
DecisionStream provides the extract, transform, and load functions required to Technical Information
re-structure operational data into formats suitable for general reporting and DecisionStream targets Online Transaction
Processing (OLTP) to Online Analytical
OLAP (online analytical processing). Processing (OLAP) transformations for
constructing data marts.
DecisionStream produces a multidimensional model of the data warehouse and
can use this framework to populate many OLAP targets. DecisionStream can DecisionStream carries out many types of
read from many different relational and text data sources. transformations. It can merge data from
multiple, heterogeneous data sources. It
DecisionStream can also deliver data to multiple target data stores, including also has fast data aggregation capabilities
and a rich library of built-in functions.
relational databases and flat files.
DecisionStream is also capable of
Lastly, DecisionStream can deliver metadata to PowerPlay Transformer, delivering reporting data models, including
Impromptu, Architect and Microsoft SQL Server Analysis Services. metadata schema for Architect, PowerPlay,
and Impromptu.
Build slide.
2 clicks to complete.
DecisionStream Architecture
Platforms
Windows
NT, 2000, XP Operational
Sources
Design Client DB2 OLAP
Oracle
SQL Server
DB2
API Scripts Informix
ODBC
Sybase
DecisionStream Process Metadata Teradata
Text
SAP R/3
NT or UNIX
Custom
Server MetaData
DECISIONSTREAM
HP/UX Targets
Solaris
DBMS Impromptu
Transformer
IBM AIX
Flat Files Architect
COMPAQ TRU64
ERP
Microsoft SSAS
Server Engine
The two main parts to the DecisionStream architecture are a design client (the Technical Information
Starting with DecisionStream Series 7,
DecisionStream Designer, which runs on the Windows operating system) and a Windows 95, 98, and ME are not
server engine (which runs on UNIX and Windows). In a typical production supported.
environment, the two components are deployed on separate machines.
During the development process, it is
Designs are created using a graphical user interface. The design metadata is stored common to run both the designer and the
in any RDBMS. The server engine then reads this metadata at run time. engine on the same machine. In
production, you would typically devote one
machine or machines to perform the "heavy
A wide variety of sources and targets are supported: all popular DBMSs, as well
lifting" using the server engine component.
as several MOLAP structures on the target side. Metadata may be integrated with
Impromptu, PowerPlay, and Architect. The delivery data may also be partitioned
in arbitrary ways across these targets.
A key point is the availability of both an API and scripting language to drive the
server engine. The API and scripting language make it possible for data mart
building jobs to be created once through the Designer and then scheduled to run
on a regular basis.
DecisionStream has built-in support for data models such as star and snowflake
schemas, and its component-based architecture scales to support very large
systems. The wide platform coverage of DecisionStream provides unparalleled
flexibility of deployment, evolution, and migration.
Build slide.
6 clicks to complete.
How DecisionStream Creates Data Marts
Dimensional
Dim
Framework
Product Time Location
Data
Sources 1
3.
5b.
DataStream
Dimension
Data Delivery
Source
5a.
2 Fact
DataStream
Delivery
Data 5c.
Source
MetaData
Delivery
DecisionStream creates data marts through a process called a fact build. The fact Technical Information
build may contain the following steps: The term "grain" in step 3 refers to the level
of detail of data being retrieved from the
1. The dimensional framework defines the hierarchical structure of the transactional data source. For example,
records of individual sales orders are at a
reference data. Most builds use this framework. For example, the fact lower level of detail than monthly totals of
build uses the Product, Time and Location dimensions. Dimension data sales orders.
is read in first to provide reference for the fact (transactional) data.
3. The dimensional key elements (Product, Time and Location) map back
to the dimensional framework to check referential integrity. The
dimensional framework is also used to identify the grain of the
transactional data, provide rollup levels, and as a source of BI metadata.
5. The main purpose of a fact build is to deliver a fact table (5a.) into a
data mart. It may optionally deliver dimension tables (5b.) and BI
metadata (5c.).
DecisionStream Interface
Visualization
Tool Bar area
Navigation
Tree
Technical Information
By default, DecisionStream displays
information in the Visualization pane at
100% of the actual size. However, you can
The DecisionStream Designer provides a graphical interface to use with the change the size of the display by clicking
Windows environment. When you have opened a catalog, the full functionality of the Change the zoom level on the
the DecisionStream Designer window is available. The window consists of the visualization button on the toolbar.
following elements:
Element Purpose
Menu Selects an option relevant to the function that you are
performing.
Toolbar Provides quick access to the main components of
DecisionStream. For a description of each of the buttons,
see the online help.
Tree pane Displays the builds, JobStreams, and reference structures in
the current catalog.
Visualization pane Displays information about the selected build or JobStream.
You can double-click any item in this pane to find out more
about it.
Builds folder Contains all the fact builds and dimension builds in the
current catalog.
Library folder Contains all the reference dimensions (including hierarchies,
auto-level hierarchies, lookups and templates), connections,
and user-defined functions in the current catalog.
JobStreams folder Contains the collections of processes used to deliver the
data warehouse.
The fact build visualization provides a generic view of the build that you are
working with. The fact build process consists of the following elements:
Element Function
When you click a reference structure in the Tree pane, DecisionStream shows a
visual representation of the whole structure in the Visualization pane.
Element Function
Hierarchy in the Dimension build Template listing Dimension table Database that
dimensional that processes data columns in the containing data from holds the
framework. from the hierarchy. dimension table, the hierarchy. dimension
plus their behavior. table.
A dimension build reads reference data from a hierarchy in the dimensional Key Information
Dimension builds do not use a
framework and delivers this data to a dimension table. A dimension build consists
transformation model.
of the following elements:
Element Function
JobStreams Visualization
The JobStream may also include alerts that can be used by NoticeCast or
instructions to send email notifications upon the completion of other tasks. The
Visualization pane shows all of the nodes contained in the selected JobStream, as
well as the order of their execution.
You can use JobStreams to schedule a set of builds that, once executed, will
create and update a data warehouse. JobStreams are discussed further in Module
13, "JobStreams."
Catalog
Build
DataStream
Transformation
Model
Delivery Modules
Element Function
JobStream
Dimension
Hierarchy
Templates
Connections
Functions
Hierarchies and Structures that hold related members organized into levels. A
Lookups lookup has only one level, while hierarchies can have one or
more levels.
DecisionStream Tools
Tool Function
From DecisionStream, you can customize your system environment. From the
Tools menu, click Options to modify the look and feel of DecisionStream. From
the Options dialog box, you can:
• have the transformation model elements and build details appear in the
Visualization pane by default (it is best practice to have these options
selected)
Demo 1-1
Purpose:
We want to open an existing catalog and examine its elements,
as well as become familiar with the DecisionStream and
SQLTerm interfaces.
Instructional Tips
This may be a good time to point out to the
class where they can access the on-line
documentation from outside of
DecisionStream. Have the students
navigate from the Start menu to:
5. From the Help menu, click Contents and Index. Programs/Cognos Series 7 Version
2/Documentation/Cognos DecisionStream.
The Help Topics: DecisionStream Help dialog box appears.
6. Examine the available Help topics.
7. Close Help.
12. Right-click the Transformation Model box, and then click Show Build
Details.
The result appears as shown below.
5. In the Database for SQL Operations box, click DS_Sources, and then in A: The Executed SQL tab shows the last
the Database Objects pane, click the plus sign (+) to expand the SQL statement that was executed by
SQLTerm. This tab only applies to data
DS_Sources database.
sources that are accessed through
There are seven tables in this data source. Universal Data Access (UDA).
Today’s Goal
Summary
1. Getting Started
2. Create a Catalog
3. Create Hierarchies
5. Conformed Dimensions
6. Derivations
Objectives
What is a Catalog?
A DecisionStream catalog provides a central repository for the information that Technical Information
The catalog may take the form of a text file
defines how DecisionStream extracts, transforms and delivers data. The catalog (with a .ctg file extension) if it is a
stores DecisionStream builds, connection specifications, JobStreams, user-defined backed-up version of another catalog.
functions and the dimensional framework.
You can produce HTML documentation that
You cannot use DecisionStream unless you first select and open a catalog or summarizes or provides detailed information
create a catalog. Only one catalog can be open at a time by a single instance of about the contents of a catalog (builds,
DecisionStream Designer (you can start multiple instances of DecisionStream connections, and so on).
Designer).
After working with the catalog, you save your changes by clicking Save Catalog
from the File menu or clicking the Save button on the toolbar.
Note: The catalog does not contain the data that DecisionStream will
manipulate and deliver. It holds the configuration details that determine
where the source data is coming from and how it will be transformed and
loaded into the target data mart(s).
Catalog Database
Development
Catalog
Tables
DecisionStream
Source Target
Connection Connection
Source Target
Connection Connection
Each catalog contains a Library, which, in turn, holds the dimensions, which Instructional Tips
JobStreams will be discussed in Module 13,
make up the dimensional framework, connections to various data sources, and "JobStreams." The dimensional framework
user-defined functions. will be covered in Module 3, "Create
Hierarchies." Fact and dimension builds will
Items in the Library may be used throughout the catalog in different fact builds, be discussed in Module 4, "Create Basic
dimension builds and JobStreams. The components stored in the Library are used Builds." User-defined functions will be
throughout the catalog, enabling the reuse of these components. You can build covered in Module 12, "User-Defined
multiple projects using the same supporting Library components, which shortens Functions and Variables."
the development time for these projects.
Create a Catalog
After you create the database, you can create the catalog. Click New Catalog from Instructional Tips
the File menu, or click the New catalog button on the toolbar. You must then If a catalog is already open,
type a name for the new catalog and (if preferred) a business name and DecisionStream displays a message
description of the catalog. Click Next to finish creating the catalog. informing you that the current catalog will
be disconnected. Click Yes to acknowledge
this message.
From the left pane, you select the physical database (you just created) that will
hold the catalog tables. The New Catalog dialog box will show fields that are The shortcut for creating a new catalog is
appropriate to the type of database that you have selected (such as ODBC or Ctrl+N.
SQL Server).
Sources Targets
Flat Files
Connect to Sources
Databases and Targets
Each connection provides information so that DecisionStream can link to a data Technical Information
Many data sources can be used with
source or target. The connection: DecisionStream, but not every connection
method may be available on your
• identifies the particular data source or target computer. If you do not have a specific
connection method, DecisionStream will
• specifies the connection method that must be used to connect to the data indicate this in the Connection Properties
dialog box. The list of available connection
methods depends on the scope of your
• provides information, such as a user name and password, that the DecisionStream license.
database management system requires when DecisionStream connects to
the data Cognos SQL is an extension of SQL 99.
Using Cognos SQL, you have a greater
• specifies the dialect of SQL used by the connection (either native SQL or degree of portability between mixed
database environments because a
Cognos SQL) common dialect can be used.
The connections are contained within the DecisionStream catalog and are specific By default, a connection accepts any
to that catalog. vendor-specific SQL SELECT statement in
a data source, including nonstandard SQL
The source data may come from relational databases or flat files. Flat files are extensions and hints. A connection or data
described in definition files (.def) by using the SQLTXT Designer tool and then source can optionally use Cognos SQL.
You cannot use Cognos SQL for SQLTXT
accessed in the same way as regular relational databases. You can define several
connections.
different connections within a catalog, including ones to DB2, Oracle and
Microsoft SQL Server data sources. You can deliver the transformed data to
various targets, including databases and flat files. The related metadata may be
delivered to PowerPlay Transformer models, Architect models, and Impromptu
catalogs and reports.
Select the Cognos SQL check box to use Cognos SQL when you construct an
SQL statement for components using this connection. If you clear this check
box, you must use native SQL for the database you are accessing.
The selection that you make here determines the default for the Cognos SQL
check box in other components that use this connection.
What is SQLTerm?
Once connections to data sources are established within a catalog, you can use
SQLTerm against them to view the data they contain, as well as change the
structure and contents of that data.
SQLTerm is the DecisionStream terminal for SQL. You can run SQL statements
against any data source that DecisionStream can access.
Using SQLTerm, you can compose and run different types of SQL statements by
using:
To display SQLTerm, you can either click SQLTerm from the Tools menu, or
click the Run the SQLTerm Tool button on the toolbar.
Interrupt current
processing
Execute the SQL Clear SQL query Database for Specify that you want to use
Query, and limit and results SQL operations Cognos SQL instead of native SQL
results to one row
Execute the
SQL Query
Displays the
SQL statements
that were
executed (not
available for
SQLTXT data
sources)
Database Objects
Using SQLTerm, you can write and test SQL statements against your data Technical Information
The Executed SQL tab is especially useful
connections. Also, because you can view the data itself, SQLTerm can give you if you are using Cognos SQL to execute
greater insight into what each data source contains and what it can be used for. your queries. The Executed SQL tab will
show you the actual SQL that has been
The SQLTerm interface shows a list of the database connections. You select a executed in the database's native dialect.
database from this list and then run the SQL operations you need.
Pane Description
Database Objects Displays a "tree view" of the tables within the current
database.
To create an SQL statement for execution, you can construct the entire statement
by hand. You can also right-click a table or column in the Database Objects pane
and click one of the most commonly used options, such as Select rows, from the
shortcut menu. This will automatically produce a statement in the SQL Query
pane that selects all the rows from a table.
You can also use any of several "click-drag" options to create your SQL
statement. For example, using the "control-click-drag" option on a table object
will create an SQL SELECT statement that explicitly includes all columns from
the table.
Select the Cognos SQL check box to specify that you want to use Cognos SQL
when you construct the SQL statement. If you clear this check box, you must use
native SQL for the database you are accessing.
Note: The default for the Cognos SQL check box is determined by whether
you selected the Cognos SQL check box in the Connection Properties
dialog box. You can modify this setting if necessary.
When you add an SQL statement in a window or dialog box, DecisionStream Instructional Tips
Emphasize that you can use SQL Helper to
displays an SQL Helper button, which opens the SQL Helper window. This help create hierarchies, lookups and fact
interface is similar to that of SQLTerm. Again, this interface makes it easier to builds. Essentially, any catalog component
specify the tables and columns that the catalog will use in the builds and the that uses a DataStream lets you access the
dimensional framework. You can write and test SQL statements before running SQL Helper tool.
builds or exploring hierarchies.
The only difference between SQLTerm and SQL Helper is that SQL Helper has
OK and Cancel buttons. Because SQL Helper is usually passed an SQL
statement, which you can modify, you can keep changes by clicking the OK
button or discard them by clicking the Cancel button.
Demo 2-1
Purpose:
We must develop a DecisionStream catalog to hold a simple
data mart that we will later analyze by using PowerPlay. First,
we must create a data source to hold the catalog. Then, we
must create the catalog itself. Finally, we must add a
connection within the catalog to a database that contains
transactional data.
Task 1. Create a data source to hold the catalog.
1. From the Tools menu, click ODBC Administrator. Instructional Tips
You can also access the ODBC Data
The ODBC Data Source Administrator dialog box appears. Source Administrator in Windows 2000
through the Start menu. Click
2. Click the System DSN tab, and then click Add.
Settings/Control Panel/Administrative
The Create a New Data Source dialog box appears. Tools/Data Sources (ODBC).
5. Click dssales.mdb in the Database Name list, and then click OK.
6. In the Data Source Name box, type DS Sales.
7. Click OK.
We have successfully created a data source that can be used within the
catalog.
8. Repeat steps 2 to 7 to add three data sources named DS Reference, DS
Stock and DS Output.
The DS Reference data source will be based on the dsref.mdb database.
The DS Stock data source will be based on the dsstock.mdb database.
The DS Output database will be based on the dsout.mdb database. All
three of these databases are in the C:\Edcognos\DS7001 folder.
9. Click OK to close the ODBC Data Source Administrator.
Task 4. Define a connection to the Sales data source.
1. Right-click Connections , and then click Insert
Connection.
The Connection Properties dialog box appears.
2. In the Alias box, type Sales.
3. Click the Connection Details tab, and then click ODBC in the list of
databases on the left side.
4. In the Data Source Name box, click DS Sales, and then click Test
Connection.
A dialog box appears, indicating that the connection is OK.
5. Click OK to close the DecisionStream Designer dialog box, and then
click OK to close the Connection Properties dialog box.
We have successfully connected to a transactional data source.
Task 5. Use SQLTerm to view the data in the Sales data
Instructional Tips
source. You can also type any valid SQL
statement in the SQL Query window
1. From the Tools menu, click SQLTerm.
and run it in the same fashion.
SQLTerm opens.
2. Maximize the window if necessary, and in the Database for SQL
Operations box, beside the toolbar, click Sales.
3. In the Database Objects pane, double-click Sales.
The tables ds_forecast and ds_sales are now available for analysis.
6. Click the Clear SQL Query and Results button to clear both
panes.
7. Repeat Steps 4 through 6 to view the data in the ds_forecast table, and
then close SQLTerm.
8. From the File menu, click Save Catalog.
Results:
We have created a DecisionStream catalog and data sources,
and then added a connection within the catalog to one of these
data sources. We then viewed the data by using SQLTerm.
This transactional data will be used to populate the data mart.
It is a good practice to back up your catalogs, which involves writing a text Instructional Tips
version of the catalog tables to a file (with a .ctg file extension). The back-up You can backup to an existing .ctg file or
process is useful for recovery and emergency situations. For example, changes create a new file.
may have been made to a catalog, but later these changes caused problems. The
The restoration process may take a long
catalog can then be restored (using the backed-up text file) to an earlier version. time, especially if the catalog has a lot of
components that must be reproduced.
Backups are invaluable if a catalog has been permanently corrupted and needs to
be replaced. Backups can also help you move from one DBMS environment to
another, or send copies of the catalog to others without sending the entire
database.
When restoring a catalog from a .ctg file, first create another .ctg file of the
current catalog that you want to replace because DecisionStream deletes all data
from the current catalog before performing the restoration. If no copy of the
catalog is created, and the .ctg file is defective in any way, the contents of the
current catalog will be lost.
Demo 2-2
Purpose:
It is possible that Day1Catalog will become unusable. Also, we
may have to move this catalog to a different system later on.
As a result, we must back up Day1Catalog to a Catalog Backup
(.ctg) file. We will then restore the current catalog by using this
file.
5. Expand Connections.
The result appears as shown below.
Results:
We have backed up Day1Catalog to a text file and then
restored the catalog by using the text file.
SQLTXT is an implementation of SQL over text files. With the SQLTXT DBMS Instructional Tips
You cannot use the JOIN clause within
driver, you can access data in delimited text format through SQL. This is especially SQLTXT, therefore, use single table
important because, in many cases, much of the data to be stored in the warehouse SELECT statements. Also, you cannot use
must be obtained from individual flat files instead of relational databases. the GROUP BY or ORDER BY clauses. Use
the SELECT DISTINCT clause instead.
SQL usually can be used only against relational databases. However,
DecisionStream has its own SQL parser that makes SQL access on a flat file
possible.
The SQLTXT component creates a definition file that you can use to specify one Technical Information
DecisionStream cannot be used to directly
or more flat file definitions, ASCII or EBCDIC. The definitions are stored in a access mainframe files (such as those
file that has a .def file extension. The definition file (.def) can be maintained existing on an MVS computer). However,
through the SQLTXT Designer or can be edited manually in a regular text editor this is not necessarily a bad thing, because
such as Notepad. we usually do not want users to have such
direct access anyway. Mainframe files can
SQLTXT restrictions are: be exported to text and transferred
elsewhere by using File Transfer Protocol.
They can then be accessed with SQLTXT
• single table SELECTS (no table joins) like other text files.
• no updates
• no sorting or grouping
Technical Information
SQLTXT has a user interface to maintain specifications showing how text files You can also use the STREAM database to
are interpreted as database tables. This interface includes table and column access text files directly without using
maintenance facilities. SQLTXT Designer to configure them first.
See the User Guide for more information.
The records of text files can be delimited (typically by a carriage return) or of a
fixed length.
Fields within each record may also be delimited (typically by a comma or a tab) or
of a fixed length.
To add a table manually, select the file that holds the data, and then add the
required columns.
To add columns to the table, select the table and then select Insert Column. Technical Information
SQLTXT data types have various
characteristics:
You must then provide the values in the following table: • BOOLEAN: represents TRUE or
FALSE in one or more characters.
Field Description • CHAR: holds string values, the length
of which cannot exceed the column
Data Type Select from BOOLEAN, CHARACTER, DATE, length specification. For delimited
FLOAT, INTEGER and NUMBER. files, you must enclose the string
value in quotation marks. For fixed-
Length Type the length of the data type if you selected width files, it left-justifies the string
value and right-pads it with spaces to
INTEGER, CHARACTER, or FLOAT. the column length.
• DATE: represents date and time
Format This field is available for date formats if you selected a values.
date data type. • FLOAT: represents floating point
numbers as strings. If the record is
Nullable Select the Nullable check box if the column can accept fixed length, the number is right-
null values. justified and left-padded with spaces
to the column width.
Key Select the Key check box if the column forms part of • NUMBER: can be any number of any
size.
the primary key of the table. Flat files typically do not
• PACKED: stores decimal numbers
have keys. with two digits per byte. This data type
is available only for fixed-width files.
Tables and columns can be edited, renamed, or deleted. • ZONED: stores decimal numbers with
one digit per byte. This data type is
available only for fixed-width files.
Step Description
Select whether records If the records are delimited, choose the delimiter (for
are delimited example, CR-carriage return). If the records are of a
fixed length, specify the record size.
Preview rows Select the number of rows that you want SQLTXT to
sample.
Header rows Select the number of header rows that the file contains.
Specify whether If the columns are delimited, select the delimiter used to
columns are fixed or separate the columns in the table. If the columns are of
delimited a fixed width, you will specify the width of the columns
in the next step.
Name and format The wizard prompts you to name and format each
column.
The first screen in the Import Wizard prompts for file type and file header
information.
If the columns are delimited, you can specify the delimiter type in the second
screen of the Import Wizard. If the columns are of a fixed width, you can specify
the width of each column.
In the third screen of the Import Wizard, you may change the column headings
and data types for each column.
Note: If you change the data type for a column, that type will change only for
the number of preview rows set in the first screen. To apply the change to
all the rows, click the Process All Records button.
Demo 2-3
Purpose:
We have transactional data that we want to incorporate into
our data mart. This data consists of sales figures within a flat
file, which must be configured to produce a definition file. We
then must access this file within Day1Catalog through a data
source connection. Finally, we want to view the contents of
this file by using SQLTerm.
5. Close SQLTerm.
Task 4. Save and backup the catalog.
1. From the File menu, click Save Catalog.
2. From the File menu, click Backup Catalog.
The Backup Catalog dialog box appears.
3. In the Save in list, navigate to C:\Edcognos\DS7001.
4. In the File name box, type Demo 2-3, and then click Save.
Results:
We have configured a flat file to produce a definition file,
accessed this file within Day1Catalog through a data source
connection, and then viewed its contents by using SQLTerm.
Summary
Workshop 2-1
Workshop Format
The following workshops have been designed to allow you to work at your own
pace. The workshops are structured as outlined in the following sections.
The Business Question Section
The first page of each workshop presents a business-type question followed by a
series of steps. These steps provide additional information to help guide the
student through the workshop. Within each step, there may be numbered
questions relating to the step. Solve the tasks by using the skills you learned in this
module and in the previous ones. If you need more assistance, you can refer to
the Task Table section that provides more detailed instruction.
The Task Table Section
The second page of the workshop is a Task Table that presents the question as a
series of numbered tasks to be accomplished. The first column in the table states
the task to be accomplished. The second column, Where to Work, indicates the
area of the product to work in. Finally, the third column provides some hints that
may help you complete the workshop. If you need more assistance to complete
the workshop, please refer to the Step-by-Step section in Appendix A.
The Workshop Results Section
This section contains a screen capture(s) of an interim or final report and/or
answers to the questions asked in the Business Question section.
The Step-by-Step Section
The Step-by-Step instructions for completing all the tasks are in Appendix A of
the Student Guide. Each task in the Task Table is expanded into numbered steps,
scripted like the demos.
The first data source contains inventory information about each product that the
company has in stock. The second data source holds reference data (including a
standard list of the 16 products that the company sells), and the third data source
will contain the completed data mart.
• use SQLTerm to view the data in the Stock and Reference data sources.
This will give us some idea of what we have to work with. Make sure you
save your catalog when you are finished.
For more detailed information outlined as tasks, see the Task Table on the next
page.
For the final result, see the Workshop Results section that follows the Task
Table.
1. Getting Started
2. Create a Catalog
3. Create Hierarchies
5. Conformed Dimensions
6. Derivations
Objectives
What is a Dimension?
Dimensions provide context for the key performance indicators (KPIs) that a
business uses to measure its performance.
For example, a retail chain-store may categorize its sales data by the products that
it sells, by its retail outlets, and by fiscal periods. This organization has the
business dimensions Product, Location, and Time. The measures of the business,
such as how much it sells, lie at the intersection of these dimensions.
You can derive summary information by aggregating data along one or more
dimensions. The slide example on the right shows the aggregation of data along
the Location dimension to give the total sales of widgets during July.
Data marts are dimensional. The dimensional framework that can be designed
using DecisionStream, as shown in the slide example, permits the reuse of
common dimensions. The slide diagram demonstrates a data mart that is not a
"stovepipe" but one that can be used for more detailed analysis. Users can drill
across from one subject area to another through the shared dimensions.
You may use any approach suited to your organization in developing your data
marts. We recommend that you create dimensions that can be shared across data
marts. Sharing makes it easier, when done correctly, to link separate fact tables
and for users to develop reports across the enterprise and not just on a single area
of the business. This is the result of a strong dimensional model. The conformed
dimensions approach is outlined in greater detail in Ralph Kimball's writings.
Build slide.
How DecisionStream Creates the Data 6 clicks to complete.
Warehouse
Dimensional
Dim Framework
Data Product Time Location
Sources 1
3.
5b.
DataStream
Dimension
Data Delivery
Source
5a.
2 Fact
DataStream
Delivery
Data 5c.
Source
MetaData
Delivery
After the required data sources are identified and the catalog is created, the first Instructional Tips
step in developing a conformed data mart is to develop the dimensional It is important to orient your students on a
framework. continual basis so that they are aware of
where they currently are in the
development lifecycle of their data mart.
The dimensional framework consists of multiple hierarchies that represent the
logical structure of the data independent of any physical data source. The Within its catalogs, DecisionStream uses
dimensional framework is defined in the catalog. It represents the core hierarchies to define business dimensions.
components of your business. The dimensional framework represents the way the This is indicated at the top of the slide.
organization thinks about its data, instead of the way it is physically stored.
Dimensional Framework
Instructional Tips
The dimensional framework shown in the slide is made up of the Date, Fiscal, Remind students that connecting to the
Location and Product dimensions. The Hierarchy wizard can help you construct required data sources is the first step in the
a dimensional framework of this sort. process of creating the dimensional
framework. Maybe you could pose it as a
DecisionStream uses SQL to retrieve the required data from the source database. question.
The DataStream gathers together the data sources and defines where the data is
assigned in the hierarchy. For example, the Product dimension requires
information about product types. The product data resides in the source database.
The DataStream assigns the product type information to the Product Type level
in the Product hierarchy.
Dimension Components
Component Function
Hierarchy Components
Dimension
Hierarchy
Level
DataStream
Data Source
Static Members
Template
To enable the flow of data from a data source to a reference structure, you must Instructional Tips
map the data source columns to the DataStream items, and then map the Note that if you want to modify the SQL (for
example, to add more columns to the
DataStream items to the level attributes. SELECT statement) you must go back and
re-map the columns added or modified to
Mapping lets you incorporate multiple data sources and transfer identical data the level attributes within the DataStream.
from these sources into the same attributes of a hierarchy level. As shown in the
slide example, you might have two tables that contain similar information but in
different languages, such as English and French. In this case, you will want to
map the data coming from the columns 'state_name' (in the English source) and
'nom_du_province' (in the French source) into the same level attribute (in this
case, "state_name").
Hierarchy Wizard
The Hierarchy wizard helps you create hierarchies and lets you choose a table Instructional Tips
structure that most accurately describes the reference data source. Using the You might want to inform the students at
Hierarchy wizard, you can select tables and columns and use expressions for this point that constructing a hierarchy from
the rows of one table will usually result in
member ids, captions, and parents. The Hierarchy wizard creates an ALL level at ragged hierarchies. Creating a hierarchy
the top of the hierarchy by default. It also assists in creating levels, defining from multiple tables (snowflake schema)
additional attributes and it generates a template to define the properties of the data usually implies that the source data is very
attributes in the data source. well normalized.
Type Usage
Hierarchy from the Use this option when your source data is in a single
columns of one table table with specific columns representing levels in the
hierarchy.
Hierarchy from the rows Use this option when your source data is in a single
of one table table with sets of rows for each level in the hierarchy,
each related by a parent Id column.
Hierarchy from multiple Use this option when your source data comes from
tables multiple tables, with each table representing a single
level in the hierarchy. Each row has a parent Id that
relates to another row in its parent table.
Note: You must be connected to your reference data sources to use the
Hierarchy wizard.
Demo 3-1
Purpose:
As mentioned earlier, we are creating a data mart that the
instructor can use to analyze the company's product inventory
in the U.S. by using PowerPlay. Therefore, we must look at our
data sources to determine the types of hierarchies that we
must create.
Results:
We have looked at the reference data sources and have
determined the three different types of hierarchies we must
create. We will create one hierarchy from the columns of a
table, one from the rows of a table, and one from multiple
tables.
Year
Quarter
Year Quarter Period
1995 1995Q1 199501
1995 1995Q1 199502 Period
1995 1995Q1 199503
1995 1995Q2 199504
The slide example shows a fiscal hierarchy based on the relationship between
columns in the same data row. The source table includes year, quarter, and period
columns. Each year contains quarters, and each quarter contains periods. The
ascending hierarchical order is therefore period→quarter→year. Each data row
identifies the year, quarter, and period to which it relates.
Note: The example in the slide is not standard for an operational system.
However, an earlier attempt at a data model might present the data in this
manner. This is also the structure you might see in the result set of
complex SQL queries that join multiple tables.
Using the Hierarchy wizard, follow these steps to define a hierarchy from the
columns of one table.
Step Description
Name the hierarchy Provide a descriptive and meaningful name for the
hierarchy. You can also provide a caption and notes.
Select the source table DecisionStream queries the list of tables in the
connection once. If you cannot connect to the data
source the query will fail.
Define a static top level This step is optional. You can create an artificial
ALL level, and DecisionStream automatically
supplies a name, caption, and Id value for the
ALL level.
Insert each level You can add, delete, edit, or re-order the levels in a
hierarchy. The wizard then creates the level
definitions and generates the data sources for the
levels.
Add attributes to the You can add additional attributes to each level in
structure the hierarchy.
CITY
This type of hierarchy is based on relationships between rows of the same table. Instructional Tips
In relational terms, these are recursive relationships. Ragged hierarchies are discussed in
Module 17, "Ragged Hierarchies."
In the slide example, the source table includes Type, Parent, and Location. Within
the Parent column, each row refers to the Location value of its parent.
Note: This type of recursive data source relationship produces a fixed number
of levels. The levels are identified and named by a column. In the slide
example, the Type column identifies levels. If there is no fixed number of
levels (that is, ragged hierarchies) you need an auto-level hierarchy to
identify the levels.
Using the Hierarchy wizard, follow these steps to define a hierarchy from the Instructional Tips
rows of one table. If you leave "Select the column for the level
name" as "Auto-Level," then the hierarchy will
Step Description
automatically be created as an auto-level
Name the hierarchy Provide a descriptive and meaningful name for the hierarchy, even if you do not specify a column
hierarchy. You can also provide a caption and for the top parent Id.
notes.
Select the hierarchy source DecisionStream queries the list of tables in the Key Information
There is a bug in Series 7 version 2 (version
connection once. If you cannot connect to the 7.1.60.0 on the About dialog box) where the
data source, the query will fail. product crashes when this wizard option is
Define the source columns Id column is the unique identifier for a level. used. A run-time error is generated when
pressing Next after selecting columns for the
Caption is a column containing a description Id, Caption, Parent, and level name
associated with the Id. This information is often attributes. See problem number 398291.0 in
used for display and presentation purposes rather Trakker for more information.
than the Id or name.
Parent is the column that identifies the parent.
The Level Name is the column that names the
levels in the hierarchy, or it is Auto-Level.
Define a static ALL level This step is optional. You can create an artificial
ALL level and DecisionStream automatically
supplies a name, caption, and Id value for the ALL
level.
Assign remaining attributes You can add attributes to each level of the
hierarchy.
1 CLASS MEN
2 CLASS WOMEN
101 CATEGORY FORMAL 1 Resulting hierarchy
102 CATEGORY FORMAL 2
103 CATEGORY CASUAL 1
S86 PRODUCT DRESS SHIRT 101
S87 PRODUCT BLUE JEANS 103
Level details
Name ID
Caption Description
Parent Parent_ID
Level Name Type
In the slide example, the data source contains information on clothing. A Instructional Tips
Expand on the example shown in the slide
hierarchy will be based on the rows of this data source.
with the text below.
The data source has two top-level categories, a Men's class and a Women's class.
Each of these classes has two children, a formal category and a casual category.
Below each of these categories are the products that reside in the Men's formal
and casual categories and the Women's formal and casual categories.
Class
Class
Household Family
Family Class
Books Household
Product
Product Family
Dictionary Books
This type of hierarchy is based on relationships between multiple data tables. The
hierarchy follows one-to-many relationships between the tables.
In the slide example, one family of products may consist of many products, but a
product must belong to only one family. A class of products may consist of many
product families, but a product family may belong to only one product class.
Three tables contribute to this hierarchy.
Using the Hierarchy wizard, follow these steps to define a hierarchy from
multiple tables:
Step Description
Name the hierarchy Provide a descriptive and meaningful name for the
hierarchy. You can also provide a caption and notes.
Select the hierarchy source DecisionStream queries the list of tables in the
connection once. If you cannot connect to the data
source, the query will fail.
Define a static ALL level This step is optional. You can create an artificial
ALL level and DecisionStream automatically
supplies a name, caption and Id value for the ALL
level.
Insert each level You can add, delete, edit, or re-order the levels in
the hierarchy.
Demo 3-2
Purpose:
We must develop a hierarchy so that we can look at our
business from a geographical point of view. The Location
hierarchy will organize this area of our business into time
zones and states.
Note: The next time you open the catalog, DecisionStream displays the
hierarchies in alphabetical order.
Level Attributes
• map the columns that the SQL statement returns, as well as any literal
values, to DataStream items.
In the slide example, the Location hierarchy contains the State level.
Technical Information
Mapping is used to specify the relationship between the columns of source data
When you create a data source query for a
and the DataStream items. hierarchy, lookup, or fact build, the SQL
statement can be parsed or prepared.
To view mapping of a DataStream, right-click the DataStream, and then click
Properties. Parsing does not send the SQL statement
to the database, so it can be used when
Column Function you are unable to connect to the database.
DecisionStream parses the SQL statement
and returns the result set of columns as
Data Source Shows the column(s) that the SQL data source(s) written in the SQL statement.
returns.
When you use parse, the SQL statement
Maps To Shows the DataStream items to which the data must begin with SELECT. Parse may not
source columns are mapped. evaluate database specific syntax correctly.
This is quicker than using prepare.
However, it can fail if the SQL is too
complex.
Mapping is also used to specify the relationship between the DataStream items
and the attributes of a level.
To view mapping of a level, right-click the level, and then click Properties.
Column Function
The attributes for a specific level contains the Id, caption, and parent attributes, as
well as any other attributes that you must include for that level (such as Price). In
the slide example, the level attributes contain the timezone_cd and state_name
attributes, which serve as the Id and caption, respectively, for this level.
A great deal of flexibility is available as to where you can insert data sources. You
can insert them at several different levels in a hierarchy, including above the top
level of the hierarchy. However, you must follow these rules:
• If all the columns from a query only provide information to a single level,
then the data source may reside at that level. The columns in this query
will not be visible at any other level.
Use the Reference Explorer to run the SQL to temporarily populate the hierarchy
for testing. Testing the hierarchy tells you:
You can use the Reference Explorer to view a hierarchy in two ways. In the
hierarchy view, the reference data is displayed in true hierarchical structure with
members linked to their parent. In the level view, the hierarchical members are
linked under the levels they belong to with no specific reference to the parent.
Note: The members are not stored. They are only loaded into memory as
needed for processing.
Build slide.
3 clicks to complete.
Date Hierarchy Wizard
Often, there is no database table that contains a full range of dates for use in a Instructional Tips
date hierarchy. To assist you in creating a date hierarchy, DecisionStream Discuss the inherent difference that exists
provides the Date Hierarchy wizard. You can choose to include levels for year, between using the Date Hierarchy wizard
and constructing a hierarchy based on time.
quarter, month, week, and day. Point out that using the Date Hierarchy
wizard will generate static data only.
When you use the Date Hierarchy wizard, the hierarchy is not based on
source data. Therefore, all members are static members and are physically
stored in the catalog.
When you provide the level details you must type or select relevant values.
Detail Description
Level Name You can change the default level name.
Id Format From the list, select the format for the level. Alternatively,
you can type a format. The Id Sample box shows the
current date and time in the selected format.
Caption Format (Optional) From the list, select the caption for the level.
The Caption Sample box shows an example of the
selected format.
Week Parent Relationship This option is available only for a week level. From the
list, select the rollup option to use.
Days to Include This option is available only for a day level. From the list,
select the days for which you want to display data. You
can select from Every Day, Weekdays Only, or Weekdays
and Saturdays.
Handle Weeks
August September
Week 34 Week 35 Week 36 Week 37 Week 38
34 35 36 37 38 34 35 36 37 38 34 35 36 37 38
A week may straddle the start and end of its parent, perhaps a month or year.
To help you resolve this problem, you can use the Date Hierarchy wizard to
specify how you want DecisionStream to handle weeks that cross a parental
boundary. You can choose from the following options:
Weeks roll to parent start Weeks roll to the parent month in which the
week began.
Weeks roll to parent end Weeks roll to the parent month in which the
week ended.
Weeks roll to parent start and This option splits the week between the two
end parent months.
Weeks on same level as parent Weeks do not roll to the month level.
Demo 3-3
Purpose:
We will use the Date Hierarchy wizard to create a static Date
hierarchy that we can use in the analysis of our business. This
hierarchy will represent the time dimension for the first three
years of our business.
Results:
We have used the Date Hierarchy wizard to create a static date
hierarchy that will represent the time dimension for the first
three years of our business.
Summary
Workshop 3-1
• Use the ds_fiscal table to create a hierarchy called Fiscal. You do not
require a static ALL level.
Fiscal:
For more detailed information outlined as tasks, see the Task Table on the next
page.
For the final result, see the Workshop Results section that follows the Task
Table.
2. Create the Fiscal hierarchy. Hierarchy wizard • Create the hierarchy from the
columns of one table (Star
Schema).
3. Create the Year level for Level Details window • Use the fiscal_yr column for
the hierarchy. the Id and the fiscal_yr_desc
column as the caption.
4. Create the Quarter level. Level Details window • Use the fiscal_qtr column for
the Id and the fiscal_qtr_desc
column as the caption.
5. Create the Month level Level Details window • Use the period_no column as
and complete the the Id and the period_no_desc
hierarchy. column as the caption.
6. Examine the Fiscal Build tree • Expand and collapse the Fiscal
hierarchy. hierarchy to review the
Data Source Properties components.
window, SQL tab
• Review the Fiscal data source
DataStream Properties SQL.
window
• Review the properties of the
DataStream.
7. Use the Reference Reference Explorer • Click the plus sign (+) to view
Explorer to examine the the members within 1999.
Fiscal hierarchy.
1. Getting Started
2. Create a Catalog
3. Create Hierarchies
5. Conformed Dimensions
6. Derivations
Objectives
What is a Build?
Builds are the foundation of the data warehouse, which is designed to provide
structured and clean data to users. Ralph Kimball (1998) noted that good data
access is the foundation for excellent decision making.
There are two types of builds in DecisionStream. Fact builds deliver transactional
data. Optionally, they may also deliver dimension data and metadata. Dimension
builds deliver only dimension data.
DecisionStream offers considerable build flexibility. Within the same build, it can
acquire, merge, and aggregate data from different data sources. It can also deliver
fact data, dimension data, and metadata to multiple targets.
Template Outlining
the Columns of the
Reference Dimension Table Dimension
structure to Dimension and the Behavior table Target
be delivered Build of these Columns created Data Mart
A dimension build delivers data that describes a single dimension of the business,
such as Product or Customer. It acquires dimension data from the hierarchy that
you specify and loads it into the data mart in the form of one or more dimension
tables.
Because a fact build can also deliver dimension data, it may seem unusual to do
this in a separate process. However, there are several instances when you want to
use a dimension build instead of a fact build to deliver dimension data. For
example, you may have a number of fact tables that share the same set of
dimension tables (conformed dimensions). You may also want to deliver all the
dimension data, although there may not be supporting fact data at the moment.
Finally, you may want to prepare the dimension tables prior to loading the fact
data (for example, you have to make gradual changes to dimension attributes).
You can use the Dimension Build wizard to quickly create a dimension build.
The wizard involves the following steps:
2. Define the schema that you want to use and the dimension (and
associated reference structure) that you want to deliver.
3. Define the schema naming conventions (how the table(s) and column(s)
will be named).
5. Define the build properties (such as how to handle multiple parents). Instructional Tips
SCDs are covered in Module 9, "History
6. Define attributes for slowly changing dimensions (SCDs). Preservation."
Demo 4-1
Purpose:
We want to create a simple dimension build based on the
existing Product hierarchy so that we can report on our
company's product inventory. This task will result in one
dimension table that utilizes a star schema.
7. Click OK.
The Product dimension build runs in a separate DOS window and
delivers 16 rows to the D_Product table of the data mart.
8. Press Enter to close the DOS window.
Results:
We have created a simple dimension build based on the
existing Product hierarchy. Implementing this build produced
one dimension table that used a star schema.
Build slide.
6 clicks to complete.
How DecisionStream Creates Data Marts
Dimensional
Dim Framework
Data Product Time Location
Sources 1.
3.
5b.
DataStream
Dimension
Data Delivery
Source
5a.
2. Fact
DataStream
Delivery
Data 5c.
Source
MetaData
Delivery
4.
In a fact build, the columns extracted from the transactional data (through SQL)
map to elements in the build's transformation model. DecisionStream uses build
elements as a transformation area in memory to manipulate the source data
before it is delivered to the target.
After the data has been transformed, DecisionStream organizes its delivery to
three types of modules: fact, dimension, and metadata. Each delivery, in turn,
may subscribe to some or all of the build elements.
Data Data
Product 1. Populate hierarchy in memory
Source Stream 1
10100101 2. Read fact source data
3. Deliver fact data
Data
Source
Data 2
Delivery
Stream 3
Transformation Model
DecisionStream can merge data from multiple sources, partition this data, and
deliver it to multiple targets.
First, the hierarchies are populated with reference data, which is then stored in
memory. This reference data is used perform data integrity checks on the
incoming fact data; for example, to determine if a sales record refers to a
legitimate product.
Fact data consists of the transactional information acquired through SQL queries
usually from OLTP systems. For example, the fact data for a department store
chain includes sales figures, such as the number of products sold by location
during the month of July. These figures are obtained mainly from the transactions
recorded by the cash registers in each store. However, additional sales records
may exist in separate data stores on other systems.
Once DecisionStream reads the fact data, it can be transformed, merged, and
aggregated as necessary, and then delivered to the data mart.
Dimension
Derivation
A calculation or transformation.
Four types of elements can be part of a delivered fact table. During the build
definition process, these elements form the transformation model.
You can create a basic fact build quickly by using the Fact Build wizard.
Wizard step Description Instructional Tips
You can access the Fact Build wizard from
Define the purpose of Name the new build, select the build style (for the toolbar or by selecting Tools/Fact Build
the build example, Cognos BI Mart), and indicate the target Wizard.
connection.
Create the DataStream Select the source tables and columns. The wizard will Key Information
create an SQL query based on these selections. A There are essentially three ways to create
a fact build:
build element will be created for each query column. 1. Use the Fact Build wizard to guide you
Assign the element types If necessary, modify the type of each element and the through all the steps required.
2. Create the entire build manually.
order of the elements in the transformation model. 3. Use the wizard to prepare the basic
Define the dimensions Select a hierarchy or lookup for each dimension structure of the build (such as the
DataStream and elements of the
element of the transformation model. transformation model) and then refine
it manually.
Define the fact delivery Select the physical structures into which the fact data
The last option is probably the most
will be delivered. You can also specify naming efficient, because the basic structure of a
conventions for these structures. build is likely to change at least somewhat.
Define the dimension Select the schema that you want to use for the You typically only deliver to an Architect
delivery dimension delivery modules and how the tables and model if you have access to Impromptu or
columns of these modules will be named. PowerPlay, or both, as well.
Note: If you want the build to give you the option of creating aggregate fact
tables, you must select the Allow Aggregation on Build Dimensions box.
Use a data transfer fact build to move data from one place to
another quickly and efficiently.
A data transfer fact build provides the default settings for copying data from one
DBMS to a single fact table in another DBMS. It is recommended that you use
this option when you just want to move data from one place to another, rather
than creating a data mart.
By default, when the data transfer option is selected, the Fact Build wizard creates
all transformation model elements as attributes. It creates neither dimension data
nor metadata deliveries.
In the slide example, a data transfer fact build was created using the Fact Build
wizard. When this fact build was executed, data was transferred from the
SourceConnect database to a single fact table (F_DataTransfer) in the
TargetConnect database.
Types of BI Marts
Star:
creates a star schema
creates one detail fact table and (optionally) a number of
aggregate fact tables, as well as a separate table for each
dimension
Snowflake:
creates a snowflake schema
creates one detail fact table and (optionally) a number of
aggregate fact tables, as well as a separate table for each
level in each dimension
The BI Mart created by the wizard, in either a star or snowflake design, can
deliver fact data, dimension data, and metadata in a form suitable for Impromptu,
PowerPlay Transformer, and Architect.
In a star schema, a single fact table is created, and all the data from each
dimension is stored in its own separate table. The primary key of this table is the
key (either business or surrogate) of the lowest level in the dimension. For
example, a Product dimension table has a primary key of Product Id, not Product
Type Id (because that is the unique Id of the next-highest level). Columns in the
table represent hierarchical levels in the dimension. The schema can therefore be
viewed as a fully denormalized representation of the dimension.
A data source for the fact build is chosen from the available
connections. This source contains the transactional data to be
transformed.
You must first select or add a data source for the incoming data.
Give the data source a name and select a connection from within the catalog.
In the slide example, a Stock data source will read from the Stock connection that
has already been created within the current catalog. In the next step, the Fact
Build wizard will create an SQL statement that actually reads the data.
After you identify your connection, you create the source SQL from within the Instructional Tips
You can modify the SQL statement after
Data Source wizard. Using this wizard, you can browse and select tables and you create the build by opening the
columns for your query. As you select tables and columns, the query is built Properties window of the data source and
automatically. clicking the SQL tab. This window also
gives you access to SQL Helper, which you
If you select more than one table, DecisionStream joins them where possible can use to test any changes that you have
(it issues an error if the join is not possible). In the right pane of the window, made.
DecisionStream inserts the SQL statement that corresponds to your selections.
You may edit this statement manually if preferred.
Alternatively, you can type the SQL statement directly in the right pane. Click the
SQL Helper button to display SQL Helper, which can assist you in testing the
statement.
When you click the Rebuild SQL Statement button, DecisionStream re-creates
the SQL statement by using the selections that you initially made in the left pane.
This is useful if you have edited the SQL statement and want to return to the
original statement that DecisionStream created.
Instructional Tips
The Fact Build wizard creates a transformation model (internal) element for each Derivations can be used to enforce
column in the SQL query. The source data columns in the SQL statement are business rules across the enterprise. For
mapped to the corresponding model elements, which make it possible for example, it may be worthwhile to have a
DecisionStream to transform the source data as necessary before loading it into calculation that does not already exist in
the target data mart. the source data, such as Gross Profit
Margin. Calculating this figure from the
The mapping shown in the slide example is performed automatically by the existing source data may provide the
additional information needed to make
wizard. Later, you can modify the mappings by right-clicking the DataStream better business decisions.
icon and then clicking Properties. This is necessary if changes are made to the
DataStream; for example, if new columns are added to the query, a new data Derivations are covered further in Module 6,
source is inserted manually, or literals are added to an existing data source. "Derivations."
You cannot create derivations by using the Fact Build wizard because these
columns do not exist in your source data. You must add derivations manually.
Once you have queried the source table, the Fact Build wizard automatically
creates the elements of the transformation model based on the type of data in
each column. The types of these elements may be modified if necessary.
Build slide.
2 clicks to complete.
How DecisionStream Creates Data Marts
Dimensional
Dim Framework
Data Product Time Location
Sources
DataStream
Data
Source
DataStream
Data
Source
So far, by using the Fact Build wizard, we have created the data source (the SQL
query) and the module elements, mapped them, and declared the element types.
The next step is to link the dimension elements to the dimensional framework.
The dimension element properties that you can set by using the Fact Build wizard
are Use Reference and Aggregate.
You use the Use Reference property to select the reference item that you want to
use for each dimension element. You can choose from hierarchies, auto-level
hierarchies, and lookups. Each dimension element (such as Product Number) can
only be associated to one reference item (in this case, the Product hierarchy).
If you select the Aggregate box for a dimension element, DecisionStream creates
an aggregate fact table for each level of the dimension. If you do not select the
Allow Aggregation on Build Dimensions box on the first screen of the Fact Build
wizard, you will not have this option.
Note: Selecting the Aggregate box for any dimension element can potentially
create a large number of aggregate fact tables.
Build slide.
3 clicks to complete.
How DecisionStream Creates Data Marts
Dimensional
Dim Framework
Data Product Time Location
Sources 1.
3.
5b.
DataStream
Dimension
Data Delivery
Source
5a.
2. Fact
DataStream
Delivery
Data 5c.
Source
MetaData
Delivery
4.
Now we can use the Fact Build wizard to define the deliveries. There are three
types of deliveries: fact, dimension, and metadata.
In the first step of the Fact Build wizard, you can select a target connection to Technical Information
determine where DecisionStream will deliver the transformed data. Usually, the indexing strategy will be "No
indexes." Indexing the data warehouse
Now you can further configure the delivery of the fact data by determining what tables is usually left to the DBA.
type of delivery module will hold it. DecisionStream has a variety of choices. For
example, you can select a simple relational table (the default) or an Oracle SQL
Loader.
You can then set the properties of the selected delivery module type. For
example, if you choose to deliver the fact data to a relational table, you can
choose how you want the data to be refreshed and the interval at which the data
will be committed.
Finally, you can determine how the tables and columns of the delivery module
will be named. For a relational table, this determines what names the fact table
and its columns will have after it has been delivered to the target data mart.
Note that the data will not actually be transformed and loaded into any delivery
modules until the fact build is implemented. A build can be implemented by
selecting Actions/Execute or by clicking the Execute the current item button on
the toolbar.
Demo 4-2
Purpose:
As noted previously, we are creating a data mart so that the
instructor can analyze product inventory by using PowerPlay.
Using a star schema, we want to create a simple fact build
based on the columns of the ds_stock table. The result will be
one fact table and three dimension tables.
5. In the left pane, click the ds_stock check box to select it.
A SELECT statement appears in the right pane of the Data Source
wizard showing the columns that will be included in the build.
The result appears as shown below.
Technical Information
The SQL created by the Data Source
wizard is native to the database in
question. In this case, because our source
data resides in an Access database, each
column is delineated by back accents.
6. Click Finish to close the Data Source wizard. When working with source or target
7. Click Next to accept the default data source mapping. databases, you must use the variety of
SQL that is native to the database.
8. Click Next to accept the default transformation model.
Task 3. Relate the reference structures (hierarchies) to the
dimensional elements of the transformation model.
1. Next to period_no, click (no reference), and then click the Browse
button .
2. In the list, double-click Fiscal, and then click the Fiscal hierarchy.
3. Repeat steps 1 and 2 for the state_cd and product_cd elements.
These will use the Location and Product hierarchies, respectively, as the
reference structures.
The result appears as shown below.
7. Click the Architect and Impromptu check boxes to clear them, and
then click Next. Key Information
If you are using the cer2 release of
8. Click Next to accept the default values for the properties of the DecisionStream, you may encounter a
PowerPlay Transformer model. series of “DS-HANDLE-E100: Handle is
9. Click Finish to accept the summary of the fact build. null or invalid” error messages when you
click the Back button at any step of the
10. In the left pane, under the Builds folder, click Stock. wizard. You may encounter the message
again when you click the Next button after
If necessary, in the Visualization pane, right-click Transformation dismissing the error message. This is a
Model, and then click Show Build Elements. bug.
If necessary, in the Visualization pane, right-click Transformation To have the build elements and build
Model, and then click Show Build Details. details appear automatically in the
Visualization pane, from the Tools menu,
The result appears as shown below. click Options, and then select the
appropriate check boxes.
Results:
We have used the Fact Build wizard to create a basic fact build
by using a star schema template based on the columns of the
ds_stock table and the reference structures in the dimensional
framework.
Dimensions
associated with
the build
Fact and
Dimension tables
Transactional
data sources
Metadata delivery to
Impromptu, PowerPlay,
and Architect
After a fact build has been created, we can view a graphical representation of
everything it contains. By clicking the first tab of the Build Visualization pane, we
can view the build's components:
Icon Description
The connection from which DecisionStream
acquires the source data.
The data source that DecisionStream uses to extract
data from the source database. This data source
contains the actual SQL query.
The DataStream that indicates the mapping of
source columns in the data source(s) (SQL queries)
to the fact build elements.
Where the processing of the fact build data occurs.
Here the transactional data is transformed and
prepared for loading into the delivery module.
A fact data delivery. If the delivery has a level or an
output filter, the filter icon (a funnel) is added.
A dimension delivery.
A metadata delivery.
By clicking the DataStream tab on the Build Visualization pane, you can view the
data source mapping that the Fact Build wizard performs. As noted earlier, you
can open the build's DataStream Properties sheet to modify this mapping.
Keep in mind that derivations in the transformation model do not map to any
literal values or columns that SQL returns.
Mapped Reference
Structures
By clicking the Transformation Model tab of the Build Visualization pane, you
can view how each dimension element in the transformation model links to the
dimensional framework. In the slide example, the period_no dimension element
maps to the Fiscal hierarchy, the state_cd dimension element maps to the
Location hierarchy, and the product_cd dimension element maps to the Product
hierarchy.
The check marks in the dimensions indicate the granularity of the transactional
data coming in (input) and the fact data being delivered (output).
By clicking the Fact Delivery tab, you can view the mapping
of the elements to the target fact table.
This slide diagram shows how transformation model elements in the fact
build map to columns of the target fact table.
Delivery Modules
Dimension delivery modules deliver data that describes a single dimension (such Instructional Tips
Partitioning refers to the delivery of data to
as Product) to the target database. DecisionStream lets you send the data to a different targets according to specified
single table (star schema) or multiple tables (snowflake schema). There can also be criteria. Vertical partitioning involves
more than one dimension delivery per dimension, for example, two star schemas. delivering the elements of the build to
different delivery modules. For example,
Fact delivery modules deliver the fact data that a build produces. A build may you may want to deliver the Product
have multiple fact deliveries, and each fact delivery may subscribe to some or all dimension element and related measures
of the build elements, which makes it possible for you to perform vertical to a relational table and the other elements
to a text file.
partitioning. You can also perform horizontal partitioning by using output and
level filters. Horizontal partitioning involves adding
filters that determine which data rows
Metadata delivery modules deliver information about fact or dimension data, or DecisionStream will deliver to which areas
both, to specific applications such as Impromptu, PowerPlay Transformer, of the data mart. Level filters configure
Architect, and Microsoft SQL Server Analysis Services. This information forms delivery of only specific dimensions and
hierarchy levels. Output filters are
the backbone of BI. expressions that result in either TRUE or
FALSE when applied to each data row. For
When you run a build, fact delivery modules are delivered first, followed by example, you may want to include only
dimension delivery modules and then metadata modules. those products that have a cost of more
than $25.00 (Price > 25.00). Each delivery
module may have several level filters but
only one output filter.
Running a fact build actually delivers data to the target. You can access the Instructional Tips
Execute Build dialog box by first selecting the build and then clicking Almost everything that is done within the
Actions/Execute. From here you can modify the default options. DecisionStream Designer GUI can be done
from the command line in Windows or
To bypass the dialog box, click the Execute Build button. UNIX. This will be covered further in
Module 22, "The Command Line Interface."
You can also run fact builds entirely from the command line. You can include
additional options at the command line. The command that the DecisionStream
Technical Information
engine implements is shown in the Command Line box.
The command line code is typically used in
a batch file that is run at regular intervals to
You can choose from three execution modes. keep the data warehouse up to date. In this
case, it will be necessary to remove the -P
Mode Description command from the code, which ensures
Normal Runs a build stored in the DecisionStream catalog. DecisionStream that DecisionStream does not prompt the
user to press Enter when the build is
processes the whole build and delivers the required data. finished running, halting the entire process.
Object Creates the delivery modules but does not implement the fact
Creation build. This mode will create the physical delivery structures but will
not deliver the data into those structures. This mode is generally
used when developing a new build.
Check Used for performance testing and resource estimating. Check Only
Only does not create the physical delivery structures and does not deliver
data or metadata. With this style of implementation, you do not
even need a target database.
When you run a fact build, the DataBuild executable file opens a new process in a
command window. This window writes trace information that is saved
automatically as a log file. By consulting these files, you can fine-tune the build so
that it efficiently meets your specific requirements.
You can control the feedback information contained in the log file by consulting
the Fact Build Properties window. By default, the only logging property set is
Progress. You can specify the level of detail to include in the log file by clicking
the appropriate boxes in the Trace list to select them.
Type of Information Description
Progress Details the overall progress of the build
implementation.
Detail Includes more detailed progress messages. This option
also displays additional progress messages as each given
number rows are processed (by default, every 5000
rows).
Internal Includes internal DecisionStream activity messages,
such as resource usage (memory usage, paging
information, and so on). This is useful for performance
testing and resource estimating.
SQL Includes all SQL statements that DecisionStream uses
at each stage of executing the build. This information is
useful in resolving database errors.
ExecutedSQL Includes the executed SQL for SELECT statements.
User Includes all application messages written to the log file
by using the LogMsg() function.
Demo 4-3
Purpose:
We have to run the Stock fact build so that we can load data
into the target data mart. After the build is implemented, we
want to view the log file to evaluate the progress of the build.
Task 1. Run the Stock fact build and view its log file.
1. In the left pane, under the Builds folder, click the Stock fact build if it is
not already selected.
2. In the toolbar, click Execute.
Databuild.exe runs in DOS and 10861 rows of transactional data are
loaded into the F_Stock table of the data mart database.
3. Press Enter to close the DOS window.
4. From the Tools menu, click Browse Log Files. Instructional Tips
The log file may have a different number,
The log window opens, displaying the log files that have been created for but it will still be prefaced with
builds that have been run. "Build_Stock."
5. Double-click the Build_Stock_0001.log file.
The log file created for the Stock build opens in Notepad.
This file shows the progress details that were displayed in the DOS
window as databuild.exe ran. It indicates that data was loaded into the
F_Stock fact table and three dimension tables, and that a Transformer
exported model was created for this build.
6. Close the Build_Stock_0001.log file, and then close the log window.
Results:
We have run a fact build and viewed the resulting log file to
evaluate its progress.
Demo 4-4
Purpose:
We want to use a Cognos BI tool to open our data mart. We
will use PowerPlay to view product inventory data by creating
a PowerCube and a report.
Result:
We have generated a PowerPlay cube and report for the Stock
fact build to view product inventory data.
Document a Catalog
HTML template
to use (if applicable)
When you create a catalog, DecisionStream automatically creates the eight tables
it requires. However, for some installations where a Database administrator must
set up the tables manually, you can use the database schema function to specify
the tables that have to be created.
You can use the buttons at the bottom of the Database Schema window to add
the SQL statements for the function that you want to perform. You can view and
edit the statements in the SQL window.
Create Creates the required tables. You can copy these statements from the
Clipboard to other applications to inform your database
administrators of the tables to create.
Grant Grants all permissions to all users for all tables in the schema.
However, this may not apply to all databases.
Drop Drops all tables of the schema therefore removing the schema from
the database.
Demo 4-5
Document a Catalog
Purpose:
We have to create full HTML documentation for the
Day1Catalog. Using this document, we can view detailed
information about the contents of the catalog.
Results:
We have created an HTML document that contains a detailed
description of the contents of the Day1Catalog.
Summary
1. Getting Started
2. Create a Catalog
3. Create Hierarchies
5. Conformed Dimensions
6. Derivations
Objectives
It is possible to deliver a single, private data mart using a fact build. To deliver
this, you must include a fact delivery and one or more dimension deliveries. In
Demo 4-2, we used the Fact Build wizard to create one fact delivery and three
dimension deliveries. When we executed the fact build, the result was one fact
table and three dimension tables in the data mart.
However, what if we want to track a new fact (such as sales) that will use the
same dimension tables? Or what if we want to make a change to the
D_Stock_Location table? Any changes that we make will be overwritten when
the fact build is executed again.
To deal with these issues, we must create our dimension tables using a separate
process. We must design our dimension tables so that several fact tables can
reference them. When multiple fact tables can use a dimension, we say that the
dimension is conformed.
Conformed Dimensions
Conformed Dimensions
Because most companies have integrated business operations, they have a strong
requirement for an integrated enterprise data warehouse that mimics the business
structure. This large data warehouse must be flexible, fast processing, easy to
maintain, secure, and complete.
In the slide example, three fact tables retrieve reference information from six
dimension tables: Location, Customer, Product, Time, Distributor, and
Promotion. However, each dimension table is created only once, and data inside
the tables is standardized.
Conformed dimensions are the elements that make the data warehouse an
integrated whole. They also reduce overall time for data warehouse development,
because each dimension is analyzed, designed, and created only once.
Dimensions
Distributor
Promotion
Customer
Location
Product
Time
Facts
Sales Fact X X X X X
Distribution
Fact X X X X
Order Fact X X X
You have to decide from the beginning which dimensions will be referenced by a
single fact table. If the same dimension is referenced in several tables, it becomes
the conformed dimension.
Note: In the architecture phase that precedes the implementation of any of the
data marts, the goals are to produce a master suite of conformed
dimensions and to standardize the definitions of facts. The resulting set
of standards is called the Data Warehouse Bus Architecture. For more
information on this, see Ralph Kimball's et al's The Data Warehouse
Lifecycle Toolkit.
It is also important to identify the dimension and fact attributes. Data often
comes from different operational sources. These separate sources may have
different names for identical entities. For example, one application may refer to a
client as a customer, whereas another application refers to the client as a prospect.
Establishing standards for dimension and fact table attributes ensures that
attributes with different interpretations get different names and, equally
important, that attributes with the same interpretation get the same name.
D_Customer
Customer Id
Last Name
First Name
Address
To provide maximum flexibility for business users, populate the fact tables with Instructional Tips
Point out to the students that, in this
atomic-level or transaction-level data. Therefore, you should also define the example, even the Product_Id may not be
conformed dimension tables at the lowest level of granularity. The grain of the the lowest level of granularity. A product
Product dimension is a single product, the grain of the Customer dimension is an tracked by this dimension table may be
individual customer, and the grain of the Time dimension is a single day. made up of individual components that
must be tracked in a separate Components
This absolute lowest level of granularity ensures that all business questions are dimension table.
answered at any level of summarization. It provides flexibility for users who can
add new reference data, extract information on each individual product,
customer, or location.
The low level of granularity does not eliminate the rolled-up fact and dimension
tables. On the contrary, it is good practice for any data warehouse to have both
transaction and snapshot fact tables. For example, when a company wants to
have a snapshot of fact data on monthly orders, it is better to create a summary
table on orders, which references the month level of the Time dimension.
Time Dimension
Customer Order Fact
Customer Id Day Id
Last Name Product Id Time(Day)
First Name Customer Id Day Id
Address Cost Day
NumberOrdered Month Id
Period
Sales Fact
Product Month Id Time(Month)
Product Id Product Id
Description
View
Customer Id Month Id
Product Type AmountSold
Product Line Month
Revenue Period
The best practice is to use a star schema for data marts joined by conformed
dimensions, where each dimension is represented by a single table that has all
possible levels of granularity. Because this data structure is so simple, even
non-technical analysts can easily understand and maintain the data model.
The combination of atomic-level fact and dimensional data supports queries that
can report and summarize across any combination of dimensional attributes.
Most contemporary reporting and analysis tools are designed for exactly this type
of database.
However, as shown in the slide example, some situations require multiple levels
within a single dimension. For example, if it is known that the Time dimension
will be queried by Month and Day on a regular basis, a designer has two options:
Star and snowflake schema designs will be discussed later in this module.
Product
Product D_Product
D_Product
Product(H) ProductNumber
ProductNumber
ProductKey
ProductKey
---------------------
---------------------
Lookup Product
Product(L)
F_Sales
F_Sales
Sales
ProductKey
A fact table often references a single level in each dimension. When checking data Instructional Tips
integrity, only that single level needs to be referenced. For performance reasons, it Point out to the students that there are
usually two hierarchies for every
is often best to use a lookup instead of a hierarchy in these cases. Lookups usually dimension. One hierarchy points to the
require fewer columns in the data source and less memory to process. operational system, and another hierarchy
or lookup points to the dimension table in
Unless you want the reference attributes to be available for calculations, the data the warehouse, as shown in the slide.
integrity lookup usually needs no more than two attributes: business key and
surrogate key. Surrogate keys will be covered further in Module 9, "History The hierarchy that points to the dimension
table should use the same template that
Preservation."
was created in the dimension build. This
template helps to reduce the number of
In the slide example, data from the Product hierarchy is delivered to the templates. This is indicated in the slide by
D_Product conformed dimension table through a dimension build. This showing only one template.
dimension table references a template, which lists the columns in the table and
how they behave.
The D_Product table is then referenced by a lookup. This lookup only contains Key Information
This slide is one of the most important in
the data necessary to perform data integrity checking. The Sales fact build, in turn, the entire course. It gives a high-level
references this lookup. When the Sales fact build processes incoming outline of why conformed dimensions and
transactional data, the lookup is used to ensure that each transaction refers to a templates are so important and how they
product that already exists in the D_Product table. are used for checking the validity of
incoming fact data.
Templates will be covered further in Module 7, "Templates, Lookups, and
Attributes." It is recommended that you refer to this
slide as often as necessary to continually
emphasize the "best practice" way of using
conformed dimensions.
Input Hierarchies
Reference Conformed Dimensions
Data
Reference
Sources
Data Sources
Product Customer
Updates
Dimension Builds
D_Customer
D_Customer
Product Customer
Customer
Using DecisionStream, users can create conformed dimensions. The process Instructional Tips
Do not get sidetracked on snowflaking.
involves creating a dimensional framework, including the dimensions and the Students only need to know that it is
hierarchies to be used in the data warehouse. DecisionStream delivers the possible to deliver multiple tables in a
dimension structure through dimension builds. One build is created for each single dimension build if they require
dimension. The build creates a conformed dimension table in the data warehouse. different levels of granularity in the
dimension build.
Specify that you want to include surrogate keys in the resulting dimension tables.
Do not specify surrogate keys for the input data, as these will be ignored when
the data is processed. You are not maintaining surrogate keys in the operational
system, only in the data warehouse.
Note: If you have determined that you require a snowflaked table (as noted on
the previous page), you can deliver more than one physical table in a single
dimension build.
Build slide.
3 clicks to complete.
Conformed Dimensions: Use
Sale
Transactions ProductNumber
CustomerCode
Returns Data Warehouse
Updates ………………….ProductNumber
CustomerCode
Fact Builds ……………...
To use the dimensions created in the previous slide, you first create hierarchies or
lookups that reference the dimension tables in the data warehouse.
When you created the dimension tables, you specified all levels of your hierarchies.
This permits you to deliver all relevant attributes and levels to the dimension
tables. You may need all these attributes for reporting.
When you reference dimension data in a fact build through dimension elements,
you rarely need all these attributes. You need to create a smaller hierarchy or a
lookup containing all necessary attributes that will be referenced by this particular
fact build.
When you use the Dimension Build wizard, you have the
option of delivering dimension data using specific schema.
The most common schema options are star and snowflake.
A star schema will deliver the data to one dimension
table.
A snowflake schema will deliver the data to several
dimension tables.
DecisionStream also offers other schema that are outside the
scope of this course.
When you use the Dimension Build wizard, you choose one of five schemas to
organize the result. The star and parent-child schemas organize each dimension in
a single dimension table, whereas the snowflake, optimal snowflake and optimal
star schemas create more than one dimension table for each dimension.
Star Schema
Product
• Product Cd
Date • Product Name
• Order Date • Product Type Cd
• Week • Product Type Desc
• Month • Product Line Cd
• Year • Product Line Name
A star schema represents a dimension in a single table with the levels of the
associated hierarchy represented as columns within that table. The primary key is
the member Id of the lowest level of the hierarchy.
A variant of the star schema is the optimal star schema, which is similar to a
star schema; however, the optimal star schema removes the descriptive
attributes from all non-base levels of the hierarchy and puts them in their
own tables, along with the Id attributes related to these attributes. This
structure saves storage space, and optimizes reporting performance.
The slide diagram depicts a star schema with four dimensions. The Product,
Sales_Staff, Customer and Date dimension tables were created in separate
dimension build processes. The primary key of each dimension table is linked to
one of the four dimension element columns (Customer_Cd, Sales_Rep_Cd,
Product_Cd and Order_Date) in the Order_Fact table. "Collapsing" the data
integrity legs of the appropriate tables in the original OLTP data source (Product
Line, Product Type and Product) produced the Product hierarchy, which was
then used to create the Product dimension table.
Snowflake Schema
Product Line
Date
• Product Line Cd
• Order Date
A time-based dimension is • Product Line Name
• Week
typically not snowflaked • Month
• Year
Product Type
• Product Type Cd
Customer Type • Product Type Desc
• Customer Type Cd Order Fact
• Product_Line Cd
• Customer Type Desc • Customer Cd
• Sales Rep Cd
• Product Cd
• Order Date Product
Customer
• Order_Qty • Product Cd
• Customer Cd
• Order_Line_Value • Product Name
• Customer Name
• Product Type Cd
• Customer Type Cd
The slide diagram depicts a snowflake schema with four dimensions. Three of the
dimensions (for example, Product) have a separate dimension table for each of
their levels. The lowest level dimension table links to the fact table (Order_Fact)
through its primary key (in this case, Product_Cd). A time-based dimension
(represented in the slide example by the Date dimension table) is typically not
snowflaked.
In a fact build, you do not necessarily need all dimension table attributes. Fact
builds use dimension data in four ways:
• Surrogate key substitution. Replace the natural or business key with the
surrogate key before inserting it into the fact table.
If you are performing only data integrity checking or surrogate key substitution,
you only need the atomic-level business key and surrogate from the dimension
table. A lookup is often sufficient for this purpose.
Fact Table
Records With
Product IDs Replace
ProductNumber Load fact table
ProductNumber with surrogate records into
ProductKey DBMS
When creating the data integrity lookup, you must use surrogate keys instead of
business (production) keys. There is an option in the lookup properties that you
can use to replace the business key with the surrogate key. However, in the Time
dimension, you can use two options. If you want to see the real dates in the fact
table, it is better to keep the business key for reference, not the meaningless
surrogate key.
If the surrogate key becomes a primary key in the dimension table, the fact table
must reference this key to preserve the data integrity. By selecting the Use
surrogates when available box, you opt for the dimension through the surrogate
key.
Build slide.
Design Data Integrity Lookups Based on 3 clicks to complete.
Conformed Dimensions
The process of creating a lookup is similar to creating a hierarchy within a Instructional Tips
In general, anytime you access data from a
dimension. dimension table in the data warehouse, you
should use a template rather than a
The slide example shows the data integrity lookup. The Product conformed DataStream to access the data.
dimension table already exists in the data mart. The lookup uses an existing
template to access that table.
1. Select the dimension in which you want to create a lookup. The
dimension will already exist because it was used to create the data mart
dimension table.
2. Create and name the lookup.
3. On the Attribute tab, select the template used to deliver the data mart
dimension table. Also, in the Available Attributes box, select only
those attributes necessary for data integrity checking. These keys will
usually be the business key and the surrogate key of the dimension
level that you are checking.
4. On the Data Access tab, click the Use template for the data access option
button to select it, and specify the database connection and the table to be
used as a source.
The attributes of the template used to create the data mart dimension table also
named the columns in that table. Because the attributes and column names
match, DecisionStream automatically maps the table columns to the template
attributes.
Note: If the attribute or column names have been changed, you must use data
source access and map the column manually.
In the fact build, declare a dimension element and select the lookup on the
Reference tab.
Because Product is a dimension, ensure that the Use surrogates when available
box is selected. The new fact rows being inserted into the data mart must use the
surrogate keys of the existing dimension rows.
In order for this to work, there must be an attribute in the template that has a
behavior of surrogate key.
D_ProductH
D_StaffH
D_TimeH
D_VendorH
Technical Information
After you establish the conformed dimensions and create the dimensional It is very important to stress to the students
framework, deliver the dimension data into the dimension tables. Each that Cognos recommends that users create
dimension build references and delivers one dimension. and maintain dimensions through
dimension builds and not as a part of the
fact build.
When delivering into a data warehouse, where the dimensions are conformed, the
dimension builds become the important component in the dimension delivery. In
the slide, four dimension builds are created: Product, Staff, Time, and Vendor.
Each dimension is represented by a table: D_ProductH, D_StaffH, D_TimeH,
and D_VendorH accordingly.
Product
ProductCode P_112 D_Product
ProdName Tent
ProdDesc Canvas 2 man pup ProductSID 120
ProdKey 1212
For example, the Product table exists in two data sources, Oracle and Sybase.
However, the structure of the two tables is different. In the Oracle database, the
table Product has ProductCode, ProdName, and ProdDesc as attributes. In the
Sybase database, the Product table has ProdKey, Name, and Desc as the
attributes. When you merge these two tables, you must decide on common
column names.
A more significant problem is that the data values between the source tables may Technical Information
Cleansing data is not an easy task. This
differ. In the slide example, a single product is identified by different product topic will not be covered to the extent
codes and even different descriptions in the two separate systems. To merge required in this course.
these together as a single member in the product hierarchy, the data must be
"cleansed." Data cleansing is a complicated and time-consuming problem in data
warehousing.
Summary
1. Getting Started
2. Create a Catalog
3. Create Hierarchies
5. Conformed Dimensions
6. Derivations
Objectives
What is a Derivation?
You use data source or DataStream derivations when you perform calculations Instructional Tips
You can use derivations in an output filter.
on source data in a fact build or dimension build. A data source derivation lets Output filters are discussed in Module 21,
you perform a calculation on data that is accessed from a single data source. A "Delivery in Depth."
DataStream derivation lets you to perform a calculation on data that is accessed
from multiple data sources.
Key Information
A derivation is a calculated value that can contain numeric and character
Using DecisionStream, a user can create
constants, operators, functions, and the names of other DecisionStream objects. the KPIs on which the enterprise can
The simplest expression is either a literal value or the name of a DecisionStream gauge the success of its critical areas.
object. DecisionStream has a rich library of built-in functions to assist you with
calculating derivations.
Calculations stored in the data mart assist in standardization, where all users apply
the same formulas. This produces consistency in the organization's use of the
data mart because all users apply one standardized calculation. For example, you
may need to concatenate a customer's first name and last name to produce a full
name for use in reports and queries.
Mathematical: Logical:
( ) =, !=
>=, <=
&
>, <, <>
+
IS
- AND, OR
* [NOT] LIKE
/ [NOT] BETWEEN
[NOT] IN
NOT ( ……….. )
Binary logical operators compare two values. Unary logical operators operate on a
single value. The result of a logical operation is either TRUE or FALSE.
Expressions can contain more than one operator. In such cases, DecisionStream
applies the operators in order of precedence.
You can use the functions in the slide in calculations to provide values for
derivations, for filters, and in the SQLTXT Designer.
Build slide.
How to Define a Data Source Derivation 3 clicks to complete.
Add
Calculate
Test
Because a derivation does not come from the data source directly, you do not
have to modify the data source SQL statement. However, the fact build or
dimension build must contain the DecisionStream objects that are included in the
calculation.
• Provide a name for it and a calculated expression that can be built using
operators, functions, or control statements.
• Test the expression. If you do not get the correct or any result, the
expression is invalid.
Build slide.
3 clicks to complete.
How to Define a DataStream Derivation
Add
Calculate
Test
Where there are multiple data sources set up for a build, a DataStream derivation
can include DecisionStream objects from any of the data sources.
• Provide a name for it and a calculated expression that can be built using
operators, functions, or control statements.
• Test the expression. If you do not get the correct or any result, the
expression is invalid.
Build slide.
2 clicks to complete.
How to Define a Transformation
Model Derivation
Insert
Calculate
Test
• Test the expression. If you do not get the correct or any result, the
expression is invalid.
AverageRevenue
Quantity * UnitCost Revenue AVGRevenue
(Quantity*UnitCost)
• Perform the calculation first, and then aggregate (summarize) the Key Information
calculated results. By selecting the first option, you can eliminate
rounding errors in the summary data. The
• Aggregate the data first, and then calculate the derivations. second option makes it possible to process
data faster.
Aggregation is a process of taking data across one hierarchical level and
summarizing it to the higher level.
To aggregate the derivation, you must select the Calculate at Source check box to
activate the Aggregation tab and then select a function from the list of functions
in there.
Data
Source
Where you create a derivation determines when the derivation is calculated in the
fact build process.
A derivation created in the data source is calculated as each row is retrieved from
the data source.
Demo 6-1
Purpose:
We want to see the best-selling products. Therefore, we must
create a derivation that calculates the total sales for each
product.
7. Click Calculate.
The result appears as follows:
Results:
We have added a derivation element called SalesTotal to the
transformation model of the DemoSales build. The derivation
calculates the total sale for each product.
Summary
1. Getting Started
2. Create a Catalog
3. Create Hierarchies
5. Conformed Dimensions
6. Derivations
Objectives
Hierarchy Lookup
AllProducts
Product Line
Product Type
Product
• hierarchies
• lookups
Performance is the same for both types of reference structure. They are treated
the same internally.
DataStreams gather together data source(s), each of which can contain SQL
statements, literals, and mapping information. DecisionStream uses a data source
to obtain the data from the database tables. You specify a data source by entering
an SQL SELECT statement to extract data from the database, and then adding
any literal(s) that may be required.
Once you have defined the data sources, you must create DataStream items to
map the data source columns returned by the SELECT statement(s) to the
attributes in each level of the hierarchy.
Members are unique instances of data at one level. Each member is defined by
level attributes.
In the slide example, the Vendor level has four attributes: VendorCode,
CompanyName, VendorTypeCode, and VendorCodeMR. The CompanyName
attribute serves as the ID for that level. Each vendor at that level has a different
value for the ID attribute, and this value uniquely identifies that vendor from all
the others.
Each vendor also has a caption of CompanyName, which may or may not be
unique. The VendorTypeCode attribute is the parent for the Vendor level. This
attribute links each vendor to the type it belongs to at the next level, which is
VendorType.
Questions
When you define a hierarchy using the Hierarchy wizard, the wizard only Ask the students why you would want to
generates those attributes that are fundamental building blocks in the hierarchy add an attribute. Explain that a table may
definition. These attributes include ID, caption, and parent. require modification if additional source
data can be added that would increase
After you have created the hierarchy and its levels with the Hierarchy wizard, you value to the dimension table.
can enhance data analysis by manually adding other attributes. You define the
properties of these additional attributes in the template for that level. All the
attributes on one level are part of that level's dataset.
You can add attributes to a single level of a hierarchy. For example, in the
Product hierarchy, only members at the Product level will have a Color attribute.
You can also add attributes to multiple levels in the same hierarchy. For example,
in the Location hierarchy, the members at the Country, Region, and City levels
may have a Population attribute. The attributes that you choose to include in your
hierarchies will reflect your specific data analysis requirements.
In the slide example, we want to add a new attribute to the Products level of the
ProductH hierarchy. Since this attribute does not exist in the underlying
ProductT template, we must first add it to this template's list of attributes.
Templates will be discussed later in this module.
Build slide.
Add Attributes to a Hierarchy (cont’d) 2 clicks to complete.
Technical Information
In the slide example, an attribute called MyNewAttribute has been added to the Sometimes SQL columns can play more
than one role in a hierarchy. A column
ProductT template. This adds MyNewAttribute to the list of available attributes
returned by a SELECT statement can
for the Products level. simultaneously be the ID and the caption
for a level.
MyNewAttribute can then be added to the list of attributes in this level's dataset.
This level already has attributes specified as ID, caption, and parent. As a result,
MyNewAttribute does not have to be designated as having one of these three
roles.
After you add attributes to a hierarchy, you map them to items in the level or
hierarchy DataStream. The DataStream items are, in turn, mapped to columns
from tables in the source database. Once this mapping process is complete, data
can be delivered to the hierarchy. Mapping will be covered later in this module.
Hierarchies consist of levels, which in turn are made up of attributes, such as the
ID number of each product. Before developing the hierarchies in the dimensional
framework, the organization should decide on how these attributes will be named
and what attribute each will represent. If the hierarchy attributes have clearly
defined, standardized names, then it is easier to develop conformed dimension
tables from these hierarchies.
The slide example refers to a proposed hierarchy structure that will contain
product data. A naming convention has been decided upon for each of the
important attributes of the lowest level, product. For example, all the surrogate
key values will be tracked by the product_skey attribute.
When you create a hierarchy, you must define a template. If you use the
Hierarchy wizard, the templates are defined dynamically as you create the
various levels.
If you create a hierarchy manually, you must specify a template in each level of
the hierarchy before you can add the level to the hierarchy. You can choose an
existing template, or create one.
If you choose an existing template, it must reside in the Templates folder of the
dimension to which the hierarchy level is being added.
Create a Template
You can:
create the template before you add the level to the hierarchy
insert the level and create the template at the same time
When creating a hierarchy with multiple levels, you have two options regarding
templates:
• Create the template first, add the level to the hierarchy, and then use the
template that you just created to define the attributes for that level. When
using this method, you do not have to add any attributes before you save
the template.
• Insert the level into the hierarchy and create the template that will
hold the required attributes for that level at the same time. The
template must contain at least one attribute. You must define at least
one attribute from the list of attributes in the template as the ID
before you can create the level.
The best practice, whenever possible, is to define all attributes within one
template, and then use that template when adding each level.
Product D_Product
D_Product
Product(H) ProductNumber
ProductNumber
ProductKey
ProductKey
---------------------
---------------------
Product(L)
F_Sales
F_Sales
Sales
ProductKey
In the slide example, the ProductH hierarchy references a template, which lists
the attributes used by the hierarchy, such as ProductNumber and ProductName.
This ProductH hierarchy will be delivered to the data warehouse through the
D_Product dimension table.
The dimension table, in turn, references a second template that lists the columns
of the table as well as their behavior (such as surrogate key). This second template
automatically generates surrogate key values (in this case, ProductKey).
The second template is also important for slowly changing dimensions, which are
discussed in Module 9, "History Preservation."
Technical Information
You can use an unlimited number of tables
A DataStream gathers together a number of data sources. Each data source in the data sources of a DataStream. The
contains an SQL SELECT statement, and may contain literal values. tables must be related in some way to
DecisionStream uses data sources to extract data from database tables. make the data in the hierarchy relevant.
A hierarchy may get its data from a single table that contains data for all the When you use the Hierarchy wizard,
levels. A hierarchy may also get its data from multiple tables. In the latter case, creating a hierarchy from multiple tables
necessarily creates one SQL statement per
you may have to define multiple data sources to extract data for the entire level. In more complex situations, one SQL
hierarchy, or separate data sources at each hierarchical level. Whether you use one statement provides attributes for only some
data source or multiple data sources will be determined by the complexity of the of the levels. In this case, several SQL
hierarchy and its levels. statements could be required to populate a
single level.
The ProductH hierarchy on the left side of the slide example gets its data on a
Each SQL statement runs against one
level-by-level basis: each level of the hierarchy has its own data source, each of single connection. However, not all SQL
which contains a separate SQL SELECT statement. By contrast, the statements within a hierarchy have to
VendorCustomerH hierarchy on the right side has one data source at the top. originate from the same connection.
This data source contains a single SQL SELECT statement that retrieves data for Therefore, each level has the potential to
all the levels in the hierarchy. run against many data sources to map to
that level's attributes.
You can access data for each level of a hierarchy using either:
• a template
A template creates its own SQL when it accesses data, which is very powerful
when maintaining slowly changing dimensions (SCDs). However, the designer has
no direct control over the SQL. If you want to write custom SQL statements to
acquire data for the hierarchy level, then you cannot use template data access.
If you want to acquire data from operational source systems, you will likely
require custom SQL. Therefore, template data access is rarely applicable against
operational data, unless you can use a simple query. A basic SELECT statement
only contains column names from a single table. If you use template access, you
cannot join tables, calculate fields, or specify a WHERE, ORDER BY, or
GROUP BY clause.
On the Data Access tab of the Level Properties dialog box, indicate how you Key Information
The SourceConnect data source indicated
want to obtain the source data. If you select template data access, you must on the left side of the slide example is listed
further indicate the data source and table that contain the data you require for in the catalog's Connections folder. The
that level. Connections folder is, in turn, stored in the
Library folder.
In the top slide example, each level of the ProductH hierarchy uses a DataStream
for data access. Each DataStream contains one data source, and each data source
contains a single SQL SELECT statement that retrieves data for one level.
In the bottom slide example, all the levels of the Product hierarchy use a template
for data access. As indicated by the Level Properties dialog box, the ProductLine
level gets its data from the ProductLine table contained in the GOSales data
source.
Demo 7-1
Purpose:
Mangers want to create reports about the sales staff for the Great
Outdoors. To do so, we must create a dimension that will deliver
the necessary levels of information.
6. Click the Prepare button to select it, and then click Refresh.
The columns referenced in the SELECT statement are prepared for use
in the hierarchy.
7. Click OK to accept the SQL code.
The data source is added to the DataStream.
We now must map the columns to the DataStream items.
8. In the SalesCountry level, right-click DataStream, and then click
Properties.
The DataStream Properties window opens.
9. Click the Auto Map button.
The columns in the data source are mapped to items in the DataStream.
The mapping appears as shown below.
12. In the Level Attributes pane, click and drag SalesBranchCode to the
Maps To column beside SalesBranchCode in the DataStream Item
column.
13. Repeat step 12 for the remaining level attributes.
The result appears as shown below.
11. Click OK to close the DataStream Mapping window, save the catalog,
and then keep DecisionStream open for the next demo.
Results:
By creating a hierarchy that has various levels, managers can
create detailed reports about the sales staff for the company.
Create Literals
Golf Other
Products
Golf Other
Focus Group
Literal values are static pieces of data that the DataStream can return. The literal
Technical Information
values remain constant for every row that is returned. Use a literal value to flag a The slide example shows two different data
piece of data that is returned from the data source. sources being used for one level called
Products.
You can also use a literal value when accessing two data sources in the
DataStream to provide data for a hierarchy level. An example of a literal value The level obtains other data, as well as a
might be the letter C for Current and H for Historical. In this case, use the literal fourth attribute called Focus Group, that
to represent whether a value comes from current data or historical data. indicates which source the data is coming
from. Source 1 is a table called Golf, and
In the slide example, we want to find information regarding a range of sports Source 2 is a table called Other. The literal
equipment. The DataStream references two different data sources: Golf returned from each source will indicate
Equipment and Other Equipment. Each data source returns values that fall within which table was accessed for the data.
a certain range, which is determined by the SQL code. We create a literal called As another example, two different data
Golf to flag any equipment that the first data source returns. We flag any sources are used to produce dimension
equipment that the second data source returns with another literal called Other. data. One source uses French data, while
the other uses German data. You define a
The literals indicate which data source the values are being returned from. The literal that indicates which source the data
two literals are included under a new attribute, such as Focus Group, for that came from: the French data source or the
level. German data source.
DecisionStream adds these literal values to each row. You can achieve the same
result by inserting a constant in the SQL SELECT statement. Using a constant,
however, is less efficient because the database adds the constant to the row
before sending it to DecisionStream. For a million rows of data, adding a
constant such as "Golf" to the SQL would equate to four million additional bytes
of data that would have to be transmitted across the network.
Build slide
2 clicks to complete.
Map Literals to DataStreams and
Attributes
Map the literals for each data source to the new attribute for
the level in the hierarchy.
To return a literal value along with the other source data acquired through SQL,
you must perform mapping in two places:
• in the DataStream Properties dialog box, from the literal to one or more
DataStream items
• in the Level Properties dialog box, from the resulting DataStream item to
one or more level attributes
In the slide example, the Products level gets its data from two sources: one
returning golf product data (Golf), and the other returning data about other
products (OtherProducts). Each data source includes a literal that indicates
whether each row is derived from the Golf or OtherProducts data source. The
Golf and Other literals are mapped from the data source to the appropriate items
in the DataStream (Golf or Other).
"ALL"
Static
member
Product Class
Classes table
Product
Family
Dynamic Families table
members
Products Product
table
Instructional Tips
DecisionStream can use both dynamic members and static members to populate Static members are members that are not
each hierarchy level. already in the source data.
Europe ?Unknown
Europe
Continent
England France
France ?Unknown
Country
Instructional Tips
By default, DecisionStream provides a foster parent for any member that has a
In the slide, there are three levels, Region,
missing or unknown parent. This mechanism is known as fostering. Country, and Rep.
In a typical hierarchy, each level contains a set of members. At every level except The example shows Tom Green being
the highest, each member is related to a member at the next-highest level: the fostered.
parent of the member. Each member is also related to members at the
next-lowest level: the children of the member. When Tom Green is added to the
hierarchy, it has no parent level.
A foster parent is an artificially introduced member that acts as a parent for It is therefore fostered under the
members that either have no defined parent or whose defined parent cannot be ?UnknownCountry level, which in turn is
found at the next highest hierarchy level. fostered under the ?UnknownContinent
level, and finally under the ALL level.
The default name for a foster parent is the name of the level,
prefixed with Unknown.
You can rename the foster parent of a level by using a static
member.
You can rename a foster member, substituting the default name assigned by
DecisionStream with a name of your choice. You rename foster members by
using static members.
For each level of a hierarchy, you can set one static member as the foster
member. You can either create a static member that serves specifically as the
foster member, or use an already existing static member.
If you want to use an existing static member as the foster member, select the
Foster box adjacent to the required static member. Otherwise, create a static
member and then select the adjacent box in the Foster column.
In the slide example, the product types that did not roll up into an existing
product line were assigned a default foster parent with caption of Unknown
ProductLine. To replace this, a static member was created that has an Id of
00000000000000 and a caption of Default ProductLine. This new static member
was assigned the foster parent role.
Demo 7-2
Purpose:
We have been asked to add a additional level to an existing
hierarchy that will provide further details for reporting and will
group records that have no parent into a separate group for
later analysis.
Results:
By adding a top level to the hierarchy, any record without a
parent will be grouped into the level and can then be analyzed
further.
Demo 7-3
Purpose:
Certain records in the VendorCustomerH hierarchy have no
associated region. We have been asked to update the
VendorCustomerH hierarchy to accommodate these records
by using City as the region.
9. Repeat step 8 to map all of the level attributes to the DataStream items.
The result appears as shown below.
Instructional Tips
Each DataStream item must be mapped to
the correct level attribute, as shown in step
9. Otherwise, you may receive an error
when you attempt to explore the hierarchy.
Results:
By adding the additional SQL code, the records that did not
have a region will now show the City value as the region.
Demo 7-4
Purpose:
Vendors are customers of the Great Outdoors and sell
products distributed by the Great Outdoors. We must create a
hierarchy to represent these vendors. We will create the first
part of the VendorH hierarchy using the Hierarchy wizard, and
then manually insert the remaining levels to complete the
hierarchy.
8. Click OK.
The level is added.
9. Repeat steps 2 to 8 to create a Vendor level.
Name the level Vendor, and use GOVVendor as the source table. Add
VendorCode, VendorCodeMR, CompanyName, and VendorTypeCode
to the Chosen attributes list. Make VendorCode the Id, CompanyName
the Caption, and VendorTypeCode the Parent.
10. Click OK, click Next, and then click Finish.
We included a WHERE clause so that the Country level will link back to
the Vendor level.
6. Click OK to close SQL Helper, click the Derivations tab, and then click
Add.
The Derivation Properties window opens.
7. In the Name box, type VendorCountryId, and then click the
Calculation tab.
8. In the right pane, type Concat( ToChar(VendorCode), ' ',
ToChar(VendorCountryCode) )
9. Click OK to close the Derivation Properties window.
10. Click the SQL tab, click the Prepare button to select it, and then click
Refresh to prepare the columns for use in the level.
11. Click OK to close the Data Source Properties window.
5. Click OK to close SQL Helper, click the Prepare button to select it, and
then click Refresh.
The columns are prepared for use in the level.
6. Click the Derivations tab, and then click Add.
The Derivation Properties window opens.
7. In the Name box, type VendorCountryId, and then click the
Calculation tab.
5. Click OK, right-click the Site level, and then click Mapping.
The DataStream Mapping window opens.
6. Click and drag the attributes from the Level Attributes pane to map them
to DataStream items on the left.
The results appear as shown below.
Results:
We created a hierarchy to represent the vendors to which the
Great Outdoors sells its products. We created the first part of
the VendorH hierarchy using the Hierarchy wizard, and then
manually inserted the remaining levels to complete the
hierarchy.
There are some cases when you do not need a multilevel reference structure to
organize the dimension data in your data warehouse.
For example, create a lookup if you want to check data integrity against a single
level, usually the lowest level, of a conformed dimension.
Tables are often created just to aid in data transformations. These tables are not
typically used in dimensional analysis. For example, you can create a table for
currency conversion. This table is used only to translate world currencies into a
standard currency and will never be the subject of dimensional analysis. It would
be more appropriate to base this type of table on a lookup rather than a
hierarchy.
Lookups are widely used for cleaning the incoming data from various
unstructured data sources. These lookups are called optional lookups. This type
of lookup is often used to determine whether records from various databases
match. Optional lookups are discussed later in this course.
Design a Lookup
You must also specify how the members of the lookup are loaded into memory:
in other words, a data access method for the lookup.
You have two choices for data access: you can use template access, or you can
write the SQL yourself. If you write your own SQL SELECT query, you must
map the resulting columns to the items in the lookup's DataStream, and then
map the items in the DataStream to the attributes of the lookup (such as Id).
Build Slide.
3 clicks to complete.
Design a Lookup to Translate Source Data
You may want to use a lookup to translate certain data values to other data
values. The slide example shows how to create a lookup that converts currency
values that exist at various rates in the database.
b. Add attributes to the template. You can specify the attribute behavior
if you want to maintain the resulting table in the future. However,
lookups do not require attribute behavior. You can create the
template attributes manually, or you can add them by importing
columns from a table. Keep only the attributes that you require for
the lookup.
Build Slide.
Design a Lookup to Translate Source 3 clicks to complete.
Data (cont’d)
4 Insert a Data Source
Select Use
DataStream for data
3
access
4. Insert the data source and create the SQL query and any necessary
derivations.
a. Map the columns from the data source to items in the DataStream.
A translation lookup does not usually use surrogate keys. In the slide example, the
currency table is never joined directly to the fact table. It is used to convert values
from one currency into another currency.
The lookup often contains many attributes from the translation or reference
table. As soon as a lookup is referenced by a dimension element, its attributes can
be accessed and used for calculations in derivations and filters.
If a dimension element that references a lookup is only used for calculations and
translations, it does not need to be delivered. Mark the element as Never Output.
Summary
Workshop 7-1
When the update is complete, create two dimension builds by using the
Dimension Build Wizard. The additional builds will create the necessary
dimension tables that we can use for our data mart.
• Create a dimension build for the StaffD dimension that includes slowly
changing dimension attributes and surrogates.
For more detailed information outlined as tasks, see the Task Table on the next
page.
For the final result, see the Workshop Results section that follows the Task
Table.
8. Fact Builds
9. History Preservation
13. JobStreams
Objectives
Input Hierarchies
Reference
Data
Reference
Sources
Data Sources Product
Product
Product D_Product
D_Product
Dimension Build
Hierarchies/Lookups
ProductLevel
Fact Builds
Transactional Sales
Data Sources ProductNumber
Quantity
UnitCost
Updates AverageRevenue
As soon as the reference data is processed and delivered to the data warehouse
using dimension builds, it is time to design one or more fact builds. Fact builds
deliver the fact data to the data warehouse.
As previously stated (see Module 4, "Create Basic Builds"), a single fact build can
deliver fact data, dimension data, and metadata. However, in a production
environment, the main purpose of the fact build is to create one or more fact
tables. Dimension tables are typically created and maintained through dimension
builds.
Sales
(Gross)
Profit Gross Profit
SalesFact
Profit (Gross Profit)
(Net Profit)
Net Profit
Finance
(Net)
Profit
The most common measures that require standardization are Profit, Cost,
Price, and Revenue. If two operational systems both use an identical term but
their definitions differ, the warehouse should use different names to properly
identify each measure. For example, Sales and Finance may both use the term
Profit, but Sales defines it as Gross Profit and Finance defines it as Net Profit.
Using the term Profit in the warehouse would be inaccurate and misleading.
Ensure that both measures have unique names to represent their appropriate
values in the fact table.
The entire enterprise should agree on the definition of all dimensions and all
measures.
Build slide.
4 clicks to complete.
Create a Fact Build Manually
As outlined in Module 4, "Create Basic Builds," you can create a fact build by Key Information
There are three ways to construct a fact
using the Fact Build wizard. However, you will often want to have more build:
flexibility and power to control the construction and delivery of the fact • Use the Fact Build wizard (the
build. In this case, you can construct the fact build manually. simplest option). This wizard
creates a basic build.
To construct a fact build manually, you must complete the following process: • Create the build manually, which
gives the designer full control
1. Add a new build and provide a name for it. over the build process.
• Combine the above two methods:
2. Add the data source(s) to the DataStream using one or more SQL use the wizard to prepare the
basic structure, and then refine it
SELECT statements. A fact build can have more than one data source. manually.
3. Map the columns of the data source(s) to the DataStream items.
Build slide.
Add a Dimension Element to a Fact Build 2 clicks to complete.
Insert
Select to roll up
Select to use the to the higher
surrogate, if one is level(s)
available, as a foreign
key Map the dimension item to a DataStream item
Instructional Tips
When you add a dimension element to a fact build, you are adding a column to a Rather than creating transformation model
fact table that links this table to a dimension table. Therefore, before you create a elements and then mapping the
dimension element, make sure the corresponding dimension with the hierarchy DataStream to the transformation model
exists in the DecisionStream library, and the build data source contains the separately, you can perform both tasks
referencing column. together.
To add a dimension element to a fact build, follow these steps: You can create attribute, dimension, and
measure transformation model elements
1. Add a dimension element to the transformation model and name it. using the following method:
2. Associate it with the corresponding hierarchy. In the slide example, the 1. Right-click the transformation
Product dimension element is associated with the Product hierarchy. model, and then click Mapping.
2. In the DataStream Item column,
3. Set Output Levels, and select the Dimension boxes for all levels if you click the DataStream item that
want them to be represented in the dimension delivery. you want to map to the
transformation model.
4. Clear the Aggregate box if you do not want the DecisionStream to 3. Drag the selected item to the
perform aggregation for the associated measures and derivations. white space in the
Aggregation will be discussed in Module 15, "Aggregation." Transformation Model column.
4. From the popup menu, click the
5. Select the Use surrogates when available box if you want to reference the relevant element type to create:
dimension through the existing surrogate key. dimension, measure, or attribute.
6. On the Unmatched Members tab, select the Accept unmatched member DecisionStream creates the transformation
identifiers box if you do not want the unmatched records to be rejected. model element and maps the selected item
to the element.
An attribute element holds additional information that is not a dimension or a Technical Information
Mathematical merge behaviors (for
measure but that may be of interest. Attributes differ from measures in that they example, SUM), are always available, even
cannot be aggregated. if the attribute is not of a numeric data type.
If they are used on a non-numeric attribute,
Attribute columns are generally either: they will cause an error when the fact build
is executed. The MAX or MIN options are
• an attribute of one of the dimensions, such as unit weight or size more appropriate options for non-numeric
attributes.
• a property of the record, such as the name of the operator who entered
the record or the timestamp of record creation
1. Add an attribute to the transformation model and provide a name for it.
Note: Attributes only have data values at levels retrieved directly from input
data SQL queries. In summary information, the value of an attribute is
always null.
Demo 8-1
Purpose:
The company wants to know what products are distributed to
which vendors on a daily basis. We need to construct a new
fact build called DemoSales, and we want to add dimension
elements, measures, and an attribute to it. We will create this
fact build manually.
9. Click OK.
10. Repeat steps 1 to 6 to add the VendorSiteCode dimension element to
the DemoSales build, selecting VendorD in the Dimension box and
VendorH (H) in the Structure box.
11. In the Site row, click the Output box to select it.
12. In the VendorType, Vendor, and Country rows, click the Dimension
check boxes to select them.
The result for VendorSiteCode appears as follows.
Results:
We manually constructed a fact build called DemoSales and
added dimension, measure, and attribute elements to the
build's transformation model.
Summary
8. Fact Builds
9. History Preservation
13. JobStreams
Objectives
Although gradual changes to database information are a sign of a well-used data Key Information
Up to this point, we have mainly dealt with
warehouse, organizations face a challenge when maintaining a cube, or reading data from a data source, and then
multidimensional representation of detail and summary data, over time. For writing that data to a target database. Now
example, sales representatives transfer to a new branch office. If their old sales we want to handle incrementally added
figures move to the new branch with them, a report from their original office data, and outline how we should deal with
suddenly shows historically poor performance. this new data.
Other common changes include the addition of a new product, department For example, what happens if an employee
is assigned a new employee number? One
reorganization, or changing characteristics of a property (a product is changed or solution would be to create two sets of
reformulated). reports for that employee: one set using the
old employee number, and another set
One of the common problems in maintaining cubes and other user-reporting using the new employee number.
data sources is that many dimensions change over time. New members are added
to dimensions, but existing members do gradually change over time. In a data However, the best practice would be to
assign a new surrogate key to the
warehouse, dimensions that reflect these types of incremental changes are known employee, and use this key to link the old
as slowly changing dimensions (SCDs). DecisionStream handles these situations and new employee data.
quite easily.
Instructional Tips
Situations where a dimension is completely changed quickly are rare. Drastic Deleting from a data warehouse is a rare
changes are usually caused by design changes or, less often, by a complete occurrence, but it can happen. For
reorganization of the relationships among members in the dimension. example, a company may disband a
department and must remove all
references to that department.
Understand Surrogates
Surrogate keys have existed for years in operational systems. For example, Technical Information
Students may ask why we would use a
Invoice Number, Order Number, and Employee Number are all operational surrogate key on a Time dimension.
surrogate keys. When dealing with orders, you may have
an OrderDate (that has a value) and a
An entity such as an employee usually has a natural key (for example, the ShippedDate (with a value of NULL).
employee's name) so that application users can easily find the data they want.
However, the employee typically has another meaningless key, such as employee When the product is eventually shipped,
number. This additional key creates several advantages in an operational system. the ShippedDate value changes to show
the actual date the product was shipped.
Using an internally assigned surrogate key means that the operational system can To track this record, surrogates are used.
ensure uniqueness. We do not have to worry if there are two Mary Smiths. Even
if the employee has some other externally assigned unique key (for example, a
social security number), that key may be missing or incorrect when the employee
Example
data is initially entered into the system. An internally assigned surrogate key The Dallas office at Cognos was in
always exists and is guaranteed unique. Arlington, Texas. Its natural key is ARL.
This key is now meaningless but has not
Also, the surrogate key is a better choice to tie all employee records together. changed even though the office moved to
What if the employee changes her name? If the surrogate key joins all records, Dallas.
then the employee name need only be stored in one place (the employee record)
and changed in one place. The surrogate key (the employee number) still
connects all other operational records together.
A final advantage of operational surrogate keys is their size. A natural key may be
many bytes long. If natural keys are used to join tables, the table sizes can be
unnecessarily large.
Although surrogate keys can be passed from an operational system into a data
mart, usually a new surrogate key is assigned inside the data mart. The surrogate
key from the operational system is often used for queries into the data mart,
playing the role of a natural key. There are a number of reasons for assigning a
new data mart-only surrogate key.
Often, several operational systems will have entities merged to form a single data
mart entity. For example, a single customer in a banking data mart may exist as a
checking account, a savings account, and an insurance policy in the operational
systems. Instead of using three account numbers to identify the customer, a new
surrogate key may be assigned, and the three account numbers become alternate,
natural keys.
The sheer size of data marts often makes surrogate keys preferable to natural
keys. Where an operational database may have millions of rows, a data mart may
have billions or even trillions of rows in a single fact table. The small size of
surrogate keys can often save large amounts of space.
However, in spite of the other advantages of using surrogate keys, the single most
important use of surrogate keys is their value in tracking changes to dimensional
information over time. The common term for tracking changes over time is
SCDs.
Fact Table
Prod Code Cust Code
PR X 002 39 SA 1 11
Measures
PR X 003 40 LO 2 22
PR Y 003 40 SE 5 55
The slide example illustrates using natural keys to join dimension tables to a fact
table. A fact table can be joined to many dimensions. Using natural keys in the
fact table can take up large amounts of physical space, both in terms of table
structure and indexes. This is one of several reasons to avoid using natural keys.
Fact Table
Prod Sur Cust Sur
1 10
Measures
2 20
3 30
The slide example illustrates using two surrogate keys, Prod Sur (in the Product
dimension table) and Cust Sur (in the Customer dimension table). The natural key
is still available to users in the dimension table, but joins are implemented
through these surrogate keys.
Although the surrogate keys are not used in reports, including them in the fact
table instead of natural keys saves space. For example, in the previous slide, each
measure is identified by a unique combination of a product code and a customer
code.
By contrast, in the slide example on this page, each row of fact data is uniquely
identified by a combination of two surrogate keys. If there are several million
rows of data in the fact table, using surrogates makes implementing the joins to
dimension tables more efficient.
You normally use surrogates to link fact tables to dimension tables, but
DecisionStream gives you the option of using natural keys. To join fact tables to
dimension tables using surrogate keys, click the Use surrogates when available box.
The Fact Build wizard assumes that all dimension elements must use
surrogate keys when they are available. If you manually add a new dimension
element to an existing fact build, you must set this option if you intend to
deliver surrogates.
Note: If you use the fact build to deliver dimension tables (which is not the
recommended approach), the resulting dimension tables will not include
surrogate keys.
We often think of data marts as having single grain fact tables (for example, daily Technical Information
The templates for extracting the source
sales). If this is true, only the lowest level of the hierarchy requires a surrogate key. data will not have surrogates. These
templates do not need surrogate attributes.
However, there are often multiple fact tables with different grains (for example, Surrogates are required only when
monthly budgets). You can use a single dimension for different fact tables at members are added to the dimension
different grains. In this case, surrogate keys are required for more than just the tables in the data mart. Therefore, only the
lowest level. template associated to the dimension build
will contain the surrogate information.
DecisionStream lets you determine whether surrogates are available at each level.
Instructional Tips
You specify this by adding one or more attribute(s) to a template and specifying
Emphasize that manually creating and
their behavior as surrogate key. Then set the value for the business key to the maintaining surrogate keys in a data
appropriate level. Or, you can set the surrogate key starting value, as in the slide warehouse can be an onerous task. By
example. If you use the Hierarchy wizard, the surrogate key will have a default using templates, DecisionStream
name of skey, which you can change. automates the generation and
management of surrogates.
In the slide example, the D_VendorSurrT template has a surrogate key called
Surrogate. This attribute is mapped to the VendorCode business key attribute. In
other words, in the corresponding dimension table, each value of VendorCode
will have a separate value for Surrogate.
Fact Table
Records With
Product IDs Replace
ProductNumber Load fact table
ProductNumber with surrogate records into
ProductKey DBMS
Once surrogate keys are added to dimensions, they can be assigned to fact table
records. This process links each row of the fact table to the correct row in the
corresponding dimension table.
1. Each record from the source system has a ProductNumber column that
holds natural key values (such as PR X 002 for Beans).
3. Each fact row uses the ProductKey column as part of its primary key,
instead of the natural key. The values in the ProductKey column are
linked to the corresponding values in the same column of the Product
dimension table.
4. Each row of fact data is loaded into the target data mart. The original key
(ProductNumber) is no longer used to join the fact table to the Product
dimension table. The more efficient surrogate key (ProductKey) is used
instead.
This diagram is a simplified version of the one used in Ralph Kimball et al.'s The
Data Warehouse Lifecycle Toolkit (1998, Wiley). (See page 634 for a more complex
example of how the natural keys of a fact table can be replaced with surrogate
keys in the data mart.) .
Demo 9-1
Purpose:
The Great Outdoors has data that tracks their vendors. We
want to store this data in a conformed dimension table that
can potentially be used by multiple fact tables in the data mart.
To accomplish this, we will manually create a dimension build
that is based on the VendorCustomerH hierarchy. We will then
execute the dimension build to deliver the data to a single
conformed dimension table.
Notice that each value of VendorSiteCode (the primary key for the table)
is associated with a separate surrogate key. For example, VendorSiteCode
101 is associated with a value of 4 in the Surrogate column.
8. Close SQLTerm and leave DecisionStream open for the next module.
Results:
We manually created a dimension build that is based on the
VendorCustomerH hierarchy. We then executed the dimension
build to deliver the data to a single conformed dimension
table.
An operational system usually only contains data about the current status of the
business. Therefore, the sales representative record will record the office in which
the sales representative currently works.
By contrast, the data warehouse is expected to hold data for perhaps five or 10
years. Over this time, it may be important to know all the sales offices in which a
sales representative has worked (and when).
Adding time variants to a data structure makes it more complex; however, not
adding them can cause considerable difficulties for the users.
For example, comparing a division's performance this year versus last year may
be impossible if the customer's sales representative has moved divisions. Does
the sales history move to the division that the sales representative is in now, or
does it stay with the sales representative's old division?
Obviously the answer is… it depends. The data warehouse must be able to give
whatever answers the user requires.
The dimensional data in an OLTP system (for example, one tracking customer
orders) is usually static. As noted previously, the only data that matters is that
which reflects the most current state of the business.
Because the data warehouse is expected to hold data for several years, it must
contain the most current dimensional data, in addition to all the changes it has
undergone gradually over time, in tandem with the changing structure of the
business. Slowly changing dimensions let you track these historical changes in the
warehouse.
* (Emp. No + Branch)
Imagine the effect of having such a large
** (Emp. No + Branch + Position) Natural key in the fact table.
*** (Emp. No + Branch + Position + Salary)
SCDs is a technique for managing historical data. They are dimensions where
non-key attributes can change over time without corresponding changes in the
business key. For example, employees may change their department without
changing their employee number, or the specification for a product may change
without changing the product code.
In the slide example, the original record changes when Jack changes location. To
keep track of the change, a new record is added. The new record would have the
same Employee Number, which means that the key is no longer unique. To make
it unique, the key must be the combination of Employee Number and Branch.
For the next change, Jack is promoted. Again, a new record is added, and again,
the key is no longer unique. To make it unique, the key must be the combination
of Employee Number, Branch, and Position.
These steps continue for every change to Jack's status within the company. As
you can see, the unique key eventually becomes quite long.
Also consider the effect of having such an inefficient and large key in each fact
record, and consider that this problem is repeated for each dimension that the
fact table references.
SCDs require that the designer has a way of tracking the data warehouse
members and their status without making the issue unnecessarily complex. The
solution is to replace these cumbersome natural keys with efficient surrogate keys.
Ralph Kimball maintained that there are three typical ways to handle changing Key Information
As noted in Ralph Kimball and Richard
dimensional data; he called them Type 1, Type 2, and Type 3. The industry has Merz's The Data Webhouse Toolkit (Wiley,
adopted these terms. 2000), "Type 3 SCD occurs when an
alternate, simultaneous description of
The choice of method largely depends on the business' need to track changes. By something is available. In this case an
far, the most commonly accepted methods are: extra 'old value' field is added in the
affected dimension."
• Type 1 - where no historical record is required
DecisionStream does not automatically
support Type 3 surrogate keys.
• Type 2 - where an accurate historical record is required at any stage
A single row may consist of attributes in any combination of the three types.
Often, if a single column is Type 2, the entire dimension is referred to as a Type 2
slowly changing dimension (SCD).
To track Type 2 SCDs, you must use surrogate keys. Creating and maintaining
surrogates is discussed later in the module.
Build slide
1 click to complete.
Type 1: Overwrite the Original Value
In a Type 1 SCD, the data fields are overwritten with the new values. If the only
changes in data are to Type 1 fields, the existing row can be updated in place. No
new rows are inserted into the dimension table.
The original data could be incorrect. In this case, the type of change is just a
correction. There is rarely a legitimate business need to track data that was
originally recorded incorrectly. For this reason, any data field could, in theory, be
eligible for a Type 1 change.
The most common reason for a Type 1 change, however, is lack of relevancy.
Although the original data was correct, there is no business reason to track the
change.
In the slide example, the marital status of Mary Jones has changed from single to
married. Since her previous marital status is no longer relevant in the Sales Rep
dimension table, the previous value (Single) in the Marital Status column is
overwritten with the most current value (Married). This is a good example of a
Type 1 change.
Build slide
2 clicks to complete.
Type 2: Add a New Dimension Record
In a Type 2 SCD, changes are detected in the source data that must be tracked in Instructional Tips
If the Sales Rep table included an extra
the data mart. For example, a Sales Representative has moved to a new office. column, End Date, the first row's record
From this date on, sales must be reported under the new sales office, but all prior would have a value of NULL in that column.
sales should be reported under the previous office. However, all sales for the Once the second record is added, the first
representative are credited to her, regardless of which office she worked in when row's value in the End Date columns would
the sale was made. have a valid date, and the second row's
value for End Date would become NULL.
In this case, a new row that has a new surrogate key must be added to the DecisionStream lets you manually control
the characteristics of the End Date column.
dimension table. The original row still points to pre-existing sales facts. From this
point on, all new sales will be joined to the new dimension record. Key Information
In this example with Mary Smith moving
You can use surrogate keys even if you do not track changes. However, you must offices, where Mary's movements have to
use surrogate keys to implement Type 2 SCDs. be preserved for history, a more rational
approach would be to have two
The advantage of this technique is that the user can report all combinations of dimensions: one employee dimension and
one location dimension.
sales. In the slide example, all of Mary's sales can be found by constraining
(filtering) on Mary's Sales Rep Key (00128). All Dallas sales (including Mary's Ideally, in the data warehouse, dimensions
while she was there) can be found by filtering on the natural key for the Dallas should be kept as atomic as possible. In
office. the current example, two dimensions have
been intersected which creates the need to
Usually, when a Type 2 change is detected, you must find the existing, current use SCD logic to preserve history.
row for the entity and update it as "no longer current;" that uses an Effective End If two dimensions had been used, the
Date. The new row will then have an Effective Begin Date and a null End Date. history preservation problem would never
There are, however, other ways to mark the current row. have occurred in the first place.
The D_StaffH template is referenced by the D_StaffH dimension table. This template is also
referenced by the SalesStaffL lookup, which reads from the D_StaffH dimension table.
Type 1 (No attributes specified). Type 2 (At least one attribute specified).
Type 2 SCDs preserve history. However, this is not always needed or preferred. It
depends on the business requirements. For example, if a product is moved from
one product type to another, the company may want it to appear as if it was always
a member of the new product type. In this case, the row only needs to be updated.
DecisionStream assumes that all attributes of all dimensions are Type 1. You only
have to specify Type 2 to preserve history. You specify Type 2 SCDs in the
Dimension Table Properties window. Clicking the Track changes (Slowly
Changing Dimensions) box enables the Track column. Click the Track box for
each attribute for which you want to preserve historical data.
On the left side of the slide example, we have specified that we do not want to
track the history of any of the dimension's attributes. Therefore, D_ProductH is a
Type 1 dimension.
On the right side of the slide example, we want to track the history of the
ProductName attribute. If the name of a product changes, we do not want to
overwrite the old one; rather, we want to track the entire history of the product,
regardless of its name. Although we do not want to preserve the history of the
remaining attributes, this dimension will be treated as a Type 2 SCD.
Attributes marked as business keys are natural keys from the original operational
system. If you have several levels in your hierarchy, you may have several business
keys. These business keys are IDs from the related hierarchy levels. In the slide
example, the D_ProductH template has four business keys, each of which is the
ID of the associated hierarchical level: Product, ProductType, ProductLine, and
AllProducts.
You can mark only one of these business keys as the primary key of the
dimension table (Value = True). The lowest level business key must be the
primary key. In the slide example, the lowest level is Product, and the business
key for this level is ProductNumber. As a result, ProductNumber is designated as
the primary key.
A business key with a primary key value of True cannot be a Type 2 attribute.
DecisionStream uses this business key to locate existing members in the
dimension. If you change this key, you have effectively created a new dimension
member.
Each surrogate key is related to a particular business key (usually the primary key
of the table). This means that each separate business key value will have a separate
surrogate key value. In the slide example, ProductNumber is a business key that
has values such as 648, 4732, and 1190. This business key is related to the "key"
attribute (which has a behavior of Surrogate). Each value of ProductNumber will
have a corresponding value for key (for example, 1, 2, and 3). As indicated on the
right side of the slide example, we cannot track changes to the ProductNumber
column in the dimension table, since it is the primary key.
Demo 9-2
Purpose:
Five of our employees have been transferred to new locations.
Using SCDs, we can have our data mart automatically updated
when such transfers take place. We will modify the StaffH
hierarchy and run the Staff dimension build to demonstrate.
9. Click OK to close SQL Helper, and then prepare and Refresh the
columns for use in the SalesStaff level.
10. Click OK to close the Data Source Properties window.
11. Under the SalesStaff level, right-click DataStream, and then click
Properties.
The DataStream Properties window opens.
The columns of the modified data source (listed on the left side) may be
in a different order than the one shown. However, each column must be
mapped to the appropriate attribute of the SalesStaff level, as indicated by
the screen capture.
13. Click OK to close the DataStream Properties window.
Task 4. Modify the D_StaffH dimension table so that it
includes Type 2 attributes.
1. Under the Staff dimension build, double-click the D_StaffH dimension
table.
The Dimension Table Properties window opens.
2. Click the Columns tab, and then ensure that the Track changes
(Slowly Changing Dimension) check box is selected.
5. Click the Override build settings check box to select it, click the
Progress, Detail, SQL, and ExecutedSQL check boxes to select them,
and then click OK.
The Staff dimension build runs and applies five SCD changes to the
existing D_StaffH dimension table. The result appears as shown below.
Notice that the changes were only applied to the SalesStaff level, because
that is where we added the new data source.
6. Press Enter to close the DOS window, and then run SQLTerm.
SQLTerm opens.
7. In the Database for SQL Operations box, click TargetConnect.
8. In the Database Objects pane, expand TargetConnect, right-click
D_StaffH, and then click Add table select statement.
A SELECT statement appears in the SQL Query pane.
9. Run the query.
The query runs and returns 107 rows.
This data set includes five new rows to reflect the changed values of the
Type 2 attributes that we set previously. These rows are at the bottom of
the result set. The result appears as shown below.
We can see new values for the SalesBranchCode column, as well as blank
values for the DateHired column. The hire dates for each of these
employees are in their previous records and will not change in the future.
10. Click the right arrow at the lower-right corner of SQLTerm to scroll the
entire data set to the right side of the screen.
By scrolling vertically in the pane, we can see that five rows have today's
date as a value in the end_date column. This indicates that these rows of
data are no longer current.
Other rows have new values in the udt_data column, which indicate
when each row was last updated. They also have values in the curr_ind
column, which indicate whether each row is the most current. The result
appears as shown below.
Results:
We have applied SCD changes to a dimension table by
modifying a hierarchy and then re-executing the dimension
build that delivers this hierarchy.
Summary
8. Fact Builds
9. History Preservation
13. JobStreams
Objectives
Each hierarchy and level in a hierarchy can have multiple data sources, which
means that data can be input from more than one connection. This can be useful
when you want to perform data cleansing using different queries.
When you use multiple data sources, you must access the data using a
DataStream, because this is where the data sources are merged.
When data is merged in a hierarchy, you cannot specify how the merge is
performed. When you merge records in a fact build, you can select first non-null,
maximum, average and many more.
Derivations in Hierarchies
Because merging is different in a hierarchy than in a fact build, derivations are Instructional Tips
You can resolve this problem by creating a
used differently. In a hierarchy or hierarchy level, you cannot perform a fact build to retrieve and merge the data to
derivation on different data sources. A derivation that is performing a calculation a staging area, and then use the data from
based on two data sources returns unexpected results. the staging area to create the hierarchy.
In the slide example, the nProduct data source has a DataStream item called
ProductionCost and the nProductMargin data source has a DataStream item
called Margin. A derivation calculated from these two items will produce no
results.
Build slide.
One click to complete.
Type 1 and Type 2 Updates
A dimension that contains Type 1 and Type 2 attributes changes all the records
with the same business key when you perform a Type 1 update. DecisionStream
assumes that when you have a Type 1 attribute, you want to maintain consistency
in all the records.
The previous value of the Type 1 attribute is updated, as well as the values in all
the previous versions of the record. In the example, ProductName changed from
Tea to Chai Tea, so DecisionStream updates all occurrences of Tea to Chai Tea.
To update the Type 1 change, DecisionStream must locate all the affected
records. To speed up this process, correct indexing must be used. For star
schemas, you should index the business key at each level. This is important when
a Type 1 attribute is updated at a higher level, since the number of records
updated is increased.
You can use the build log to see what happens when you execute a build
containing updates. In the slide example, there is one Type 1 update, but no Type
2 updates. Eleven rows were updated because there was a Type 1 change.
To make the updating more efficient, it is recommended that you index your
business key. In the slide example, the ProductID column was indexed.
Dimensional History
DecisionStream can load the history into a dimension table in the data
warehouse. This is useful if you want to load old data, or when the dimensions
change frequently. When you have loaded the historical data into the dimension,
you can start to load the fact records and take advantage of the late arriving facts
functionality.
Multiple rows with Type 2 attributes may have the same business key, but they
always have unique surrogate keys.
Using effective date attributes solves problems that may occur when loading
historical data in the warehouse. Initial loads are usually required if you want to
load data for a number of years. Periodic runs may only be used once a week or
once a month.
To implement dimensional history, you must define an attribute for effective start
date in the source template.
In the Dimension Table Properties window, you assign a column from the
hierarchy data as the effective start date attribute. DecisionStream derives the
effective start date from the source data, instead of automatically generating the
effective start date.
When you execute the dimension build, the effective start date is initialized from
the source data, and the effective end date is generated automatically.
Specification is a Type 2 attribute that has The specification for the product underwent a change on
undergone two changes since November 1, Feb. 10, 2002. If the specification is going to change again on
1998. It has changed from being a blue pen to June 7, 2002, DecisionStream will check this record, then
a black pen, and then again to a red pen. update the dimension table again as required.
By default, DecisionStream assumes that changes to dimension data that are fed
into the data mart arrive in chronological order. In the slide example, changes to
the specification of product P1 occurred sequentially. Product P1 was a blue pen
on Nov. 11, 1998, was reclassified as a black pen later (Dec. 23, 2001), and was
then reclassified as a red pen even later (Feb. 10, 2002).
DecisionStream detected that the incoming data about product P1 had a different
specification, which became effective at an earlier date (Dec. 23, 2001). As a
result, a new row was added to the dimension table with a new effective date of
Feb. 10, 2002. DecisionStream specified that the previous row was no longer
current by adding a new end date value to that row. The end date for this row is
the date in which the new row of data became effective minus one day or one
second.
However, what if the change from a black pen to a red pen had not taken place
on Feb. 10, 2002, but on Feb. 10, 2001, several months before the effective date
of the most current row of data? This type of change is an example of a late
arriving dimension detail.
Type 2 attribute changes to a dimension member that occurred prior to the Technical Information
Late arriving dimension details are a
effective start date of the most current record for that member cannot be written procedural problem that is not handled
to the dimension table. automatically by DecisionStream. If you
write the details to a .rej file, you will have
The data at the bottom of the slide example includes two changes to the to manually process them back into the
specification of product P1. Both of these changes occurred prior to the effective system.
date of the current record in the dimension table for product P1. The product
was an orange pen on May 14, 1999, and a mauve pen on April 27, 2001. But on Late arriving dimension details present the
following problems:
Feb. 10, 2002 (the date of the most current record for P1), it was a red pen. Do • Further work is required to insert
we accept or reject these late arriving dimension details? this dimension history into the
existing dimension table.
To save the late arriving dimension details to a reject file, specify the name and • Existing surrogate key and
location of the file on the Dimension History Options tab of the Dimension effective date values in the
Table Properties window. If you do not specify a reject file, the late arriving dimension table must be
dimension details are lost. realigned to accommodate the
late arriving dimension details.
• Reassigning surrogate key
values on dimension tables to
accommodate late arriving
dimension details creates
problems for the fact tables that
use these surrogate keys for
referential integrity.
For the Dimension History Options tab to be enabled, the following conditions
must be met:
• there must be a column in the source data that supplies the effective date
(for example, Effective_Begin_Date in a Product table)
• in the target dimension table, ensure that you have a column that is
mapped to this attribute that has Effective Start Date behavior; we do
not want DecisionStream to automatically generate the effective start date
in the template, we want to derive it from the source data
Should the effective start date be the date specified for each
dimension member in the data? Or should the effective date be
set automatically by the template?
In the slide example, there are two dimension members. Product P1 is a blue pen
and product P2 is white paper. These are the first rows that represent each of
these products. Because of this, there is no value in the End Date column: no
new records have arrived that render the old records out of date.
In the Dimension Table Properties window, you can specify from where
DecisionStream should read the effective start date for the initial record for a
specific dimension member. You have two options:
• From source attribute: DecisionStream sets the effective start date to the
date specified for each dimension member in the dimension data. This is
the default setting.
In the slide example, we have overridden the template settings and used the date
specified in the dimension data for individual dimension members as the effective
start date for initial records. As a result, the data for both products became
effective at different times. Product P1 was identified as a blue pen on Nov. 1,
1998, while P2 was identified as white paper on Nov. 1, 2001.
If you use the template to set the effective start date for the
initial records of distinct dimension members, you can further
specify how this date should be set.
You can set the effective start date to the timestamp set by the
dimension build, a variable, a specific date, or a null value.
If you use the template to set the effective date, you can further indicate how
DecisionStream should set the effective start date in the initial record for a
specific dimension member. The following table outlines your options.
You want to use the date that Use data timestamp value
corresponds to the timestamp set by
the dimension build.
A dimension table must reference a template. The template lists attributes that Technical Information
If Date only is selected in the Effective date
represent the columns in the table, as well as the behavior of these columns. For granularity box at the top, you cannot
example, only one attribute in the template can represent a primary key column in specify an explicit date and time, just the
the dimension table. date.
If you use a template to set the effective start date for the initial record of a The option in the Set previous record
dimension member, you can control how DecisionStream sets these start dates, Effective Date to box is set to minus one
as well as the effective end dates.In the Effective Date granularity box, specify the day, not one second.
format to use for both the effective start date and the effective end date columns.
You can use either the date and time (the default) or just the date. The option you
choose depends on your reporting requirements.
For example, if you are reporting on electricity rates and are using a date
timestamp, your reports will not be accurate if the rate changes numerous times
in one day. However, using a timestamp that includes just the date may be valid
for reporting on human resources data. For example, it is unlikely that an
employee’s personal information will change twice in the same day.
If you use a timestamp that includes both the date and time, you have more
flexibility when you create reports. For example, you can specify a BETWEEN
clause to retrieve only records that have been modified during a particular time
period in one day.
In the Effective End Date in current records box, indicate how DecisionStream
should set the effective end date when the template detects a change in the
incoming dimension data. You can either use a null value (the default), a variable,
or an explicit date and time.
In the Set previous record Effective Date to box, indicate how you want to set
the effective end date for the previous row of data when DecisionStream creates
a new row in the dimension table. You can set this effective start date to be the
same as the effective start date of the new dimension data row, or the same date
minus one day or second.
Demo 10-1
Purpose:
The Great Outdoors recently acquired a new line of food and
beverage products. The data about the past sales of these
products is stored in a separate database. We will add a
connection to this database, then create a new hierarchy that
shows the entire history of this dimension data, including
changes in product names and prices. Finally, we will
construct and execute a dimension build to deliver this history
to a dimension table and view the table’s contents in
SQLTerm.
3. In the left pane, ensure that ODBC is selected, and in the Data Source
Name box, click BIAS_Northwind, and then click Test Connection.
A message appears indicating that the connection is successful.
4. Click OK, and then click OK again to close the Connection Properties
dialog box.
Task 3. Create a new hierarchy in the Product dimension
and add the Category level.
1. Expand the Dimensions folder, right-click the ProductD dimension,
and then click Insert Hierarchy.
The Hierarchy Properties window opens.
2. In the Name box, type ProductHistory, and then click OK.
3. Right-click the ProductHistory hierarchy, and then click Insert Level.
The Level Properties window opens.
4. In the Name box, type Category, click the Attributes tab, and then click
New.
The Template Properties window opens.
5. In the Name box, type ProductHistory, and then click the Attributes
tab.
6. Click Add, and then type CategoryID.
7. Click Add button, and then type CategoryName.
8. Click OK.
We return to the Level Properties window.
9. Click Add all attributes to add the attributes to the Chosen
attributes pane.
10. In the CategoryID row, click the Id check box to select it, and then for
the CategoryName box, click the Caption check box to select it.
The result appears as shown below.
11. In the ProductID row, click the Id check box to select it, in the
ProductName box, click the Caption check box to select it, and then in
the CategoryID row, click the Parent check box to select it.
The result appears as shown below.
4. Click the Attributes tab, click the eff_date attribute, and then click
Delete to remove the attribute from the template.
5. In the Behavior column beside EffectiveBeginDate, click Effective
Start Date, and then press Enter on your keyboard.
The result appears as shown below.
7. Click the Dimension History Options tab, ensure that the From Instructional Tips
source attribute button is selected in the Effective Start Date in Initial Students may have to close and re-open
the Dimension Table Properties dialog box
Records area, and then click OK to close the Dimension Table before the Dimension History Options tab is
Properties window. enabled.
10. Close SQLTerm and leave DecisionStream open for the next module.
Results:
We added a connection to a new database. We then created a
new ProductHistory hierarchy that showed the entire history of
the product data in this database, including changes to
product names and prices. Finally, we constructed and
executed a dimension build to deliver this product history to a
dimension table and viewed the table’s contents in SQLTerm.
Summary
8. Fact Builds
9. History Preservation
13. JobStreams
Objectives
10100101
Hierarchical data is loaded into memory
and is referenced multiple times in the
fact build process.
Product
DecisionStream uses the levels of the
hierarchy to merge the data, to validate
Data incoming data, to aggregate the data,
Source and to partition and filter the data.
• checks that the columns in the target table are the same as those being
delivered
• loads the dimensional data into memory (using the hierarchy or lookup
definitions)
To process fact data, DecisionStream acquires and merges the data as specified,
aggregates it, filters and partitions the result, and then delivers the data to the
appropriate tables.
No additional memory required if you are not merging or rejecting duplicate rows.
Regardless of whether you are using a single data source or multiple data sources,
all the rows are moved into the DataStream. After the rows reach the DataStream
they may be rejected or merged.
The slide example shows what happens if two or more data sources are acquired
and the user does not specify whether to merge or reject duplicate rows. In this
case, DecisionStream reads every row from the first source, and then every row
from the second source, and so on, processing each record separately until all the
records from all the data sources are read.
No memory is used if the developer does not specify that they want to merge or
reject duplicate rows.
If two rows have identical dimensional values, then the calculation generates the
same hash number. Because that number determines where to load the pointer in
the hash table, rows that have identical dimensional values will have their pointers
clustered together in the hash table. The clustering is the interleave process that
DecisionStream performs. Do not confuse interleaving rows with sorting or
ordering the rows.
Keep in mind that the interleaving process just groups common dimension
values together. The actual merging of the related measures, attributes, and
derivations, takes place in the transformation model.
Since the hash number is derived from a calculation, it is possible that more than
one set of dimensional values will yield the same hash number. If this happens,
the slot in memory is already occupied. This situation is referred to as a collision.
In this case, another slot is picked in the hash table for the new pointer. The hash
table is discussed further in Module 20, "Troubleshooting and Tuning."
The DataStream is written to memory as a simple list of rows, and a hash table of
pointers is created to find them again for processing.
Allow
Reject
Merge
Duplicate rows occur when the values in all the dimensional columns match. You
can have duplicate rows even if you only have one data source. When handling
duplicates in DecisionStream, you must define how you want to handle them.
Duplicate behaviour is defined on the Input tab of the Build Properties window
because the processing is performed at the input stage of a transformation.
You can reject, accept without further manipulation, or merge (consolidate)
duplicate rows. The default (and most efficient) behaviour is to allow duplicate
rows. No sorting or hashing is required. Use the Allow records with duplicate keys
option when duplicates are not important, or when you know that no duplicate
rows exist in the input data.
Option Description
Allow records with DecisionStream accepts the duplicate records. This
duplicate keys option is selected as the default.
Reject records with For each set of duplicate records, DecisionStream
duplicate keys accepts the first data row and rejects subsequent rows
to the reject file. If you do not specify a reject file to
write rejected records to, the rejected rows are lost. You
specify the reject file on the Input tab.
Merge records with DecisionStream tracks the duplicate records and merges
duplicate keys all non-dimension columns using the merge functions
that you specify.
Build slide
One click to complete.
Allow or Reject Duplicate Rows
D D DataStream
Cust Date Qty Amt Cust Date Qty Amt
1 199901 1 100 1 199901 1 100
1 199901 1 200 1 199901 1 200
2 199901 2 200 2 199901 2 200
2 199901 3 300 2 199901 3 300
3 199901 4 400 3 199901 4 400
4 199901 5 500 4 199901 5 500
D D DataStream
Cust Date Qty Amt Cust Date Qty Amt
1 199901 1 100 1 199901 1 100
1 199901 1 200
2 199901 2 200
X 1
2
199901
199901
1 200
2 200
2 199901 3 300 2 199901 3 300
3 199901 4 400 3 199901 4 400
4 199901 5 500 4 199901 5 500
In the slide example, the first two records are duplicate rows. Both have the same
values for the Customer and Date dimensional elements (1 and 199901,
respectively).
With the setting that allows for duplicates, both records are passed to the
DataStream. If you set DecisionStream to reject records with duplicate keys, the
first record is passed, but any subsequent records are not. These duplicate records
are written to the reject file.
In the slide example, DecisionStream is set to merge any records that have Questions
In the table on the right, the last column on
duplicate keys. The keys are the actual dimensional values. the right does not sum the Credit Limit
values, but chooses the last value from the
Note the source rows for customer 2. The first two rows referring to this values of each of the third, fourth, fifth, and
customer have zero values for Paid and Credit Limit. The second two rows have sixth records. That value is 450.
zero values for Quantity and Amount. All four records have the same
dimensional values, or keys (in this case, 2 and 199901). Merging the records Ask the students if they notice anything
creates a merged record that has the same dimension values as all four records, unusual about this table to see if they spot
this, and ask why the Credit Limit column
and merged values for all remaining elements. has this value for the record that has
dimension values of Customer=2,
This merging process continues for all records that have duplicate dimensional Date=199901.
values until all duplicate records are merged.
Select the
Merge Behavior
Instructional Tips
You can choose from one of the following merge methods.
If the element (for example, an attribute
Your situation Merge Method called CustomerName) is a character
string, do not select a mathematical merge
You want to use the sum the duplicate child members. SUM method (such as SUM). Doing so will
produce an error when the fact build is
You want to use the child member with the maximum MAX executed.
value.
You want to use the child member with the minimum MIN
value.
You want a count of all the child members. COUNT
You want the average value of the child members. AVG
You want to use the first value that occurs FIRST
(the first answer is always the correct one).
You want to use the first non-null value that occurs FIRST NON-NULL
(the first answer is always correct provided it is present).
You want to use the last value that occurs LAST
(the latest information is always best).
You want to use the last non-null value that occurs LAST NON-NULL
(the last record represents the last update, but a null
value is never an improvement on a previous real value).
You want to use 1 or 0, depending on whether values ANY
are present or not.
Reject File
If you create a build using the Fact Build wizard, a reject file named
{$DS_BUILD_NAME}.rej is specified. The $DS_BUILD_NAME portion is a
variable that returns the name of the build itself. As shown in the slide example,
Instructional Tips
you can also rename the reject file and give it a different extension (in this case, To change the properties of a fact build,
reject.txt). right-click the build and then click
Properties. Click the Input tab to specify
When executing a fact build, DecisionStream writes to the reject file all rows record rejection options.
where the value of one or more dimension elements does not exist in a reference
dimension. This process is part of basic data integrity checking.
If you specified DecisionStream to reject duplicate records; that is, you selected
the Reject records with duplicate keys option, these records are also written to the
reject file.
In the Write Any Rejected Records to box, enter the full directory location and
name of the file. If necessary, click the ellipsis to open the Select Reject File dialog
box. By default, reject files have a .rej filename extension. If you do not specify a
file, the rejected rows are lost.
The reject file is deleted and re-created each time the build is run.
Customer Dimension
Cust Name Address
1 Bob Canada
Fact Data 2 Tom U.S.A.
Cust Date Qty Amt Paid Cr L 3 Mary England
1 199901 1 100 100 500
2 199901 5 500 150 450
3 199901 4 400 200 350
4 199901 5 500 0 0 Unmatched Member
If a fact row with an unmatched member is detected, it is written to the reject file
(assuming one has been specified). The log file indicates whether there are any
rejects and how many. You must, however, check the reject file to see the
rejected records.
Unmatched member(s) happen for many reasons. For example, the fact data may
be incorrect, or a dimensional member may be missing.
If most or all rows are rejected, it is likely because of problems with the definition
of the reference hierarchy or a source query.
Customer Dimension
Cust Name Address
Fact Data
1 Bob Canada
Cust Date Qty Amt Paid Cr L 2 Tom U.S.A.
1 199901 1 100 100 500 3 Mary England
2 199901 5 500 150 450 4 <null> <null>
3 199901 4 400 200 350
4 199901 5 500 0 0 Unmatched Member
You can allow DecisionStream to accept source fact data that does not relate to Instructional Tips
any dimension reference data. This is known as including unmatched members. You must specify whether you want to
DecisionStream treats unmatched members as data at the lowest level of the include unmatched members for each
reference data. dimension element. Open the Properties
dialog box and then click the Unmatched
Members tab.
You can also add unmatched members to the dimension reference data so that
the data is not unmatched in subsequent build executions. To allow the addition To accept source data that does not relate
of unmatched members to reference data, the dimension element must be to any dimension reference data, select the
associated with a lookup that uses a template for data access. This is because a Accept unmatched member identifiers
template automatically creates the correct INSERT and SELECT statements check box. To add the unmatched
required to include the unmatched members. members to the dimension reference data,
click the Save unmatched member details
via reference structure check box.
You do not require a dimension delivery for the unmatched members to be
written back to the reference structure. Note: If the conditions for adding
unmatched members are not met, a
In the slide example, the customer with an Id number of 4 has been added to the message appears informing you of
Customer dimension, but the Name and Address attributes have no useful values. this. This check box is then not
There are no values in the fact data that DecisionStream can place into those available.
attributes. When the unmatched member is added
back to the reference structure, the correct
It is preferable to resolve data issues in the source system instead of in the data surrogate key and any date attributes are
warehouse. assigned correctly.
Country
Region
Customers table
City
Customer
A dimension can be made up of many different types of data from many sources Questions
Ask the students how they would create
and are customizable to suit the needs of the user. this dimension.
The slide example shows an unusual dimension. It is unusual because the 1. Use the wizard to create the Country,
lowest level Order is usually part of the fact (transactional) data, not the Region, City, and Customer levels
dimension data. from one table.
2. Add the Order level manually.
The top level only contains one static member. The next four levels are all 3. Add the ALL level manually.
(or include it when using the wizard)
derived from data contained in one single table: Customers. The names of these
four levels come from the structural data. The last level is derived from the
Orders table. The name of this level comes from the transactional data.
Demo 11-1
Merge Data
Purpose:
To streamline the reporting requirements of our managers, we
want to remove any data that is duplicated in the tables. We
must eliminate any unnecessary information by merging all
duplicate data.
8. Under Deliver, clear Dimension and Metadata, and then click OK.
The build runs and delivers ten rows to the F_Merge table.
9. Press Enter to close the DOS window.
Task 3. View the data in the fact table.
1. Open SQLTerm.
2. In the Database for SQL Operations box, click TargetConnect.
3. Under Database Objects, expand TargetConnect.
4. Right-click F_Merge, and then click Add table select statement.
5. Execute the query.
Ten rows are read.
There are two pairs of records that have the same dimension data. We
want to merge the duplicates to eliminate duplicate data because it is
unnecessary to include it in reports.
6. Close SQLTerm.
Task 4. Change the merge behavior.
1. Ensure that Transformation Model is expanded, right-click UnitCost, Technical Information
and then click Properties. If the students look at SalesTotal, they will
see that it does not have a Merge tab
2. Click the Merge tab. associated with it. This is because the
merging is done in the transformation
3. In the Merge Behavior box, click AVG, and then click OK. model after the data is acquired.
This setting forces DecisionStream to average the values for the UnitCost
element.
4. Repeat steps 1 to 3 for UnitPrice to average the values for UnitPrice.
Task 5. Change the fact build properties.
1. Right-click DemoSales:1, and then click Properties.
2. Click the Input tab.
3. Under Duplicate Key Handling, click Merge records with duplicate
keys to select it, and then click OK.
Task 6. Execute the build and view the data in the fact
table.
1. Click DemoSales:1 to select it, and then on the toolbar, click the
Execute button.
A dialog box appears prompting us to save our changes to the catalog.
2. Click OK.
The build runs and delivers 8 rows to the F_Merge table.
3. Press Enter to close the DOS window.
4. Open SQLTerm.
5. In the Database for SQL Operations box, click TargetConnect.
6. Under Database Objects, expand TargetConnect.
7. Right-click F_Merge, and then click Add table select statement.
8. Execute the query.
Eight rows are read. The result appears as shown below:
Result:
To create clear, concise reports, we have merged rows with
duplicate dimension values that are unnecessary for our
reporting needs.
Demo 11-2
Purpose:
We want to further analyze the records that are rejected during
the execution of the DemoSales:1 fact build. We will create a
file that contains the rejected records, and then view the
contents of this file.
Results:
We defined a file for tracking rejected records. We also viewed
the contents of this file so that we can analyze the data and
determine why it was rejected.
Summary
8. Fact Builds
9. History Preservation
13. JobStreams
Objectives
A UDF can be used to define business rules. It can be used in output filters,
derivations, JobStream procedure and condition nodes and build JobStream
variables.
The UDFs for the open catalog are located in the Functions folder in the
library tree.
Create a UDF
2. Next, define any necessary arguments and assign the appropriate data types
to those arguments.
3. You can also define any required variables and their data types for use within
the function.
4. After you complete these tasks, define the syntax of the function and test it to
ensure that it returns the correct values and types.
When you add a UDF to the library, it becomes available to all builds and Instructional Tips
To export a component of a catalog (in this
JobStreams within that catalog. After you define the function in a catalog, you can case, a UDF), use the Create Package
reference it many times, and export or import it across catalogs. command under the File menu.
Use the General tab to define the name and description of the function, as well as
the return value type.
On the Interface tab, you can add up to 16 unique arguments for each function.
DecisionStream gives the name Argumentn (where n is a unique sequential
number) to each argument and allocates an argument type of CHAR by default.
When adding an argument, give the argument a more meaningful name.
To change the default data type of the argument, click the Argument Type box
and choose the type you prefer. To delete an argument, click the argument and
click Delete.
If a function has more than one argument, the order of the arguments indicates
the order in which you must enter the values when using the function. You can
change the order of the arguments by clicking the argument and then clicking
Move Up or Move Down.
Implementation Tab
The left pane of the Implementation tab contains a list of functions and other
calculation options. Choose from logical and mathematical operators, built-in and
user-defined functions, various control statements, as well as variables and
arguments. You have access to over 75 built-in functions, 20 operators, and
various control statements (If/Case/Do While).
Test a UDF
Set scope of
expression
Set data
type for Enter test
arguments values
Set default
values for
arguments
Click Calculate
to test
expression
It is good practice to test the syntax of a function and ensure that it returns
correct results.
By using the Test button (see the slide on the previous page), you can enter real
values and determine whether the function will operate correctly.
There are a number of options you can set when testing an expression:
In the slide example, by entering the values of 100 for Price and 50 for Cost and
then clicking the Calculate button, the correct value of 0.5 is returned, verifying
that the function returns the correct result.
On the Variables tab, you can create your own variables for use within a UDF.
All variables must start with an alphabetic character and must contain only Technical Information
When you deal with variables in
alphabetic characters, numeric characters, and underscores. Names of variables DecisionStream, a variable with a $ in front
are case-sensitive; therefore, you can use a combination of characters, including of it is a dynamic variable, whose value can
uppercase and lowercase, to create many different variable names. be changed.
For example, $X := $X + 10
You reference a variable in an expression by preceding its name with a dollar
sign ($). For example, the following expression: However, by adding braces around the
variable, the variable becomes static and
its value cannot be changed.
$X := $X + 10;
For example, {$X}
adds 10 to whatever value was previously in the variable X.
Build and JobStream variables are available anywhere within the object in which it
is declared.
You can also define variables within the environment of the operating system, or
as a command-line parameter. (We will discuss this topic later in Module 22, "The
Command Line Interface").
Demo 12-1
Purpose:
We want to create a UDF that calculates profit margin. The
UDF will be applicable throughout the GO_Catalog, and can be
used within different builds and JobStreams.
(a-b)/a
Results:
We have created a UDF that calculates profit margin. We can
use this UDF for any build in the catalog.
Demo 12-2
Purpose:
We want to use a variable to set the location in which reject
files are stored for a fact build.
Results:
We have used a variable to set the location of reject files
created by a fact build.
Technical Information
User-defined functions are implemented either internally or externally. In an
In Windows, dynamic link libraries (.DLL)
internal implementation, the calculations are coded directly inside the UDF. files are used. In UNIX, shared library files
These calculations may incorporate existing DecisionStream functions, other are used.
UDFs, or a combination of both.
External functions may have been created by the client or purchased from a third
party. You can use these functions for complex calculations or data cleansing. It is
preferable to re-use existing functions rather than re-write them.
To create an external UDF, follow the same basic steps for creating an internal
UDF. The only major difference is that you must create the external UDF
functions in a run-time library, and they must adhere to specific rules and
conventions.
Specify the library and function name that the UDF will use.
After you create and define the function in a library file, you must declare the Technical Information
Use an external UDF to cover any complex
function to DecisionStream. Declare any function you have defined by clicking calculations that DS cannot handle with the
the Implementation tab of the Function Properties window. built-in features. A benefit of this is that the
creator of the UDF can debug the file,
Click the External option button to select the type of UDF to use. In the without depending on the DecisionStream
Library Name box, enter the name of the library file that contains the administrator.
function you are registering. On Windows the library file is a dynamic link
library (.dll) file, and on UNIX a shared library file. You do not have to enter
the full directory path for the file because the standard rules for locating
dynamic libraries for your platform apply.
UDF: In a Derivation
You can also use a UDF in a derivation to better implement your organization's
business rules. To assist you when adding a calculation to a derivation,
DecisionStream provides built-in functions as well as the UDFs that you created
previously.
To use a UDF in the output filter for delivery, choose Output Filter from the
Filter tab of the delivery, then in the left pane, locate the UDF and add it to the
output filter.
UDF: In a JobStream
You can use a UDF in a JobStream to provide more control over the logic and
flow of the JobStream.
You can add a UDF to a Procedure node or Condition node. This topic will be
covered in detail in Module 13, "JobStreams."
Demo 12-3
Purpose:
Management wants to include data regarding the gross profit
margin of each product sold in the data mart. Therefore, we
must add a new derivation to the DemoSales build that will use
the Margin internal UDF. We then have to execute the build
and view the results.
The values for the GrossMargin derivation are in the last column of the
table and are calculated based on the value of UnitSalePrice and
UnitCost.
9. Close SQLTerm.
10. Save your work and keep DecisionStream open for the next Demo.
Results:
By creating a new derivation and by using an existing internal
function that we created earlier, we have used existing data to
develop results that can help identify the profit margins of
products that we sell.
Summary
8. Fact Builds
9. History Preservation
13. JobStreams
Objectives
Automation of:
• Data Extraction
• Data Transformation
• Data Loading
• Exception/Error Handling
• Logging/Notification
Managing a data warehouse requires the coordination of various tasks. After the
data marts are created, performing these tasks can be automated. These tasks are
performed in JobStreams and can be performed in sequence or in parallel.
• build status notification (for example, send an e-mail if the job fails)
What is a JobStream?
Characteristics of JobStreams
Catalog-specific JobStreams
contained in separate folder
A JobStream is similar to a build; it can be used only in the catalog in which it was
created. As such, it can use other components within the same catalog, such as
builds and user-defined functions (UDFs). However, a JobStream can call any
system command, including DecisionStream commands such as DATABUILD,
which may reference another catalog.
JobStreams are contained in a separate folder in the left pane of the Designer
interface. When you select a JobStream from this folder, a graphical
representation of the component appears in the Visualization pane.
The first step in implementing a job control process within DecisionStream is Instructional Tips
You can also add a JobStream by right-
adding a JobStream to your catalog, similar to adding any other component. Click clicking the JobStreams folder and clicking
the JobStreams folder, and then from the Insert menu, click JobStream. You then Insert JobStream from the shortcut menu.
enter basic information about the JobStream, such as its name and what logging
and audit information you want to track when it is implemented. You can modify
these properties at any time.
When you add a JobStream to the catalog, a Start node (a green triangle labeled
Start) appears in the JobStream Visualization pane. This node indicates where the
execution process begins.
A DecisionStream variable is a name and value pair that resides in the memory of Technical Information
the computer. Variables affect the operation of DecisionStream programs, store When you reference a variable in a node,
values for use in builds and procedures, and control the flow of JobStreams. you must precede it with a dollar sign (for
example, $COUNTER).
Within the properties of a JobStream, you can add variables. These variables can,
in turn, be referenced within procedure, condition, and SQL nodes.
Variables can be read and assigned values during the execution of JobStreams.
In the slide example, the COUNTER variable is referenced in the Counter Test
condition node as $COUNTER. If the COUNTER variable is less than 2, the
workflow loops back to the previous node in the JobStream.
When you create a JobStream, you add nodes to represent the execution of Instructional Tips
internal DecisionStream commands, user-defined commands, or operating It may be useful to temporarily exclude a
system commands and programs. node from processing while testing a
Each JobStream include any number of the following nodes. JobStream. You can omit a single node or
all the nodes that follow a specified point.
Node type Description
Build An existing fact or dimension build that you can add to a To do this, open the properties of the node
JobStream. You can include any of the fact or dimension builds and select Exclude this node from
in the current catalog. processing and (if necessary) Exclude
subsequent nodes in this thread.
SQL Contains SQL statement(s) to be implemented.
Procedure Contains one or more DecisionStream functions or variable
references. Operating system commands or other programs may
be called by these functions. Procedure nodes can make use of
the same scripting language that is available for UDFs, variables,
and derivations.
Condition Provides conditional branches between nodes. In other words, it
sets one or more conditions in place that will determine how the
remainder of the JobStream progresses.
JobStream A JobStream within a JobStream, which makes it possible to
break down larger jobs into a series of smaller ones.
Alert An alert node writes messages to the audit table that can be used
as the basis for alerts in Cognos NoticeCast. You must create an
agent in Cognos NoticeCast that will make use of these audit
table records.
Email An email node sends event notifications to mail systems via
SMTP. For example, you can set up emails to provide
notifications when a job has completed or failed. You can also
include attachments with emails.
Build Node
You can automate these tasks by adding build nodes to a JobStream. Each time
the JobStream is executed and a build node is reached, the build it references is
executed.
SQL Node
In the slide example, the Create Indexes SQL node contains four separate SQL
Instructional Tips
statements. When the JobStream reaches this node, it runs these statements to SQL Helper is used here in the same
create four separate indexes on the SalesFact table. manner as everywhere else. The interface
is covered in Module 2, "Create a Catalog."
Procedure Node
Procedure nodes are useful for coordinating processing around builds for such Technical Information
A procedure can include UDFs as well as
activities as checking for input files, sending mail messages and alerts, and built-in DecisionStream functions. These
generating custom logging and auditing messages. UDFs must exist in the same catalog.
In the slide example, the Execution Log procedure writes a message to the
AutomationJS log file after each successful execution of the Sales fact build.
The commands and control statements that are part of DecisionStream are not
intended to serve as a full-fledged programming language. If you have complex
algorithms to create, you may be better off issuing an operating system command
to call an external program or function that issues a return code.
Alert Node
You can use an alert node to send event notifications to Cognos NoticeCast. For
example, you can set up alerts to provide notifications when a JobStream has
completed successfully or failed. For this process to work, you must create an
agent in Cognos NoticeCast that will make use of the audit table entries.
Email Node
Email nodes are used to send event notifications to mail systems using Simple Technical Information
Email attachments are not supported on
Mail Transfer Protocol (SMTP).
UNIX.
You enter basic information about the email node, such as its name, and then
enter the email profile and password for the computer you are using, details of
the recipient, and the message itself. You can include attachments with an email.
By default, each type of node (except the condition node) sets the RESULT
Boolean variable to TRUE or FALSE, depending on whether the node
succeeded or failed.
You can specify a different variable to receive the node execution results. If you
specify another variable, you must add it on the Variables tab of the JobStream
Properties window, and you must declare it as a Boolean data type.
Result variables are often tested in condition nodes to control JobStream flow.
Action on Failure
In the Properties window for each node, you can specify how you want the
JobStream to respond if the node fails to complete successfully.
If you select Terminate, DecisionStream stops processing the current flow, starting
with the failed node. However, any remaining flows will still be processed.
If you select Abort, the JobStream stops processing immediately after the node
fails.
Condition Node
A condition node provides a branching mechanism between nodes for Technical Information
conditional execution. Each condition node can have many nodes linking to it A condition can include UDFs as well as
built-in DecisionStream functions. These
but only two output links (True and False). As with a procedure, you have access UDFs must exist in the same catalog.
to the full range of operators, functions, control statements and variables available
within DecisionStream.
In the slide example, the Counter Test condition node is used to check the value
of a variable called COUNTER. The initial value of $COUNTER is set in the
properties of the JobStream. If the value is less than 2, the DemoSales node is
processed. Otherwise, the condition is False, and the Create Indexes SQL node
executes.
The values of 0, F and f are considered equivalent to False. Any other value is
considered equal to True.
JobStream Node
A JobStream node lets you nest JobStreams, which supports breaking larger jobs Instructional Tips
into separate groups of tasks. When DecisionStream encounters a JobStream Any node within a JobStream can be
node, it processes all the steps within the node. It moves on to the next node in converted to a JobStream. Right-click the
the sequence only when these steps are completed. node and click Convert Into JobStream
from the shortcut menu.
This nesting process can proceed indefinitely. You can nest JobStreams within
If you CTRL+click multiple nodes, all of
JobStreams within JobStreams, theoretically to the point of infinity. It is best, them will be included in the JobStream
however, to keep this sort of nesting to a minimum to make the JobStream as node.
efficient and easy to understand as possible.
In the slide example, a separate JobStream control process is initiated when the
FactBuild JobStream node is reached in the first JobStream. When this happens,
all the nodes within this JobStream are completely processed.
Each JobStream node must be linked to at least one other node; otherwise
DecisionStream will not process it. DecisionStream starts processing at the Start
node and progresses through the JobStream following the links that you created.
When DecisionStream encounters a JobStream node, it runs all the nodes within
it before progressing to the next node.
Each node can have one or more nodes linked to it and can, in turn, link to one
or more nodes. The exception to this is a Condition node, which must link to
two nodes (True and False).
Each node can link to any node within the JobStream, whether it precedes or
follows it.
You can link nodes directly within the JobStream Visualization pane. On the
Predecessors and Successors tabs of the Properties window for each node, you
can specify how that node connects to any other nodes. The Predecessors tab
indicates which nodes precede the current one, whereas the Successors tab shows
which nodes follow it. For a node to be processed, all its predecessors must have
finished executing.
DecisionStream does not support links from a Condition node other than True
and False. If you link from a Condition node, DecisionStream allocates a status of
True to the node that you link to first and a status of False to the second node.
You can change the logic of the condition by right-clicking a link and clicking
Reverse Logic. This will switch the value of a True link to False and a False link
to True.
These nodes are run in parallel. These nodes are run sequentially.
Execute a JobStream
If the JobStream encounters a problem with a node, the JobStream may fail to
complete. After you resolve the problem, you can instruct DecisionStream to
restart executing a JobStream, starting with the node that failed. You indicate this
by clicking the Restart Last JobStream box to select it.
When you execute a JobStream, the command that the DecisionStream engine
implements is shown in the Command Line box. You can add additional options
to be included in the command line.
As with a fact or dimension build, you can execute a JobStream entirely from the
command line.
• Right-click the JobStream and click Execute from the popup menu.
Demo 13-1
Create a JobStream
Purpose:
Management wants to create a job control process that will
extract, transform, and load the raw data from the Great
Outdoors OLTP system into the data warehouse. To test this
process before fully automating it, we will create a JobStream
that will run the Sales fact build twice.
3. In the Action area, in the right pane, type the following code: Instructional Tips
You can select the MSG and
$MSG := Concat('Build execution # ', ($COUNTER+1)); COUNTER user-defined variables
from the tree structure in the left
LogMsg($MSG);
pane.
This node will write a message to the AutomationJS log file after each
successful execution of the Sales fact build.
4. Click OK to close the Procedure Node Properties window.
Task 6. Add a procedure node to increment the counter
variable.
1. Right-click the AutomationJS JobStream, point to Insert Node, and
then click Procedure Node.
The Procedure Node Properties window opens.
2. In the Business name box, type Increment Counter, and then click the
Action tab.
3. In the Action box, type $COUNTER := $COUNTER + 1.
By incrementing the COUNTER variable by 1 after each execution, this
node will track the number of times that the Sales fact build has
completed.
4. Click OK to close the Procedure Node Properties window.
Task 7. Add a condition node to the AutomationJS
JobStream.
1. Right-click the AutomationJS JobStream, point to Insert Node, and
then click Condition Node.
The Condition Node Properties window opens.
2. In the Business name box, type Counter Test, and then click the Action
tab.
3. In the right pane of the Action area, type $COUNTER < 2.
This node will test whether the DemoSales fact build has run less than
two times. If it has, this condition will be True. Once the DemoSales fact
build has run twice, this condition will be False.
4. Click OK to close the Condition Node Properties window.
Task 8. Add an SQL node to create four indexes on the
F_DemoSales fact table.
1. Right-click the AutomationJS JobStream, point to Insert Node, and
then click SQL Node.
The SQL Node Properties window opens.
2. In the Business name box, type Create Indexes, and then click the SQL
tab.
3. In the Database box, click TargetConnect.
5. Save your work and keep DecisionStream open for the upcoming demo.
Results:
We have created and added nodes to a JobStream that will
extract, transform and load the raw data from the Great
Outdoors OLTP system into the data warehouse.
Demo 13-2
Purpose:
To complete the AutomationJS JobStream, we must link all of
its nodes so that they are run in the correct sequence. After we
link the nodes, we will run the JobStream and view its log file.
Results:
We have linked the nodes in the AutomationJS JobStream. We
then ran this JobStream and viewed its log file to evaluate its
progress.
"The ultimate warehouse operation would run the regular load processes in a
lights-out manner, that is, completely unattended. While this is a difficult Technical Information
A JobStream can be used only within a
outcome to obtain, it is possible to get close." (Kimball et al, 1998). catalog. If you create it using the
DecisionStream language, you must export
Ultimately, the purpose of a JobStream is to automate the basic tasks of managing it to a catalog before you can execute it.
a data warehouse. If all goes well, the JobStream is able to extract the raw data After it is created, the JobStream can be
into the transformation process and then load it into the fact and dimension run as a batch file on a Windows or UNIX
tables of the target data mart. Ideally, this is scheduled to occur after business operating system using rundsjob.exe. This
will be, essentially, the backbone of the
hours so that the most up-to-date data is available on a daily basis. data warehouse automation process. See
Module 22, "The Command Line Interface,"
It is also important to consider how to manage the JobStream logs and other for a discussion of this topic.
persistent information regarding the success or failure of the job control process.
Who in the organization needs the information? How long will it be kept? What
reporting framework can be used to communicate this information? Answers to
these questions are necessary to continually refine and enhance the job control
process.
Summary
Workshop 13-1
• add dimension and fact build nodes to represent the execution of the
Product conformed dimension build and the JobStreamBuild fact build
• add an Abort procedure to exit the JobStream and log a message if the
Product conformed dimension failed to run
• add a Finish procedure to indicate when the fact data was delivered if the
Product dimension build ran successfully
For more detailed information outlined as tasks, see the Task Table on the next
page.
For the final result, see the Workshop Results section that follows the Task
Table.
8. Link the JobStream nodes JobStreams toolbar, Create a • If necessary, right-click the
into a coherent process. link between two nodes links to the condition
button node and click Reverse
Logic.
9. Run the JobStream. • Modify the logging to
include detail, user and
variable messages.
After the ConformedDimension JobStream finishes running (see Task 9), the
result appears as shown below.
8. Fact Builds
9. History Preservation
13. JobStreams
Objectives
Effective dates are required to build the dimension history. In the existing current
row, an effective end date is inserted. In the new row, an effective begin date is
inserted together with a null effective end date.
The dimension table contains a separate row for each change that has been
tracked. As a result, there may be multiple rows that contain the same business
key, but each row will have a unique surrogate key.
Build slide.
Four clicks to complete.
What Are Late Arriving Facts?
In the slide example, the Sales Rep dimension table contains two instances of
business key 00128, with a unique surrogate key for each instance.
The sales fact record dated February 12, 2002 is processed after April 1, 2002. By
default, if you do not enable the late arriving fact processing option, the value of
the surrogate key will be 11112, which is the current record. However, if you do
enable the late arriving fact processing option, the value of the surrogate key will
be 11111.
One case in which late arriving facts are likely to occur is credit card transactions.
These transactions (for example, the payment of a bill) may occur after the close
of a month. Each credit card transaction must be correctly associated with the
records in the related dimension tables (for example, those tables containing date
and customer data).
Before you can enable late arriving fact processing for a dimension element, the Instructional Tips
If you need to merge data or include a
conditions stated must be met. dimension delivery, you can use another
job to merge the data in the staging area or
to deliver the dimension.
• Enable late arriving facts for the appropriate dimension element on the
Late Arriving Facts tab.
• Specify what date range limits to impose on processing late arriving facts:
• "All available" caches the entire dimension so that all records are
processed.
Specifying the date range limits for late arriving facts is important if you want to
consider only a specific range of dates for checking late arriving dimension data.
For example, specifying a range ensures that a late arriving fact can only occur up
to three months ago. Any facts older than three months should not be considered
for late arriving fact processing.
Demo 14-1
Purpose:
Some of our incoming product sales data may have either a
null value for the order date or a value that falls outside of an
acceptable range. We will create a lookup in the Product
dimension that references existing values in the
D_ProductHistory dimension table. This lookup will determine
the permissible range of values for each order date.
We will then create a fact build with late arrival fact processing
enabled. In doing so, the dates for any late arriving product
sales or product sales with a null value for their order date will
be replaced with a more appropriate date determined by the
lookup.
We will then execute the fact build and view the results in
SQLTerm.
5. Click the Data Access tab, and then click Use Template for data
access to select it.
6. In the Connection box, click TargetConnect, and then click Browse.
The Select Table dialog box appears.
7. Click D_ProductHistory, and then click OK.
8. Click OK to close the Lookup Properties window.
Task 2. Create the NorthwindOrders fact build.
1. On the toolbar, click the Fact Build wizard button.
2. In the Enter the name of the build box, type NorthwindOrders.
3. In the Select the connection into which the build is to deliver data, click
TargetConnect.
4. Click Perform a full Refresh on the Target Data to select it.
5. Click Next, click Data source, and then click Add.
The Data Source wizard opens.
6. In the Select the Connection from which the Data Source is to read box,
click BIAS_Northwind, and then click Next.
7. In the right pane, type the following SQL code:
SELECT b.`OrderID`,
b.`CustomerID`,
b.`EmployeeID`,
b.`OrderDate`,
b.`RequiredDate`,
b.`ShippedDate`,
b.`ShipVia`,
a.`ProductID`,
a.`UnitPrice`,
a.`Quantity`,
a.`Discount`
FROM `Orders` b, `Order Details` a
WHERE a.`OrderID` = b.`OrderID`
8. Click Finish to close the Data Source wizard.
9. Click Next to accept the DataStream, and then click the ShipVia
measure.
10. Click the Change Type button, click To Attribute, and then click Next.
11. Click Next to accept the properties of the dimensions, click Next to
accept the default fact delivery, and then click Next to accept the default
table and column names.
12. Click Next to accept the summary of the fact build, clear the Deliver
Dimensions check box, and then click Next.
13. Clear the Deliver Metadata check box, click Next, and then click
Finish.
The NorthwindOrders fact build is added to the Builds folder.
Task 3. Modify the properties of the ProductID dimension
element to enable late arriving facts processing.
1. Expand the NorthwindOrders fact build, expand Transformation
Model, and then double-click ProductID.
The Dimension Properties window opens.
2. Click the Reference tab, in the Dimension box, click ProductD, and
then in the Structure box, click ProductHistoryL.
3. Click the Late Arriving Facts tab, and then click the Enable late
arriving fact processing check box to select it.
4. In the Transaction date element box, click OrderDate.
5. In the Transaction date value actions area, beside the When NULL area,
click Use current reference member to select it.
Selecting this option specifies that if an incoming value for the
OrderDate column is null, DecisionStream should use the current record
in the ProductHistoryL lookup.
6. Beside the When out of range section, click the Use closest reference
member button.
Selecting this option specifies that if an incoming value for the
OrderDate column is outside the effective date range specified in the
ProductHistoryL lookup, DecisionStream should use the closest
matching record in the lookup.
The result appears as shown below.
7. Click OK to close the Dimension Properties window, and then save the
catalog.
Task 4. Execute the NorthwindOrders fact build and view
the results in SQLTerm.
1. Right-click the NorthwindOrders fact build, and then click Execute.
The Execute Build dialog box appears.
2. Click the Override build settings check box to select it, and then click
Detail, SQL, and ExecutedSQL.
3. Ensure that the Progress check box is selected, and then click OK.
A command window opens and a log file is created, tracking the progress
of the fact build execution. Notice that 2155 rows are inserted into the
F_NorthwindOrders fact table.
4. When the build has finished executing, press Enter to close the
command window.
5. Open SQLTerm.
SQLTerm opens.
6. In the Database for SQL Operations box, click TargetConnect, and
then in the Database Objects pane, expand TargetConnect.
Results:
We created a lookup in the Product dimension that referenced
existing values in the D_ProductHistory dimension table. This
lookup determined the permissible range of values for each
order date. We also created a fact build with late arrival fact
processing enabled. We then executed the fact build and
viewed the results in SQLTerm.
Summary
15. Aggregation
16. Pivoting
17. Ragged Hierarchies
18. Packaging and Navigator
19. Resolving Data Quality Issues
20. Troubleshooting and Tuning
21. Delivery in Depth
22. The Command Line Interface
Series 7 Version 2 DecisionStream for Data Warehouse
Developers
SERIES 7 VERSION 2 DECISIONSTREAM FOR DATA WAREHOUSE DEVELOPERS
Objectives
10100101
Hierarchical data is loaded into memory
and is referenced multiple times in the
fact build process.
Product
DecisionStream uses the levels of the
hierarchy to merge the data, to validate
Data incoming data, to aggregate the data,
Source and to partition and filter the data.
• checks that the columns in the target table are the same as those being
delivered
• loads the hierarchy data into memory (using the hierarchy definitions)
For processing fact data, DecisionStream acquires and merges the data as
specified, aggregates it, filters and partitions the result, and then delivers the data
to the appropriate tables.
DecisionStream launches the fact build (by using DataBuild.exe) as a new process
that runs independently of the designer. All the relevant progress information is
written to a log file.
What is Aggregation?
ALL
33
Class
C1 C2 C3 Sum
13 16 4
P1 P2 P3 P4 P5 P6
5 8 9 7 3 1
Aggregation is the process of reading data across one level in a reference structure
and summarizing it. In DecisionStream you can perform aggregation on a
measure or a derivation element.
You can:
• exclude detail data from the output to provide compact summary data
collections
For example, you can include every conceivable combination of summary data of
in-depth business analysis, or just a high-level summary for management
reporting.
Note: You cannot perform aggregation for a lookup because it has only one
level.
On the left side of the slide example, we are only delivering aggregated fact data
to the data mart: total sales of all products and total sales of each product line. On
the right side, we are delivering input data arriving at the lowest level (sales for
each product). We are also delivering summarized sales data for all products, each
product line, and each product type.
DecisionStream has many available methods to calculate aggregates, such as Instructional Tips
average and standard deviation. To specify aggregation, select the measure or Aggregation exceptions are covered later in
derivation you want to aggregate, and then select the type of aggregation. this module.
Fiscal Value
Q1 <NULL> FIRST
Q2 10 FIRST NON-NULL
Q3 20 LAST NON-NULL
Q4 <NULL> LAST
The following table outlines other methods for aggregating fact data.
You want to use the first value that occurs (the first FIRST
value is always the correct one).
You want to use the first non-null value that occurs (the FIRST NON-NULL
first value is always correct, provided it is present).
You want to use the last non-null value that occurs. The LAST NON-NULL
last value represents the last update, but a null value is
never an improvement on a previous real value.
You want to use the last value that occurs (the last value LAST
is always best).
Inventory Levels
In the TIME dimension
Year
1
Quarter
First Q1 Q2 Q3 Q4
1 4 3 6
J F M A M J J A SO N D
1 3 2 4 1 5 3 2 8 6 7 3
Most measures can be aggregated identically across all dimensions, whereas other Key Information
measures cannot. This is an issue when summing (adding) measures across time. Measures that have an aggregation
exception are what Ralph Kimball refers to
Opening and closing balances (for example, inventory levels and bank balances) as "semi-additive measures."
should not be summed over time. However, it is reasonable to sum them across
other dimensions (for example, Product or Customer).
In these cases, specify the Time dimension as an exception for aggregation and
choose whether you want to use the first or last value of each time period.
Inventory levels can be aggregated over the levels of the Product dimension to
produce a total inventory of products at one point in time.
However, summing inventory levels across the Time dimension would not make
sense. Instead, as shown in the slide example, an aggregation exception has been
specified for this dimension. Instead of summing the number of products in
inventory over the entire year, we want to use the first value that we encounter at
each level above the lowest one. We also specified an exception indicating that
the last date we want to reference is Aug. 22, 2003.
Note: The exception dimension element must have its domain type set to
Reference, not Dynamic. This is important when dealing with future
values, such as Forecast.
Enable Aggregation
To enable aggregation, click the Aggregate box on the Reference tab of the Instructional Tips
If you input data at two different levels and
Dimension Properties window. Then select the level(s) of aggregation that you also output at those levels, you would not
require by clicking the box(es) adjacent to the level(s) in the Output column. want to aggregate the data.
The relevant hierarchy icon in the Visualization pane indicates that a dimension
element has been set to aggregate. In the slide example, we specified that we want Key Information
To access the aggregation option,
to aggregate the Product dimension element at the ProductType level. The right-click the dimension element,click
ProductH icon in the Visualization pane has arrows to indicate that the source Properties, and then click the Reference
data is being rolled up to the ProductType level, in addition to being retrieved at tab.
the lowest level, Products.
D D D D
Cust Date Qty Amt CrL Cust Date Qty Amt CrL
1 199901 1 100 200 1 199901 1 100 200
2 199901 5 500 200 2 199901 5 500 200
1 199902 1 100 300 1 199902 1 100 300
2 199902 5 500 300 2 199902 5 500 300
1 199903 1 100 400 1 199903 1 100 400
2 199903 5 500 400 2 199903 5 500 400
3 199903 4 400 400 3 199903 4 800 400
sum sum last 1 19991 3 300 400
2 19991 15 1500 400
The last three rows of data are rolled 3 19991 4 400 400
up numbers from the first quarter.
Aggregating input data in the fact build creates additional rows. This can increase
processing time.
In the slide example, the data for each customer is aggregated across the Quarter
level. All the data values for each customer are rolled up to create the summary
data. The resulting three aggregated rows are then added to the existing rows in
the DataStream.
Also, avoid creating an aggregate table until you know there is a process that
requires it.
Aggregating several dimensions over several levels can result in massive numbers
of aggregate tables. Every combination of every level in every dimension is a
potential aggregate table.
The slide example refers to a hypothetical fact build that references three
dimensions with five levels each. If seven such dimensions were used, 78,125
tables would be required to support every combination.
When deciding whether to create additional aggregate tables, keep in mind the
number of tables that may potentially be created. Create aggregates only when
there is a specific need for them.
Demo 15-1
Aggregate Data
Purpose:
We want to report on our sales data at higher levels of detail.
We will use aggregation in our fact build to deliver rolled up
data to our fact table.
Task 1. Set the input rows and view the current data.
1. In the GO_Catalog, expand the DemoSales:1 fact build, right-click
DataStream, and then click Properties.
The DataStream Properties window opens.
2. Click the Input tab.
3. In the Maximum input rows to process box, type 15, and then click OK.
4. Right-click DemoSales:1, and then click Properties.
The Fact Build Properties window opens.
5. Click the Input tab, in the Write any rejected records to box, type
reject.txt, and then click OK to close the Fact Build Properties window.
6. Click DemoSales:1 to select it, and then on the toolbar, click the
Execute button.
A dialog box appears, prompting us to save our changes.
7. Click OK. Instructional Tips
You may also want to set the merge
Thirteen rows are delivered. duplicate rows option, if it has not been set
8. Press Enter to close the DOS window when the build is finished already.
executing.
9. Run SQLTerm.
10. In the Database for SQL Operations box, click TargetConnect.
11. Under Database Objects, expand TargetConnect, right-click F_Merge,
and then click Add table select statement.
12. Execute the SQL.
The query returns 13 rows of data. Now we will add aggregation to the
build to create summary data for later analysis.
8. Press Enter to close the DOS window when the build is finished
executing.
9. Run SQLTerm.
10. In the Database for SQL Operations box, click TargetConnect.
11. In the Database Objects pane, expand TargetConnect, right-click
F_Merge, and then click Add table select statement.
15. Close SQLTerm, and then leave DecisionStream open for the next
demo.
Results:
We used aggregation in our fact build to deliver rolled up data
to our fact table.
Summary
15. Aggregation
16. Pivoting
17. Ragged Hierarchies
18. Packaging and Navigator
19. Resolving Data Quality Issues
20. Troubleshooting and Tuning
21. Delivery in Depth
22. The Command Line Interface
Series 7 Version 2 DecisionStream for Data Warehouse
Developers
SERIES 7 VERSION 2 DECISIONSTREAM FOR DATA WAREHOUSE DEVELOPERS
Objectives
What is Pivoting?
For example, consider a relational table that has a single column, which holds all
product values. Assume that this table has one column for each business
measure, such as Actual Sales and Forecast Sales. The product column could be
pivoted to produce one column for each product, and the business measures
could be pivoted to produce one row for each measure.
In the slide example, the original table contains three columns: Product, Eastern
Sales and Western Sales. The Sales columns have a double meaning. They identify
the sales Region (Eastern or Western) and the Sales value in that region. As a
result of pivoting, a new column (Region) is created, and all sales are combined in
a single measure column (Sales).
A typical pivot rotates two or more data source columns to two DataStream
items. One DataStream item maps to a transformation model element that
records the data source column from which the value originates. The second
DataStream item maps to a transformation model element that contains the
values from the data source columns.
You can create as many pivot values as you need for each DataStream item. You
typically create one pivot value for each column you are pivoting. For example, a
table has four columns that must be pivoted: Product A, Product B, Product C,
and Product D, where each of the columns holds sales data for each given
product. To pivot the table, create a DataStream item with four pivot values, each
corresponding to a product.
Because pivoting transforms the table structure from horizontal to vertical, the
number of columns decreases, whereas the number of row increases. Row limits
apply to the output of a DataStream, not to the number of source data rows
processed.
You can perform multiple pivoting. Multiple pivoting implies that you create
more than one DataStream item with pivot values. Each DataStream item has its
own set of pivot values.
For example, four data source columns called Eastern Sales, Eastern Forecast,
Western Sales, and Western Forecast might be pivoted. The following
DataStream items will be created with pivot values:
Pivot values
The slide illustrates the single-pivot technique. Use the single-pivot technique
when you know all values for the pivoted column in advance. In the slide
example, all twelve of the month columns from the source table are replaced with
literal values for the twelve months of the year (Jan, Feb, and so on). These values
are declared as pivot values in the Month DataStream item and are written to a
single column, Month, in the fact table.
Pivot Technique
Map the DataStream item and a pivot value to each pivoted data source column.
Each data source column selected for pivoting must be mapped twice:
• to the DataStream item that will contain the values from the data source
columns
Once the data source columns have been mapped, the DataStream items must be
mapped to transformation model elements in the usual way. In the slide
example, each month column from the data source has double mapping to a
corresponding month pivot value within the Months DataStream item and to the
Amount DataStream item containing information on the number of products for
this month.
The pivot values in the DataStream item are designed for two purposes. The first
purpose is to provide literal values to plug into subsequent fact table rows. The
second purpose is to indicate DecisionStream how many rows to insert into the
fact table for each row from the data source.
When pivoting, create a new DataStream item to store the pivot values. In the
slide example, a Month DataStream item is created containing twelve pivot values
to represent the twelve months of the year. Also, a DataStream item called
Amount is created for the quantitative values.
In order to map both the pivot value and the Amount DataStream item to the
appropriate columns, map the pivot value first, and then use Ctrl+click to map
the DataStream item representing quantitative values.
Note: The slide example is inappropriate for checking data integrity against the
Month dimension. The keys in the Month dimension represent actual
dates (199901, 199902, and so on) instead of literal values (Jan, Feb, and
so on). Data integrity checking requires a more advanced technique.
Demo 16-1
Purpose:
We want to create a fact build that performs basic pivoting. We
will create a new fact build and deliver a pivot table called
PivotData to the data mart.
2. Close SQLTerm.
Now we need to create a simple fact build to deliver a pivot table.
3. From the Tools menu, click Fact Build Wizard.
The Fact Build Wizard window opens.
4. In the Enter the name of the build box, type PivotData.
5. Make sure that Cognos BI Mart (Star) is selected in the Select the type
of fact build to create box.
6. In the Select the connection into which the build is to deliver data box,
click TargetConnect.
7. Click Perform a full Refresh on the Target Data to select it, and then
click Next.
8. Click Data source, and then click Add.
The Data Source Wizard window opens.
9. In the Enter the Data Source name box, type PivotData.
10. In the Select the Connection from which the Data Source is to read box,
click DS_Sources, and then click Next.
11. In the left pane, click the Rolling12Months check box to select the
entire table.
A SELECT statement appears in the right pane of the Data Source
Wizard window displaying the columns that will be included in the build.
5. Click OK.
Task 6. Add the Month and Amount elements and map the
DataStream items to the transformation model.
1. Under PivotData, right-click the Transformation Model, and then click
Mapping.
2. In the left pane, double-click Month to create and map a Month
element.
3. Double-click Amount to create and map an Amount element.
The mapping appears as follows.
Notice that both elements have been created as attributes. You can
change the element type later if you require.
4. Click OK.
The Add New Elements dialog box appears.
5. Click OK.
6. Right-click the DataStream for the PivotData fact build, and then click Instructional Tips
Properties. Steps 6 to 9 may not be necessary.
However, if you execute the build and a
The DataStream Properties window opens. data type error occurs, ensure that the
7. In the DataStream Items column, click Year, and then click Edit. Year DataStream item is of data type
CHAR with a precision of 4.
The DataStream Item Properties window opens.
8. In the Type box, click CHAR, and then in the Precision box, type 4.
9. Click OK to close the DataStream Item Properties window, and then
click OK to close the DataStream Properties window.
10. Save the catalog, click PivotData, and then execute the build.
The log file indicates that 96 records have been inserted into the
F_PivotData fact table.
11. Press Enter to close the DOS window.
12. Open SQLTerm and examine the data in the F_PivotData table in the
TargetConnect database.
The results appear as shown below.
13. Close SQLTerm and keep DecisionStream open for the upcoming
demo.
Results:
We performed basic pivoting by creating a new fact build and
delivering a pivot table called PivotData to the data mart.
There are instances where pivoting data on one axis will not
produce a sufficient result in the fact table.
For example, month and year may be separate and you must
concatenate them.
You may have more than one measure that you want to use
in a pivot.
You have one data source that is already pivoted and another
that is not, but you need to use both to produce a result in
your fact table.
Your source data may dictate pivoting on more than one axis to obtain the
desired results in your fact table. In this case, you need to perform multiple pivots
of your data.
For example, your source data may contain multiple measures that are required in
the pivot; or source data that needs to be concatenated to correctly express your
results in a format that makes sense.
You will need to use a multiple pivot technique to obtain results that will best suit
your requirements.
• to the DataStream item that will contain the values from the data source
columns
In the slide example, the sales array is mapped to the new DataStream item
(Amount).
The calculated month array is mapped to the Months DataStream item, which is
then mapped in the transformation model to the first dimension element for data
integrity checking.
Also, both the sales array and the month array are mapped to the dummy
element (RowNumber). This mapping creates twelve output rows for each input
row.
Multi-Pivot Technique
Use the advanced pivoting technique when the required pivot values are derived
from an expression or calculation, rather than literal values.
If data integrity checking is required while you are pivoting, you may need two
transformation elements. This is especially true when you are pivoting date arrays.
The first element must be a dimension element. It is used to check referential Instructional Tips
The dimension element that is set for
integrity, if necessary. This element is output to the fact table. The second element output can be used for data integrity
is a dummy element that is mapped to the DataStream item containing the pivot checking, if it references an existing
values. This element is not output to the fact table. dimension.
In the slide example, the new query includes two arrays. The first array contains
the sales values that come from the source table. The second array contains
derivations used to calculate dates for each month.
Demo 16-2
Multi-Pivot Technique
Purpose:
We want to pivot the month columns in a fact table so that it
includes both the year and the month. Therefore, we must use a
multi-pivot technique to calculate the month values. We will not
create a new fact build from the beginning. Instead, we will
duplicate the PivotData fact build and modify it.
4. Run the query to check the syntax, and then click OK.
5. Click Prepare, and then click Refresh to prepare the columns for use in
the fact build.
6. Click OK to close the Data Source Properties window.
4. Close SQLTerm.
Results:
We duplicated a fact build and used a multi-pivot technique to
calculate month values.
Summary
15. Aggregation
16. Pivoting
17. Ragged Hierarchies
18. Packaging and Navigator
19. Resolving Data Quality Issues
20. Troubleshooting and Tuning
21. Delivery in Depth
22. The Command Line Interface
Series 7 Version 2 DecisionStream for Data Warehouse
Developers
SERIES 7 VERSION 2 DECISIONSTREAM FOR DATA WAREHOUSE DEVELOPERS
Objectives
Note: PowerPlay requires all the leaf nodes to be at the same level to
aggregate properly. This topic is covered in greater detail in the
following pages.
Parent-Child Relationships
Reports To
Employees
Orders
A variant of this schema uses one table for all dimensions and has an additional
column to identify the dimension to which each row relates.
A parent-child schema is a structure data table where each row contains the Id of
the member and the identifier of its parent. Where each structure data row
identifies the hierarchy level of that row, you can use this information to select for
each level DataStream only the structure data that relates to that level. Therefore,
such hierarchies acquire structure data through the level DataStream. If the levels
are not identified, use an auto-level hierarchy to determine the number of levels.
Ragged Hierarchies
In a fact build the dimension elements that reference a hierarchy will look for a
match only at the lowest level of the dimension. In the slide example, this
means that only the lowest three employees (Suyama, King, and Dodsworth)
will be referenced, but all the other employees will be ignored (Davolio,
Leverling, Peacock, and Callahan).
The most difficult aspect of the problem is that you often do not know how
many values the dimension takes on until you see the data itself. Therefore, the
REJECT and REJECT/WARN options are not available for this feature. If you
set this feature to ERROR, DecisionStream issues an error message and halts
processing if the hierarchy is ragged.
The issue with a ragged hierarchy is not how to manage the dimension, but how
to correctly display to users the results in an understandable manner.
DecisionStream can handle ragged hierarchies by using auto-level hierarchies.
The process for resolving data reporting problems such as ragged hierarchies
usually has many steps. The first step is to use an auto-level hierarchy, which is
defined solely in terms of parent-child relationships to create the required defined
levels for a new hierarchy.
The auto-level hierarchy is used to create a table that will be referenced in the
creation of a new hierarchy. This new hierarchy will have the defined level
structure necessary to continue resolving the ragged hierarchy problem.
After the levels are known, a dimension build is used to create a table that
will contain each row and a column to identify the level of the row. This step
is necessary only if the original parent-child relationship table does not
identify the levels.
Demo 17-1
Purpose:
We have a problem with recursive relationships in our
Employee table. We must create an auto-level hierarchy to
determine the number of levels that exist in this table. This is
the first step in solving this problem.
3. Click OK.
4. Expand the EmployeesAH hierarchy, right-click DataStream, and then
click Insert Data Source.
5. Click the SQL tab, click BIAS_NorthWind in the Database for SQL
Operations box, and then click SQL Helper.
6. In the SQL Query pane, type the following: Instructional Tips
This may be a good opportunity to
SELECT `EmployeeID`, demonstrate to the students how to use the
`FirstName`, drag-and-drop feature for adding columns
`LastName`, to the SQL code.
`ReportsTo`
FROM `Employees` Under Database Objects, under the
Employees table, left-click+drag a column
7. Execute the query. to the SQL Query pane.
The table has nine records. You will likely not see the items in the far
8. Click OK to close SQL Helper. right DataStream Items column until you
click the Auto Map button. In effect, clicking
9. Click Prepare to select it, and then click Refresh. this button both creates the DataStream
items and maps them to the appropriate
10. Click the Derivations tab, and then click Add. data source columns in the left pane.
The Derivation Properties window opens.
11. In the Name box, type EmpName, and then click the Calculation tab.
12. In the right pane, type Concat(FirstName, ' ', LastName)
13. Click OK to close the Derivation Properties window, and then click OK
to close the Data Source Properties window.
3. Click OK.
Task 5. Examine the results in Reference Explorer.
1. Save the catalog.
2. Right-click the EmployeesAH hierarchy, and then click Explore.
The Reference Explorer dialog box appears.
3. Click OK.
The Reference Explorer window opens.
4. Expand Andrew Fuller to view the data.
We can see that Mr. Fuller has five people reporting to him.
Results:
By creating an auto-level hierarchy, we have determined the
number of levels that will have to be created to resolve our
ragged hierarchy problem.
Demo 17-2
Purpose:
Using the Dimension Build Wizard, create a build by using the
EmployeesAH hierarchy. We will create a second hierarchy by
using this new dimension build to help solve our ragged
hierarchy problem.
5. Click Execute.
There are nine employees in the table.
Results:
We will use the D_EmployeesAH table to create our new
hierarchy in the next demo.
After you create the auto-level to determine the number of levels and the Key Information
Remind the students that we are using
dimension table by using the auto-level hierarchy as a reference, you have to these techniques because PowerPlay
create a new hierarchy by using the dimension table. requires fixed levels to create cubes
suitable for reporting.
At each level, you must define the SQL to extract the data correctly. The slide
example has three levels of employees. The SQL will extract the lowest level of We must have the ability to track data
data to populate the hierarchy. A similar SQL statement is also used at the first about each category (in this case, each
and second levels. employee) at the lowest level, so that
aggregation can be performed. In a ragged
hierarchy, not all of the members go down
Note: If metadata is delivered to Transformer, it is important to go to the to the same level. When using this
transformer model and suppress blanks in the level properties of the hierarchy as a data source, Transformer
dimension. cannot establish a convergence level.
Demo 17-3
Purpose:
We will build a new hierarchy called RaggedEmpH using the
data in the table created by the AutoEmployees dimension
build (D_EmployeesH). Later, we will deliver the data from this
new hierarchy to our data mart in incremental steps.
5. Click OK.
6. Expand Level1, right-click DataStream, and then click Insert Data
Source.
7. Click the SQL tab, and then in the Database box, click TargetConnect.
8. Click SQL Helper, and then in the SQL Query pane type the following:
SELECT `EmployeeID`,
`Name`
FROM `D_EmployeesAH`
WHERE `level_name` = 'Level1'
9. Click OK to close SQL Helper.
10. Click Prepare, and then click Refresh.
11. Click OK to add the new data source.
Task 3. Create the Level2 level and add the Data Source.
1. Right-click the RaggedEmpH hierarchy, and then click Insert Level.
2. Click the Attributes tab, and in the Template box, click D_EmployeesAH.
3. In the Available attributes box, double-click EmployeeID, Name, and
ReportsTo to add them to the Chosen attributes list.
4. In the Chosen attributes list, click the Id box for EmployeeID, the
Caption box for Name, the Parent box for ReportsTo to select them.
The Level Properties window for Level2 appears as shown below.
5. Click OK.
6. Expand Level2, right-click DataStream, and then click Insert Data
Source.
7. Click the SQL tab, and then in the Database box, click TargetConnect.
8. Click SQL Helper, and then in the SQL Query pane type the following:
SELECT `EmployeeID` AS Level2ID,
`Name` AS Level2Name,
`ReportsTo` AS ParentID
FROM `D_EmployeesAH`
WHERE `level_name` = 'Level2'
9. Click OK to close SQL Helper.
10. Click Prepare, and then click Refresh.
11. Click OK to add the new data source.
Task 4. Create the Level3 level and add the data source.
1. Right-click the RaggedEmpH hierarchy, and then click Insert Level.
2. Click the Attributes tab, and in the Template box, click D_EmployeesAH.
3. In the Available attributes box, double-click EmployeeID, Name, and
ReportsTo to add them to the Chosen attributes list.
4. In the Chosen attributes box, click the Id box for EmployeeID, the
Caption box for Name, the Parent box for ReportsTo to select them.
The Level Properties window for Level3 appears as shown below.
5. Click OK.
6. Expand Level3, right-click DataStream, and then click Insert Data
Source.
7. Click the SQL tab, and then in the Database box, click TargetConnect.
8. Click SQL Helper, and then in the SQL Query pane type the following:
SELECT `EmployeeID` AS Level3ID,
`Name` AS Level3Name,
`ReportsTo` AS ParentID
FROM `D_EmployeesAH`
WHERE `level_name` = 'Level3'
9. Click OK to close SQL Helper.
10. Click Prepare, and then click Refresh.
11. Click OK to add the new data source.
Task 5. For each level, map the data source columns and
create DataStream items.
1. Under Level1, right-click DataStream, and then click Properties.
The DataStream Properties window appears.
2. Click Auto Map.
The mapping appears as shown below.
3. Click OK.
4. Under Level2, right-click DataStream, and then click Properties.
The DataStream Properties window appears.
5. Click Auto Map.
The mapping appears as shown below.
6. Click OK.
7. Under Level3, right-click DataStream, and then click Properties.
8. Click Auto Map.
The mapping appears as shown below.
9. Click OK.
Task 6. For each level, map the DataStream items to the
level attributes.
1. Right-click Level1, and then click Mapping.
The DataStream Mapping window appears.
2. Map the EmployeeID level attribute to the EmployeeID DataStream
item, and Name to Name.
The mapping appears as shown below.
3. Click OK.
6. Click OK.
7. Right-click Level3, and then click Mapping.
8. Map EmployeeID to Level3ID, Name to Level3Name, and
ReportsTo to ParentID.
The mapping appears as shown below.
9. Click OK.
Results:
We built a new hierarchy called RaggedEmpH using the data in
the table created by the AutoEmployees dimension build
(D_EmployeesH). In the next demo, we will deliver the data
from this new hierarchy to our data mart in incremental steps.
Demo 17-4
Purpose:
In the previous demo, we built a new hierarchy that we can use
to resolve our ragged hierarchy problem. We will create a
dimension build that will deliver the data from this hierarchy to
the data mart in three separate passes. Then the instructor will
show the results in a PowerPlay report.
10. In the LowestLevelId row, click in the Value column, and then change
False to True.
The attributes appear as shown below.
8. Click OK.
Task 4. Create another table to populate the third level.
1. Right-click the RaggedEmp dimension build, and then click Insert
Table.
2. In the Table name box, type RaggedEmp, click the Columns tab, and
then in the Use template box, click RaggedEmpT.
3. Under Available Sources, expand RaggedEmpH and then expand
Level1, Level2, and Level3.
4. From Level1, map EmployeeID [Id] to Level1Id and map Name
[Caption] to Level1Name.
5. From Level2, map EmployeeID [Id] to Level2Id and map Name
[Caption] to Level2Name.
6. From Level3, map EmployeeID [Id] to LowestLevelId and map
Name [Caption] to LowestLevelName.
The mapping appears as shown below.
3. Right-click the new middle RaggedEmp table, and then click Move Up.
The table is now at the top.
By performing these steps, the Level3 data will populate the table first,
followed by the Level2 data, and then the Level1 data. This is important
for allocating enough memory for the data in each column.
Task 6. Remove fostering from the build.
1. Right-click the RaggedEmp dimension build, and then click
Properties.
2. Click the Dimension tab.
3. Click the Remove Unused Foster Parents check box to select it, and
then click OK.
4. Save your work.
Task 7. Execute the build and examine the RaggedEmp
table using SQL Term.
1. Execute the RaggedEmp dimension build.
A DOS window opens indicating that nine inserts in total were made.
2. Press Enter to close the DOS window.
3. Run SQL Term.
4. In the Database for SQL Operations box, click TargetConnect.
5. In the Database Objects pane, expand TargetConnect.
6. Right-click the RaggedEmp table, and then click Add table select
statement.
7. Click Execute.
The table appears as shown below.
8. Close SQLTerm and leave DecisionStream open for the next demo.
ID Caption Parent
BUF Buffalo NY
Summary
15. Aggregation
16. Pivoting
17. Ragged Hierarchies
18. Packaging and Navigator
19. Resolving Data Quality Issues
20. Troubleshooting and Tuning
21. Delivery in Depth
22. The Command Line Interface
Series 7 Version 2 DecisionStream for Data Warehouse
Developers
SERIES 7 VERSION 2 DECISIONSTREAM FOR DATA WAREHOUSE DEVELOPERS
Objectives
What is Packaging?
Packaging Concepts
Save
Select
components to
import from
package
Instructional Tips
DecisionStream identifies a component as
When you select components to import, DecisionStream notifies you if there are being identical if it has the same name and
specification as the component in the target
dependent components. However, it is not compulsory to import dependent
catalog.
components if they already exist in the target catalog.
Demo 18-1
Purpose:
We want to create a package that contains the StaffH
hierarchy. We then want to import the package and view the
results.
.
Task 2. Save the package.
1. Click OK.
The Package File dialog box appears.
2. In the Save in box, navigate to C:\Edcognos\DS7001.
3. In the File name box, type Staff_Hierarchy, and then click Save.
Task 3. Import the Staff hierarchy.
1. From the File menu, click Import Package.
DecisionStream displays a message prompting you to back up the
catalog.
2. Click Yes and create a backup called Day1Backup in the folder
C:\Edcognos\DS7001.
The Package File dialog box appears.
Results:
We created a package containing the StaffH hierarchy. We
then imported this package and viewed the results.
A large catalog can become difficult to navigate. Using the navigator lets you
quickly search for components and check their dependencies. For example, you
may want to locate an obsolete dimension to delete it, but first you need to check
whether the dimension is used anywhere in the catalog.
Search criteria
Search results
When you perform a search, the navigator displays a list of all components that
match your search criteria. For each matching component, it shows:
To locate a component in the Tree pane, click the component name in the list.
Move backwards or
forwards between
previously selected
components
Click on
component to view
its dependencies
When you select a component in the Tree pane, the navigator lists its
dependencies. For each listed component it shows:
Demo 18-2
Purpose:
We want to search for the StaffD reference dimension and
explore its dependent components using the navigator.
Results:
We have searched for the StaffD reference dimension and
explored its dependent components using the navigator.
Summary
15. Aggregation
16. Pivoting
17. Ragged Hierarchies
18. Packaging and Navigator
19. Resolving Data Quality Issues
20. Troubleshooting and Tuning
21. Delivery in Depth
22. The Command Line Interface
Series 7 Version 2 DecisionStream for Data Warehouse
Developers
SERIES 7 VERSION 2 DECISIONSTREAM FOR DATA WAREHOUSE DEVELOPERS
Objectives
Kimball has defined a set of standards that are used to understand the
characteristics of quality data in data warehouse implementation.
For example, the gender field may contain M/F (uppercase) or m/f (lowercase).
In other cases, some data may be technically legal, such as a Sales Order for 1899,
but is not logical to the business in existence for only twenty years. Also, legacy
systems may have evolved where the meaning of the data fields have changed
over time. These issues must be addressed.
Also, elements from different sources that have the same implied meaning should
have the same key values and captions. For example, I.B.M and IBM should be
defined as the same company.
Note: This is not an exhaustive list of problems, but they are common examples.
You often encounter data problems when executing dimension and fact builds. Instructional Tips
It may be a good idea to reiterate to
Examining log and reject files can uncover many of these quality problems. students that these data characteristics do
not necessarily imply a bad data model.
Some of the more common problems are listed in the slide. These are examined
in greater detail on the pages that follow.
Rejections are not necessarily a sign of a bad model in DecisionStream. Instead, Technical Information
By default, the reject file is called
they could be an indicator of unexpected input data or incorrect specifications. {$DS_BUILD_NAME}.rej. This can cause
Separating the two requires analyzing the data because this is most often the problems if you duplicate a fact build,
cause of the data rejections. because the duplicated fact build will have
a colon in its name. The reject file name
By default, DecisionStream handles rejected transactional (fact) data from two cannot contain a colon, which is a reserved
perspectives. It fails either a reference data integrity check or a user logic check. character in Windows.
In the first scenario, the rejected data fails to match on some dimensional key (for
example, the state_code is not in the Location hierarchy) and the fact row, by
default, is written to a reject file, not to the fact table.
In the second scenario, the user may code some constraint into an output filter
on the fact table (for example, profit_margin > .30), and only those records that
pass that constraint are written to the table.
Understand Fostering
When a member at one hierarchy level does not have a parent, DecisionStream
assigns a foster parent by default. In the slide example, the Dictionary member of
the ProductH hierarchy does not point to any higher-level parent.
When creating a dimension from the original source systems, you may choose to
include orphaned members. These orphan members are assigned a foster parent
with a default name of ?<Level Name>. In the slide example, the foster parent is
called Unknown ProductLine.
Fostering can also result from delivering fact rows that do not
match existing dimension members.
You can choose to deliver both the fact data and the
unmatched members to the data mart.
You can also add unmatched members to the existing dimension data so that the
data is not unmatched in subsequent build executions. To save these members,
click the Save unmatched member details via reference structure box.
To save the unmatched members to the existing dimension data, the dimension
element must be using a reference structure (a hierarchy, auto-level hierarchy, or
lookup) that uses a template for data access. This template must contain an
attribute that has a behavior of business key and a primary key value of True. For
a non-auto-level hierarchy, this applies to the lowest level of the hierarchy at
which input data is mapped.
If the conditions for adding unmatched members are not met, a message appears
informing you of this. This check box is not available.
When you select the Accept unmatched member identifiers box, the unmatched
fact row is delivered to the fact table in the data mart.
If you do not include a delivery module for the dimension, you will have a fact
row that has no matching member in the dimension table. Avoid having an
unmatched member, because it makes it difficult to locate this problem in the
data mart.
Product
Diet Cola Cola Diet Orange Orange
25 35 15 20
Occasions arise when a member of a hierarchy has more than one parent.
Although this situation occurs in various businesses, it is very difficult to report
and analyze the fact table from this type of structure. Aggregation in particular
can be quite complex, as noted in the slide example.
The slide demonstrates the discrepancy between totals at the Product level and
Product Type level because of multiple parents for the diet products. When the
total of the Product Type level is 135 (40+60+35), the total on the Product level
is 95 (25+35+15+20), which matches the number on the All Products level. This
discrepancy causes the problem for reporting and analysis.
The solution to this and similar problems depends on the company's reporting
needs and requirements.
Although DecisionStream supports multiple parent hierarchies by default, you Instructional Tips
This issue usually occurs due to data
should avoid them. quality problems or a lack of understanding
about the source data.
Multiple parent hierarchies are difficult to model and support in the data
warehouse and they lead to confusing and possibly inaccurate query results.
Click the Ignore Multiple Parents box if you want to deliver only
the first parent of a dimension member.
Subsequent parents for that member are ignored.
If you cannot solve the problem of multiple parents at the source, you can force
DecisionStream to ignore multiple parents when the dimension build is run.
Click the Ignore Multiple Parents box on the Dimension tab of the Dimension
Build Properties window. DecisionStream will deliver only the first parent of the
dimension member that it encounters in the incoming data. All subsequent
parents for that member are ignored.
Demo 19-1
Purpose:
We want to override the default build settings to view
additional information about our build in the log files. This lets
us see some of the flexibility that we have in changing build
settings for troubleshooting purposes.
6. Click OK to close the Hierarchy Properties window, and then save your
work.
3. Click OK.
The DOS window opens and the build is executed.
4. Press Enter to close the DOS window.
We can see that 26 rows of data with non-unique IDs already exist.
3. Close the log window and Notepad.
4. Under the ProductD dimension, right-click the ProductH hierarchy, and
then click Properties.
5. In the Hierarchy Properties window, click the Features tab.
6. For Non-unique Ids change Reject/Warn to Accept.
7. Change the Limit for Non-unique Ids to 0.
8. Click OK.
9. Save the catalog and keep DecisonStream open for the next demo.
Results:
We modified the default build settings for the purpose of
troubleshooting. In doing so, we were able to inspect the
progress of the build and identify rejected data in the form of
Non-Unique Ids.
Rejected records
When handling rejected data, you can use SQLTerm to implement remedies. You
can use techniques such as:
In the slide example, a SELECT statement was run against a table containing data
rejected by a fact build. After careful examination, it was determined that it was
the state_cd column that was causing problems. The F_RejectData table contains
data about states that we did not want to include in the original fact table.
Using the LogMsg function in a derivation, you can insert messages into the build Instructional Tips
Writing messages to the DOS window will
log file. In the slide example, the LogMsg function is used in a derivation element degrade performance. Therefore, if you
called DebugProduct. When the fact build containing this derivation is run, a have many messages that you want to
message is written to the log file for each member of the Product dimension that include in your log file, avoid showing these
is processed. If something goes wrong during the execution of the build, we can messages in the DOS window during build
view the log file to see exactly which member of the Product dimension caused execution.
this problem.
You can also enable various audit functions to write additional information to the
DecisionStream catalog tables.
You must consider many factors to ensure the quality of data in the data mart.
You can cleanse data in an operational data store (ODS) or a data staging area.
You can also create the data mart directly from operational systems without
staging it first.
• Verify that the correct set of data is extracted from the sources.
• Establish sanity checks or guards on derived and aggregated data and run
audit checks for out-of-bound conditions.
• Check the mappings from the DataStream to make sure that the values
are correctly assigned to all hierarchy attributes and build elements.
There are cases where a fact build delivers fewer or more than
expected rows into the data mart. This can happen because:
data is being rejected due to bad dimension elements or
duplicates
merging of rows with duplicate dimension values is enabled
the build is suppressing output of detail data
the delivery has an output or level filter, is missing a filter, or
has an inappropriate filter
a dimension element property has enabled multiple output
levels and aggregation
In many cases, the best way to see the rows and columns that were passed to the
deliveries is to deliver to scratch tables or flat files, and include all build elements
(especially those referenced in filters that are not output in the normal deliveries).
Additional derivation columns could be added to the fact build that write Y/N
(0/1) flags based on the conditional expressions that are suspected of causing
problems.
Demo 19-2
Purpose:
We want to ensure that data about each product is processed
into the fact table. To accomplish this, we will add a derivation
to the DemoSales build to generate a separate log message for
each product number.
5. Click OK.
6. Under the DemoSales build, right-click the DataStream, and then click
Properties.
The DataStream Properties window opens.
7. Click the Input tab, and then in the Maximum input rows to process
box, type 3000.
8. Click OK to close the DataStream Properties window.
Results:
To ensure that data about each product was processed into
the fact table, we added a derivation to the DemoSales build to
generate a separate log message for each product number.
Build slide.
1 clicks to complete.
Create an Optional Lookup for Source Data
A value for matching records is defined in the lookup, and a value for
non-matching records is defined in the fact build. As shown in the slide, the
fact table has the found column that stores Y for matching records and N for
nonmatching records.
The optional lookup technique requires additional steps in both the design of the
lookup and the fact build. The following two pages describe these steps.
Build Slide.
4 clicks to complete.
Create an Optional Lookup (cont’d)
Select Use DataStream for data access and create
2
Add a flag column a SQL query.
1
to the template.
Technical Information
To find matched and unmatched data, we
In an optional lookup, you create a flag column with values for matching records must include unmatched members (see the
(for example, Y or 1). next slide). To include unmatched
members, you must use a template to
To design the optional lookup, follow these steps: access the source data, not a DataStream.
This is because a template automatically
1. When creating a template, add an extra attribute for the flag column. creates the correct INSERT and SELECT
When the lookup records are loaded into memory from the reference statements required to include the
unmatched members.
table, this flag will be set to Y for all members.
2. On the Data Access tab of Lookup Properties window, click the Use
DataStream for data access box to select it. Make sure to include a data
source.
3. In the Data Source for the lookup, create a literal that indicates a record
exists. In this example, the value is Y.
4. Map the literal and the data returned from the SQL SELECT query:
a. Map the literal value and the columns from the data source to items
in the DataStream.
Build Slide.
2 clicks to complete.
Add an Optional Lookup to a Fact Build
While a value for matching records is assigned in the optional lookup, a value for
nonmatching records is assigned in the fact build.
Using this technique, the fact build must have two elements: a dimension element
to reference the lookup and a derivation to calculate the reference literal value of
the lookup.
1. You want all records to be included in the fact table, even if they do not
match an existing lookup row. As a result, you must click the Accept
unmatched member identifiers box in the properties of the dimension
element.
2. Create a derivation to hold the flag. The derivation references the flag
column; in this example, found. All fact records that match an existing
lookup row will include the value Y.
3. The rows that failed the lookup check will have a null value in the flag
because no matching member row existed in the lookup table. Assign a
value of not found (in this case, N) to the Value if NULL option on the
flag column. This option is under Element Properties in the fact table
delivery module.
Demo 19-3
Purpose:
We want to include a column in a fact table that will convert
product revenues to the local currency. First, we will use the
Fact Build wizard to create a build called GO_Fact. We will
then create the CurrencyD dimension and CurrencyL lookup to
be referenced from the GO_Fact build.
9. Click Next four times, clear the Deliver Dimensions check box, and
then click Next.
10. Clear the Deliver Metadata check box, click Next, and then click
Finish.
The GO_Fact build is added to the tree.
11. In the Builds folder, expand GO_Fact, and then expand
Transformation Model.
The result appears as shown below.
4. Click OK, right-click the CurrencyL lookup, and then click Mapping.
The DataStream Mapping window opens.
5. Drag the attributes from the Level Attributes pane to the Maps To
column so that the mapping appears as shown below.
6. Click OK, and then explore the hierarchy in Reference Explorer, saving
changes if prompted.
7. Close Reference Explorer.
Task 6. Add a new dimension element called
CountryCurrency to the GO_Fact build.
1. Under the GO_Fact build, right-click Transformation Model, and then
click Insert Dimension.
The Dimension Properties window opens.
2. In the Name box, type CountryCurrency.
3. Click the Never Output check box to select it.
This step excludes the dimension element from delivery.
4. Click the Reference tab, and then in the Dimension box, click
CurrencyD.
5. In the Structure box, click CurrencyL (L).
Notice that no levels appear in the Level column.
We want all transaction records delivered to the table, even those records
that do not match the existing reference data.
6. Click the Unmatched Members tab, and then click the Accept
unmatched member identifiers check box to select it.
This option prevents the records from being rejected.
7. Click OK.
8. Click CountryCurrency, and then drag it to the top of the
Transformation Model tree.
The result appears as shown below.
4. Click the Execute SQL Query (limit to 1 return row) to ensure that
the code works.
5. Click OK, click the Prepare button to select it, and then click Refresh to
prepare the columns for use in the fact build.
6. Click the Derivations tab, and then click the Add button.
The Derivation Properties window opens.
7. In the Name box, type CountryCurrency, and then click the
Calculation tab.
8. In the right pane, type Concat(CountryCode, SubStr(DateOrder, 1, 6)).
9. Click OK to close the Derivation Properties window, and then click OK
to close the Data Source Properties window.
If we scroll to the bottom of the result set, we can see that some records
have the revenue values converted, and other records have a value of 0.
12. Close SQLTerm.
We need to change the number of rows to process back to 1000.
13. Under the GO_Fact build, right-click DataStream, and then click
Properties.
The DataStream Properties window opens.
14. Click the Input tab.
15. In the Maximum input rows to process box, type 1000, and then click
OK.
16. Save the catalog.
Results:
We have created the Currency dimension and Currency
Lookup to be referenced from the Sales fact build. We also
added a dimension element to reference the lookup and a
derivation to calculate revenue in the local currency.
Summary
15. Aggregation
16. Pivoting
17. Ragged Hierarchies
18. Packaging and Navigator
19. Resolving Data Quality Issues
20. Troubleshooting and Tuning
21. Delivery in Depth
22. The Command Line Interface
Series 7 Version 2 DecisionStream for Data Warehouse
Developers
SERIES 7 VERSION 2 DECISIONSTREAM FOR DATA WAREHOUSE DEVELOPERS
Objectives
The three primary programs in DecisionStream that are used to collect and
manage dimension and fact data are dimbuild, databuild, and rundsjob. These
programs execute specifications extracted from the DecisionStream catalog.
DecisionStream will parse and validate the specifications, and check runtime
license keys before beginning the source and delivering the data.
When these programs execute, they will always produce a log file. Each log file
has its own naming convention. The dimbuild.exe file produces a log file with a
name of DimBuild_<BuildName>_<sequence number>. log (for example,
DimBuild_Sales_0001.log).
It is a good practice to run a build in Check Only mode before you execute it
normally, so that you can inspect the log files for occurrences of unexpected
behavior.
The difference between executing a build in Normal mode and using Check Only
mode is that no data will be delivered in Check Only mode. However, the build
will still perform all internal processing. Check Only mode makes it possible for
you to track the progress of a build without populating or altering any of the
target tables.
Structure Data DecisionStream stores all the data for hierarchies and Instructional Tips
Emphasize to students at this point that
lookups used in a fact build and caches it at the start of DecisionStream gets its great throughput
the build process. Once built, this structure is static and capabilities because of how effectively it
is not subject to application paging, except under uses the computer's memory.
extreme circumstances.
Hash Function
DecisionStream will
resize the hash table if
the maximum number
of slots is less than the
default.
By default, the size of the initial hash table in DecisionStream is set to 200,000
slots. However, DecisionStream precalculates the maximum possible number of
hash table slots and uses the precalculated value when it is smaller than the
specified number of slots.
For any build, you can determine the minimum hash table size by executing the
build in Check Only mode and then examining the log for an entry of the
following format, where nnn is the table size in slots:
Specify the minimum hash table size from the log on the
Memory tab of the Build Properties window.
Where possible, determine the minimum size of the hash table by using the log Technical Information
To obtain the information, run the build in
file, and then specify this value on the Memory tab of the Build Properties Check Only mode and write the data to a
window. If you have to specify a lower value, use a value that will result in the log file.
minimum hash table size after resizing. You can calculate suitable values by
multiplying the minimum hash table size by (2/3)n where n is an integer. For Be careful not to make the hash table too
example, if the log indicates that the minimum hash table is 10,000 slots, then big. Allocating too much memory can take
(rounding up to the nearest integer): away memory that may be used by other
programs. Paging will occur, and eventually
the operating system will terminate the
10,000 x (2/3) = 6,670 program.
10,000 x (2/3)2 = 4,450
10,000 x (2/3)3 = 2,963
Performance tip: Resizing the hash table is the worst possible scenario and will
have a serious impact on performance. The best possible scenario is when the hit
rate in the hash table is closest to 1. The log file will indicate the average hit rate.
If the hit rate is more than 1, more than one attempt was made to find a slot in
the hash table. Hit rate affects performance. The fuller the hash table is, the
greater the likelihood that it will take more than one attempt to find an empty slot
in the table.
You can determine the maximum amount of memory that DecisionStream used
during the build by inspecting the log file. Search for the last entry of this format:
The value of y (the peak value) gives the maximum amount of memory (in MB)
that DecisionStream used. If DecisionStream reaches the limit of available
memory, it creates virtual memory by paging information to disk. You can
determine whether this happens by inspecting the log file. Search for entries of
this format:
The presence of one or more of these lines indicates that DecisionStream has
exhausted available real memory. The value of n3 indicates the number of
times it occurred.
To manage the memory used, you can limit memory during build execution
Technical Information
by using dimension breaks (discussed later in this module) or setting memory Notice that some builds require very little
limits. Setting a memory limit within DecisionStream prevents an operating memory; therefore, even if you specify a
system from terminating an application that requests more memory than the memory limit as small as 1MB, paging may
operating system can provide. It also ensures that DecisionStream uses its not occur.
own paging algorithm instead of relying on the operating system to provide
In short, this tab is where you set the
this function.
amount of memory that DecisionStream
can allocate to execute the fact build. If
DecisionStream exceeds this amount
during the execution process, then
DecisionStream will perform its own
paging, which will greatly reduce
performance.
Dimension Breaking
120
30 30
10 10 10 10 10 10
During data acquisition, DecisionStream stores partially built aggregates and all
data that may contribute to future aggregations. When DecisionStream detects a
change in a sorted input stream, all aggregates can be cleared from memory,
therefore releasing the memory occupied by that data.
For example, the hierarchy in the slide example contains year, quarter, and month
dimensions. When DecisionStream has processed all the records for March, it
clears memory for the March records, checks the hierarchy, and flushes all the
records for Q1. DecisionStream continues processing until it has processed all the
records until June. Then it clears memory of all the records for Q2.
Breaking can only function correctly if each data source is sorted by the same set Key Information
This method requires that the data is
of dimensions and in the same sequence. Sorting is often performed in the
already sorted and in order.
database (for example, indexing often orders the data). However, if the source
data is not in the required order, you can either add an ORDER BY clause to the
source SQL or select the Force Sort on Break Dimensions box in the Fact Build
Properties window.
Check that all data sources are sorted by dimension in the sequence in which they
appear in the Break On list. If not, click the Force Sort on Break Dimensions box
to select it.
Specify an
absolute Specify the value as a
value. number of breaks.
When you perform dimension breaking, you can choose whether to break on a Technical Information
The algorithm for the hash table starts to
fixed number of dimension changes or when the hash table reaches a specific affect the breaking/flushing process if you
percentage of usage: set it to approximately 60 percent. What
does this mean?
• Click the Breaks option button to specify a break every n dimension
changes, and type the required number in the adjacent box. You can compare this to looking for a seat
in a movie theater. If the theater is less
than 60 percent full, it is relatively easy to
• Click the Percentage option button to specify that the break is to be find an empty seat. However, as the
based on the percentage of the hash table that is used. number of empty seats decreases, it
becomes progressively harder to find a
Type a number in the Perform Break Processing Every box to indicate the table place to sit.
usage or the required percentage (as a number between 1 and 100).
Sixty percent is an arbitrary number, but we
Note: A good dimension candidate for breaking is one that is fairly evenly generally recommend it as a good
benchmark. Using a percentage instead of
balanced. The Time dimension is usually a good example. literal number of breaks provides more
flexibility, because the percentage is in
relation to the actual hash table size and
changes as the hash table changes.
When you execute a fact build, DecisionStream creates the required fact and
dimension tables if they are not in the target data mart. For most purposes, the
structure of the created tables is acceptable.
However, you can fine-tune the structure of these tables by saving the data
definition language (DDL) statements to a script file or by copying them to
SQLTerm or another suitable program. You can edit this script file, and then
execute the modified DDL statements.
Note: You can copy the statements to the Clipboard by selecting Copy from the
shortcut menu.
Dimension Caching
By default, when you execute a fact build, the dimensions are cached in memory Instructional Tips
Dimension caching is only a concern if you
first, and then the fact records are processed. For large dimensions, caching uses a have very large dimensions (for example,
lot of memory, which means that the dimension data may take some time to load. over a million members).
You can specify when caching is to be performed, either at the start of build
execution or as rows arrive (caching on demand).
As a general rule, you should cache at the start of a build if most of the dimension
members are required (this is the default). Specify cache on demand if the build
references only a small proportion of the members in a large dimension.
When deciding when to cache, you should consider the portion of the dimension
that is to be loaded during build execution. If the build references most of the
records, then you should cache the dimension.
For example, a car insurance company has many customers, but only 10% of
them are invoiced each month. If the company has 120,000 customers, an
average of 10,000 invoices are generated each month. Since this uses only 10% of
the dimension, caching the Customer dimension on demand may be faster.
Demo 20-1
Purpose:
To more effectively utilize memory, we want to perform
dimension breaking while executing the DemoSales fact build.
We will then execute the build in Check Only mode to evaluate
the results of breaking. Lastly, we will inspect the log file.
5. Click OK to close the Fact Build Properties window, and then save the
catalog.
The INTERNAL messages indicate when breaking occurred during the Instructional Tips
execution of the build. You can add more messages to the log file
by reducing the numbers in the Message
4. Close Notepad and the log window, and then leave DecisionStream frequency (Input) and Message frequency
open for the next module. (Output) boxes. This can be done on the
Input tab of the Fact Build Properties
Results: window or in the Execute Build dialog box.
To more effectively utilize memory, we performed dimension
breaking while executing the DemoSales fact build. We then
executed the build in Check Only mode to evaluate the results
of breaking. Lastly, we inspected the log file.
Summary
15. Aggregation
16. Pivoting
17. Ragged Hierarchies
18. Packaging and Navigator
19. Resolving Data Quality Issues
20. Troubleshooting and Tuning
21. Delivery in Depth
22. The Command Line Interface
Series 7 Version 2 DecisionStream for Data Warehouse
Developers
SERIES 7 VERSION 2 DECISIONSTREAM FOR DATA WAREHOUSE DEVELOPERS
Objectives
Dimension
Delivery
Data Mart
Fact
Delivery
Metadata
Delivery
There are various options for data delivery through a fact build.
When you must create an independently standing data mart, the deliveries for fact
data, dimension data, and metadata can be defined within the fact build.
However, when the data is delivered into a complex database, the fact build is
used only for a fact delivery.
Input Hierarchies
Reference
Data
Reference
Sources
Data Sources Product
Dimension Build
Hierarchies/Lookups
ProductLevel
Fact Builds
Transactional Sales
F_Sales
F_Sales
Data Sources ProductNumber
Quantity
UnitCost
Updates AverageRevenue
When the target database is a data warehouse, the best practice is to deliver and
maintain the reference data through dimension builds. Fact builds are responsible
for the fact data delivery and maintenance.
The slide shows that the Product dimension build delivers the Product dimension
based on the Product input hierarchy.
The Sales fact build, which consists of the ProductNumber dimension element
and a few fact columns, references the Product dimension through
ProductNumber.
In production, a dimension element does not have to reference all columns in the
corresponding dimension table. In most cases, the reference has to check data
integrity on a single level. Therefore, an additional, much simpler hierarchy or
lookup must be created. The new hierarchy or lookup references the dimension
table in the data warehouse. In the slide example, the ProductNumber dimension
element references the D_Product table through the new ProductLevel hierarchy,
which stores the lowest-level attributes of the Product dimension.
As the Product dimension build delivers dimension data, the Sales fact build
delivers fact data into the F_Sales table.
Fact delivery modules deliver the fact data that a fact build produces.
DecisionStream provides a number of the fact delivery modules that can be
separated into two groups:
Table delivery modules deliver data into database tables:
• The Bulk Copy (BCP) delivery module delivers data to Sybase or
Microsoft SQL Server databases using the BCP loader utility.
• The DB2 LOAD delivery module delivers data to IBM DB2 databases
using the DB2 bulk loader.
• The Informix LOAD delivery module delivers data to an Informix
database using the Informix DBACCESS command.
• The ORACLE SQL*Loader delivery module delivers data to an Oracle
database using the Oracle Bulk Load utility.
• The Red Brick Loader (TMU) delivery module delivers data to a Red
Brick database using the Red Brick Bulk Loader utility.
• The Relational Table delivery module is discussed later in this course.
• The Teradata delivery modules deliver data to the Teradata bulk loader
utilities.
• The Microsoft SQL Server BCP delivery module delivers data to
Microsoft SQL Server databases using the Microsoft SQL Server BCP
bulk load utility.
Elements • REPLACE
subscription • UPDATE/INSERT
for delivery • UPDATE
Commit Interval:
1
Key option ..
100 commit
…
200 commit
…
…
1000
The Relational Table delivery module formats the build data into relational tables.
These tables can be accessed by client-server analysis applications, or can form
the basis of standard reporting systems.
On the Table Properties tab of the Table Delivery Properties dialog box for the
fact table delivery:
• create keys for columns and change the column names (optional)
On the Module Properties tab of the Table Delivery Properties dialog box:
• select the method by which rows should be applied to the output table:
APPEND, TRUNCATE, REPLACE, UPDATE/INSERT, and
UPDATE (mandatory)
Output Filename
full directory path and file
name or just file name.
Line/Field Delimiter
NL(new line) TAB, COMMA,
SPACE, NONE(no delimiter),
or a character string.
The Text File delivery module is used to format build data into simple text files.
These files then can be imported into spreadsheets for cross-tab (pivot table)
analysis or distributed to other systems.
On the Module Properties tab of the Table Delivery Properties dialog box for the
fact delivery:
On the Element Properties tab of the Table Delivery Properties dialog box for
the fact delivery:
As previously stated, you should not deliver dimensional data in a fact build.
However, there are cases when you can define dimension delivery in a fact build.
It may also be convenient to deliver portions of a large data warehouse to local Key Information
Because Cognos recommends maintaining
servers; for example, a sales data mart for the Northeast Region of your dimension data in dimension builds, and
company. This type of delivery can also be handled through a single fact build. not in fact builds, this module does not
contain a demo on dimension delivery as a
Sometimes the only source of dimensional data is within the fact data. For part of fact build. However, the following
example, if a distributor sells your products to new customers, the sales pages describe how to add a dimension
transaction file may be your only source of customer data. delivery to a fact build.
Template
Attributes
You can set as many relational table deliveries as you need for each dimension.
You can then add one or more tables for each dimension delivery.
• Provide the table name (mandatory). You can deliver data into an existing
dimension table or create a dimension table for the data mart.
• provide a name (on the General tab of the Metadata Delivery Properties
dialog box)
• connection and virtual cube name for the SQL Server Analysis
Services delivery
To deliver BI metadata, you must Select a fact table and columns from
the table that you want to deliver
declare dimension delivery modules
for each dimension, but you do not
need to deliver the dimension data.
Disable dimension deliveries if you do not
want dimension data to be delivered through
the fact build
Dimension deliveries
The next step is to select a fact table and the table columns that you want to
include in the metadata delivery. Information about fact tables and the table
columns subscribed for the fact delivery is on the Fact tab of the Metadata
Delivery Properties dialog box. You can select the whole table or individual
columns for the delivery.
With the dimensional deliveries declared, on the Dimension tab of the Metadata
Delivery Properties dialog box, specify the source of delivery and a dimension
delivery module for each dimension.
Demo 21-1
Purpose:
We want to add a relational table delivery to the DemoSales
fact build. We will ignore dimension delivery in this fact build
because we have already created the dimension tables that we
require by using dimension builds. We will add a new delivery
called F_DemoFact, execute the DemoSales build, and then
view the results in SQLTerm.
6. Click the Module Properties tab, and then ensure that APPEND is
selected in the Refresh Type box.
7. Click OK to close the Table Properties window.
The F_DemoFact table contains the 1000 rows that were inserted during
the build execution process.
9. Close SQLTerm and keep DecisionStream open for the next demo.
Results:
We added a relational table delivery to the DemoSales fact
build called F_DemoFact. We then executed the DemoSales
build and viewed the results in SQLTerm.
Partitioned Delivery
Vertical Partitioning
Horizontal Partitioning
You can choose from three methods to partition your data to meet your delivery
requirements:
• Data rows within the build elements subscribed for delivery. This
method involves both vertical and horizontal partitioning.
Vertical Partitioning
Columns included
in Fact Delivery
Columns excluded
from Fact Delivery
For each delivery, you can choose the transformation model elements to which
each delivery subscribes. This vertical partitioning is performed by subscribing to
the elements that you want to deliver. For example, you can partition the data as:
By subscribing to certain elements in these fact deliveries, you deliver sales data
and revenue data to two different fact tables within the same data mart.
Demo 21-2
Purpose:
We want to create a new fact delivery in the DemoSales build
called F_CustomOrders. We will vertically partition the
incoming fact data by subscribing only to the transformation
model elements that we want to see in this table. We will then
execute the DemoSales build and view the results in SQLTerm.
6. Click OK.
7. Right-click F_DemoSales, and then ensure that Enabled is not selected.
This will disable the delivery of the fact table.
8. Right-click F_DemoFact, click Enabled to disable the delivery of the
fact table, and then save the catalog.
Results:
We created the F_CustomOrders fact table with partitioned
data delivery. We set vertical partitioning by subscribing only
to the elements that we want to see in the table. We then
executed the DemoSales build and viewed the results in
SQLTerm.
bitmap index
DecisionStream provides three of the most common index types: Instructional Tips
Stress to students the difference between
keys and indexes. Keys are the logical
• The unique B-tree index is used for primary key columns. It builds a tree elements, whereas indexes are the
structure of possible values with a list of row unique identifiers that have physical elements. Usually, indexes are
the leaf value. It involves moving up the tree and finding the rows that created on columns that already have keys.
contain a given value. The index can be created on one or many columns.
• The repeating B-tree index is used for foreign key columns. It is a default
index type for DecisionStream. It is built in a similar way to the unique B-
tree index.
Key Information
• The bitmap index is used with dimension tables and fact tables, where
When you work with a large data
the constraint on the table results in a low cardinality match with the warehouse, you probably would not use
table. This index represents a string of bits for each possible value of the DecisionStream to initially create indexes.
column. Each bit string has one bit for each row. Each bit is set to 1 if They are usually created through the index
the row has the value that the big string represents, and is set to 0 if the plan by the DBA.
row does not have that value.
The most common index for a fact table is the B-tree index. When we declare a
primary key constraint on a table, a unique index is built automatically on those
columns in the order in which they were declared. A fact table contains not only a
primary key but also foreign keys represented by dimension elements; therefore, it
is important to create repeating B-tree indexes on those columns.
• A composite index, which is a single index based on the fact table keys
represented by dimension elements.
Single-column Index
Create a repeating
B-tree index for a
single column in
the fact table
Select to ignore
errors while re-
creating the index
Create a unique
B-tree index on a
primary key column
in the dimension table
A fact table can have a composite index and single-column indexes. It is a good
practice to create a composite index on a group of the table key columns and a
single-column index on individual fact key columns that likely will be used as a
join condition, filter, or group. In the slide example, a repeating index is created
on the SalesStaff dimension element column, because it is a high-cardinality
column and often will be used in join conditions and filters.
All dimension tables must have a single-column primary key and therefore
one unique index on that key. Larger dimension tables have more than one
single-column index. However, small dimension tables seldom benefit from
additional indexing. In a dimension table that has the business and surrogate
key columns, the unique index is created on the surrogate key column, which
is a primary key column. The repeating index is created on the business key
column.
Indexes are important for faster data retrieval. However, they are memory
consuming and slow the system's load and maintenance process. Because indexes
are important for updates, they are not needed for inserts. By selecting the
Recreate Index box in the Index Properties, you can drop the index before an
insert and re-create it after. To ignore errors while DecisionStream is re-creating
the index, select the Suppress Errors box.
Indexes are required for a record search in a large amount of data. For example, if
you look for EmployeeCode 1234, the database does not have to scan through all
50,000 employees. It uses the index to find the record. Indexes are especially
useful for updates.
Keys ensure a column number uniqueness. For example, only one employee can
have EmployeeCode 1234.
Because indexes and keys are used for different purposes, they are not
interchangeable. However, it is recommended that you create indexes on key
columns to support both the logical and the physical data structure.
When updating a fact table, the new data locates old records.
By default, if no explicitly defined keys exist, DecisionStream looks
for dimension element keys.
DecisionStream modifies the SQL statement to include the key
columns in the WHERE clause of the UPDATE statement.
Update Sale
Key columns Set Quantity = 500 Updated record
Where Staff = 50
and Product = 1
and Date =‘19970314’
Sale
Staff
Product
Date
Quantity
If you do not want all dimension element keys to be included in the WHERE
clause, you must define them explicitly by creating a key for each dimension
element that you do want to include.
Create keys on
dimension element
columns if you want to
include them in the
WHERE clause
You can create keys on non-dimension columns if you want to use them in the
UPDATE statement. However, as soon as you explicitly define the keys,
DecisionStream overwrites the default key columns with the new ones. To
preserve dimension element column keys, you must define them explicitly.
To perform an update, you must change settings on the Module Properties tab in
the Table Delivery Properties dialog box by selecting UPDATE or
UPDATE/INSERT. You then create keys on columns that you want to include
in the Where clause if they are not created yet.
DecisionStream supports only a single composite index per fact table. It creates
the index on all key columns that are defined either by default or explicitly in the
fact table.
In the slide example, the Index Properties dialog box for the sales_comp
composite index does not specify the segments of the index. DecisionStream
creates the index on all key columns. In this example, the segments of the index
are SalesStaff, Vendor, Product, DateOrder, and OrderCode.
You can create a composite index on the Table Properties tab of the Table
Delivery Properties dialog box:
• Click the Index box to select it, and then click the Index Properties
button.
• In the Index Properties, provide a name for the index, following the
index naming convention.
• In the Index Properties dialog box, select a type for the index. Because
DecisionStream creates only one composite index per table, this index
will be of the UNIQUE type.
Even if DecisionStream offers the two options for indexes, multiple single-
column indexes and a single composite index, it is more appropriate to create a
single-column index on each fact table key and then let the optimizer combine
those indexes as appropriate to resolve the queries.
However, when users try to update a row and the first column of the index is part
of the UPDATE statement, the SQL Query optimizer uses the composite index
to speed up the search process.
Each delivery can include many level filters; however, each delivery cannot have
more than one output filter.
Level filters specify level combinations that the delivery module needs to accept.
The output filter accepts output processed by the level filters and specifies the
data that the delivery module needs to accept.
Filter guidelines:
D D
lev Cust Date Qty
M 1 199901 1
M 2 199901 5
M 1 199902 1
M 2 199902 5 Monthly
M 1 199903 1
M 2 199903 5
M 3 199903 4
Q 1 19991 3 Qtr
Q 2 19991 15
Q 3 19991 4
When delivering multiple levels,
always include a level filter in
each delivery
You can partition the data so that each delivery delivers a subset of the data rows.
Applying filters performs this delivery type, termed horizontal partitioning.
The slide example shows two fact deliveries, monthly sales and quarterly sales. A
level filter has been applied to each. If a filter is not applied to these deliveries,
DecisionStream delivers monthly and quarterly rows to each table of each
delivery.
DecisionStream knows the aggregate level of every row in the Data stream;
therefore, the level filters reject all rows except the monthly rows from the
monthly table, and the quarterly rows from the quarterly table.
D D
lev Cust Date Qty
M 1 199901 1
M 2 199901 5
M 1 199902 1 Monthly
M 2 199902 5
M 1 199903 1
M
M
Q
2
3
1
199903
199903
19991
5
4
3
X
Q 2 19991 15 Qtr
Q 3 19991 4
By default, DecisionStream offers all rows The "Exclusive" property on the Monthly
to all deliveries, although the level filter table will not offer rows to any subsequent
can reject them. delivery tables.
Even though level filters prevent unwanted rows from being delivered to the
aggregate tables, DecisionStream offers all rows to every delivery. Offering all
rows creates performance overhead and can add to processing time.
If rows delivered to one table are not meant to be written to other tables, set the
Exclusive option on that delivery. Setting this option saves internal processing
time because the rows that were written are flushed from memory and do not
have to be checked by each subsequent delivery.
period.year * location.state
A level combination on two dimensions.
period.year.1997,1998 * location.state
Based on your data mart requirements, you can determine what type of filters to
apply and to what level(s) they apply.
You can also specify members of a level, as shown in the slide example.
Output Filters
Expression:
uses same syntax as derivations
must return TRUE or FALSE
Expression involving:
Output • build elements
• members
When you apply an output filter to the delivery, only those data rows for which
the expression evaluates to TRUE are delivered.
Usually, you would use a delivery's output filter to horizontally partition data
other than by hierarchical levels.
For example, you can partition data by sales. Two deliveries, one having the
output filter units_sold > 0, and the other having the output filter units_sold<=0,
would partition the data according to whether the company had sold that product
or the fact row contained bad data.
Demo 21-3
Purpose:
The company wants to run updates to the F_DemoSales table
that was created by the DemoSales build. To facilitate this
process, we must define keys on the OrderCode and
dimension element columns. We must then create single-
column indexes on dimension element columns and a
composite index on all key columns in the F_DemoSales table.
For convenience, we will create a separate build to maintain
updates to the table.
6. Click OK.
7. Click the Module Properties tab.
8. In the Refresh Type box, click TRUNCATE if necessary, and then click
OK to close the Table Delivery Properties window.
8. Click OK.
9. In the Element list, click ProductNumber.
Buttons for Index Properties and Element Properties appear.
5. Press Enter to close the command window, and then open the log file.
The result appears as shown below.
Key Information
You may receive an error at the end of the
fact build due to the dropping of indexes.
This is acceptable, as long as the fact build
completed successfully.
5. Close the log file, and then close the Log window.
Task 7. Create indexes on the surrogate and business key
columns of the D_ProductH dimension table.
1. Under Builds, expand the Product dimension build, and then double-
click D_ProductH.
The Dimension Table Properties window opens.
2. Click the Columns tab, and then in the skey row, click the Index box to
select it.
An ellipsis button appears in the Index Properties column for key.
3. Click the ellipsis button .
The Index Properties dialog box appears.
4. In the Name box, type prod_pkey.
5. In the Type box, click UNIQUE.
We want the surrogate key to be a primary key for the table.
6. Click the Suppress Errors and Recreate Indexes check boxes to
deselect them.
Because we will be updating the table, we do not want the indexes to be
dropped.
7. Click OK.
8. In the ProductNumber row, click the Index check box to select it.
An ellipsis button appears in the Index Properties column for
ProductNumber.
9. Click the ellipsis button.
The Index Properties dialog box appears.
10. In the Name box, type prod_product.
Leave the Type box blank.
11. Click the Suppress Errors and Recreate Indexes check boxes to
deselect them.
The result appears as follows.
12. Click OK, and then click OK to close the Dimension Table Properties
window.
We want to see how DecisionStream updates the dimension data
through a dimension build. Therefore, we must add extra records as static
members.
Task 8. Add static members to the Products level of the
ProductH hierarchy and execute the Product
dimension build.
1. In the Library folder, expand the Dimensions folder (if necessary), Instructional Tips
expand the ProductD dimension (if necessary), the ProductH There are already products in the ProductH
hierarchy with ID numbers of 110 (Blue
hierarchy, and the Products level. Steel Putter) and 115 (Course Pro Gloves).
2. Double-click Static Members. You may want to explore the hierarchy and
show the students which products already
The Products Static Members window opens. use these numbers.
3. Click Add. Instead of adding static members, you may
4. In the ProductNumber column, type 110. want to include rows of data in a text file,
and then add the text file as a data source
5. Double-click the ProductName box, and then type Red Pencils. in the catalog.
6. Repeat steps 3 to 5 to add another member, using 115 for
ProductNumber and Coffee Tables for ProductName.
7. Click OK to close the Products Static Members window, and then save
the catalog.
8. Right-click the Product build, and then click Execute.
The Execute Build dialog box appears.
9. In the Trace area, if necessary, click the Override build settings check
box to select it, and then click the SQL check box.
Other check boxes may be selected in the Trace area already. This is
acceptable.
10. Click OK, and then press Enter when the build has finished executing to
close the command window.
11. Analyze the log file.
Notice that there are two updates to the table. The surrogate key column
is placed in the WHERE clause of the UPDATE statement. Remember
that we did not create a key on the surrogate key column.
DecisionStream did it by default.
12. Close the log file and the Log window.
13. Under the Products level of the ProductH hierarchy, right-click Static
Members, and then click Delete.
A dialog box appears confirming the deletion.
14. Click Yes, save the catalog, and then keep DecisionStream open for the
upcoming workshop.
Results:
We have defined keys on the OrderCode and dimension
element columns. Then we created single-column indexes on
dimension element columns and a composite index on all key
columns in the F_DemoSales table. We also created single-
column indexes on the surrogate and business key columns of
the D_ProductH table.
Summary
Workshop 21-1
• Add the Sales fact build to the build tree and add a data source that uses
data from the GOSOrderDetail and GOSOrderHeader tables in the
SourceConnect database.
• Map the columns from the data source to items in the DataStream, and
then map the DataStream items to the elements of the transformation
model.
• Add a unique composite index on the table delivery. At this point, you
want the index to be recreated. However, you do not need to track any
errors.
For more detailed information outlined as tasks, see the Task Table on the next
page.
For the final result, see the Workshop Results section that follows the Task
Table.
15. Aggregation
16. Pivoting
17. Ragged Hierarchies
18. Packaging and Navigator
19. Resolving Data Quality Issues
20. Troubleshooting and Tuning
21. Delivery in Depth
22. The Command Line Interface
Series 7 Version 2 DecisionStream for Data Warehouse
Developers
SERIES 7 VERSION 2 DECISIONSTREAM FOR DATA WAREHOUSE DEVELOPERS
Objectives
DecisionStream provides a command line interface (CLI) for all platforms that it Technical Information
supports. You can therefore use DecisionStream Designer to develop builds on You may have to configure the computer
the 32-bit Windows platform and then use the CLI to deploy these builds on before using the CLI. This is dependent on
supported UNIX or Windows platforms. the operating system in question.
You can perform auditing only on catalog-based projects, and can use user-
defined functions and JobStreams only within a catalog.
CLI Programs
When you execute a fact build from within the catalog or the command line
interface, the DecisionStream Engine runs the DATABUILD command.
The advantage of using DATABUILD directly to execute a fact build is that after
you provide the correct parameters, you can use the command in a batch file to
automate the execution process. This makes it possible for you to process a fact
build from outside DecisionStream.
You can modify the behavior of the DATABUILD command by adding options.
For example, adding -c to the DATABUILD command, specifies that the fact
build definition should be retrieved from a catalog. You can also use
DATABUILD to list all the delivery modules to which your DecisionStream
license gives access or to list the properties of an available module (such as a DB2
LOAD delivery).
For information about the syntax of this command, see Chapter 23,
"Commands" of the User Guide.
When you execute a dimension build from within the catalog or from the
command line, the DecisionStream Engine runs the DIMBUILD command. As
with DATABUILD, the advantage of using DIMBUILD directly to execute a
dimension build is that, once you have provided the correct parameters, you can
use the command in a batch file to automate the execution process. This allows
you to process a dimension build from outside DecisionStream.
You can modify the behavior of the DIMBUILD command by adding options.
For example, adding -C to the DATABUILD command, executes the dimension
build in check only mode.
For information about the syntax of this command, see Chapter 23,
"Commands" of the User Guide.
Execute a JobStream
When you execute a JobStream from within the catalog or from the command
line, the DecisionStream Engine runs the RUNDSJOB command. As with
DATABUILD and DIMBUILD, the advantage of using RUNDSJOB directly to
execute a JobStream is that after you provide the correct parameters, you can use
the command in a batch file to automate the execution process. This makes it
possible for you to process a JobStream from outside DecisionStream.
You can modify the behavior of the RUNDSJOB command by adding options.
For example, adding -L to the RUNDSJOB command, logs JobStream progress
to the specified file.
For information about the syntax of this command, see Chapter 23,
"Commands" of the User Guide.
Demo 22-1
Purpose:
Management wants to simplify the extraction, transformation,
and loading of transactional data so that the process contains
fewer steps and can be done outside DecisionStream.
Therefore, we will use the DecisionStream language to create a
batch file to execute the Sales fact build that exists in the DS
GO_Catalog. We will then execute this batch file and view the
results.
For more information about variables, see Chapter 15 of the User Guide.
Summary
Appendix A
Step-by-Step Solutions
3. In the Name box, type sales_comp, and then in the Type box, click
UNIQUE.
The result appears as shown below.
4. Click OK to close the Index Properties dialog box, and then click the
Module Properties tab.
5. In the Refresh Type box, click TRUNCATE, and then click OK to
close the Table Delivery Properties window.
6. Under the Sales fact build, right-click DataStream, and the click
Properties.
The DataStream Properties window opens.
7. Click the Input tab, and then in the Maximum input rows to process
box, type 1000.
8. Click OK to close the DataStream Properties window, and then save the
catalog.
Task 9. Execute the build and view the results.
1. Right-click Sales, and then click Execute.
The Execute Build dialog box appears.
2. Click the Override build settings check box to select it, and then click
the SQL and ExecutedSQL check boxes to select them.
3. Ensure that the Progress check box is selected, and then click OK.
A command window opens and a log file is created that tracks the
progress of the build execution. Notice that 1000 records have been
inserted into F_Sales table.
4. Press Enter to close the DOS window.
5. Open SQLTerm, and then in the Database for SQL Operations box,
click TargetConnect.
6. Expand TargetConnect, right-click F_Sales, and then click Select
rows.
The SQL statement appears in the SQL Query pane.
7. Execute the query.
The result appears as shown below.
8. Close SQLTerm.
Appendix B
Entity-Relationship Diagram of
the GO_Demo Database
The DS GO_Source Data Source Name (DSN) refers to an Access database Legend:
called GO_Demo.mdb. This database consists of 32 tables, which are related as 1 = one ∞ = many
indicated by the following entity/relationship diagram.
Index
A C
aggregation, 4-22, 15-6 caching
additional rows created by, 15-11 dimension data, 20-15, 20-16
considerations, 15-12, 15-13 calculations
definition, 15-5 creating for user-defined functions, 12-14
enabling, 15-10 functions used in, 6-8
exceptions, 15-9 in derivations, 6-6, 6-12, 6-13, 6-15
functions, 15-7–15-8 catalogs
impact on derivations, 6-12 adding data sources to, 9-29, 10-21
using the AVG function, 15-17 adding dimensions to, 7-16
alert nodes, 13-9 adding JobStreams to, 13-7
architecture adding user-defined functions to, 12-6
data warehouses, 1-6 backing up, 2-19, 2-21, 2-32, 18-10
DecisionStream, 1-14 closing, 1-28
arguments creating, 2-7, 2-16
in user-defined functions, 12-7, 12-14 creating database schema for, 4-42
attributes, 4-12 database tables for, 2-5
adding to auto-level hierarchies, 17-11 definition, 2-4
adding to fact builds, 8-10, 8-16 documenting, 4-41, 4-44
adding to hierarchy levels, 7-6, 7-7 exploring, 1-27
in hierarchies, 7-5 opening, 1-26
mapping, 7-17–7-24 restoring, 2-19, 2-21
mapping literals to, 7-26 saving, 2-21, 2-32
mapping to DataStream items, 17-12 searching with Navigator, 18-13, 18-14, 18-16-18-17
naming conventions for, 7-8 shared library items in, 2-6
audit tables storing, 2-7
inspecting, 19-24 tools for developing, 1-23
auto-level hierarchies, 3-8 viewing documentation for, 4-44
adding attributes to, 17-11 circular references, 17-30
creating, 17-10 CLI. See command line interface
determining number of levels in, 17-10–17-13 CLI commands
using to resolve ragged hierarchies, 17-8 CATBACKUP, 22-5
CATEXP, 22-5
B CATIMP, 22-5
CATLIST, 22-5
backing up
CATRESTORE, 22-5
catalogs, 2-19, 2-21, 2-32, 18-10
CATUPGRADE, 22-5
balanced hierarchies, 17-4 DATABUILD, 22-5, 22-6
batch files DIMBUILD, 22-5, 22-7
creating in Notepad, 22-10
RUNDSJOB, 22-5, 22-8
executing, 22-11
SHOWREF, 22-5
saving, 22-11
SQLTerm, 22-5
bitmap indexes, 21-23 command line interface. See also CLI commands
build elements, 8-7 executing dimension builds from, 22-7
build nodes, 13-9, 13-10, 13-24
executing fact builds from, 22-6
build schemas. See schemas
executing JobStreams from, 22-8
builds. See dimension builds and fact builds
programs. See CLI commands
Builds folder, 1-16
commands. See CLI commands
business keys, 9-26 composite indexes, 21-29, 21-30, 21-39
condition nodes, 13-9, 13-17, 13-25
SERIES 7 VERSION 2 DECISIONSTREAM FOR DATA WAREHOUSE DEVELOPERS
F
dimensions, 1-22, 3-6, 4-12. See also dimension elements
and SCDs Fact Build wizard, 1-23, 4-13, 14-12, 16-10, 19-29
adding to catalogs, 7-16 types of fact builds, 4-14
creating, 3-36, 19-31 using, 4-26–4-28
custom, 11-15 fact builds, 1-15, 4-4, 8-4
definition, 3-4 adding attributes to, 8-10, 8-16
example, 1-11 adding data sources to, 4-17, 4-18, 4-26, 8-12
generating surrogate keys for, 9-10 adding derivations to, 6-11, 6-15, 12-25, 19-23
hierarchies in, 3-8 adding dimension elements to, 8-7, 8-14–8-15, 19-34
lookups in, 3-8 adding measures to, 8-9, 8-16
shared across data marts, 3-5 analyzing the results of executing, 19-37, 21-43
shared in a data mart, 1-9, 1-12 configuring data delivery for, 4-24
standardized. See conformed dimensions controlling feedback, 4-35
templates in, 3-9 creating manually, 8-6, 8-12
documenting creating to deliver pivot tables, 16-10
catalogs, 4-41, 4-44 creating using Fact Build wizard, 4-13, 4-26–4-28, 14-12
drilling down data integrity lookup in, 5-20
on dimension data, 4-39 delivery modules in, 4-10
duplicate fact rows, 11-8, 11-9 dimension data delivery in, 4-5, 21-10–21-11
dynamic members, 7-27 duplicating, 11-17, 16-20
elements, 4-10, 4-12
E executing, 4-34, 4-37, 11-19, 12-26, 14-14, 15-15, 15-16,
16-23, 19-24, 19-37, 21-16, 21-22, 21-41, 21-43
effective date attributes executing from a batch file, 22-11
options, 10-17
executing from the CLI, 22-6
specifying the source, 10-15, 10-16, 10-26
executing in Check Only mode, 20-19
using to preserve dimensional history, 10-10
how data is processed using, 11-4, 11-5
email nodes, 13-9, 13-14
implementing late arriving facts in, 14-8
executing mapping data source columns to elements in, 8-16
dimension builds, 9-16, 9-32, 21-45 metadata delivery in, 4-11, 21-12–21-13
fact builds, 12-26, 15-15, 15-16, 16-23, 19-24, 21-16,
modifying properties of, 21-40
21-22, 21-41, 21-43
optional lookups in, 19-27
JobStreams, 13-21, 13-29
renaming, 16-20
execution modes
setting properties for, 4-26, 11-18
Check Only, 20-5 setting properties to reject data, 11-21
for fact builds, 4-34 translation lookups in, 7-52
Normal, 20-5
types of, 4-14
Object Creation, 20-5
using dimension data in, 5-17
exploring. See also viewing
using variables in, 12-17
dimension tables, 17-15, 17-28
viewing log files for, 4-37
hierarchies, 3-32, 3-37 viewing results of execution, 16-23
hierarchy properties, 3-24 visualizations for, 1-17, 4-29
levels, 3-25
fact data, 4-11
log files, 19-16
delivery of, 4-33
user interface, 1-26–1-28
merging duplicate, 11-10–11-11
exporting processing, 11-4, 11-5, 15-4
components to packages, 18-6, 18-10 rejecting duplicate, 11-12–11-13
DDL statements, 20-14
setting properties for delivery of, 4-27, 13-29, 21-8–21-9
external user-defined functions
fact deliveries
creating, 12-19
visualizations for, 4-32
implementing, 12-20
fact delivery modules
relational table, 21-8
text file, 21-9
types of, 21-7
I L
Import Wizard, 2-27–2-28 late arriving dimension details
importing processing, 10-13
components from packages, 18-7, 18-8, 18-11 late arriving facts
table data into definition files, 2-30 definition, 14-5
indexes implementing, 14-8, 14-13
compared to keys, 21-26 necessary conditions for processing, 14-7
composite, 21-29, 21-30 specifying date ranges for, 14-9
creating composite, 21-39 when they can occur, 14-6
creating in dimension tables, 21-25, 21-43 leaf nodes, 17-6
creating in fact tables, 21-24, 21-29 level filters, 21-32
creating single-column, 21-25, 21-38 considerations, 21-33
types of, 21-23 examples, 21-34
using to update fact tables, 21-30 levels, 3-9
interface. See user interface adding attributes to, 7-6, 7-7, 10-21
internal user-defined functions, 12-14 adding data sources for, 7-17–7-24
adding static members to, 7-32, 21-44
J adding to hierarchies, 3-22–3-23, 7-16–7-22, 10-21
adding to hierarchies manually, 7-31
JobStream nodes, 13-9, 13-18
changing data sources for, 9-29
JobStreams, 1-22
creating dimension tables to populate, 17-25–17-27
action on node failure, 13-16
exploring, 3-25
adding nodes to, 13-23–13-26 mapping attributes of, 3-29
adding to catalogs, 13-7 members, 7-27–7-29
adding variables to, 13-23
populating, 7-27
characteristics, 13-6
specifying output, 15-16
creating, 13-23
viewing attributes of, 3-27
definition, 13-5
Library. See catalogs or Library folder
executing, 13-21, 13-29 Library folder, 1-16
executing from the CLI, 22-8 literals
executing nodes, 13-20
definition, 7-25
linking nodes in, 13-19, 13-28
mapping to attributes, 7-26
nesting, 13-18
log files
nodes in, 13-9
exploring, 19-16
user-defined functions in, 12-23 for troubleshooting, 20-4
variables in, 13-8, 13-15 hash table references in, 20-8
visualizations for, 1-20
inspecting, 19-24
using, 4-35
K viewing for JobStream, 13-29
keys lookups, 1-22, 3-8, 7-48
compared to indexes, 21-26 creating based on a template, 19-31
creating on fact tables, 4-16, 21-28, 21-37 creating data sources for, 19-32
Kimball, Ralph, 5-7 creating for data integrity, 5-11, 5-18–5-19
definition, 7-4
design requirements, 7-49
in fact builds, 7-52, 19-27
optional, 7-48, 19-25, 19-26
to process late arriving facts, 14-11
translation, 7-48, 7-50, 7-51
M O
mapping ODBC Administrator
attributes, 7-17–7-24 using to add data sources, 10-20
data in hierarchies, 3-10 OLTP, 1-6
data source columns to fact build elements, 4-19, 8-16 using versus data marts, 1-10
data source columns to hierarchy attributes, 10-24 operational source systems. See OLTP
measures, 4-12 operators
adding to fact builds, 8-9, 8-16 for derivations, 6-7
members. See also dynamic members, foster members, optimal schemas
and static members snowflake, 5-16
definition, 3-9 star, 5-15
memory optional lookups, 7-48, 19-25
allocating to members, 20-6 designing, 19-26
reducing usage, 20-11. See also dimension breaking in fact builds, 19-27
setting limits, 20-9–20-10 output filters, 21-35
specifying options, 20-8
menus P
exploring, 1-28
packaging, 18-4
merging
changing behavior of, 11-18 components in packages, 18-5
duplicate data, 11-10–11-11, 11-17–11-19 exporting components to packages, 18-6, 18-10
importing components from packages, 18-7, 18-11
metadata, 4-11
importing identical components, 18-8
delivery modules, 21-12–21-13
page pool, 20-6
delivery of, 4-33
setting properties for, 4-27, 21-12 page table, 20-6
metrics. See measures partitioning
horizontal, 21-18, 21-31-21-35
multiple parents, 19-10
vertical, 21-18, 21-19
accepting, 19-11
pivoting, 16-4
ignoring, 19-12
applying multi-pivot technique, 16-20–16-23
multiple pivots, 16-16, 16-18, 16-21
mapping, 16-17 considerations, 16-5
creating pivot values, 16-13
implementing, 16-6, 16-17, 16-18
N
mapping to data sources, 16-14
naming conventions modifying pivot values, 16-21
for hierarchy attributes, 7-8 multiple pivots, 16-16, 16-21
natural keys, 9-7 single pivots, 16-7, 16-12, 16-13
Navigator, 18-12 PowerCubes
using to search for components, 18-13, 18-14, 18-17 creating, 4-39
nodes saving, 4-40
action on failure, 13-16 procedure nodes, 13-9, 13-12, 13-25, 13-26
alert, 13-9, 13-13 product
build, 13-9, 13-10, 13-24 user interface, 1-16–1-22
condition, 13-9, 13-17, 13-25 properties
email, 13-9, 13-14 providing a value for NULL values, 19-37
executing in JobStreams, 13-20 setting for dimension builds, 4-8
in JobStreams, 13-9 setting for dimension data delivery, 4-27, 21-11
JobStream, 13-9, 13-18 setting for dimension elements, 4-22
linking, 13-19, 13-28 setting for fact builds, 4-26, 11-18, 21-40
procedure, 13-9, 13-12, 13-25, 13-26 setting for fact data delivery, 4-27, 13-29, 21-8–21-9
SQL, 13-9, 13-11, 13-24, 13-26 setting for fact tables, 16-23
non-unique Ids, 19-14 setting for hierarchies, 19-14
NULL values setting for input rows, 19-37
providing a value for, 19-37 setting for metadata delivery, 4-27, 21-12
setting to reject data, 11-21
R SQL columns
mapping to fact build elements, 4-19, 8-16
ragged hierarchies
SQL Helper
circular references, 17-30
user interface, 2-13
creating, 17-19
using, 7-35
creating dimension builds to reference, 17-25
SQL nodes, 13-9, 13-11, 13-24, 13-26
exploring in PowerPlay, 17-29
SQL statements
leaf nodes, 17-5, 17-6
creating, 2-11
resolving, 17-7, 17-8, 17-17
creating for fact builds, 4-18, 4-26
recursive relationships, 3-16–3-18, 17-5
modifying, 7-22, 19-35
Reference Explorer, 1-23
running different types of, 2-10
using to explore hierarchies, 3-31, 7-35, 7-47, 17-12
SQLTerm, 1-23
reference structures
debugging data with, 19-18
visualizations for, 1-18
displaying, 2-10
reject files, 11-12, 19-6
exploring, 1-28
analyzing, 11-21
user interface, 2-11
rejected data
using to explore dimension tables, 17-15, 17-28
analyzing, 19-6
using to verify SQLTXT specifications, 2-32
saving to a fact delivery, 19-17
using to view data, 2-17, 9-16, 9-28
relational table delivery module, 21-8
SQLTXT
relationships
verifying specifications, 2-32
between multiple tables, 3-19–3-20
SQLTXT database
between rows of one table, 3-16–3-18
adding tables to, 2-25, 2-27–2-28
recursive, 17-5
SQLTXT Designer, 1-23, 2-23
repeating B-tree indexes, 21-23
defining columns in, 2-26
reporting data. See data warehouses
Import Wizard, 2-27–2-28
restoring
user interface, 2-24
catalogs, 2-19, 2-21
standardizing
Result variable
incoming data, 5-22
in JobStreams, 13-15
star schema, 4-16, 5-15
RUNDSJOB command, 22-8
static members, 3-9, 7-27, 7-29
rundsjob.exe, 20-4
adding to levels, 7-32, 21-44
surrogate keys, 9-5
S adding to dimension tables, 9-15
saving adding to dimensions, 9-10
catalogs, 2-21, 2-32 assigning to fact tables, 9-9, 9-11
PowerCubes, 4-40 example, 9-8
SCDs, 9-18–9-19 for data integrity lookups, 5-18–5-19
applying to dimension builds, 9-28–9-34 in data marts, 9-6
definition, 9-4 in operational systems, 9-6
issues with, 9-20 substituting in fact tables, 5-18
managing, 9-25–9-26 value to SCDs, 9-6
methods of handling, 9-21–9-23 versus natural keys, 9-5
Type 2, 14-4 syntax
viewing changes to, 9-32 of user-defined functions, 12-8
schemas, 5-14
optimal snowflake, 5-16
optimal star, 5-15
parent-child, 17-5
snowflake, 5-16
star, 5-15
server engine, 1-14
shared dimensions. See conformed dimensions and dimensions
single pivots, 16-7, 16-13
mapping, 16-6
slowly changing dimensions. See SCDs
snowflake schema, 4-16, 5-16