DS7001 Instructor Guide PDF

Course Code: DS7001
Series 7 Version 2
DecisionStream for Data
Warehouse Developers
Instructor Guide
Printed in Canada (05/03) !035101!
SERIES 7 VERSION 2 DECISIONSTREAM FOR DATA W AREHOUSE DEVELOPERS
While every attempt has been made to ensure that the Part 35101
information in this document is accurate and complete,
some typographical or technical errors may exist. DS7001
Cognos cannot accept responsibility for any kind of loss Published TBD, 2003
resulting from the use of this document.
© 2003, Cognos Incorporated
This page shows the original publication date. The Portions copyright (C) Microsoft Corporation, One
information contained in this book is subject to change Microsoft Way, Redmond, Washington 98052-6399
without notice. Any improvements or changes to either USA. All rights reserved.
the product or the course will be documented in
subsequent editions. Sample product images with the pound symbol (#) in the
lower right hand corner are copyright (C) 1998
PhotoDisc, Inc.
This guide contains proprietary information, which is Cognos, the Cognos logo, Better Decisions Every Day,
protected by copyright. All rights are reserved. No part of Axiant, Cognos Accelerator, COGNOSuite,
this document may be photocopied, reproduced, or DecisionStream, Impromptu, NovaView, PowerCube,
translated into another language without the prior written PowerHouse, PowerPlay, Scenario and 4Thought are
consent of Cognos Incorporated. trademarks or registered trademarks of Cognos
Incorporated in the United States and/or other countries.
All other names are trademarks or registered trademarks
of their respective companies.
Restricted Rights Legend. The Software and all

accompanying materials are provided with Restricted
Rights. Use, duplication, or disclosure by the U.S.
Government is subject to restrictions as set forth in
subdivision (b) (3) (ii) of the Rights in Technical Data and
Computer Software clause at 252.227-7013. The
Contractor is Cognos Corporation,
67 South Bedford St., Burlington, MA 01803-5164.
ii © 2003, Cognos Incorporated

Cognos Confidential. For internal use only.
This guide contains proprietary information which is protected by copyright.
INSTRUCTOR PREFACE
Contents
CONTENTS.................................................................................................................. III
COURSE OVERVIEW ............................................................................................... IV

COURSE OVERVIEW ..................................................................................................... IV
WHO SHOULD ATTEND ................................................................................................ IV
COURSE PREREQUISITES .............................................................................................. IV
SUGGESTED COURSE SCHEDULE .........................................................................V
INSTRUCTIONAL MATERIALS ............................................................................VII

STUDENT GUIDE ......................................................................................................... VII
INSTRUCTOR GUIDE .................................................................................................... VII
STUDENT DATA MEDIA............................................................................................... VII
INSTRUCTOR INSTALLATION CD ................................................................................ VIII
IMPORTANT SETUP INSTRUCTIONS FOR THIS COURSE...............................X
CONFIGURE THE INSTRUCTOR AND STUDENT COMPUTER CHECKLIST .......................... X
GENERAL SETUP AND INSTRUCTOR PREPARATION ..................................XII
PRE-CLASS AGENDA ................................................................................................... XII
PREPARE TO TEACH .................................................................................................... XII
DOCUMENT CONVENTIONS .............................................................................. XIII
POWERPOINT TIPS ..................................................................................................... XIV
GET THE CLASS STARTED ...................................................................................XV
STUDENT GUIDE TABLE OF CONTENTS ........................................................ XVI
POST-CLASS AGENDA.........................................................................................XVII
POST-CLASS AGENDA (CONT'D).................................................................... XVIII

YOUR FEEDBACK ..................................................................................................... XVIII
© 2003, Cognos Incorporated iii

Course Overview
Course Overview
Series 7 Version 2 DecisionStream for Data Warehouse Developers
is a five-day, instructor-led course designed for enterprise data mart
builders. It will teach users how to move, merge, consolidate, and
transform data from a range of data sources to build and maintain
subject-area data marts. DecisionStream data mart targets include all the
major relational databases and are typically designed to create best
practices dimensional data mart solutions (star schema).
Specifically, the course deals with the dimensional framework, builds, and
templates. The dimensional framework is a repository for reference
structures (dimensions, hierarchies, lookups, and dimension templates)
that promotes the reuse of these structures. Builds are units of work that
progress through data acquisition, transformation, and delivery.
Templates are objects that define source and target attributes for
dimension data.
Who Should Attend

This course is intended for developers who are planning a data
warehouse or have a data warehousing background. It is designed for the
more experienced systems professional involved in the provision of MIS
solutions. Typically, this audience includes BI project consultants, project
analysts, project leaders, or database administrators.
Participants will learn:
• data modeling strategy

• dimension modeling using DecisionStream
• template development
• fact build creation and delivery
• data mart development lifecycle
Course Prerequisites
Participants should have:
• knowledge of Windows
• knowledge of database concepts
• knowledge of dimensional analysis concepts
• working knowledge of SQL
iv © 2003, Cognos Incorporated

INSTRUCTOR PREFACE
Suggested Course Schedule
Day 1
Start End Description Time (hr:min)
9:00 AM 9:15 AM Introduction 0:15
9:15 AM 10:30 AM Module 1 - Lecture 1:15
10:30 AM 10:45 AM Break 0:15
10:45 AM 11:45 PM Module 2 - Lecture 1:00
11:45 PM 12:00 PM Module 2 - Workshop 0:15
12:00 PM 1:00 PM Lunch 1:00
1:00 PM 2:45 PM Module 3 - Lecture 1:45
2:45 PM 3:00 PM Break 0:15
4:45 PM 5:00 PM Wrap Up - Day 1 0:15
Day 2
9:00 AM 9:10 AM Review Day 1 0:10
10:40 AM 10:50 AM Break 0:10
12:00 PM 1:00 PM Lunch 1:00
2:45 PM 3:00 PM Break 0:15
4:45 PM 5:00 PM Wrap Up - Day 2 0:15
Day 3
9:00 AM 9:15 AM Review Day 2 0:15
10:45 AM 11:00 AM Break 0:15
12:00 PM 1:00 PM Lunch 1:00
2:45 PM 3:00 PM Break 0:15
4:45 PM 5:00 PM Wrap Up - Day 3 0:15
© 2003, Cognos Incorporated v

Day 4
9:00 AM 9:10 AM Review Day 3 0:10
10:45 AM 11:00 AM Break 0:15
12:00 PM 1:00 PM Lunch 1:00
3:00 PM 3:15 PM Break 0:15
4:45 PM 5:00 PM Wrap Up - Day 4 0:15
Day 5
9:00 AM 9:15 AM Review Day 4 0:15
10:30 AM 10:45 AM Break 0:15
12:00 PM 1:00 PM Lunch 1:00
2:30 PM 2:45 PM Break 0:15
3:15 PM 4:15 PM Wrap Up - Course 1:00
vi © 2003, Cognos Incorporated

INSTRUCTOR PREFACE
Instructional Materials
Student Guide
The Student Guide contains explanations and features of the product,
along with the presentation slides that are presented by the instructor.
Student demos and workshops are incorporated in the course to enrich
the learning experience through hands-on practice.
Demos
Demos appear after covering one or more topics or features of the

application. While not every product function is demonstrated,
participants work with the more important and complex features through a
series of tasks. Demo tasks contain a number of steps related to a
specific action or feature of the product.
Workshops
In some of the modules, a supplementary workshop is included. If

participants followed the concepts in class without difficulties, they can
probably complete the exercise with no additional information. The
second page for each exercise contains a task table that identifies each
task, where to work in the application, and any applicable hints to help
the participants. Step-by-step solutions for all workshop exercises are in
Appendix A. Participants may want to follow these instructions if they
were not able to complete the exercise or if they require a little more
practice with the application.
Instructor Guide
The Instructor Guide contains the same content presented in the Student
Guide, along with additional notes to supplement and add value to the
lecture. The information can be generic, non-technical information, such
as multiple ways to perform the same command or a more in-depth
discussion of a topic. It may also be used to address more technical
questions from participants or as supplementary technical discussion, at
the discretion of the instructor. It helps to provide the appropriate level of
information to a specific audience.
Student Data Media

The CD contains the files necessary for completing the demos and
workshops outside of the classroom.
© 2003, Cognos Incorporated vii

Instructor Installation CD
The Instructor Installation CD contains an executable file that can install
any or all of the following files. By inserting the CD into your computer
and following the prompts as the auto install runs, these files will be
installed in C:\Edcognos\DS7001.
Instructor Slides
These files contain the Microsoft PowerPoint slide presentation for each
module of the course as presented in the Student Guide:
• StartDS7001.ppt • Mod7DS7001.ppt • Mod15DS7001.ppt
• IntroDS7001.ppt • Mod8DS7001.ppt • Mod16DS7001.ppt
• Mod1DS7001.ppt • Mod9DS7001.ppt • Mod17DS7001.ppt
PDF version of the Instructor Guide
These files are the Instructor Guide Microsoft Word documents in PDF.
Student Data
The Student Data files and folders are required for the completion of the
demos and workshops. This is the same data that is contained on the
Student Data CD described earlier and is available with the Instructor
Installation CD. The student data files and folders are installed in
C:\Edcognos\DS7001. They are:
viii © 2003, Cognos Incorporated

INSTRUCTOR PREFACE
Demos, Workshops, and Workshop Solutions

The course is designed to be easily customized for on-site training with
customer data. The files contain Microsoft Word files that you can use to
modify the demonstrations and workshops as required.
The previous items cannot be accessed directly from the CD. They must
be installed on your computer by using the EXE auto install.
© 2003, Cognos Incorporated ix

Important Setup Instructions for This Course
Configure the Instructor and Student Computer

Checklist
Use the following checklist when configuring both the instructor and
student computers:
Setup Complete
Windows 2000.
Microsoft ODBC Data Access Components.
Microsoft Office (specifically Microsoft Access Components).
Cognos PowerPlay Transformer for Windows Series 7 Version 2 on the

instructor's computer.
Cognos PowerPlay for Windows Series 7 Version 2 on the instructor's

computer.
Cognos DecisionStream Series 7 Version 2.
Viewer for Microsoft PowerPoint 2000 or the full PowerPoint 2000 application
on the instructor's computer.
Adobe Acrobat Reader version 5 (install after Web browsers).
Install student data files.
Install the presentation files on instructor computer.
Customize the introductory slide in STARTDS7001.PPT (optional).
x © 2003, Cognos Incorporated

INSTRUCTOR PREFACE
Set up data source names (DSNs) in ODBC Administrator

prior to Module 5, "Conformed Dimensions."
1. Open DecisionStream, and then from the Tools menu, click ODBC
Administrator.
2. Click the System DSN tab, and then add three data source names using
the Microsoft Access Driver (*.mdb):
DS GO_Source: points to C:\Edcognos\DS7001\GO_Demo.mdb.
DS GO_Target: points to C:\Edcognos\DS7001\GO_Target.mdb.
3. Close ODBC Administrator.
© 2003, Cognos Incorporated xi

General Setup and Instructor Preparation

Pre-Class Agenda
To ensure that the class runs smoothly, you should know the answers to
the following questions. If you need help in obtaining answers, contact the
customer or customer's sales representative (if the course is scheduled for
a client site), or the local office responsible for course logistics.
• Who is the contact person for class setup?
• What is the classroom setup? Is there a white board? Is there a

flip chart? Is there a computer for the instructor, a PC viewer,
overhead projector, and screen?
• Will the physical environment be set up prior to your arrival

(product loaded and PowerPoint files on the computer)?
• What time does the class start?
• What hours are available for accessing the teaching site, copying
the files to the hard disk, tuning the color on the PC viewer, and
so on?
• What Cognos office is responsible for sending the Student

Guides?
• If the course has been previously taught on the computers you

are using, have the Preferences been reset to their defaults, and
have student files been deleted?
Prepare to Teach
After you have configured the instructor and student computers, consider
the following:
• Run through at least one module in a classroom with a PC viewer.
• Run through the full course at least once on a computer.
• Make sure you complete each of the demos before teaching the
course so that you become familiar with each step required.
• Have a set of product reference manuals in the classroom.
• Make sure that there is a Student Guide for each participant and
that they have the student data files so that they can practice after
leaving the course.
xii © 2003, Cognos Incorporated

INSTRUCTOR PREFACE
Document Conventions
Conventions used in this guide follow Microsoft Windows application
standards, where applicable. As well, the following conventions are
observed:
Bold Bold style is used in demo and workshop step-

by-step solutions to indicate a user interface
element that is actively selected or text that
must be typed by the participant.
Italic Italic style is used to emphasize terms that are

used in an unfamiliar way.
CAPITALIZATION All file names, table names, column names,

and folder names appear in this guide exactly
as they appear in the application.
To keep capitalization consistent with this

guide, type text exactly as shown.
Help Footnotes are used to reference the Help file

for other information and to expand on a topic
covered in the page content.
© 2003, Cognos Incorporated xiii

PowerPoint Tips
Here are valuable keyboard commands you can use to improve your
presentation.
Command Key(s)
Advance to next slide Left-click, PAGE DOWN, SPACE, N,

RIGHT or DOWN ARROW, right-
click/Next, ENTER
Return to previous slide BACKSPACE, PAGE UP, P, LEFT or UP

ARROW, right-click/Previous
Change pointer to a pen Right-click/Pen or CTRL+P
Erase drawings on screen E
Make the screen white W or ',' (toggle to restore)
Make the screen black B or '.' (toggle to restore)
Help ?
End the slide show ESC, CTRL+BREAK, '-'
Move between PowerPoint and the ALT+TAB or click the application name
product on the status bar
You can also jump to a specific slide by typing its slide number and
pressing the ENTER key. However the slide number is not the same as
the printed page number because a page may be built from several slides
to produce an animation sequence.
Important Tips:
• A page containing an animation slide (multiple clicks to complete

the slide) will also include an Instructor Guide note indicating the
number of clicks needed to complete the slide.
xiv © 2003, Cognos Incorporated

INSTRUCTOR PREFACE
Get the Class Started

Welcome participants to the course. Use the slide show,
STARTDS7001.PPT, to introduce yourself, the participants, and the
agenda for the training (optional).
Instructor Introduction Use this slide to welcome the participants to the

Course Name
course, to introduce yourself, and to mention your
Instructor Name background (for example, how long you have been
Instructor Background
teaching the course, your teaching experience overall,
how long you have been working with the product, and
so on). Make sure you have customized the slide
ahead of time.
Student Introductions Have the participants take turns introducing

Name
themselves with respect to the items listed on this
Company slide. The intent of the slide is to act as an icebreaker
Position
Product Experience
and to encourage participation.
Personal objective(s)
for this course
Administrative Items Use this slide to go through the list of administrative

Sign-in sheet
items that participants often ask about.
Smoking
Messages
Telephones
Washrooms
Refreshments
Turn off cell phones
and pagers
Class Format Use this slide to explain the class format and
lecture with slides
emphasize that participants are encouraged to actively
student guides as
reference material
perform the hands-on demos while following along
Hands-on demos to learn
and practice
with the instructor.
Independent workshop
exercises for more practice
Mention that the Student Guide contains copies of the slides and further
supporting notes for the participants to use as reference material in the
future.
© 2003, Cognos Incorporated xv

Student Guide Table of Contents
Introduction
Module 1: Getting Started
Module 2: Create a Catalog
Module 3: Create Hierarchies
Module 4: Create Basic Builds
Module 5: Conformed Dimensions
Module 6: Derivations
Module 7: Templates, Lookups, and Attributes
Module 8: Fact Builds
Module 9: History Preservation
Module 10: Hierarchical Dimensions
Module 11: Facts in Depth Merging
Module 12: User-Defined Functions and Variables
Module 13: JobStreams
Module 14: Facts and History Preservation
Module 15: Aggregation
Module 16: Pivoting
Module 17: Ragged Hierarchies
Module 18: Packaging and Navigator
Module 19: Resolving Data Quality Issues
Module 20: Troubleshooting and Tuning
Module 21: Delivery in Depth
Module 22: The Command Line Interface
Appendix A: Step-by-Step Solutions
Appendix B: Entity-Relationship Diagram of the
GO_Demo Database
xvi © 2003, Cognos Incorporated

INSTRUCTOR PREFACE
Post-Class Agenda
• Have participants complete the Course Evaluation forms.
• Hand out certificates.
• Leave the classroom clean.
• If you brought any hardware or course diskettes, take them with

you when you leave. Erase any files copied to the hard disks of
the computers in the classroom. Change the Preferences back to
their initial settings.
• Complete the Instructor Feedback form, and return it to the

Education Coordinator. Ensure that the Coordinator receives the
Course Evaluation forms.
• If you are at a customer site, thank the course administrator by

letter.
• List any outstanding questions, and ensure that participants

receive answers in writing.
• Report any sales leads to your sales representative.
• Make notes for yourself about what went well during the course
and what needs improvement. When you are preparing for your
next teach, you can refer to these.
© 2003, Cognos Incorporated xvii

Post-Class Agenda (cont'd)
Your Feedback
These course materials were designed by a group of instructional
designers in Ottawa, Canada.
Your feedback is important and valuable. We are interested in your

comments or questions.
Please address them to:
Cognos Incorporated
3755 Riverside Drive
P.O. Box 9707, Stn. T
Ottawa, Ontario, Canada
K1G 4K9
Phone: (613) 738-1440
Fax: (613) 727-0236
Attn: Manager, Education Products
Email Address: education@cognos.com
xviii © 2003, Cognos Incorporated

Introduction
Series 7 Version 2 DecisionStream for Data Warehouse

Developers
II © 2003, Cognos Incorporated

INTRODUCTION
Course Objectives
In this course, we will:

discuss dimensional modeling with DecisionStream
develop a DecisionStream catalog
create the Dimensional Framework
discuss and develop templates
enhance the Dimensional Framework
deliver data in a fact build
index data
© 2003, Cognos Incorporated III

Skills You Will Learn
You will learn:

data modeling strategy
dimensional modeling using DecisionStream
the data mart development lifecycle
template development
fact build creation and delivery
IV © 2003, Cognos Incorporated

INTRODUCTION
1. Getting Started
2. Create a Catalog
3. Create Hierarchies
4. Create Basic Builds
5. Conformed Dimensions
6. Derivations
7. Templates, Lookups, and Attributes

Developers
Throughout this course, DecisionStream concepts and procedures are grouped

into modules that are presented in a logical and structured manner. The hands-on
demonstrations and workshops give you the opportunity to gain the knowledge
and skills that you require to create data marts that will help your users to
effectively report and query on important business areas.
Audience:
• This course is intended for DecisionStream Designers.
Prerequisites:
• data and dimensional modeling experience
• experience administering any Cognos applications
• experience using Windows operating system
• knowledge of your business requirements and data would be beneficial
© 2003, Cognos Incorporated V

8. Fact Builds
9. History Preservation
10. Hierarchical Dimensions
11. Facts in Depth Merging
12. User-Defined Functions and Variables
13. JobStreams
14. Facts and History Preservation

Developers
15. Aggregation
16. Pivoting
17. Ragged Hierarchies
18. Packaging and Navigator
19. Resolving Data Quality Issues
20. Troubleshooting and Tuning
21. Delivery in Depth
22. The Command Line Interface
Developers
VI © 2003, Cognos Incorporated

INTRODUCTION
Cognos DecisionStream Help

Cognos Help files are written and designed to help you find information quickly
and easily while you work.
The documentation available to you will depend on the components selected
during installation. Contact your administrator if you want access to
documentation listed in the Documentation Roadmap or Books for Printing.
DecisionStream documentation includes reference books and online help to
meet the needs of our varied audience.
We use different media for different information, all designed to meet your
changing needs for documentation. Here is how you can use the different kinds
of documentation we provide.
Help type Purpose Location
Task-oriented You are working in the product and you The Help menu on the
online need specific task-oriented help. DecisionStream main menu bar.
Books for Printing You want to use search engines to find The Cognos Series 7 Version 2 folder.
(.pdf) information. You can then print out Navigate from Start to Programs/
selected pages, a section, or the whole Cognos Series 7 Version 2/
book. Documentation/Cognos
DecisionStream.
Use Step-by-Step online books (.pdf) if
you want to know how to get something
done but prefer to read about it in a
book. The Step-by-Step online books
contain the same information as the
online help although the method of
presentation may be different.
Documentation You have to know which type of Help The Help menu on the
Roadmap will provide you with answers to your DecisionStream main menu bar.
questions.
Cognos on the You want to access any of the following: The Help menu on the
Web DecisionStream main menu bar.
• online support
• What’s New at Cognos Web site
• Services and Training Web site
• Cognos News Web site
• Cognos Web site
© 2003, Cognos Incorporated VII

Task-Oriented Help
Contents
The Help function is always available from the main menu bar.
From the Help menu, click Contents and Index, or press F1.
The following dialog box appears.
Click the Contents tab to browse through topics by category.
VIII © 2003, Cognos Incorporated

INTRODUCTION
Index
Click the Index tab to see a list of index entries.
An index is a tool that points to or leads you to the related topic. Each topic in a
help file has one or more index terms from which that topic can be accessed.
Either type the term you are looking for or scroll through the interactive list of
terms available.
© 2003, Cognos Incorporated IX

Find
The Find tab accesses a search engine that will search for instances of a term
within the contents of the help file. Use this tab when you cannot find a term in
the index.
For example, node is not an index term. Using the Find tab, type node, and the
search engine will find this term. You can then display this help topic, and all
instances of the term builds will be highlighted.
Click the Find tab to search for words or phrases that do not have an index entry.
To refine your search, click Options.
To rebuild the word list to minimize, maximize, or customize the search

capabilities, click Rebuild.
X © 2003, Cognos Incorporated

INTRODUCTION
Books for Printing
If you navigate to Start\Programs\Cognos Series 7 Version 2\

Documentation\Cognos DecisionStream, you will see the following list of
documents for printing.
You must install Adobe Acrobat Reader to view the .pdf files.
Here is a summary of the contents of the books for printing.
Location Topic
Cognos SQL This guide describes the Cognos SQL data

manipulation language (DML) as defined in
ANSI/ISO SQL standards.
Discovering This guide helps new users quickly acquire a basic

DecisionStream understanding of DecisionStream Designer.
Function and Scripting This guide describes the DecisionStream functions

Reference and the scripting language that you can use within
the language or from DecisionStream Designer.
New Features for Series 7 This guide outlines the new features for
DecisionStream Series 7.
User Guide This guide describes how to use DecisionStream for

Windows. It assumes that you are familiar with
Microsoft Windows and SQL, and have an
understanding of multi-dimensional data analysis or
decision support systems.
© 2003, Cognos Incorporated XI

XII © 2003, Cognos Incorporated

1
1. Getting Started
2. Create a Catalog
6. Derivations

Developers
SERIES 7 VERSION 2 DECISIONSTREAM FOR DATA WAREHOUSE DEVELOPERS
1-2 © 2003, Cognos Incorporated

GETTING STARTED
Objectives
In this module, we will:

look at Cognos’ approach to data warehousing and the
role of DecisionStream in that approach
discuss conformed dimensions
discuss DecisionStream architecture
examine the interface elements
© 2003, Cognos Incorporated 1-3

What is DecisionStream?
The primary purpose of DecisionStream is to create

integrated data marts by:
extracting operational data from multiple sources
merging and transforming the data to facilitate reporting
and analysis
delivering transformed data into coordinated data marts
Data warehouses are becoming ever larger, with increasing demands for faster Instructional Tips
warehouse population, refresh, and support for multiple, dependent data marts. Explain the meaning of an ETL tool and
DecisionStream is a scalable, high-performance, flexible, multi-platform data discuss how DecisionStream distinguishes
transformation tool that addresses these needs. DecisionStream extracts data and itself from the many other ETL tools that
exist. Specifically, discuss the integration of
transforms it to deliver dimensional data marts. conformed data marts and the manner in
which DecisionStream produces valuable
DecisionStream: data, block by block, on a frequent basis.
• merges and transforms data sources to create subject-oriented data marts
• uses best-practice dimensional data mart architecture
• communicates with other Cognos tools
• uses an innovative dimensional reference model (the dimensional

framework) to manage coordinated data marts across many different
platforms

GETTING STARTED
Why Use a Data Warehouse?
The goals of a data warehouse are to:

facilitate reporting and analysis
maintain an organization’s historical information
be an adaptive and resilient source of information
be the foundation for decision making
Data In
To Users
Data Warehouse
A data warehouse is a database that is used to hold data for reporting and analysis. Questions
Ask the class "Why would you not put
The data may be accessed directly by users or it may be used to feed data marts. crude oil into an engine?" It is because the
The data warehouse is used as a source of reporting data for the whole enterprise. oil has to be refined to be used by an
internal combustion engine.
A successful data warehouse must:
Instructional Tips
• provide an integrated view of the enterprise, across many subject areas
The points in the slide are from Kimball et
al''s The Data Warehouse Lifecycle Toolkit
• be implemented in phases, delivering business value at each stage (1998).
• be capable of rapid response to changes in both source data and

reporting requirements

Data Warehouse Architecture
The purpose of a data warehouse is to supply well-structured

and clean data to end users.
Data warehouse architecture comprises a number of elements:
operational source systems
a data staging area
one or more conformed data marts
a data warehouse database
A data warehouse necessarily consists of a number of parts.
Operational source systems (also known as Online Transactional Processing or

OLTP) are developed to capture and process original business transactions. These
systems are designed for data entry, not reporting, but it is from here that tools
such as DecisionStream get the data they need to construct the data warehouse.
The data staging area is where the raw operational data is acquired, transformed,
cleaned, and combined so that it can be reported on and queried by users. This
area lies between the operational source systems and the user database and is
typically not accessible to users.
A data mart is a logical subset of an enterprise-wide data warehouse. For example,

a data warehouse for a retail chain is constructed incrementally from individual,
conformed data marts dealing with separate subject areas such as product sales.
The data warehouse database contains the data that is organized and stored
specifically for direct user queries and reports. It differs from an OLTP database
in that it is designed primarily for reads, not writes.

GETTING STARTED
Determining User Requirements
Data marts are designed to present an integrated view of your

business by consolidating the data into a common structure.
How you design your data warehouse will depend on your

organization’s requirements, environment, and readiness for
data warehousing.
It is necessary to get input from all users at all stages of the

design and development process.
Design (Requirements Gathering) Phase
Information Project Management

Data Warehouse User Community
Systems and (Business Intelligence
Developers
Technology Experts)
Implementation Phase
Using DecisionStream, you create data marts that present an integrated view Instructional Tips
Data warehousing design has been called
of your business by consolidating the data into a common structure. an art rather than a science, and students
However, there is no single correct view. How you choose to design your are often frustrated where there are no
data warehouse depends on your organization's requirements, environment scientific, clear-cut answers to the
and readiness for data warehousing. problems facing them. Students must
understand the various solutions and the
The user's input into the requirements of the data warehouse is paramount. pros and cons of each decision they make.
Although the user will not design or build the database itself, user
requirements dictate how changing dimensions are handled, how
summarization is tackled, the hierarchical structure of the data in the
dimension tables, and so on. It is important to maintain contact with the
users during the more technical phase of the project.

The Cognos Approach to Data Warehousing
Provide rapid and frequent return on investment (ROI).

Start with a business focus, not a data focus.
Be analysis-oriented (to facilitate the decision making
process) as opposed to database-oriented
(for example, Sales Analysis vs. Order Entry).
Cognos recommends information-centric instead of a technological approach to

building a data warehouse. The data warehouse needs to be derived from the
users' analytical needs as opposed to information technology's understanding of
the existing operational databases. Using a technological approach takes too long
and results in data warehouses that are overly complex and under utilized. The
result is a poor return on investment (ROI).
In most cases, it is important that the data warehouse and operational system be
kept separate because their characteristics are so different. The warehouse must
not be a mirror image of the operational system.

GETTING STARTED
Data Mart Versus Data Warehouse
How do we combine data marts to construct a data warehouse?
Global Data Warehouse

Sales Data Mart
Sales
Branches Products
HR
Sales
Fact
Customer Channels
Inventory Manufacturing
Time
Ralph Kimball et al (1998) defines a data mart as "a logical subset of the complete
data warehouse."
Must we build a single comprehensive data warehouse and extract data marts, or
must we build data marts and combine them into a data warehouse? Building a
single data warehouse:
• is complex and intellectually challenging
• takes too long before results are available
• is difficult to maintain
• is costly with slow return on investment
On the other hand, integrated data marts:
• can be implemented in manageable proportions
• make decentralized incremental development possible
• provide rapid deployment and return on investment
• are easier to change and more flexible
The danger of data marts is that if they are not integrated, they can become
"islands of information," where each subject area is isolated from the others. The
way to address this situation is to use conformed dimensions. This topic is
discussed further in Module 5, "Conformed Dimensions."

Data Mart Versus OLTP System
Orders OLTP System
Product Customer
Sales Area
Line Type Orders Data Mart
Product SalesStaff Products

Customer Sales Rep
Type
Order
Fact
Order
Product
Header Customer Time
Order
Line The Time table is constructed
from static data. It is not
derived from the Orders OLTP
system.
An OLTP system is different from a data mart in that it is designed for writing
transactional data, not querying or reporting. OLTP systems are normalized,
which means that the data to be written is broken down into its simplest form,
removing all redundancy from the data. All of these tables are related through
referential integrity, which makes writing new data to the OLTP database fast and
efficient.
However, reporting against such a structure is difficult because the data in it is

constantly changing. Also, most queries against an OLTP database require going
through several related tables, which makes any complex queries very slow.
The biggest difference between a typical data mart and the OLTP system that it is
built from is the number of tables. In the slide, the Orders data mart has only one
central table containing mostly numeric data, along with four other tables with
detailed information relating to these numbers. This is a typical "star schema,"
discussed further in Module 4, "Create Basic Builds."
The Products table, for example, is constructed through "collapsing" the

referential integrity legs of the Product Line, Product Type, and Product tables.
The data in these tables is combined into one dimension table. In this manner,
any query relating to Products and Customers in the Orders data mart has to go
through a maximum of three tables, rather than as many as seven as in the
Orders OLTP system.

GETTING STARTED
What is a Dimension?
Business dimensions are the core components or categories

of a business, anything that you want to analyze in reports.
Business dimensions are fundamental to data organization
and provide context for numeric data items, or measures.
Aggregating across
locations
Dimensions provide context for the key performance indicators (KPIs) that a
business uses to measure its performance.
For example, a retail chain-store may categorize its sales data by the products that
it sells, by its retail outlets, and by fiscal periods. This organization has the
business dimensions Product, Location, and Time. The measures of the business,
such as how much it sells, lie at the intersection of these dimensions.
The slide example shows these dimensions as the axis of a three-dimensional

space. The cube at the center of this space represents 100 units of widgets sold in
Montana during July.
You can derive summary information by aggregating data along one or more
dimensions. The slide example on the right shows the aggregation of data along
the Location dimension to give the total sales of widgets during July.

Conformed (Shared) Dimensions
Ralph Kimball conceived the conformed dimension concept.
Dimensions The Sales Data Mart
Shipping
Products
Sales X X X X X Branches
Distr. X X X
Mkting XX X X Distribution
Channels fact
Sales
HR X X X Fact
Customer Marketing
HR fact
fact
Time
Promotion
Training
Various business topics become natural data marts. A fully integrated data
warehouse will have data marts that use conformed dimensions and each data
mart will have a set of measures (fact table).
Conformed dimensions are the common elements in each data mart, which, when
combined into the data warehouse, form the overlapping glue. A conformed
dimension is re-useable and identical in every data mart in which it is used.
Ralph Kimball developed the concept of conformed dimensions. Kimball's

dimensional design technique makes it possible for you to build a data warehouse,
one data mart at a time. These techniques are most thoroughly discussed in his
book, The Data Warehouse Toolkit.
Identifying the conformed dimensions is critical in the beginning phases of

warehouse design and subsequent data mart development. In the longer term,
separating data by subject makes the warehouse flexible and easier to maintain.
DecisionStream is designed to create integrated data marts by using conformed

dimensions.

GETTING STARTED
The Purpose of DecisionStream
Extract Transform Load
Data
OLTP Data
OLTP
Data
Integrated
Text
ERP + Data Marts
-
ERP x/y
Text Metadata
DecisionStream provides the extract, transform, and load functions required to Technical Information
re-structure operational data into formats suitable for general reporting and DecisionStream targets Online Transaction
Processing (OLTP) to Online Analytical
OLAP (online analytical processing). Processing (OLAP) transformations for
constructing data marts.
DecisionStream produces a multidimensional model of the data warehouse and
can use this framework to populate many OLAP targets. DecisionStream can DecisionStream carries out many types of
read from many different relational and text data sources. transformations. It can merge data from
multiple, heterogeneous data sources. It
DecisionStream can also deliver data to multiple target data stores, including also has fast data aggregation capabilities
and a rich library of built-in functions.
relational databases and flat files.
DecisionStream is also capable of
Lastly, DecisionStream can deliver metadata to PowerPlay Transformer, delivering reporting data models, including
Impromptu, Architect and Microsoft SQL Server Analysis Services. metadata schema for Architect, PowerPlay,
and Impromptu.

Build slide.
2 clicks to complete.
DecisionStream Architecture
Platforms
Windows
NT, 2000, XP Operational
Sources
Design Client DB2 OLAP
Oracle
SQL Server
DB2
API Scripts Informix
ODBC
Sybase
DecisionStream Process Metadata Teradata
Text
SAP R/3
NT or UNIX
Custom
Server MetaData
DECISIONSTREAM
HP/UX Targets
Solaris
DBMS Impromptu
Transformer
IBM AIX
Flat Files Architect
COMPAQ TRU64
ERP
Microsoft SSAS
Server Engine
The two main parts to the DecisionStream architecture are a design client (the Technical Information
Starting with DecisionStream Series 7,
DecisionStream Designer, which runs on the Windows operating system) and a Windows 95, 98, and ME are not
server engine (which runs on UNIX and Windows). In a typical production supported.
environment, the two components are deployed on separate machines.
During the development process, it is
Designs are created using a graphical user interface. The design metadata is stored common to run both the designer and the
in any RDBMS. The server engine then reads this metadata at run time. engine on the same machine. In
production, you would typically devote one
machine or machines to perform the "heavy
A wide variety of sources and targets are supported: all popular DBMSs, as well
lifting" using the server engine component.
as several MOLAP structures on the target side. Metadata may be integrated with
Impromptu, PowerPlay, and Architect. The delivery data may also be partitioned
in arbitrary ways across these targets.
A key point is the availability of both an API and scripting language to drive the
server engine. The API and scripting language make it possible for data mart
building jobs to be created once through the Designer and then scheduled to run
on a regular basis.
DecisionStream has built-in support for data models such as star and snowflake
schemas, and its component-based architecture scales to support very large
systems. The wide platform coverage of DecisionStream provides unparalleled
flexibility of deployment, evolution, and migration.

GETTING STARTED
Build slide.
How DecisionStream Creates Data Marts
Dimensional
Dim
Framework
Product Time Location
Data
Sources 1
3.
5b.
DataStream
Dimension
Data Delivery
Source
5a.
2 Fact
DataStream
Delivery
Data 5c.
Source
MetaData
Delivery
DecisionStream creates data marts through a process called a fact build. The fact Technical Information
build may contain the following steps: The term "grain" in step 3 refers to the level
of detail of data being retrieved from the
1. The dimensional framework defines the hierarchical structure of the transactional data source. For example,
records of individual sales orders are at a
reference data. Most builds use this framework. For example, the fact lower level of detail than monthly totals of
build uses the Product, Time and Location dimensions. Dimension data sales orders.
is read in first to provide reference for the fact (transactional) data.
2. Transactional data is extracted from the operational systems' data sources

by DataStreams.
3. The dimensional key elements (Product, Time and Location) map back
to the dimensional framework to check referential integrity. The
dimensional framework is also used to identify the grain of the
transactional data, provide rollup levels, and as a source of BI metadata.
4. The other elements (attributes and measures) can be manipulated before

they are mapped to a fact table. This manipulation process may include
creating new elements (derivations).
5. The main purpose of a fact build is to deliver a fact table (5a.) into a
data mart. It may optionally deliver dimension tables (5b.) and BI
metadata (5c.).

DecisionStream Interface
Visualization
Tool Bar area
Navigation
Tree
Technical Information
By default, DecisionStream displays
information in the Visualization pane at
100% of the actual size. However, you can
The DecisionStream Designer provides a graphical interface to use with the change the size of the display by clicking
Windows environment. When you have opened a catalog, the full functionality of the Change the zoom level on the
the DecisionStream Designer window is available. The window consists of the visualization button on the toolbar.
following elements:
Element Purpose
Menu Selects an option relevant to the function that you are
performing.
Toolbar Provides quick access to the main components of
DecisionStream. For a description of each of the buttons,
see the online help.
Tree pane Displays the builds, JobStreams, and reference structures in
the current catalog.
Visualization pane Displays information about the selected build or JobStream.
You can double-click any item in this pane to find out more
about it.
Builds folder Contains all the fact builds and dimension builds in the
current catalog.
Library folder Contains all the reference dimensions (including hierarchies,
auto-level hierarchies, lookups and templates), connections,
and user-defined functions in the current catalog.
JobStreams folder Contains the collections of processes used to deliver the
data warehouse.

GETTING STARTED
Fact Build Visualization
The fact build visualization provides a generic view of the build that you are
working with. The fact build process consists of the following elements:
Element Function
Hierarchies Reference structures used to define the business view

of the data, organized by levels.
DataSource DataStream A set of instructions that a hierarchy or build uses to

get source data.
Transformation Model Provides a physical description of what is being

retrieved from the data source(s) and delivered to the
target table(s).
Delivery Modules Collection of data targets such as fact tables,

dimension tables, and metadata components.
Fact builds are discussed further in Module 4, "Create Basic Builds."

Reference Structure Visualization
When you click a reference structure in the Tree pane, DecisionStream shows a
visual representation of the whole structure in the Visualization pane.
The information shown includes:
• The database from which reference data is being accessed.
• The method used to access data (a template, DataStream or both).
• Any mapping between the DataStream and the reference structure.
Element Function
DataSource DataStream A set of instructions that the hierarchy uses to get

source data. The reference structure may also use a
template for data access. Template data access is
covered in Module 7, "Templates, Lookups, and
Attributes."
Levels The levels that compose the hierarchy. Each higher

level shows a more aggregated view of the data.
Hierarchy The entire reference structure used to define the

business view of the data, organized by levels.

GETTING STARTED
Dimension Build Visualization
Hierarchy in the Dimension build Template listing Dimension table Database that
dimensional that processes data columns in the containing data from holds the
framework. from the hierarchy. dimension table, the hierarchy. dimension
plus their behavior. table.
A dimension build reads reference data from a hierarchy in the dimensional Key Information
Dimension builds do not use a
framework and delivers this data to a dimension table. A dimension build consists
transformation model.
of the following elements:
Element Function
Hierarchy A logical view of the reference data, organized by levels.
Dimension Build A unit of work that delivers data describing a single

business dimension, such as Product or Customer.
Template A DecisionStream component that defines the columns

of the delivered dimension table as well as the behavior of
these columns.
Table The table containing the delivered reference data.
Database The database containing the dimension table that was

created by the dimension build.
Dimension builds are discussed further in Module 5, "Conformed Dimensions."

JobStreams Visualization
A JobStream combines nodes to form a logical process. Each JobStream node is

a single step in the process. A node can represent the execution of a fact or
dimension build, or may contain SQL statements, a set of instructions, or a
condition to control the flow of the JobStream.
The JobStream may also include alerts that can be used by NoticeCast or
instructions to send email notifications upon the completion of other tasks. The
Visualization pane shows all of the nodes contained in the selected JobStream, as
well as the order of their execution.
You can use JobStreams to schedule a set of builds that, once executed, will
create and update a data warehouse. JobStreams are discussed further in Module
13, "JobStreams."

GETTING STARTED
Tree Interface Objects
Catalog
Build
DataStream
Transformation
Model
Delivery Modules
A catalog is a set of database tables that hold DecisionStream build, reference,

and connection specifications. These tables can be stored in any of the relational
databases that DecisionStream supports. The catalog consists of the following
elements:
Element Function
Catalog All the metadata that is contained within a project.
Build A unit of work that goes through the steps of acquisition,

transformation, and delivery.
DataStream A set of instructions that a hierarchy or build uses to get

information.
Transformation Provides a physical description of what is being retrieved from

Model the data source and delivered to the target.
Delivery Collection of data targets such as fact tables, dimension tables,

Modules and metadata components.
Catalogs are discussed further in Module 2, "Create a Catalog."

JobStream and Library Interface Objects
JobStream
Dimension
Hierarchy
Templates
Connections
Functions
Element Function Instructional Tips

Templates are discussed in greater detail
JobStreams Combine builds and other DecisionStream objects to form in Module 7, "Templates, Lookups, and
logical processes. Attributes."
Dimension A category or aspect of a business (such as a product line),

which provides context to other elements (such as measures).
Hierarchies and Structures that hold related members organized into levels. A
Lookups lookup has only one level, while hierarchies can have one or
more levels.
Templates Provide, for dimension tables, additional information that is

required to properly maintain and use the tables.
Connection All the information needed to connect to a database.
Functions Routines that process a set of parameters to derive a new value.

GETTING STARTED
DecisionStream Tools
DecisionStream provides a variety of tools to assist you in developing your catalog.
Tool Function
Fact Build Automatic step-by-step option-based process to generate a fact

Wizard build enabling the delivery of facts, dimensions, and metadata.
Dimension Automatic step-by-step option-based process to generate

Build Wizard dimension tables with the ability to support surrogates and
slowly changing dimensions
Hierarchy Used to generate reference structures to deliver to or read

Wizard from dimension tables.
Date Hierarchy Used to generate a hierarchy of dates based on a date range

Wizard that you specify (for example, Jan. 1, 1995 to Dec. 31, 2010).
Reference Used to explore hierarchical data.

Explorer
SQLTerm A terminal monitor application that provides a common

interface to all the connections defined.
SQLTXT Enables text files to be defined in a way that subsequently

Designer makes it possible for DecisionStream to access the data that
the text files hold.

Customize the Interface
Modify the interface defaults, file locations, program

locations, and authorization.
From DecisionStream, you can customize your system environment. From the
Tools menu, click Options to modify the look and feel of DecisionStream. From
the Options dialog box, you can:
• have the welcome screen appear when the application starts
• have the transformation model elements and build details appear in the
Visualization pane by default (it is best practice to have these options
selected)
• have the build details appear in the Visualization pane by default
• clear the recent catalogs list

GETTING STARTED
Demo 1-1
Explore the DecisionStream Interface

Demo 1-1: Explore the DecisionStream Interface
Purpose:
We want to open an existing catalog and examine its elements,
as well as become familiar with the DecisionStream and
SQLTerm interfaces.
Task 1. Start DecisionStream, open a catalog, and then

look at Help.
1. From the Start menu, click Programs\Cognos Series 7 Version 2\
Cognos DecisionStream Designer.
The DecisionStream application opens.
If the Welcome screen appears, click Close.
2. From the File menu, click Open Catalog.
The Open Catalog dialog box appears.
3. Click ODBC to select it.
4. In the Data Source Name box, click DS_Advanced, and then click OK.
The result appears as shown below.
Instructional Tips
This may be a good time to point out to the
class where they can access the on-line
documentation from outside of
DecisionStream. Have the students
navigate from the Start menu to:
5. From the Help menu, click Contents and Index. Programs/Cognos Series 7 Version
2/Documentation/Cognos DecisionStream.
The Help Topics: DecisionStream Help dialog box appears.
6. Examine the available Help topics.
7. Close Help.

GETTING STARTED
Task 2. Explore the catalog and DecisionStream Designer

interface.
1. Click the plus sign (+) to expand Connections .

There are connections to seven data sources in this catalog.
2. Click the plus sign (+) to expand Dimensions .
There are seven dimensions in this catalog.
3. Click the plus sign (+) to expand the Staff dimension .
There are four hierarchies, a lookup, and a folder called Templates
associated with this dimension.
4. Click the plus sign (+) to expand the Staff hierarchy .
There are five levels in this hierarchy.
5. Click the plus sign (+) to expand the Staff level.
6. Click the plus sign (+) to expand the DataStream, and then double-click
the Staff data source.
The Data Source Properties window opens.
7. Click the SQL tab.
We can see the SQL statement that is used to retrieve the data from the
data source.
8. Click Cancel to close the Data Source Properties window.
9. Double-click the SalesFact build .
10. Double-click Delivery Modules. Instructional Tips
We can see the Sales fact build in the Visualization pane as well as the If students do not see the visualization in
build's DataStream and Delivery Modules. the screen capture, have them click the
SalesFact tab.
11. In the Visualization pane, right-click the Transformation Model box,
and then click Show Build Elements.

12. Right-click the Transformation Model box, and then click Show Build
Details.
Task 3. Explore the menus and SQLTerm, and then close

the catalog.
1. Click the File menu.
This menu contains all the actions we can perform on a catalog. Instructional Tips
When you attempt to expand a database
2. Repeat step 1 to examine the other DecisionStream menu items. that does not contain any tables, the egg
timer will appear (indicating that SQLTerm
3. From the Tools menu, click SQLTerm. is working). However, the plus sign will not
The SQLTerm window opens. disappear, although there are no tables in
the database.
4. Click the plus sign (+) to expand the DS_Advanced_out database.
This is the target database for the DS_Advanced catalog. None of the Questions
builds in this catalog have been run yet, therefore, the database is Q: What is displayed on the Executed SQL
currently empty. tab in SQLTerm?
5. In the Database for SQL Operations box, click DS_Sources, and then in A: The Executed SQL tab shows the last
the Database Objects pane, click the plus sign (+) to expand the SQL statement that was executed by
SQLTerm. This tab only applies to data
DS_Sources database.
sources that are accessed through
There are seven tables in this data source. Universal Data Access (UDA).
6. Close SQLTerm. SQL statements executed against data

sources that are not accessed through
7. From the File menu, click Close Catalog. UDA (for example, SQLTXT files) will not
be shown on the Executed SQL tab.
Results:
We have become familiar with DecisionStream by examining
the Help files and the DecisionStream and SQLTerm interfaces.

GETTING STARTED
Today’s Goal
Using DecisionStream, create a fact build that can be used to

report on inventory in a PowerPlay report.

Summary
In this module, we have:

looked at Cognos’ approach to data warehousing and the
role of DecisionStream in that approach
discussed conformed dimensions
discussed DecisionStream architecture
examined the interface elements

2
1. Getting Started
2. Create a Catalog
6. Derivations

Developers

CREATE A CATALOG
Objectives

discuss the purpose and contents of DecisionStream
catalogs
create a catalog
define connections to source and target data
perform basic administration for a catalog
use SQLTerm to view data
use SQLTXT to configure flat data source files

What is a Catalog?
A catalog is a container for DecisionStream metadata.
A DecisionStream catalog provides a central repository for the information that Technical Information
The catalog may take the form of a text file
defines how DecisionStream extracts, transforms and delivers data. The catalog (with a .ctg file extension) if it is a
stores DecisionStream builds, connection specifications, JobStreams, user-defined backed-up version of another catalog.
functions and the dimensional framework.
You can produce HTML documentation that
You cannot use DecisionStream unless you first select and open a catalog or summarizes or provides detailed information
create a catalog. Only one catalog can be open at a time by a single instance of about the contents of a catalog (builds,
DecisionStream Designer (you can start multiple instances of DecisionStream connections, and so on).
Designer).
After working with the catalog, you save your changes by clicking Save Catalog
from the File menu or clicking the Save button on the toolbar.
Note: The catalog does not contain the data that DecisionStream will
manipulate and deliver. It holds the configuration details that determine
where the source data is coming from and how it will be transformed and
loaded into the target data mart(s).
DecisionStream\Help\Contents and Index\Index\catalogs

CREATE A CATALOG
Physical Storage of the Catalog
Catalog Database
Development
Catalog
Tables
DecisionStream
Source Target
Connection Connection
Source Target
Connection Connection
Physically, DecisionStream maintains catalog information in database tables.

Instructional Tips
We recommend that you create a dedicated schema for each of your catalogs, As the slide indicates, it is common to have
separate from your source or target data. With a separate schema, the catalog a separate catalog for each stage of the
is more secure; only people who have access to that schema will use the development process. In this slide, there is
a separate catalog for the Development
catalog. stage, Quality Control stage, and
Production stage. Each of these catalogs is
If you are using Microsoft Access, a catalog database may contain only one
stored in a separate database.
catalog. This is because Access does not use database schema. If you will be
working with multiple catalogs in Access, each catalog must be stored in a
separate database. Technical Information
If you choose to store a catalog in a
A catalog database contains the following tables: database that is not in the list of supported
native connections, DecisionStream must
Table Contents use ODBC to connect to it. As a result, you
ds_audit_msg Audit history table containing all message, error, or must create an ODBC Data Source Name
(DSN) for the database before you create
warning codes. the catalog. Select ODBC Administrator
ds_audit_msg_line Audit history table containing descriptions for all the from the Tools menu within
messages returned by a process. DecisionStream or Settings/Control
Panel/Administrative Tools/Data Sources
ds_audit_trail Audit history information about a JobStream or build. (ODBC) from the Start menu in Windows.
ds_catalog The database schema that the catalog uses.
We recommend that you create system
ds_component Information about catalog components. DSNs. These DSNs are available to
ds_component_line More detailed information about catalog components. everyone who can access the computer,
including users with administrative
ds_component_run Execution history of each catalog component. privileges.
ds_delivery_hist Details of data delivery to target data mart(s).
ds_jobnode_run Execution history of each JobStream in the catalog.
ds_sequence Control file for audit IDs.

Shared Items in a Catalog
Library items can be used

throughout the catalog in a
variety of different builds and
JobStreams.
A catalog library consists of:
Dimensions
Connections
Functions
Each catalog contains a Library, which, in turn, holds the dimensions, which Instructional Tips
JobStreams will be discussed in Module 13,
make up the dimensional framework, connections to various data sources, and "JobStreams." The dimensional framework
user-defined functions. will be covered in Module 3, "Create
Hierarchies." Fact and dimension builds will
Items in the Library may be used throughout the catalog in different fact builds, be discussed in Module 4, "Create Basic
dimension builds and JobStreams. The components stored in the Library are used Builds." User-defined functions will be
throughout the catalog, enabling the reuse of these components. You can build covered in Module 12, "User-Defined
multiple projects using the same supporting Library components, which shortens Functions and Variables."
the development time for these projects.

CREATE A CATALOG
Create a Catalog
First, create an empty database to hold the catalog (outside

of DecisionStream). Then, create the catalog from within
DecisionStream.
Empty Database
After you create the database, you can create the catalog. Click New Catalog from Instructional Tips
the File menu, or click the New catalog button on the toolbar. You must then If a catalog is already open,
type a name for the new catalog and (if preferred) a business name and DecisionStream displays a message
description of the catalog. Click Next to finish creating the catalog. informing you that the current catalog will
be disconnected. Click Yes to acknowledge
this message.
From the left pane, you select the physical database (you just created) that will
hold the catalog tables. The New Catalog dialog box will show fields that are The shortcut for creating a new catalog is
appropriate to the type of database that you have selected (such as ODBC or Ctrl+N.
SQL Server).

Connect to Sources and Targets
Sources Targets
Flat Files
Connect to Sources
Databases and Targets
Flat Files Databases
Each connection provides information so that DecisionStream can link to a data Technical Information
Many data sources can be used with
source or target. The connection: DecisionStream, but not every connection
method may be available on your
• identifies the particular data source or target computer. If you do not have a specific
connection method, DecisionStream will
• specifies the connection method that must be used to connect to the data indicate this in the Connection Properties
dialog box. The list of available connection
methods depends on the scope of your
• provides information, such as a user name and password, that the DecisionStream license.
database management system requires when DecisionStream connects to
the data Cognos SQL is an extension of SQL 99.
Using Cognos SQL, you have a greater
• specifies the dialect of SQL used by the connection (either native SQL or degree of portability between mixed
database environments because a
Cognos SQL) common dialect can be used.
The connections are contained within the DecisionStream catalog and are specific By default, a connection accepts any
to that catalog. vendor-specific SQL SELECT statement in
a data source, including nonstandard SQL
The source data may come from relational databases or flat files. Flat files are extensions and hints. A connection or data
described in definition files (.def) by using the SQLTXT Designer tool and then source can optionally use Cognos SQL.
You cannot use Cognos SQL for SQLTXT
accessed in the same way as regular relational databases. You can define several
connections.
different connections within a catalog, including ones to DB2, Oracle and
Microsoft SQL Server data sources. You can deliver the transformed data to
various targets, including databases and flat files. The related metadata may be
delivered to PowerPlay Transformer models, Architect models, and Impromptu
catalogs and reports.

CREATE A CATALOG
Select the Cognos SQL check box to use Cognos SQL when you construct an
SQL statement for components using this connection. If you clear this check
box, you must use native SQL for the database you are accessing.
The selection that you make here determines the default for the Cognos SQL
check box in other components that use this connection.
DecisionStream\Help\Contents and Index\Index\connections

What is SQLTerm?
SQLTerm is the DecisionStream terminal for SQL.
Once connections to data sources are established within a catalog, you can use
SQLTerm against them to view the data they contain, as well as change the
structure and contents of that data.
SQLTerm is the DecisionStream terminal for SQL. You can run SQL statements
against any data source that DecisionStream can access.
Using SQLTerm, you can compose and run different types of SQL statements by
using:
• Data Definition Language (DDL), such as CREATE TABLE
• Data Manipulation Language (DML), such as SELECT FROM
• Connection Queries, such as SOURCES
To display SQLTerm, you can either click SQLTerm from the Tools menu, or
click the Run the SQLTerm Tool button on the toolbar.
DecisionStream\Help\Contents and Index\Index\SQLTerm

CREATE A CATALOG
SQLTerm: Layout and Usage
Interrupt current
processing
Execute the SQL Clear SQL query Database for Specify that you want to use
Query, and limit and results SQL operations Cognos SQL instead of native SQL
results to one row
Execute the
SQL Query
Displays the
SQL statements
that were
executed (not
available for
SQLTXT data
sources)
Database Objects
Using SQLTerm, you can write and test SQL statements against your data Technical Information
The Executed SQL tab is especially useful
connections. Also, because you can view the data itself, SQLTerm can give you if you are using Cognos SQL to execute
greater insight into what each data source contains and what it can be used for. your queries. The Executed SQL tab will
show you the actual SQL that has been
The SQLTerm interface shows a list of the database connections. You select a executed in the database's native dialect.
database from this list and then run the SQL operations you need.
The interface also includes four panes.
Pane Description
Database Objects Displays a "tree view" of the tables within the current
database.
SQL Query Displays the SQL statement under construction.
Results Displays the results of the SQL statement after it has

been run.
SQL Language Specify if you wish to use Cognos SQL instead of

native SQL
To create an SQL statement for execution, you can construct the entire statement
by hand. You can also right-click a table or column in the Database Objects pane
and click one of the most commonly used options, such as Select rows, from the
shortcut menu. This will automatically produce a statement in the SQL Query
pane that selects all the rows from a table.

You can also use any of several "click-drag" options to create your SQL
statement. For example, using the "control-click-drag" option on a table object
will create an SQL SELECT statement that explicitly includes all columns from
the table.
Select the Cognos SQL check box to specify that you want to use Cognos SQL
when you construct the SQL statement. If you clear this check box, you must use
native SQL for the database you are accessing.
Note: The default for the Cognos SQL check box is determined by whether
you selected the Cognos SQL check box in the Connection Properties
dialog box. You can modify this setting if necessary.

CREATE A CATALOG
SQL Helper: Test Your Query First
The SQL Helper interface is identical to that of SQLTerm and

can be accessed in several areas within the catalog.
When you add an SQL statement in a window or dialog box, DecisionStream Instructional Tips
Emphasize that you can use SQL Helper to
displays an SQL Helper button, which opens the SQL Helper window. This help create hierarchies, lookups and fact
interface is similar to that of SQLTerm. Again, this interface makes it easier to builds. Essentially, any catalog component
specify the tables and columns that the catalog will use in the builds and the that uses a DataStream lets you access the
dimensional framework. You can write and test SQL statements before running SQL Helper tool.
builds or exploring hierarchies.
The only difference between SQLTerm and SQL Helper is that SQL Helper has
OK and Cancel buttons. Because SQL Helper is usually passed an SQL
statement, which you can modify, you can keep changes by clicking the OK
button or discard them by clicking the Cancel button.

Demo 2-1
Create a Catalog and Register Data

Sources

CREATE A CATALOG
Demo 2-1: Create a Catalog and Register Data Sources
Purpose:
We must develop a DecisionStream catalog to hold a simple
data mart that we will later analyze by using PowerPlay. First,
we must create a data source to hold the catalog. Then, we
must create the catalog itself. Finally, we must add a
connection within the catalog to a database that contains
transactional data.
Task 1. Create a data source to hold the catalog.
1. From the Tools menu, click ODBC Administrator. Instructional Tips
You can also access the ODBC Data
The ODBC Data Source Administrator dialog box appears. Source Administrator in Windows 2000
through the Start menu. Click
2. Click the System DSN tab, and then click Add.
Settings/Control Panel/Administrative
The Create a New Data Source dialog box appears. Tools/Data Sources (ODBC).
Make sure Microsoft Access Driver (*.mdb) is selected.

3. Click Finish.
The ODBC Microsoft Access Setup dialog box appears.
We are now ready to register a new data source for our DecisionStream
catalog.
4. Click Create, and in the Directories list, navigate to
C:\Edcognos\DS7001.
5. In the Database Name box, type Day1Database.mdb.
6. Click OK.
The ODBC Microsoft Access Setup dialog box appears, informing us
that we have successfully created our database.
7. Click OK, and then in the Data Source Name box, type Day1Catalog.
8. Click OK, and then click OK again to close the ODBC Data Source
Administrator.
We have successfully created the data source that will hold our catalog.

Task 2. Create the catalog and connect to the

Day1Catalog data source.
1. From the File menu, click New Catalog.
The New Catalog dialog box appears.
2. In the Name box, type Day1Catalog.
3. In the Business name box, type Day1Catalog, and then click Next.
4. In the list of databases on the left side, click ODBC.
5. In the Data Source Name box, click Day1Catalog, and then click Test
Connection.
The DecisionStream Designer dialog box appears, showing us that the
connection was successful.
6. Click OK to close the DecisionStream Designer dialog box, and then
click Finish.
Day1Catalog opens in DecisionStream.
We have successfully created a catalog and connected it to the database
that will hold it. The result appears as shown below.
Task 3. Create data sources to be used within

Day1Catalog.
1. From the Tools menu, click ODBC Administrator.
The ODBC Data Source Administrator dialog box appears.
The Create New Data Source dialog box appears.
Make sure Microsoft Access Driver (*.mdb) is selected.
3. Click Finish.
We are now ready to register a new data source to use within our
DecisionStream catalog.
4. Click Select, and in the Directories list, navigate to
C:\Edcognos\DS7001.

CREATE A CATALOG
5. Click dssales.mdb in the Database Name list, and then click OK.
6. In the Data Source Name box, type DS Sales.
7. Click OK.
We have successfully created a data source that can be used within the
catalog.
8. Repeat steps 2 to 7 to add three data sources named DS Reference, DS
Stock and DS Output.
The DS Reference data source will be based on the dsref.mdb database.
The DS Stock data source will be based on the dsstock.mdb database.
The DS Output database will be based on the dsout.mdb database. All
three of these databases are in the C:\Edcognos\DS7001 folder.
9. Click OK to close the ODBC Data Source Administrator.
Task 4. Define a connection to the Sales data source.
1. Right-click Connections , and then click Insert
Connection.
The Connection Properties dialog box appears.
2. In the Alias box, type Sales.
3. Click the Connection Details tab, and then click ODBC in the list of
databases on the left side.
4. In the Data Source Name box, click DS Sales, and then click Test
Connection.
A dialog box appears, indicating that the connection is OK.
click OK to close the Connection Properties dialog box.
We have successfully connected to a transactional data source.
Task 5. Use SQLTerm to view the data in the Sales data
Instructional Tips
source. You can also type any valid SQL
statement in the SQL Query window
1. From the Tools menu, click SQLTerm.
and run it in the same fashion.
SQLTerm opens.
2. Maximize the window if necessary, and in the Database for SQL
Operations box, beside the toolbar, click Sales.
3. In the Database Objects pane, double-click Sales.
The tables ds_forecast and ds_sales are now available for analysis.

4. Right-click ds_sales, and then click Select rows.

Notice the back accents around the
ds_sales table in the SELECT statement.
This is a convention of the Access syntax
of SQL. The back accents are not
necessary, but they will be inserted
automatically when you click one of the
5. Click the Execute SQL Query button . commands from the shortcut menu in the
Database Objects pane.
The query runs and reads 10529 rows of data.
6. Click the Clear SQL Query and Results button to clear both
panes.
7. Repeat Steps 4 through 6 to view the data in the ds_forecast table, and
then close SQLTerm.
8. From the File menu, click Save Catalog.
Results:
We have created a DecisionStream catalog and data sources,
and then added a connection within the catalog to one of these
data sources. We then viewed the data by using SQLTerm.
This transactional data will be used to populate the data mart.

CREATE A CATALOG
Backup and Restoration of a Catalog
Backup files are useful for recovery and emergency

situations.
Write the catalog tables to an ASCII text file for later
recovery.
It is a good practice to back up your catalogs, which involves writing a text Instructional Tips
version of the catalog tables to a file (with a .ctg file extension). The back-up You can backup to an existing .ctg file or
process is useful for recovery and emergency situations. For example, changes create a new file.
may have been made to a catalog, but later these changes caused problems. The
The restoration process may take a long
catalog can then be restored (using the backed-up text file) to an earlier version. time, especially if the catalog has a lot of
components that must be reproduced.
Backups are invaluable if a catalog has been permanently corrupted and needs to
be replaced. Backups can also help you move from one DBMS environment to
another, or send copies of the catalog to others without sending the entire
database.
When restoring a catalog from a .ctg file, first create another .ctg file of the
current catalog that you want to replace because DecisionStream deletes all data
from the current catalog before performing the restoration. If no copy of the
catalog is created, and the .ctg file is defective in any way, the contents of the
current catalog will be lost.
DecisionStream\Help\Contents and Index\Index\catalogs: backing up

Demo 2-2
Backup and Restore a Catalog

CREATE A CATALOG
Demo 2-2: Backup and Restore a Catalog
Purpose:
It is possible that Day1Catalog will become unusable. Also, we
may have to move this catalog to a different system later on.
As a result, we must back up Day1Catalog to a Catalog Backup
(.ctg) file. We will then restore the current catalog by using this
file.
Task 1. Make a back-up copy of Day1Catalog and save it to

Instructional Tips
the data folder. When you are prompted to save your
1. From the File menu, click Backup Catalog. catalog before backing up, click Yes.
Otherwise, the most recent changes to the
The Backup Catalog dialog box appears. catalog will not be included in the .ctg file.
2. In the Save in list, navigate to C:\Edcognos\DS7001.
3. In the File name box, type Demo 2-2, and then click Save.
The Demo 2-2.ctg file is saved to the Edcognos folder.
Task 2. Delete the Sales connection from Day1Catalog and Technical Information
To open the backed-up catalog in a text
then restore the catalog to its original state by editor, press Shift and right-click the file
using the Demo 2-2.ctg file. and then click Open With. In the Open With
dialog box, select a text editor (such as
1. Click the Sales connection, and then press Delete. Notepad). Click the Always use this
program box to open this file box to create
A dialog box appears, confirming whether we want to delete the item. an association between the text editor
2. Click Yes. program and files with a .ctg file extension.
3. From the File menu, click Restore Catalog.

The Restore Catalog dialog box appears.
4. In the Look in list, navigate to C:\Edcognos\DS7001, click
Demo 2-2.ctg, and then click Open.
The current catalog is restored using the previously backed-up copy. The
restored Day1Catalog includes the Sales connection that we had deleted.

5. Expand Connections.
Results:
We have backed up Day1Catalog to a text file and then
restored the catalog by using the text file.

CREATE A CATALOG
SQLTXT: SQL Over Text
SQLTXT is an implementation of SQL over text files. With the SQLTXT DBMS Instructional Tips
You cannot use the JOIN clause within
driver, you can access data in delimited text format through SQL. This is especially SQLTXT, therefore, use single table
important because, in many cases, much of the data to be stored in the warehouse SELECT statements. Also, you cannot use
must be obtained from individual flat files instead of relational databases. the GROUP BY or ORDER BY clauses. Use
the SELECT DISTINCT clause instead.
SQL usually can be used only against relational databases. However,
DecisionStream has its own SQL parser that makes SQL access on a flat file
possible.
The SQLTXT component creates a definition file that you can use to specify one Technical Information
DecisionStream cannot be used to directly
or more flat file definitions, ASCII or EBCDIC. The definitions are stored in a access mainframe files (such as those
file that has a .def file extension. The definition file (.def) can be maintained existing on an MVS computer). However,
through the SQLTXT Designer or can be edited manually in a regular text editor this is not necessarily a bad thing, because
such as Notepad. we usually do not want users to have such
direct access anyway. Mainframe files can
SQLTXT restrictions are: be exported to text and transferred
elsewhere by using File Transfer Protocol.
They can then be accessed with SQLTXT
• single table SELECTS (no table joins) like other text files.
• no updates
• no sorting or grouping
DecisionStream\Help\Contents and Index\Index\SQLTXT Designer

SQLTXT Designer: The User Interface
Inventory data within a flat …becomes a table called AdditionalSales1

file (AdditionalSales1.txt)... contained in a file called DS_Sources.def: a
definition file. This table can be
manipulated in the same manner as a table
in a relational database.
SQLTXT has a user interface to maintain specifications showing how text files You can also use the STREAM database to
are interpreted as database tables. This interface includes table and column access text files directly without using
maintenance facilities. SQLTXT Designer to configure them first.
See the User Guide for more information.
The records of text files can be delimited (typically by a carriage return) or of a
fixed length.
Fields within each record may also be delimited (typically by a comma or a tab) or
of a fixed length.

CREATE A CATALOG
SQLTXT Table Definition
A flat text file is defined as a relational table within a

database by using the SQLTXT interface.
To add a table manually, select the file that holds the data, and then add the
required columns.
You must then provide the values in the following table:

Field Description
Data file Enter the name of the data file that holds the data that
you want to configure.
Header Lines Type the number of lines at the top of the file that
DecisionStream must ignore.
Trailer Lines Type the number of lines at the end of the files that
DecisionStream must ignore.
Character Set Examples include ASCII, EBCIDIC, and Unicode,
among others. If the character set you want is not
included in the list, you can type it in.
Action on Row Error Select Error, Warn or Ignore, depending on how you
want SQLTXT to handle data rows that contain errors.
Record Definition Select the Fixed Length or Delimited Records option.
If you select one of the first two options, make
selections from the applicable lists. The Ignore Extra
Columns box can be selected in either case, but it is
only useful for files that have delimited records. Files
with fixed-length records will not have extra columns.

SQLTXT Column Definition
The columns in the text file become columns in a relational

table, each with its own separate properties.
To add columns to the table, select the table and then select Insert Column. Technical Information
SQLTXT data types have various
characteristics:
You must then provide the values in the following table: • BOOLEAN: represents TRUE or
FALSE in one or more characters.
Field Description • CHAR: holds string values, the length
of which cannot exceed the column
Data Type Select from BOOLEAN, CHARACTER, DATE, length specification. For delimited
FLOAT, INTEGER and NUMBER. files, you must enclose the string
value in quotation marks. For fixed-
Length Type the length of the data type if you selected width files, it left-justifies the string
value and right-pads it with spaces to
INTEGER, CHARACTER, or FLOAT. the column length.
• DATE: represents date and time
Format This field is available for date formats if you selected a values.
date data type. • FLOAT: represents floating point
numbers as strings. If the record is
Nullable Select the Nullable check box if the column can accept fixed length, the number is right-
null values. justified and left-padded with spaces
to the column width.
Key Select the Key check box if the column forms part of • NUMBER: can be any number of any
size.
the primary key of the table. Flat files typically do not
• PACKED: stores decimal numbers
have keys. with two digits per byte. This data type
is available only for fixed-width files.
Tables and columns can be edited, renamed, or deleted. • ZONED: stores decimal numbers with
one digit per byte. This data type is
available only for fixed-width files.

CREATE A CATALOG
SQLTXT Import Wizard
The Import Wizard provides an easy way to format the data

in your definition file so that it can be used effectively as a
data source.
The SQLTXT Import Wizard takes a step-by-step approach to adding a new

table or text file to an SQLTXT database. The Wizard also scans the number of
rows in the table to establish formatting and data types.
The Import Wizard guides you through the following steps:
Step Description
Select whether records If the records are delimited, choose the delimiter (for
are delimited example, CR-carriage return). If the records are of a
fixed length, specify the record size.
Preview rows Select the number of rows that you want SQLTXT to
sample.
Header rows Select the number of header rows that the file contains.
File character set Examples include ASCII, EBCIDIC, and Unicode,

among others. If the character set you want is not
included in the list, you can type it in.
Specify whether If the columns are delimited, select the delimiter used to
columns are fixed or separate the columns in the table. If the columns are of
delimited a fixed width, you will specify the width of the columns
in the next step.
Name and format The wizard prompts you to name and format each
column.

Import Wizard: A Simple View of the File
The first screen in the Import Wizard prompts for file type and file header
information.
If the columns are delimited, you can specify the delimiter type in the second
screen of the Import Wizard. If the columns are of a fixed width, you can specify
the width of each column.
In the third screen of the Import Wizard, you may change the column headings
and data types for each column.
Note: If you change the data type for a column, that type will change only for
the number of preview rows set in the first screen. To apply the change to
all the rows, click the Process All Records button.

CREATE A CATALOG
Demo 2-3
Configure a Flat File by Using

SQLTXT

Demo 2-3: Configure a Flat File by Using SQLTXT
Purpose:
We have transactional data that we want to incorporate into
our data mart. This data consists of sales figures within a flat
file, which must be configured to produce a definition file. We
then must access this file within Day1Catalog through a data
source connection. Finally, we want to view the contents of
this file by using SQLTerm.
Task 1. Import table data into a new definition file.

1. From the Tools menu, click SQLTXT Designer.
DecisionStream SQLTXT Designer appears.
Maximize the window if necessary.
2. From the File menu, click New.
3. Right-click the untitled database, and then click Import Table Data.
The Import SQLTXT Table Data File dialog box appears.
4. In the Look in list, navigate to
C:\Edcognos\DS7001\AdditionalSales1.txt, and then click Open.
The Import Wizard appears.
We will now use this wizard to convert AdditionalSales1.txt into a format
that can be accessed with the SQLTXT driver.
5. In the Number of header rows box, type 1, and then click Next.
6. Select COMMA from the Delimiter type list, and then click Next.

CREATE A CATALOG
7. Click Finish, and then click in the left pane.

The result appears as shown below:
8. From the File menu, click Save As.

The Save Definition File As dialog box appears.
9. In the File name box, type AdditionalSales1.def, and then click Save.
10. From the File menu, click Exit.
We have configured a text file to produce a definition file that we can
interact with by using SQL.
Task 2. Create a connection to the .def file in Day1Catalog.
1. Right-click the Connections icon, and then click Insert Connection.
2. In the Alias box, type AdditionalSales, and then click the Connection
Details tab.
3. In the left pane, click SQLTXT.
4. Click the Browse button .
The Select File dialog box appears.
5. In the Look in list, navigate to
C:\Edcognos\DS7001\AdditionalSales1.def, and then click Open.
6. Click Test Connection.

Task 3. Verify the SQLTXT specification using SQLTerm.

1. From the Tools menu, click SQLTerm Technical Information
Notice the double-quotation marks around
SQLTerm appears. the Additional Sales1 table in the SELECT
2. In the Database Objects pane, double-click AdditionalSales. statement. This is a convention of the
SQLTXT syntax of SQL. They are not
3. Right-click AdditionalSales1, and then click Select rows. necessary but will be inserted automatically
4. Click Execute SQL Query. when you choose one of the options from
the shortcut menu in the Database Objects
The result appears as shown below: pane.
5. Close SQLTerm.
Task 4. Save and backup the catalog.
2. From the File menu, click Backup Catalog.
The Backup Catalog dialog box appears.
3. In the Save in list, navigate to C:\Edcognos\DS7001.
4. In the File name box, type Demo 2-3, and then click Save.
Results:
We have configured a flat file to produce a definition file,
accessed this file within Day1Catalog through a data source
connection, and then viewed its contents by using SQLTerm.

CREATE A CATALOG
Summary

discussed the purpose and contents of DecisionStream
catalogs
created a catalog
defined connections to source and target data
performed basic administration for a catalog
used SQLTerm to view data
used SQLTXT to configure flat data source files


CREATE A CATALOG
Workshop 2-1
Define Connections to Data Sources
Workshop Format
The following workshops have been designed to allow you to work at your own
pace. The workshops are structured as outlined in the following sections.
The Business Question Section
The first page of each workshop presents a business-type question followed by a
series of steps. These steps provide additional information to help guide the
student through the workshop. Within each step, there may be numbered
questions relating to the step. Solve the tasks by using the skills you learned in this
module and in the previous ones. If you need more assistance, you can refer to
the Task Table section that provides more detailed instruction.
The Task Table Section
The second page of the workshop is a Task Table that presents the question as a
series of numbered tasks to be accomplished. The first column in the table states
the task to be accomplished. The second column, Where to Work, indicates the
area of the product to work in. Finally, the third column provides some hints that
may help you complete the workshop. If you need more assistance to complete
the workshop, please refer to the Step-by-Step section in Appendix A.
The Workshop Results Section
This section contains a screen capture(s) of an interim or final report and/or
answers to the questions asked in the Business Question section.
The Step-by-Step Section
The Step-by-Step instructions for completing all the tasks are in Appendix A of
the Student Guide. Each task in the Task Table is expanded into numbered steps,
scripted like the demos.

Workshop 2-1: Define Connections to Data Sources

Now that the catalog is created and connected to a data source, we can continue
developing the data mart. The instructor will later use this data mart to create a
PowerPlay cube, which will be used to view and analyze product inventory data.
The data mart must contain all the relevant information about the business. As a
result, we must obtain data from all the necessary databases. Also, the data mart
itself will take the form of relational tables that will be located in a separate
database. Your assignment is to define connections to three additional data
sources within Day1Catalog.
The first data source contains inventory information about each product that the
company has in stock. The second data source holds reference data (including a
standard list of the 16 products that the company sells), and the third data source
will contain the completed data mart.
To accomplish this workshop:
• create connections to the DS Stock, DS Reference and DS Output Data

Source Names (DSNs). In DecisionStream, name these connections
Stock, Reference, and Output, respectively.
• use SQLTerm to view the data in the Stock and Reference data sources.
This will give us some idea of what we have to work with. Make sure you
save your catalog when you are finished.
For more detailed information outlined as tasks, see the Task Table on the next
page.
For the final result, see the Workshop Results section that follows the Task
Table.

CREATE A CATALOG
Workshop 2-1: Task Table

TASK WHERE TO WORK HINTS
1. Add the Stock, Output, Connections node, Build tree • Use the DS Reference, DS
and Reference Stock, and DS Output
connections to data sources.
Day1Catalog.
2. Use SQLTerm to view the SQLTerm • Right-click a table or

data. column to obtain a
SELECT statement.
3. Save and backup the

catalog.
If you need more information to complete a task, see the Step-by-Step

instructions for this exercise in Appendix A.

Workshop 2-1: Workshop Results

3
1. Getting Started
2. Create a Catalog
6. Derivations

Developers

CREATE HIERARCHIES
Objectives

discuss the role of the dimensional framework in
DecisionStream
create hierarchies from three common data structures by
using the Hierarchy wizard
test a hierarchy
use the Date Hierarchy wizard
handle weeks in a date hierarchy

What is a Dimension?
Business dimensions are the core components or categories

of a business, anything that you want to analyze in reports.
Business dimensions are fundamental to data organization
and provide context for numeric data items, or measures.
Aggregating across locations
Dimensions provide context for the key performance indicators (KPIs) that a
business uses to measure its performance.
For example, a retail chain-store may categorize its sales data by the products that
it sells, by its retail outlets, and by fiscal periods. This organization has the
business dimensions Product, Location, and Time. The measures of the business,
such as how much it sells, lie at the intersection of these dimensions.
The slide example shows these dimensions as the axis of a three-dimensional

space. The cube at the center of this space represents 100 units of widgets sold in
Montana during July.
You can derive summary information by aggregating data along one or more
dimensions. The slide example on the right shows the aggregation of data along
the Location dimension to give the total sales of widgets during July.

CREATE HIERARCHIES
Share Dimensions Across Data Marts
These dimensions are shared

between the Sales Fact and
Inventory Fact tables.
Sales Org. Fiscal
Dimension Dimension
Vendor
Dimension
Location Sales Product Inventory
Dimension Fact Dimension Fact
Inventory
Dimension
Shipper Staff
Dimension Dimension
Data marts are dimensional. The dimensional framework that can be designed
using DecisionStream, as shown in the slide example, permits the reuse of
common dimensions. The slide diagram demonstrates a data mart that is not a
"stovepipe" but one that can be used for more detailed analysis. Users can drill
across from one subject area to another through the shared dimensions.
You may use any approach suited to your organization in developing your data
marts. We recommend that you create dimensions that can be shared across data
marts. Sharing makes it easier, when done correctly, to link separate fact tables
and for users to develop reports across the enterprise and not just on a single area
of the business. This is the result of a strong dimensional model. The conformed
dimensions approach is outlined in greater detail in Ralph Kimball's writings.
DecisionStream is used to create conformed (shared) dimensions in the

dimensional framework. Conformed dimensions are the keys to building an
enterprise data warehouse quickly and effectively. This topic is covered further in
Module 5, "Conformed Dimensions."

Build slide.
How DecisionStream Creates the Data 6 clicks to complete.
Warehouse
Dimensional
Dim Framework
Data Product Time Location
Sources 1
3.
5b.
DataStream
Dimension
Data Delivery
Source
5a.
2 Fact
DataStream
Delivery
Data 5c.
Source
MetaData
Delivery
After the required data sources are identified and the catalog is created, the first Instructional Tips
step in developing a conformed data mart is to develop the dimensional It is important to orient your students on a
framework. continual basis so that they are aware of
where they currently are in the
development lifecycle of their data mart.
The dimensional framework consists of multiple hierarchies that represent the
logical structure of the data independent of any physical data source. The Within its catalogs, DecisionStream uses
dimensional framework is defined in the catalog. It represents the core hierarchies to define business dimensions.
components of your business. The dimensional framework represents the way the This is indicated at the top of the slide.
organization thinks about its data, instead of the way it is physically stored.
A dimension is something by which you want to categorize or analyze the data.

Examples of dimensions include products, customers, and sales staff. These
dimensions are the context through which you analyze your business.
A dimension is a grouping of related reference data, whereas a single hierarchy is

a definition of that relationship. For example, a Product dimension contains
information on Product Type, Product Group and Product levels of detail.
DecisionStream represents this as a hierarchy within a dimension. There is no
limit to the number of hierarchies that you can have in a dimension.

CREATE HIERARCHIES
Dimensional Framework
Dimensions are logical views fed by physical data structures.

Dimensions define the hierarchical structure of the reference data.
Instructional Tips
The dimensional framework shown in the slide is made up of the Date, Fiscal, Remind students that connecting to the
Location and Product dimensions. The Hierarchy wizard can help you construct required data sources is the first step in the
a dimensional framework of this sort. process of creating the dimensional
framework. Maybe you could pose it as a
DecisionStream uses SQL to retrieve the required data from the source database. question.
The DataStream gathers together the data sources and defines where the data is
assigned in the hierarchy. For example, the Product dimension requires
information about product types. The product data resides in the source database.
The DataStream assigns the product type information to the Product Type level
in the Product hierarchy.

Dimension Components
A logical view of the dimension is created within the library.
A dimension can be made up of any number of these reference structures:
Component Function
Hierarchy A particular view of a business dimension which organizes

the structure data into levels that represent parent/child
relationships. Each hierarchy can have as many levels as you
require.
Auto-Level A hierarchy that is structured solely in terms of parent-child

Hierarchy relationships and does not have a fixed number of levels. For
example, a Staff auto-level hierarchy may contain employees
who report to managers who, in turn, may report to other
managers (a recursive relationship).
Lookup A single-level dimension used primarily for data validation.
DecisionStream\Help\Contents and Index\Index\dimensions

CREATE HIERARCHIES
Hierarchy Components
Members populate the hierarchy.
Dimension
Hierarchy
Level
DataStream
Data Source
Static Members
Template
A hierarchy is made up of several components each with a specific purpose.

Component Function
Member A unique occurrence of data within a level. For example, a
specific customer or a specific product.
Level A basic component of a hierarchy. Hierarchies are made up of
levels. A level is defined as a set of attributes. Members at a
level are related to the members at the next highest level (their
parents) and at the next lowest level (their children). Members
are the rows of data that populate the hierarchy.
DataStream A component that is used to retrieve data from identified
sources of data. A DataStream may have multiple data sources.
Data Source An SQL query that retrieves data from a specific data
connection.
Static Member A member that does not come from a data source. The
designer manually inserts it. The ALL member, usually located
at the top of hierarchies, is a static member. Static members are
useful for things that never change (for example, Months).
Dimension Provide information that is required to properly maintain the
Template(s) dimension table. Templates are discussed in detail in Module 7,
"Templates, Lookups, and Attributes."

Map Data into a Hierarchy
State Data Source State Level
To enable the flow of data from a data source to a reference structure, you must Instructional Tips
map the data source columns to the DataStream items, and then map the Note that if you want to modify the SQL (for
example, to add more columns to the
DataStream items to the level attributes. SELECT statement) you must go back and
re-map the columns added or modified to
Mapping lets you incorporate multiple data sources and transfer identical data the level attributes within the DataStream.
from these sources into the same attributes of a hierarchy level. As shown in the
slide example, you might have two tables that contain similar information but in
different languages, such as English and French. In this case, you will want to
map the data coming from the columns 'state_name' (in the English source) and
'nom_du_province' (in the French source) into the same level attribute (in this
case, "state_name").
DecisionStream\Help\Contents and Index\Index\mapping, data sources to

DataStream items

CREATE HIERARCHIES
Hierarchy Wizard
The Hierarchy wizard helps you create hierarchies and lets you choose a table Instructional Tips
structure that most accurately describes the reference data source. Using the You might want to inform the students at
Hierarchy wizard, you can select tables and columns and use expressions for this point that constructing a hierarchy from
the rows of one table will usually result in
member ids, captions, and parents. The Hierarchy wizard creates an ALL level at ragged hierarchies. Creating a hierarchy
the top of the hierarchy by default. It also assists in creating levels, defining from multiple tables (snowflake schema)
additional attributes and it generates a template to define the properties of the data usually implies that the source data is very
attributes in the data source. well normalized.
The Hierarchy wizard has the following options:
Type Usage
Hierarchy from the Use this option when your source data is in a single
columns of one table table with specific columns representing levels in the
hierarchy.
Hierarchy from the rows Use this option when your source data is in a single
of one table table with sets of rows for each level in the hierarchy,
each related by a parent Id column.
Hierarchy from multiple Use this option when your source data comes from
tables multiple tables, with each table representing a single
level in the hierarchy. Each row has a parent Id that
relates to another row in its parent table.
Note: You must be connected to your reference data sources to use the
Hierarchy wizard.

Demo 3-1
Determine the Hierarchy Types

CREATE HIERARCHIES
Demo 3-1: Determine the Hierarchy Types
Purpose:
As mentioned earlier, we are creating a data mart that the
instructor can use to analyze the company's product inventory
in the U.S. by using PowerPlay. Therefore, we must look at our
data sources to determine the types of hierarchies that we
must create.
Task 1. Determine the types of hierarchies to create.
1. In Day1Catalog, in the toolbar, click SQLTerm .

2. Click Reference from the Database for SQL Operations list.
3. In the Database Objects pane, double-click Reference.
4. Right-click ds_location, and then click Select rows.
5. Run the SQL query.
This table contains a column called timezone_cd. If we expand
ds_timezone, we can see that this column also exists in that table. In the
ds_location table, it serves as a foreign key column. Therefore, we must
create the Location hierarchy from multiple tables (ds_location and
ds_timezone).
6. Click Clear SQL Query and Results.
7. Repeat Steps 4 and 5 for the ds_fiscal table.
We can see that we must create the Fiscal hierarchy based on the
columns from one table (ds_fiscal) because the columns of this table
reflect a hierarchical structure.
8. Click Clear SQL Query and Results.
9. Repeat Steps 4 and 5 for the ds_product table.
We can see that we must create the Product hierarchy based on the rows
from one table (ds_product) because each row contains a primary key
and a parent Id to a higher level (parent_product_cd).
10. Close SQLTerm.
Results:
We have looked at the reference data sources and have
determined the three different types of hierarchies we must
create. We will create one hierarchy from the columns of a
table, one from the rows of a table, and one from multiple
tables.

Create the Hierarchy from the Columns

of One Table
Each column represents a level.
Year
Quarter
Year Quarter Period
1995 1995Q1 199501
1995 1995Q1 199502 Period
1995 1995Q1 199503
1995 1995Q2 199504
A hierarchy created from the columns of one table is used to implement a

hierarchy that is based on the relationships between columns in the same data row.
The slide example shows a fiscal hierarchy based on the relationship between
columns in the same data row. The source table includes year, quarter, and period
columns. Each year contains quarters, and each quarter contains periods. The
ascending hierarchical order is therefore period→quarter→year. Each data row
identifies the year, quarter, and period to which it relates.
Note: The example in the slide is not standard for an operational system.
However, an earlier attempt at a data model might present the data in this
manner. This is also the structure you might see in the result set of
complex SQL queries that join multiple tables.
DecisionStream\Help\Contents and Index\Index\Hierarchy wizard

CREATE HIERARCHIES
Define the Hierarchy from the Columns

of One Table
1. Name the hierarchy.

2. Select the source table.
3. Define a static top level
(optional).
4. Insert each level.
5. Add additional attributes
to the structure.
Using the Hierarchy wizard, follow these steps to define a hierarchy from the
columns of one table.
Step Description
Name the hierarchy Provide a descriptive and meaningful name for the
hierarchy. You can also provide a caption and notes.
Select the source table DecisionStream queries the list of tables in the
connection once. If you cannot connect to the data
source the query will fail.
Define a static top level This step is optional. You can create an artificial
ALL level, and DecisionStream automatically
supplies a name, caption, and Id value for the
ALL level.
Insert each level You can add, delete, edit, or re-order the levels in a
hierarchy. The wizard then creates the level
definitions and generates the data sources for the
levels.
Add attributes to the You can add additional attributes to each level in
structure the hierarchy.

Create the Hierarchy from the Rows of

One Table (Recursive Relationships)
Each row represents a level.
Type Parent Location
REGION n/a West

REGION
STATE West California
CITY California San Jose
STATE
CITY
This type of hierarchy is based on relationships between rows of the same table. Instructional Tips
In relational terms, these are recursive relationships. Ragged hierarchies are discussed in
Module 17, "Ragged Hierarchies."
In the slide example, the source table includes Type, Parent, and Location. Within
the Parent column, each row refers to the Location value of its parent.
Note: This type of recursive data source relationship produces a fixed number
of levels. The levels are identified and named by a column. In the slide
example, the Type column identifies levels. If there is no fixed number of
levels (that is, ragged hierarchies) you need an auto-level hierarchy to
identify the levels.

CREATE HIERARCHIES
Define the Hierarchy from the Rows of

One Table (Recursive Relationships)
Define the source columns:

Id column
Caption column
Parent Id column
Level Name column
Using the Hierarchy wizard, follow these steps to define a hierarchy from the Instructional Tips
rows of one table. If you leave "Select the column for the level
name" as "Auto-Level," then the hierarchy will
Step Description
automatically be created as an auto-level
Name the hierarchy Provide a descriptive and meaningful name for the hierarchy, even if you do not specify a column
hierarchy. You can also provide a caption and for the top parent Id.
notes.
Select the hierarchy source DecisionStream queries the list of tables in the Key Information
There is a bug in Series 7 version 2 (version
connection once. If you cannot connect to the 7.1.60.0 on the About dialog box) where the
data source, the query will fail. product crashes when this wizard option is
Define the source columns Id column is the unique identifier for a level. used. A run-time error is generated when
pressing Next after selecting columns for the
Caption is a column containing a description Id, Caption, Parent, and level name
associated with the Id. This information is often attributes. See problem number 398291.0 in
used for display and presentation purposes rather Trakker for more information.
than the Id or name.
Parent is the column that identifies the parent.
The Level Name is the column that names the
levels in the hierarchy, or it is Auto-Level.
Define a static ALL level This step is optional. You can create an artificial
ALL level and DecisionStream automatically
supplies a name, caption, and Id value for the ALL
level.
Assign remaining attributes You can add attributes to each level of the
hierarchy.

Example of a Hierarchy from a Recursive

Relationship
Recursive Reference data source

ID Type Description Parent_ID
1 CLASS MEN
2 CLASS WOMEN
101 CATEGORY FORMAL 1 Resulting hierarchy
102 CATEGORY FORMAL 2
103 CATEGORY CASUAL 1
S86 PRODUCT DRESS SHIRT 101
S87 PRODUCT BLUE JEANS 103
Level details
Name ID
Caption Description
Parent Parent_ID
Level Name Type
In the slide example, the data source contains information on clothing. A Instructional Tips
Expand on the example shown in the slide
hierarchy will be based on the rows of this data source.
with the text below.
The data source has two top-level categories, a Men's class and a Women's class.
Each of these classes has two children, a formal category and a casual category.
Below each of these categories are the products that reside in the Men's formal
and casual categories and the Women's formal and casual categories.

CREATE HIERARCHIES
Create the Hierarchy from Multiple Tables
This data source structure is the most common type used to

create hierarchies.
Each table represents a level.
Class
Class
Household Family
Family Class
Books Household
Product
Product Family
Dictionary Books
This type of hierarchy is based on relationships between multiple data tables. The
hierarchy follows one-to-many relationships between the tables.
In the slide example, one family of products may consist of many products, but a
product must belong to only one family. A class of products may consist of many
product families, but a product family may belong to only one product class.
Three tables contribute to this hierarchy.

Define the Hierarchy from Multiple Tables
You can use different tables from different databases.

At each level, assign:
a name
a data source
a source table
chosen attributes
Using the Hierarchy wizard, follow these steps to define a hierarchy from
multiple tables:
Step Description
Name the hierarchy Provide a descriptive and meaningful name for the
hierarchy. You can also provide a caption and notes.
Select the hierarchy source DecisionStream queries the list of tables in the
connection once. If you cannot connect to the data
source, the query will fail.
Define a static ALL level This step is optional. You can create an artificial
ALL level and DecisionStream automatically
supplies a name, caption and Id value for the ALL
level.
Insert each level You can add, delete, edit, or re-order the levels in
the hierarchy.

CREATE HIERARCHIES
Demo 3-2
Create a Hierarchy from Multiple

Tables and Examine its Properties

Demo 3-2: Create a Hierarchy from Multiple Tables and

Examine its Properties
Purpose:
We must develop a hierarchy so that we can look at our
business from a geographical point of view. The Location
hierarchy will organize this area of our business into time
zones and states.
Task 1. Begin creating the Location hierarchy.

1. In the DecisionStream toolbar, click the Hierarchy Wizard button . Instructional Tips
The list on the left side of the Hierarchy
The Hierarchy wizard appears. wizard indicates where you are in the
2. Click Create the hierarchy from multiple tables (Snowflake hierarchy creation process.
Schema) to select it, and then click Next.
3. In the Enter the name of the hierarchy box, type Location.
4. Click Add.
We will now name our reference dimension.
5. In the Name box, type Location, and then click OK.
The Location dimension is selected automatically in the Select the
reference dimension to use for this hierarchy box.
6. Click Next.
7. To accept the defaults for an 'ALL' level, click Next.
We have now created the top level of our hierarchy.
Task 2. Add the Timezone level to the hierarchy.
1. Click ALLLocation, and then click Add.
The Level Details window opens.
2. In the Name box, type Timezone.
3. In the Source database box, click Reference.
4. Next to Source Table, click Browse for table .
The Select Table dialog box appears.
5. Click ds_timezone, and then click OK.
6. In the Available attributes list, double-click timezone_cd and
timezone_name to add them to the Chosen attributes list.
7. In the Chosen attributes list, click the Id box for timezone_cd and the
Caption box for timezone_name to select them.
8. Click OK to create the level.

CREATE HIERARCHIES
Task 3. Add the State level to the hierarchy.

1. With Timezone selected, click Add.
2. Type State in the Name box, and then, if necessary, click Reference in
the Source database list.
3. Next to Source table, click Browse for table .
4. Click ds_location, and then click OK.
5. In the Available attributes list, double-click state_cd, state_name, and
timezone_cd to add them to the Chosen attributes list.
6. In the Chosen attributes list, click the Id box for state_cd, the Caption
box for state_name, and the Parent box for timezone_cd to select them.
7. Click OK to create the level.

8. Click Next, and then click Finish.

9. Expand the Location hierarchy.

Task 4. Examine some of the Location hierarchy attributes

and properties.
1. Right-click the Location hierarchy, and then click Explore.
A dialog box appears, prompting us to save our changes to the catalog.
2. Click OK.
The Reference Explorer dialog box appears, with Location (H) selected
by default in the Reference Item list.
3. Click OK, and then examine the attributes of the hierarchy by expanding
various levels.
4. Close Reference Explorer.
5. Right-click the Location hierarchy, and then click Properties.
The Hierarchy Properties window opens.
6. Click the Features tab.
We can see the default property settings for this hierarchy. These will be
discussed further in this course.
7. Click OK.
8. Right-click State, and then click Properties.
9. Click the Attributes tab.
We can see the various attributes we assigned to this level.
10. Click OK.
Task 5. Examine the State level.
1. Expand the State level, right-click DataStream, and then click
Properties.
The DataStream Properties window opens.
We can see the data source mapping for this level.
2. Click OK.
3. Under the State level, expand DataStream, and then double-click the
State data source.
The State Data Source Properties window opens.
We can see the SQL statement that retrieves the data from the
ds_location table.
5. Click OK.
6. Save your work.
Results:
We have developed a Location hierarchy so that we can look
at our business from a geographical point of view.

CREATE HIERARCHIES
What the Hierarchy Wizard Creates
The wizard creates:

named hierarchy
named levels
attributes for each level
DataStreams and data
sources
a static member to represent
‘ALL’ (Optional)
template(s)
In DecisionStream, each hierarchy consists of one or more levels. Each level of

the hierarchy contains a set of members, which can be a combination of static
and dynamic members.
Note: The next time you open the catalog, DecisionStream displays the
hierarchies in alphabetical order.

Level Attributes
You can rename attributes to have more appropriate names.
To view level attributes:

• Select the level, and then right-click Properties. This window contains
three tabs: General, Attributes, and Data Access. The General tab shows
the name, the business name and a description of the level.
• The Attributes tab shows the properties of each attribute in the level. The
attributes use a template to define the data source columns. Each level
requires an Id attribute.
• The Hierarchy wizard automatically assigns Id and caption attributes to
the level matching the data source column names.
Note: Templates will be discussed in Module 7, "Templates, Lookups, and

Attributes."

CREATE HIERARCHIES
What Does the DataStream Do?
You use DataStreams to access source data from a database. A DataStream

gathers together a number of data sources, each of which can contain an SQL
statement, as well as literal values. Within a DataStream, you must define the
necessary data sources that specify from where and how data is to be accessed.
In a DataStream, you can:
• declare one or more data sources, each of which is an SQL query
• specify literal values (if any)
• map the columns that the SQL statement returns, as well as any literal
values, to DataStream items.
In the slide example, the Location hierarchy contains the State level.

SQL Columns Are Mapped to

DataStream Items
Mapping is used to specify the relationship between the columns of source data
When you create a data source query for a
and the DataStream items. hierarchy, lookup, or fact build, the SQL
statement can be parsed or prepared.
To view mapping of a DataStream, right-click the DataStream, and then click
Properties. Parsing does not send the SQL statement
to the database, so it can be used when
Column Function you are unable to connect to the database.
DecisionStream parses the SQL statement
and returns the result set of columns as
Data Source Shows the column(s) that the SQL data source(s) written in the SQL statement.
returns.
When you use parse, the SQL statement
Maps To Shows the DataStream items to which the data must begin with SELECT. Parse may not
source columns are mapped. evaluate database specific syntax correctly.
This is quicker than using prepare.
However, it can fail if the SQL is too
complex.
Parse is the default setting for existing data

sources in catalogs upgraded from
DecisionStream 6.5.
Prepare is the recommended method. It

sends the SQL statement to the database
and returns the result set columns.
Database specific syntax can be used.

CREATE HIERARCHIES
DataStream Items Are Mapped to

Level Attributes
Mapping is also used to specify the relationship between the DataStream items
and the attributes of a level.
To view mapping of a level, right-click the level, and then click Properties.
Column Function
DataStream Item Shows the DataStream items to which to the

column(s) of source data have been mapped.
Maps To Shows the DataStream items to which the level

attribute(s) are mapped.
The attributes for a specific level contains the Id, caption, and parent attributes, as
well as any other attributes that you must include for that level (such as Price). In
the slide example, the level attributes contain the timezone_cd and state_name
attributes, which serve as the Id and caption, respectively, for this level.

Where Do Data Sources Go in a Hierarchy?
When a single SQL query provides more

than one level of a hierarchy, a single data
source above the top level must be used.
Data sources at specific levels

may only have their columns
mapped to attributes at that level.
A great deal of flexibility is available as to where you can insert data sources. You
can insert them at several different levels in a hierarchy, including above the top
level of the hierarchy. However, you must follow these rules:
• If a single query provides information to more than one level in the

hierarchy, it must be declared at the hierarchy level. All columns in this
top-level data source are available (if included in the template) to all levels
of the hierarchy.
• If all the columns from a query only provide information to a single level,
then the data source may reside at that level. The columns in this query
will not be visible at any other level.

CREATE HIERARCHIES
Test and View a Hierarchy
Hierarchy View Level View
Use the Reference Explorer to run the SQL to temporarily populate the hierarchy
for testing. Testing the hierarchy tells you:
• which members are at each level
• how much memory is required to cache the reference data
You can use the Reference Explorer to view a hierarchy in two ways. In the
hierarchy view, the reference data is displayed in true hierarchical structure with
members linked to their parent. In the level view, the hierarchical members are
linked under the levels they belong to with no specific reference to the parent.
Note: The members are not stored. They are only loaded into memory as
needed for processing.

Build slide.
Date Hierarchy Wizard
Often, there is no database table that contains a full range of dates for use in a Instructional Tips
date hierarchy. To assist you in creating a date hierarchy, DecisionStream Discuss the inherent difference that exists
provides the Date Hierarchy wizard. You can choose to include levels for year, between using the Date Hierarchy wizard
and constructing a hierarchy based on time.
quarter, month, week, and day. Point out that using the Date Hierarchy
wizard will generate static data only.
When you use the Date Hierarchy wizard, the hierarchy is not based on
source data. Therefore, all members are static members and are physically
stored in the catalog.
When you provide the level details you must type or select relevant values.
Detail Description
Level Name You can change the default level name.
Id Format From the list, select the format for the level. Alternatively,
you can type a format. The Id Sample box shows the
current date and time in the selected format.
Caption Format (Optional) From the list, select the caption for the level.
The Caption Sample box shows an example of the
selected format.
Week Parent Relationship This option is available only for a week level. From the
list, select the rollup option to use.
Days to Include This option is available only for a day level. From the list,
select the days for which you want to display data. You
can select from Every Day, Weekdays Only, or Weekdays
and Saturdays.

CREATE HIERARCHIES
Handle Weeks
August September
Week 34 Week 35 Week 36 Week 37 Week 38
Aug Sept Aug Sept Aug Sept
34 35 36 37 38 34 35 36 37 38 34 35 36 37 38
Week to Start Week to End Week to Start and End
A week may straddle the start and end of its parent, perhaps a month or year.
To help you resolve this problem, you can use the Date Hierarchy wizard to
specify how you want DecisionStream to handle weeks that cross a parental
boundary. You can choose from the following options:
Week Pattern Description
Weeks roll to parent start Weeks roll to the parent month in which the
week began.
Weeks roll to parent end Weeks roll to the parent month in which the
week ended.
Weeks roll to parent start and This option splits the week between the two
end parent months.
Weeks on same level as parent Weeks do not roll to the month level.
DecisionStream\Help\Contents and Index\Index\Date Hierarchy wizard

Demo 3-3
Create and Examine a Static Date

Hierarchy

CREATE HIERARCHIES
Demo 3-3: Create and Examine a Static Date Hierarchy
Purpose:
We will use the Date Hierarchy wizard to create a static Date
hierarchy that we can use in the analysis of our business. This
hierarchy will represent the time dimension for the first three
years of our business.
Task 1. Create the Date dimension.

1. Right-click the Dimensions folder, and then click Insert Reference
Dimension.
The Dimension Properties dialog box appears.
2. In the Name box, type Date.
3. Click the Dimension Represents Time box to select it, and then click
OK.
We will now create the static date hierarchy that will populate this
dimension.
Task 2. Create the Fiscal_static hierarchy by using the
Date Hierarchy wizard.
1. In the toolbar, click the Date Hierarchy wizard button .

The Date Hierarchy wizard appears.
2. In the Enter the name of the hierarchy box, type Fiscal_static.
3. In the reference dimension list, click Date, and then click Next.
4. Click the Week and Day boxes to clear them, and then click Next.
5. In the Year Caption format box, click YYYY, and then click Next.
6. In the Quarter Caption format box, click YYYYQ, and then click Next.
7. In the Month Caption format box, click YYYYMM, and then click Next.
8. In the Start date box, type 01/01/1998.
9. In the End date box, type 12/31/2000, and then click Finish.
Instructional Tips
The Fiscal_static date hierarchy is automatically generated. You can also use the left and right arrow
keys to move between fields, and the up
10. Expand the Fiscal_static hierarchy. and down arrow keys to change the values
The result appears as shown below. in each field.

Task 3. Explore the Fiscal_static hierarchy by using

Reference Explorer.
1. Right-click Fiscal_static, and then select Explore.
2. Click OK.
The Reference Explorer dialog box appears with Fiscal_static (H)
selected by default in the Reference Item list.
3. Click OK.
Reference Explorer runs and returns the static values in the Fiscal_static
hierarchy.
4. Expand 2000 (2000), and then expand 20001 (20001).

6. Expand the Fiscal_static hierarchy, and then expand the Month level.
7. In the Month level, right-click Static Members, and then click
Properties.
The Month Static Members window opens.
The meaning of these members will be discussed later in this course.
8. Click OK to close the Static Members window, and then collapse the
Date and Location dimensions, if necessary.
9. Save your work.
Results:
We have used the Date Hierarchy wizard to create a static date
hierarchy that will represent the time dimension for the first
three years of our business.

CREATE HIERARCHIES
Summary

discussed the role of the dimensional framework in
DecisionStream
created hierarchies from three common data structures
by using the Hierarchy wizard
tested a hierarchy
used the Date Hierarchy wizard
handled weeks in a date hierarchy

Workshop 3-1
Create a Hierarchy from the Columns of

One Table and Examine its Properties

CREATE HIERARCHIES
Workshop 3-1: Create a Hierarchy from the Columns of

You must continue the development of your catalog to create a simple data mart.
Then your PowerPlay administrator can develop a cube for PowerPlay users.
Your next task is to use the Hierarchy wizard to build a hierarchy from the
columns of one table.
• Use the ds_fiscal table to create a hierarchy called Fiscal. You do not
require a static ALL level.
• Create the levels for the fiscal hierarchy.
Fiscal:
Level Source Source Column Source Column

Name Table for Id for Caption
Year ds_fiscal fiscal_yr fiscal_yr_desc
Quarter ds_fiscal fiscal_qtr fiscal_qtr_desc
Month ds_fiscal period_no period_no_desc
• View the properties and mappings of the Fiscal hierarchy.
page.
Table.


1. Create the Fiscal Build tree
dimension.
2. Create the Fiscal hierarchy. Hierarchy wizard • Create the hierarchy from the
columns of one table (Star
Schema).
• Use the ds_fiscal table within

the Reference database.
3. Create the Year level for Level Details window • Use the fiscal_yr column for
the hierarchy. the Id and the fiscal_yr_desc
column as the caption.
4. Create the Quarter level. Level Details window • Use the fiscal_qtr column for
the Id and the fiscal_qtr_desc
column as the caption.
5. Create the Month level Level Details window • Use the period_no column as
and complete the the Id and the period_no_desc
hierarchy. column as the caption.
6. Examine the Fiscal Build tree • Expand and collapse the Fiscal
hierarchy. hierarchy to review the
Data Source Properties components.
window, SQL tab
• Review the Fiscal data source
DataStream Properties SQL.
window
• Review the properties of the
DataStream.
7. Use the Reference Reference Explorer • Click the plus sign (+) to view
Explorer to examine the the members within 1999.
Fiscal hierarchy.


CREATE HIERARCHIES
Workshop 3-1: Results


4
1. Getting Started
2. Create a Catalog
6. Derivations

Developers

CREATE BASIC BUILDS
Objectives

discuss DecisionStream builds and build-related
terminology
create a basic dimension build by using the Dimension
Build wizard
create a basic fact build by using the Fact Build wizard
test and execute a fact build
use Cognos PowerPlay to view the contents of a data mart
create catalog schema
document a catalog

What is a Build?
A build is a named unit of work that goes through the

process of:
data acquisition
data transformation
data delivery
Builds are the foundation of the data warehouse, which is designed to provide
structured and clean data to users. Ralph Kimball (1998) noted that good data
access is the foundation for excellent decision making.
There are two types of builds in DecisionStream. Fact builds deliver transactional
data. Optionally, they may also deliver dimension data and metadata. Dimension
builds deliver only dimension data.
DecisionStream offers considerable build flexibility. Within the same build, it can
acquire, merge, and aggregate data from different data sources. It can also deliver
fact data, dimension data, and metadata to multiple targets.
The build process is essentially iterative in scope. A build can be tested,

implemented, modified, and then re-implemented to further fine-tune the
eventual structure of a data mart. It is this iteration that determines the final shape
of the data marts and conformed dimensions that will compose the data
warehouse.
DecisionStream\Help\Contents and Index\Index\dimension builds

CREATE BASIC BUILDS
What is a Dimension Build?
A dimension build delivers data that describes a single

dimension to the data mart. This may take the form of one or
several tables.
Template Outlining
the Columns of the
Reference Dimension Table Dimension
structure to Dimension and the Behavior table Target
be delivered Build of these Columns created Data Mart
A dimension build delivers data that describes a single dimension of the business,
such as Product or Customer. It acquires dimension data from the hierarchy that
you specify and loads it into the data mart in the form of one or more dimension
tables.
Because a fact build can also deliver dimension data, it may seem unusual to do
this in a separate process. However, there are several instances when you want to
use a dimension build instead of a fact build to deliver dimension data. For
example, you may have a number of fact tables that share the same set of
dimension tables (conformed dimensions). You may also want to deliver all the
dimension data, although there may not be supporting fact data at the moment.
Finally, you may want to prepare the dimension tables prior to loading the fact
data (for example, you have to make gradual changes to dimension attributes).
As we shall see in later modules, it is very common in the data warehouse

development process to create the dimension builds first, particularly when using
conformed dimensions. We will cover this topic further in Module 5,
"Conformed Dimensions."

The Dimension Build Wizard
The Dimension Build wizard guides you step-by-step through

the process of creating a new dimension build.
You can use the Dimension Build wizard to quickly create a dimension build.
The wizard involves the following steps:
1. Name the dimension build.
2. Define the schema that you want to use and the dimension (and
associated reference structure) that you want to deliver.
3. Define the schema naming conventions (how the table(s) and column(s)
will be named).
4. Define the build features (such as the indexing strategy).
5. Define the build properties (such as how to handle multiple parents). Instructional Tips
SCDs are covered in Module 9, "History
6. Define attributes for slowly changing dimensions (SCDs). Preservation."
A dimension build can also be added to the catalog manually.

CREATE BASIC BUILDS
Demo 4-1
Create a Simple Dimension Build

Demo 4-1: Create a Simple Dimension Build
Purpose:
We want to create a simple dimension build based on the
existing Product hierarchy so that we can report on our
company's product inventory. This task will result in one
dimension table that utilizes a star schema.
Task 1. Name the dimension build and choose the

dimension to deliver.
1. From the File menu, click Restore Catalog, and then click Yes.
2. Navigate to C:\Edcognos\DS7001, and then double-click
StartMod4.ctg. Instructional Tips
Selecting the Product dimension at this
The catalog is restored using the contents of the .ctg file. stage of the wizard automatically selects
the Product hierarchy, since it is the only
3. In Day1Catalog, in the toolbar, click Dimension Build wizard . reference structure within the dimension at
this point.
The Dimension Build wizard appears.
4. In the Name box, type Product, and then click Next.
5. In the Dimension to be delivered box, click Product.
6. In the Deliver into connection box, click Output, and then click Next.
Task 2. Set the properties of and run the Product
dimension build.
1. Click Next to accept the default schema naming conventions for the
dimension table and columns.
2. Click Next to accept the default features of the generated build.
3. Click Next to accept the default properties of the generated build.
4. Click Next to bypass adding surrogates and SCDs to the dimension
table.
5. Click Finish.
6. In the toolbar, click Execute .


CREATE BASIC BUILDS
7. Click OK.
The Product dimension build runs in a separate DOS window and
delivers 16 rows to the D_Product table of the data mart.
8. Press Enter to close the DOS window.
Results:
We have created a simple dimension build based on the
existing Product hierarchy. Implementing this build produced
one dimension table that used a star schema.

Build slide.
Dimensional
Dim Framework
Sources 1.
3.
5b.
DataStream
Dimension
Data Delivery
Source
5a.
2. Fact
DataStream
Delivery
Data 5c.
Source
MetaData
Delivery
4.
In a fact build, the columns extracted from the transactional data (through SQL)
map to elements in the build's transformation model. DecisionStream uses build
elements as a transformation area in memory to manipulate the source data
before it is delivered to the target.
After the data has been transformed, DecisionStream organizes its delivery to
three types of modules: fact, dimension, and metadata. Each delivery, in turn,
may subscribe to some or all of the build elements.
DecisionStream\Help\Contents and Index\Index\fact builds: defining

CREATE BASIC BUILDS
The Fact Build Process
Data Data
Product 1. Populate hierarchy in memory
Source Stream 1
10100101 2. Read fact source data
3. Deliver fact data
Data
Source
Data 2
Delivery
Stream 3
Transformation Model
DecisionStream can merge data from multiple sources, partition this data, and
deliver it to multiple targets.
First, the hierarchies are populated with reference data, which is then stored in
memory. This reference data is used perform data integrity checks on the
incoming fact data; for example, to determine if a sales record refers to a
legitimate product.
Fact data consists of the transactional information acquired through SQL queries
usually from OLTP systems. For example, the fact data for a department store
chain includes sales figures, such as the number of products sold by location
during the month of July. These figures are obtained mainly from the transactions
recorded by the cash registers in each store. However, additional sales records
may exist in separate data stores on other systems.
Once DecisionStream reads the fact data, it can be transformed, merged, and
aggregated as necessary, and then delivered to the data mart.

Fact Build Data Elements
Dimension
A dimension key or data that is used to qualify a

measure.
Measure
Quantitative data or values that can be aggregated.

These are usually Key Performance Indicators (KPIs).
Attribute
Additional information that is not a dimension
element or measure but that may be of interest.
Derivation
A calculation or transformation.
Four types of elements can be part of a delivered fact table. During the build
definition process, these elements form the transformation model.
Element Description Instructional Tips

A derivation can be composed of many
Dimension An element containing data that is used to give objects including a string literal
context to a measure (for example, Product gives ("FRANCE"), a calculation formula defined
context to Price). The values of incoming fact in terms of measures or other derivations
('Revenue' – 'Cost'), or even an attribute of
dimension elements are checked against members of one of the dimensions (such as the Unit
related hierarchies. Weight attribute of the Product dimension).
Measure A quantitative data value, such as Product Cost. The
Key Information
eventual goal of Business Intelligence is to roll up and
Derivations can also be added to the data
analyze these values. Measures are usually additive source and the DataStream, as well as to
values, such as Total Sales. the transformation model.
Attribute Additional transactional information that may be of
interest in the data mart, but not necessarily
dimensional or a measure. For example, there is little
business sense in rolling up Unit Weight.
Derivation A calculated element. A derivation is different from
other build elements in that it is an output-only
element. In other words, it is not obtained through a
SQL query of the source data.
Derivation elements cannot be specified in the Build
wizard. They can be inserted into an existing build by
right-clicking the transformation model node.

CREATE BASIC BUILDS
Fact Build Wizard
The Fact Build wizard guides you step-by-step through the

process of creating a new fact build.
You can create a basic fact build quickly by using the Fact Build wizard.
Wizard step Description Instructional Tips
You can access the Fact Build wizard from
Define the purpose of Name the new build, select the build style (for the toolbar or by selecting Tools/Fact Build
the build example, Cognos BI Mart), and indicate the target Wizard.
connection.
Create the DataStream Select the source tables and columns. The wizard will Key Information
create an SQL query based on these selections. A There are essentially three ways to create
a fact build:
build element will be created for each query column. 1. Use the Fact Build wizard to guide you
Assign the element types If necessary, modify the type of each element and the through all the steps required.
2. Create the entire build manually.
order of the elements in the transformation model. 3. Use the wizard to prepare the basic
Define the dimensions Select a hierarchy or lookup for each dimension structure of the build (such as the
DataStream and elements of the
element of the transformation model. transformation model) and then refine
it manually.
Define the fact delivery Select the physical structures into which the fact data
The last option is probably the most
will be delivered. You can also specify naming efficient, because the basic structure of a
conventions for these structures. build is likely to change at least somewhat.
Define the dimension Select the schema that you want to use for the You typically only deliver to an Architect
delivery dimension delivery modules and how the tables and model if you have access to Impromptu or
columns of these modules will be named. PowerPlay, or both, as well.
Define the metadata Decide whether you want Impromptu, Transformer,

delivery SSAS, or Architect delivery modules, or a
combination, and how you want these to be named
and configured.

Types of Fact Builds
Fact build types provide default settings for the delivery

of data marts.
Each type of fact build has certain default characteristics.

Type of Fact Build Description Instructional Tips
A star schema is the most common and
Cognos BI Mart (Star) By default, all three of these Fact Build wizard styles usually the most efficient choice for
deliver the dimension data to tables that represent a creating fact builds. The star schema is,
star schema: that is, it creates one dimension table for therefore, the focus of this module.
SQL Server Analysis each dimension in the build and a central fact table.
If time permits, you may want to show the
Services Datamart The only difference between them is the default students how to use the Data transfer fact
selection of Business Intelligence (BI) metadata build option. Data transfer fact builds are
delivery options. Cognos BI Mart selects Impromptu, useful for moving data from one location to
PowerPlay and Architect. SQL Server Analysis another (for example, to cleanse the data
Relational Datamart prior to loading it into a fact table).
Services selects SSAS. Relational Datamart does not
select any metadata options.
Cognos BI Mart By default, the Fact Build wizard delivers the
(Snowflake) dimension data to tables that represent a snowflake
schema; that is, it creates one dimension table for each
level of each dimension in the build and a central fact
table. Also, by default, it selects the Cognos BI
metadata delivery options.
Data Transfer By default, the Build wizard creates all build elements
as attribute elements. It delivers neither dimension
data nor metadata. This template provides the default
settings for copying data from one DBMS to a single
fact table in another DBMS.
Note: If you want the build to give you the option of creating aggregate fact
tables, you must select the Allow Aggregation on Build Dimensions box.

CREATE BASIC BUILDS
Use a Fact Build to Transfer Data
Use a data transfer fact build to move data from one place to
another quickly and efficiently.
A data transfer fact build

was used to move data
from the SourceConnect
database to a single fact
table in the TargetConnect
database. Each column
retrieved from the data
source became an attribute
in the transformation
model.
A data transfer fact build provides the default settings for copying data from one
DBMS to a single fact table in another DBMS. It is recommended that you use
this option when you just want to move data from one place to another, rather
than creating a data mart.
By default, when the data transfer option is selected, the Fact Build wizard creates
all transformation model elements as attributes. It creates neither dimension data
nor metadata deliveries.
In the slide example, a data transfer fact build was created using the Fact Build
wizard. When this fact build was executed, data was transferred from the
SourceConnect database to a single fact table (F_DataTransfer) in the
TargetConnect database.

Types of BI Marts
Star:
creates a star schema
creates one detail fact table and (optionally) a number of
aggregate fact tables, as well as a separate table for each
dimension
Snowflake:
creates a snowflake schema
creates one detail fact table and (optionally) a number of
aggregate fact tables, as well as a separate table for each
level in each dimension
The BI Mart created by the wizard, in either a star or snowflake design, can
deliver fact data, dimension data, and metadata in a form suitable for Impromptu,
PowerPlay Transformer, and Architect.
In a star schema, a single fact table is created, and all the data from each
dimension is stored in its own separate table. The primary key of this table is the
key (either business or surrogate) of the lowest level in the dimension. For
example, a Product dimension table has a primary key of Product Id, not Product
Type Id (because that is the unique Id of the next-highest level). Columns in the
table represent hierarchical levels in the dimension. The schema can therefore be
viewed as a fully denormalized representation of the dimension.
A single fact table is also created in a snowflake schema. However, a separate

table is created for each level of each dimension. The primary key of each table is
the key (either business or surrogate) of each level. Each table, except the
top-level table, has a foreign key to link it to the table of the level above it. For
example, a Product Type dimension table has Product Type Id as its primary key.
It also has Product Line Id as a foreign key linking it to the Product Line table.

CREATE BASIC BUILDS
The Fact Build Wizard: Add a Data Source
A data source for the fact build is chosen from the available
connections. This source contains the transactional data to be
transformed.
You must first select or add a data source for the incoming data.
Give the data source a name and select a connection from within the catalog.
In the slide example, a Stock data source will read from the Stock connection that
has already been created within the current catalog. In the next step, the Fact
Build wizard will create an SQL statement that actually reads the data.

The Fact Build Wizard: Create the SQL

Data Source
Tables and columns SQL statement

to be read from created
After you identify your connection, you create the source SQL from within the Instructional Tips
You can modify the SQL statement after
Data Source wizard. Using this wizard, you can browse and select tables and you create the build by opening the
columns for your query. As you select tables and columns, the query is built Properties window of the data source and
automatically. clicking the SQL tab. This window also
gives you access to SQL Helper, which you
If you select more than one table, DecisionStream joins them where possible can use to test any changes that you have
(it issues an error if the join is not possible). In the right pane of the window, made.
DecisionStream inserts the SQL statement that corresponds to your selections.
You may edit this statement manually if preferred.
Alternatively, you can type the SQL statement directly in the right pane. Click the
SQL Helper button to display SQL Helper, which can assist you in testing the
statement.
When you click the Rebuild SQL Statement button, DecisionStream re-creates
the SQL statement by using the selections that you initially made in the left pane.
This is useful if you have edited the SQL statement and want to return to the
original statement that DecisionStream created.

CREATE BASIC BUILDS
The Fact Build Wizard: Map to Fact

Build Elements
Columns returned from Elements of the

source data by SQL transformation model
Instructional Tips
The Fact Build wizard creates a transformation model (internal) element for each Derivations can be used to enforce
column in the SQL query. The source data columns in the SQL statement are business rules across the enterprise. For
mapped to the corresponding model elements, which make it possible for example, it may be worthwhile to have a
DecisionStream to transform the source data as necessary before loading it into calculation that does not already exist in
the target data mart. the source data, such as Gross Profit
Margin. Calculating this figure from the
The mapping shown in the slide example is performed automatically by the existing source data may provide the
additional information needed to make
wizard. Later, you can modify the mappings by right-clicking the DataStream better business decisions.
icon and then clicking Properties. This is necessary if changes are made to the
DataStream; for example, if new columns are added to the query, a new data Derivations are covered further in Module 6,
source is inserted manually, or literals are added to an existing data source. "Derivations."
You cannot create derivations by using the Fact Build wizard because these
columns do not exist in your source data. You must add derivations manually.

The Fact Build Wizard: Modify Attributes
The elements of the transformation model are created

automatically by the Fact Build wizard but may need to be
modified.
Click here to modify

the types of elements
Elements of the
transformation model
Once you have queried the source table, the Fact Build wizard automatically
creates the elements of the transformation model based on the type of data in
each column. The types of these elements may be modified if necessary.

CREATE BASIC BUILDS
Build slide.
Dimensional
Dim Framework
Sources
DataStream
Data
Source
DataStream
Data
Source
So far, by using the Fact Build wizard, we have created the data source (the SQL
query) and the module elements, mapped them, and declared the element types.
The next step is to link the dimension elements to the dimensional framework.

The Fact Build Wizard: Link to the

Dimensional Framework
Dimension elements are related to reference structures.
The dimension element properties that you can set by using the Fact Build wizard
are Use Reference and Aggregate.
You use the Use Reference property to select the reference item that you want to
use for each dimension element. You can choose from hierarchies, auto-level
hierarchies, and lookups. Each dimension element (such as Product Number) can
only be associated to one reference item (in this case, the Product hierarchy).
If you select the Aggregate box for a dimension element, DecisionStream creates
an aggregate fact table for each level of the dimension. If you do not select the
Allow Aggregation on Build Dimensions box on the first screen of the Fact Build
wizard, you will not have this option.
For example, if you choose to aggregate along the Products dimension,

DecisionStream aggregates the measures at the Product Type and Product Class
levels. The aggregate option creates a table for each combination of levels
indicated in the checked dimensions.
Note: Selecting the Aggregate box for any dimension element can potentially
create a large number of aggregate fact tables.

CREATE BASIC BUILDS
Build slide.
Dimensional
Dim Framework
Sources 1.
3.
5b.
DataStream
Dimension
Data Delivery
Source
5a.
2. Fact
DataStream
Delivery
Data 5c.
Source
MetaData
Delivery
4.
Now we can use the Fact Build wizard to define the deliveries. There are three
types of deliveries: fact, dimension, and metadata.

The Fact Build Wizard: Fact Data Delivery
In the first step of the Fact Build wizard, you can select a target connection to Technical Information
determine where DecisionStream will deliver the transformed data. Usually, the indexing strategy will be "No
indexes." Indexing the data warehouse
Now you can further configure the delivery of the fact data by determining what tables is usually left to the DBA.
type of delivery module will hold it. DecisionStream has a variety of choices. For
example, you can select a simple relational table (the default) or an Oracle SQL
Loader.
You can then set the properties of the selected delivery module type. For
example, if you choose to deliver the fact data to a relational table, you can
choose how you want the data to be refreshed and the interval at which the data
will be committed.
Finally, you can determine how the tables and columns of the delivery module
will be named. For a relational table, this determines what names the fact table
and its columns will have after it has been delivered to the target data mart.
Note that the data will not actually be transformed and loaded into any delivery
modules until the fact build is implemented. A build can be implemented by
selecting Actions/Execute or by clicking the Execute the current item button on
the toolbar.
DecisionStream\Help\Contents and Index\Index\delivering data: adding a fact delivery

CREATE BASIC BUILDS
Demo 4-2
Create a Simple Star Schema

Fact Build

Demo 4-2: Create a Simple Star Schema Fact Build
Purpose:
As noted previously, we are creating a data mart so that the
instructor can analyze product inventory by using PowerPlay.
Using a star schema, we want to create a simple fact build
based on the columns of the ds_stock table. The result will be
one fact table and three dimension tables.
Task 1. Set the properties of the fact build.

Instructional Tips
1. In the toolbar, click Fact Build wizard . The Cognos BI Mart (Star) type of fact build
The Fact Build wizard appears. is selected automatically.
2. In the Enter the name of the build box, type Stock.

3. In the Select the connection into which the build is to deliver data box,
click Output.
4. Click the Perform a full Refresh on the Target Data check box to
select it, and then click Next.
Task 2. Establish the data source that the fact build will
read from, and create the SQL statement to obtain
the data.
1. Click the Data source button, and then click Add.
The Data Source wizard appears.
2. In the Enter the Data Source name box, type Stock.
3. In the Select the Connection from which the Data Source is to read box,
click Stock.
4. Click Next.

CREATE BASIC BUILDS
5. In the left pane, click the ds_stock check box to select it.
A SELECT statement appears in the right pane of the Data Source
wizard showing the columns that will be included in the build.
The SQL created by the Data Source
wizard is native to the database in
question. In this case, because our source
data resides in an Access database, each
column is delineated by back accents.
6. Click Finish to close the Data Source wizard. When working with source or target
7. Click Next to accept the default data source mapping. databases, you must use the variety of
SQL that is native to the database.
8. Click Next to accept the default transformation model.
Task 3. Relate the reference structures (hierarchies) to the
dimensional elements of the transformation model.
1. Next to period_no, click (no reference), and then click the Browse
button .
2. In the list, double-click Fiscal, and then click the Fiscal hierarchy.
3. Repeat steps 1 and 2 for the state_cd and product_cd elements.
These will use the Location and Product hierarchies, respectively, as the
reference structures.
4. To define the fact delivery, click Next.

Task 4. Set the properties of the Stock build's fact data,
dimension data and metadata deliveries.
1. Click Next to accept the default properties of the fact delivery.
2. Click Next to accept the defaults for the fact delivery's table and column
names.
3. Click Next to accept the summary of the fact delivery.
4. Click Next to accept the default properties of the dimension delivery.
5. Click Next to accept the defaults for the dimension delivery's table and
column names.
6. Click Next to accept the summary of the dimension delivery.

7. Click the Architect and Impromptu check boxes to clear them, and
then click Next. Key Information
If you are using the cer2 release of
8. Click Next to accept the default values for the properties of the DecisionStream, you may encounter a
PowerPlay Transformer model. series of “DS-HANDLE-E100: Handle is
9. Click Finish to accept the summary of the fact build. null or invalid” error messages when you
click the Back button at any step of the
10. In the left pane, under the Builds folder, click Stock. wizard. You may encounter the message
again when you click the Next button after
If necessary, in the Visualization pane, right-click Transformation dismissing the error message. This is a
Model, and then click Show Build Elements. bug.
If necessary, in the Visualization pane, right-click Transformation To have the build elements and build
Model, and then click Show Build Details. details appear automatically in the
Visualization pane, from the Tools menu,
The result appears as shown below. click Options, and then select the
appropriate check boxes.
11. Save your work.
Results:
We have used the Fact Build wizard to create a basic fact build
by using a star schema template based on the columns of the
ds_stock table and the reference structures in the dimensional
framework.

CREATE BASIC BUILDS
Fact Build Visualization
Dimensions
associated with
the build
Fact and
Dimension tables
Transactional
data sources
Metadata delivery to
Impromptu, PowerPlay,
and Architect
After a fact build has been created, we can view a graphical representation of
everything it contains. By clicking the first tab of the Build Visualization pane, we
can view the build's components:
Icon Description
The connection from which DecisionStream
acquires the source data.
The data source that DecisionStream uses to extract
data from the source database. This data source
contains the actual SQL query.
The DataStream that indicates the mapping of
source columns in the data source(s) (SQL queries)
to the fact build elements.
Where the processing of the fact build data occurs.
Here the transactional data is transformed and
prepared for loading into the delivery module.
A fact data delivery. If the delivery has a level or an
output filter, the filter icon (a funnel) is added.
A dimension delivery.
A metadata delivery.
A physical table to which the data will be delivered.
The database where the delivery tables reside.

DataStream Mapping Visualization
You can visualize the mapping of the SQL columns to the

build elements by clicking the Mapping tab.
By clicking the DataStream tab on the Build Visualization pane, you can view the
data source mapping that the Fact Build wizard performs. As noted earlier, you
can open the build's DataStream Properties sheet to modify this mapping.
Keep in mind that derivations in the transformation model do not map to any
literal values or columns that SQL returns.

CREATE BASIC BUILDS
Transformation Model Visualization
Use the Transformation Model tab to view the linkage of the

dimension elements to the dimensional framework.
Mapped Reference
Structures
By clicking the Transformation Model tab of the Build Visualization pane, you
can view how each dimension element in the transformation model links to the
dimensional framework. In the slide example, the period_no dimension element
maps to the Fiscal hierarchy, the state_cd dimension element maps to the
Location hierarchy, and the product_cd dimension element maps to the Product
hierarchy.
Attribute, derivation and measure elements do not map to reference structures.
The check marks in the dimensions indicate the granularity of the transactional
data coming in (input) and the fact data being delivered (output).

Fact Delivery Visualization
By clicking the Fact Delivery tab, you can view the mapping
of the elements to the target fact table.
This slide diagram shows how transformation model elements in the fact
build map to columns of the target fact table.

CREATE BASIC BUILDS
Delivery Modules
The three delivery module types are:

fact
dimension
metadata
Dimension delivery modules deliver data that describes a single dimension (such Instructional Tips
Partitioning refers to the delivery of data to
as Product) to the target database. DecisionStream lets you send the data to a different targets according to specified
single table (star schema) or multiple tables (snowflake schema). There can also be criteria. Vertical partitioning involves
more than one dimension delivery per dimension, for example, two star schemas. delivering the elements of the build to
different delivery modules. For example,
Fact delivery modules deliver the fact data that a build produces. A build may you may want to deliver the Product
have multiple fact deliveries, and each fact delivery may subscribe to some or all dimension element and related measures
of the build elements, which makes it possible for you to perform vertical to a relational table and the other elements
to a text file.
partitioning. You can also perform horizontal partitioning by using output and
level filters. Horizontal partitioning involves adding
filters that determine which data rows
Metadata delivery modules deliver information about fact or dimension data, or DecisionStream will deliver to which areas
both, to specific applications such as Impromptu, PowerPlay Transformer, of the data mart. Level filters configure
Architect, and Microsoft SQL Server Analysis Services. This information forms delivery of only specific dimensions and
hierarchy levels. Output filters are
the backbone of BI. expressions that result in either TRUE or
FALSE when applied to each data row. For
When you run a build, fact delivery modules are delivered first, followed by example, you may want to include only
dimension delivery modules and then metadata modules. those products that have a cost of more
than $25.00 (Price > 25.00). Each delivery
module may have several level filters but
only one output filter.

Run a Fact Build
Running a fact build actually delivers data to the target. You can access the Instructional Tips
Execute Build dialog box by first selecting the build and then clicking Almost everything that is done within the
Actions/Execute. From here you can modify the default options. DecisionStream Designer GUI can be done
from the command line in Windows or
To bypass the dialog box, click the Execute Build button. UNIX. This will be covered further in
Module 22, "The Command Line Interface."
You can also run fact builds entirely from the command line. You can include
additional options at the command line. The command that the DecisionStream
engine implements is shown in the Command Line box.
The command line code is typically used in
a batch file that is run at regular intervals to
You can choose from three execution modes. keep the data warehouse up to date. In this
case, it will be necessary to remove the -P
Mode Description command from the code, which ensures
Normal Runs a build stored in the DecisionStream catalog. DecisionStream that DecisionStream does not prompt the
user to press Enter when the build is
processes the whole build and delivers the required data. finished running, halting the entire process.
Object Creates the delivery modules but does not implement the fact
Creation build. This mode will create the physical delivery structures but will
not deliver the data into those structures. This mode is generally
used when developing a new build.
Check Used for performance testing and resource estimating. Check Only
Only does not create the physical delivery structures and does not deliver
data or metadata. With this style of implementation, you do not
even need a target database.

CREATE BASIC BUILDS
Control Build Feedback
From the Tools menu, click

Browse Log Files
When you run a fact build, the DataBuild executable file opens a new process in a
command window. This window writes trace information that is saved
automatically as a log file. By consulting these files, you can fine-tune the build so
that it efficiently meets your specific requirements.
You can control the feedback information contained in the log file by consulting
the Fact Build Properties window. By default, the only logging property set is
Progress. You can specify the level of detail to include in the log file by clicking
the appropriate boxes in the Trace list to select them.
Type of Information Description
Progress Details the overall progress of the build
implementation.
Detail Includes more detailed progress messages. This option
also displays additional progress messages as each given
number rows are processed (by default, every 5000
rows).
Internal Includes internal DecisionStream activity messages,
such as resource usage (memory usage, paging
information, and so on). This is useful for performance
testing and resource estimating.
SQL Includes all SQL statements that DecisionStream uses
at each stage of executing the build. This information is
useful in resolving database errors.
ExecutedSQL Includes the executed SQL for SELECT statements.
User Includes all application messages written to the log file
by using the LogMsg() function.

Demo 4-3
Run a Fact Build and View the

Resulting Log File

CREATE BASIC BUILDS
Demo 4-3: Run a Fact Build and View the Resulting

Log File
Purpose:
We have to run the Stock fact build so that we can load data
into the target data mart. After the build is implemented, we
want to view the log file to evaluate the progress of the build.
Task 1. Run the Stock fact build and view its log file.
1. In the left pane, under the Builds folder, click the Stock fact build if it is
not already selected.
2. In the toolbar, click Execute.
Databuild.exe runs in DOS and 10861 rows of transactional data are
loaded into the F_Stock table of the data mart database.
4. From the Tools menu, click Browse Log Files. Instructional Tips
The log file may have a different number,
The log window opens, displaying the log files that have been created for but it will still be prefaced with
builds that have been run. "Build_Stock."
5. Double-click the Build_Stock_0001.log file.
The log file created for the Stock build opens in Notepad.
This file shows the progress details that were displayed in the DOS
window as databuild.exe ran. It indicates that data was loaded into the
F_Stock fact table and three dimension tables, and that a Transformer
exported model was created for this build.
6. Close the Build_Stock_0001.log file, and then close the log window.
Results:
We have run a fact build and viewed the resulting log file to
evaluate its progress.

Demo 4-4
Use PowerPlay to Open the Data

Mart and Demonstrate the Use of a
PowerCube (instructor-only)

CREATE BASIC BUILDS
Demo 4-4: Use PowerPlay to Open the Data Mart and

Demonstrate the Use of a PowerCube
(instructor-only)
Purpose:
We want to use a Cognos BI tool to open our data mart. We
will use PowerPlay to view product inventory data by creating
a PowerCube and a report.
Task 1. Open the Transformer Exported Model and the

PowerCube.
1. From the Tools menu, click Browse Data Files. Instructional Tips
Both the Stock.mdl model and Stock.mdc
The data window opens. PowerCube were created when the Stock
fact build was executed in Demo 4-3.
2. Double-click Stock.mdl.
Stock.mdl opens in Transformer. The steps outlined in Task 2 are just an
example of drilling down in PowerPlay. You
3. From the Tools menu, click PowerPlay. may want to drill down to other categories
to show different results.
The PowerCube and default report open in PowerPlay.
Task 2. Drill down on the Fiscal and Location dimensions.
1. Position the cursor on , and then click FY00.
2. Position the cursor on , and then click Bedroom Furniture.

We have drilled down on the product inventory data to view the supply
of bedroom furniture in the all the locations in the United States in 2000.

3. On the toolbar, click the Pie (depth) button .

Task 3. Close all programs.

1. From the File menu, click Exit.
Instructional Tips
A dialog box appears, prompting us to save our changes to the default You may also choose to save the report
report. and the PowerCube.
2. Click No to close PowerPlay without saving changes.

3. Close Transformer, and then close the Data window.
Result:
We have generated a PowerPlay cube and report for the Stock
fact build to view product inventory data.

CREATE BASIC BUILDS
Document a Catalog
Select the type of

information to write to
your HTML file
Name the HMTL

document
HTML template
to use (if applicable)
DecisionStream can document the contents of a catalog in HTML format for

viewing in your Web browser. You can then examine and print it.
You can choose from the following types of documentation:
• Full Documentation provides full component details with SQL

statements and also includes information about all configured delivery
modules. If you include business names and descriptions for any objects
(for example, a fact build), these will be included in the documentation as
well.
• Usage Summary provides a list of all catalog components.
DecisionStream\Help\Contents and Index\Index\documenting catalogs

Create a Catalog Schema
For some installations, it may be necessary for a Database

administrator to manually set up the catalog tables.
Use the buttons at the bottom of

the window to add the SQL
statements that you want to
execute
When you create a catalog, DecisionStream automatically creates the eight tables
it requires. However, for some installations where a Database administrator must
set up the tables manually, you can use the database schema function to specify
the tables that have to be created.
You can use the buttons at the bottom of the Database Schema window to add
the SQL statements for the function that you want to perform. You can view and
edit the statements in the SQL window.
Button SQL statement
Create Creates the required tables. You can copy these statements from the
Clipboard to other applications to inform your database
administrators of the tables to create.
Index Creates indexes for the selected database tables.
Grant Grants all permissions to all users for all tables in the schema.
However, this may not apply to all databases.
Delete Deletes the contents of all tables in the schema.
Drop Drops all tables of the schema therefore removing the schema from
the database.
DecisionStream\Help\Contents and Index\Index\database schema

CREATE BASIC BUILDS
Demo 4-5
Document a Catalog

Demo 4-5: Document a Catalog
Purpose:
We have to create full HTML documentation for the
Day1Catalog. Using this document, we can view detailed
information about the contents of the catalog.
Task 1. Document Day1Catalog.

1. Right-click the Stock fact build, and then click Properties. .
2. In the Business name box, type Stock Fact Build, and then in the
Description box, type This is the Stock fact build.
3. Click OK, and then from the File menu, click Document Catalog.
The Document Catalog dialog box appears.
4. Click Browse beside the Document name box.
The Save Document dialog box appears.
5. In the File name box, type Demo4-5, navigate to
C:\Edcognos\DS7001, and then click Open.
6. Click OK.
The documentation for Day1Catalog opens in your default browser.
Task 2. View the documentation for Day1Catalog.
1. In your browser, under Data Builds, click the Stock link.
The browser scrolls to the Stock details section of the documentation.
This section gives information about the Stock fact build, such as the
SQL used to retrieve the transactional data and the elements of the
transformation model. We also see the business name and description
that we added in Task 1.
2. Click the Linked to Stock Connection link.
The browser scrolls to the Usage Summary details of the Stock
connection. This section indicates the different data sources that use
Stock to retrieve data.
3. Click the Location dimension delivery link.
The browser scrolls to the details of the Location dimension delivery
created in the Stock fact build. This section outlines the elements
subscribed to by this dimension delivery.
4. Close the Web browser.
Results:
We have created an HTML document that contains a detailed
description of the contents of the Day1Catalog.

CREATE BASIC BUILDS
Summary

discussed DecisionStream builds and build-related
terminology
created a basic dimension build by using the Dimension
Build wizard
created a basic fact build by using the Fact Build wizard
tested and executed a fact build
used Cognos PowerPlay to view the contents of a data mart
created catalog schema
documented a catalog


5
1. Getting Started
2. Create a Catalog
6. Derivations

Developers

CONFORMED DIMENSIONS
Objectives

define conformed dimensions
understand how conformed dimensions are used to
perform data integrity checking

Day 1 Review: A Private Data Mart
Yesterday, we created a data mart containing one fact table

and three dimension tables. We delivered all the tables by
executing a fact build.
What if we want to add another fact table called F_Sales that
will reference the three existing dimension tables?
It is possible to deliver a single, private data mart using a fact build. To deliver
this, you must include a fact delivery and one or more dimension deliveries. In
Demo 4-2, we used the Fact Build wizard to create one fact delivery and three
dimension deliveries. When we executed the fact build, the result was one fact
table and three dimension tables in the data mart.
However, what if we want to track a new fact (such as sales) that will use the
same dimension tables? Or what if we want to make a change to the
D_Stock_Location table? Any changes that we make will be overwritten when
the fact build is executed again.
To deal with these issues, we must create our dimension tables using a separate
process. We must design our dimension tables so that several fact tables can
reference them. When multiple fact tables can use a dimension, we say that the
dimension is conformed.

Understand Conformed Dimensions
A conformed dimension is a dimension that is standardized

across all data marts.
Location Sales Fact Time
Conformed Dimensions
Customer Distribution Fact Distributor
Product Order Fact Promotion
Because most companies have integrated business operations, they have a strong
requirement for an integrated enterprise data warehouse that mimics the business
structure. This large data warehouse must be flexible, fast processing, easy to
maintain, secure, and complete.
To maximize the benefits of shared dimensions, they must be conformed.

Conformed dimensions support using a single dimension table against multiple
fact tables.
In a production environment, you will typically create each dimension table in a

separate process by running dimension builds, rather than using dimension
deliveries in a fact build. If you deliver your dimension tables using fact build
dimension deliveries, you will continually overwrite the dimension data each time
you execute the fact build. This inhibits the development of conformed
dimensions.
In the slide example, three fact tables retrieve reference information from six
dimension tables: Location, Customer, Product, Time, Distributor, and
Promotion. However, each dimension table is created only once, and data inside
the tables is standardized.

Advantages of Conformed Dimensions
Deliver incremental data marts in a short period of time.

Independent data marts become part of a fully integrated
data warehouse.
Deliver a consistent view across your business.
Conformed dimensions are the elements that make the data warehouse an
integrated whole. They also reduce overall time for data warehouse development,
because each dimension is analyzed, designed, and created only once.
Conformed dimensions anticipate changes to the data warehouse as business

requirements evolve. This can require periodically adding new data marts,
dimensions, or fact tables to existing data marts. Using conformed dimensions,
you can add new data elements (attributes and tables) quickly to dimensional
structures in such a way that the modifications do not affect the old applications.
The conformed dimensions provide a consistent view of your business, which

makes it possible for you to drill from one area of your business to another. For
example, if your company does not have a single definition for product, you
cannot compare production to sales.

Conformed Dimensions Within Bus

Architecture
Identifying and designing the conformed dimensions is a

critical step in the architecture phase of data warehouse
design.
Dimensions
Distributor
Promotion
Customer
Location
Product
Time
Facts
Sales Fact X X X X X
Distribution
Fact X X X X
Order Fact X X X
Establishing conformed dimensions is a very significant step in the beginning of

data warehouse design.
You have to decide from the beginning which dimensions will be referenced by a
single fact table. If the same dimension is referenced in several tables, it becomes
the conformed dimension.
Note: In the architecture phase that precedes the implementation of any of the
data marts, the goals are to produce a master suite of conformed
dimensions and to standardize the definitions of facts. The resulting set
of standards is called the Data Warehouse Bus Architecture. For more
information on this, see Ralph Kimball's et al's The Data Warehouse
Lifecycle Toolkit.

Design of Conformed Dimensions
A commitment to using conformed dimensions is more than

just a technical consideration. It must be a business
mandate.
Lay out a broad dimensional map for the enterprise.
Define conformed dimensions at the most granular (atomic)
level possible.
Conformed dimensions should always use surrogate keys.
Define standard definitions for dimension and fact attributes.
Designing conformed dimensions is a process that requires careful analysis. Start

by identifying what dimensions you need, what hierarchy levels must be
represented, and what attributes each dimension will have. The latter requires that
you analyze the data sources: what tables are available, how similar their structures
are, and how to merge these structures.
A critical requirement for designing conformed dimensions is to keep the data

structure at the lowest level of granularity. If the lowest level of data that you keep
in the warehouse is summarized, you have lost any ability to analyze at the atomic
level. For example, if sales are summarized to the monthly level, then you cannot
analyze sales by day of week.
It is important to use surrogate keys for performance and tracking of history, as

discussed in Module 9, "History Preservation."
It is also important to identify the dimension and fact attributes. Data often
comes from different operational sources. These separate sources may have
different names for identical entities. For example, one application may refer to a
client as a customer, whereas another application refers to the client as a prospect.
Establishing standards for dimension and fact table attributes ensures that
attributes with different interpretations get different names and, equally
important, that attributes with the same interpretation get the same name.

Granularity in Conformed Dimensions
Conformed dimensions should be defined at the most

granular (atomic) level so that each record in these tables
corresponds to a single record in the base-level fact table.
D_Product Order Fact D_TimeDay

Product Id Day Id Day Id
Description Product Id Day
Product Type Customer Id Month
Type Cost Year
Description NumberOrdered
Product Line
Line Description
D_Customer
Customer Id
Last Name
First Name
Address
To provide maximum flexibility for business users, populate the fact tables with Instructional Tips
Point out to the students that, in this
atomic-level or transaction-level data. Therefore, you should also define the example, even the Product_Id may not be
conformed dimension tables at the lowest level of granularity. The grain of the the lowest level of granularity. A product
Product dimension is a single product, the grain of the Customer dimension is an tracked by this dimension table may be
individual customer, and the grain of the Time dimension is a single day. made up of individual components that
must be tracked in a separate Components
This absolute lowest level of granularity ensures that all business questions are dimension table.
answered at any level of summarization. It provides flexibility for users who can
add new reference data, extract information on each individual product,
customer, or location.
The low level of granularity does not eliminate the rolled-up fact and dimension
tables. On the contrary, it is good practice for any data warehouse to have both
transaction and snapshot fact tables. For example, when a company wants to
have a snapshot of fact data on monthly orders, it is better to create a summary
table on orders, which references the month level of the Time dimension.

Flexibility of Conformed Dimensions
Conformed dimensions are usually designed within star

schema data marts. For multiple granularity fact tables,
higher level views of dimensions can be used (or a snowflake
table).
View or Snowflake table
Time Dimension
Customer Order Fact
Customer Id Day Id
Last Name Product Id Time(Day)
First Name Customer Id Day Id
Address Cost Day
NumberOrdered Month Id
Period
Sales Fact
Product Month Id Time(Month)
Product Id Product Id
Description
View
Customer Id Month Id
Product Type AmountSold
Product Line Month
Revenue Period
The best practice is to use a star schema for data marts joined by conformed
dimensions, where each dimension is represented by a single table that has all
possible levels of granularity. Because this data structure is so simple, even
non-technical analysts can easily understand and maintain the data model.
The combination of atomic-level fact and dimensional data supports queries that
can report and summarize across any combination of dimensional attributes.
Most contemporary reporting and analysis tools are designed for exactly this type
of database.
However, as shown in the slide example, some situations require multiple levels
within a single dimension. For example, if it is known that the Time dimension
will be queried by Month and Day on a regular basis, a designer has two options:
• create a monthly view of the Time dimension (as recommended by

Kimball); you create this view by writing a SELECT DISTINCT
statement against the Time(Day) table that will retrieve only monthly data
• create a separate physical (snowflake) monthly table
Star and snowflake schema designs will be discussed later in this module.

Build Conformed Dimensions in a Data

Warehouse
Conformed dimensions are especially important for validating

incoming transaction data (for example, whether a sales
transaction refers to an existing product).
Product
Product D_Product
D_Product
Product(H) ProductNumber
ProductNumber
ProductKey
ProductKey
---------------------
---------------------
Lookup Product
Product(L)
F_Sales
F_Sales
Sales
ProductKey
A fact table often references a single level in each dimension. When checking data Instructional Tips
integrity, only that single level needs to be referenced. For performance reasons, it Point out to the students that there are
usually two hierarchies for every
is often best to use a lookup instead of a hierarchy in these cases. Lookups usually dimension. One hierarchy points to the
require fewer columns in the data source and less memory to process. operational system, and another hierarchy
or lookup points to the dimension table in
Unless you want the reference attributes to be available for calculations, the data the warehouse, as shown in the slide.
integrity lookup usually needs no more than two attributes: business key and
surrogate key. Surrogate keys will be covered further in Module 9, "History The hierarchy that points to the dimension
table should use the same template that
Preservation."
was created in the dimension build. This
template helps to reduce the number of
In the slide example, data from the Product hierarchy is delivered to the templates. This is indicated in the slide by
D_Product conformed dimension table through a dimension build. This showing only one template.
dimension table references a template, which lists the columns in the table and
how they behave.
The D_Product table is then referenced by a lookup. This lookup only contains Key Information
This slide is one of the most important in
the data necessary to perform data integrity checking. The Sales fact build, in turn, the entire course. It gives a high-level
references this lookup. When the Sales fact build processes incoming outline of why conformed dimensions and
transactional data, the lookup is used to ensure that each transaction refers to a templates are so important and how they
product that already exists in the D_Product table. are used for checking the validity of
incoming fact data.
Templates will be covered further in Module 7, "Templates, Lookups, and
Attributes." It is recommended that you refer to this
slide as often as necessary to continually
emphasize the "best practice" way of using
conformed dimensions.

Conformed Dimensions: Create
Input Hierarchies
Reference Conformed Dimensions
Data
Reference
Sources
Data Sources
Product Customer
Updates
Dimension Builds
D_Customer
D_Customer
Product Customer
Customer
1. Create input hierarchies.

Product
2. Create and execute dimension builds D_Product
D_Product
(include surrogate keys in the

dimension tables). Data Warehouse
Using DecisionStream, users can create conformed dimensions. The process Instructional Tips
Do not get sidetracked on snowflaking.
involves creating a dimensional framework, including the dimensions and the Students only need to know that it is
hierarchies to be used in the data warehouse. DecisionStream delivers the possible to deliver multiple tables in a
dimension structure through dimension builds. One build is created for each single dimension build if they require
dimension. The build creates a conformed dimension table in the data warehouse. different levels of granularity in the
dimension build.
Specify that you want to include surrogate keys in the resulting dimension tables.
Do not specify surrogate keys for the input data, as these will be ignored when
the data is processed. You are not maintaining surrogate keys in the operational
system, only in the data warehouse.
Surrogates are necessary to track slowly changing dimensions (SCDs). Slowly

changing dimension behavior is discussed in greater detail in Module 9, "History
Preservation."
Note: If you have determined that you require a snowflaked table (as noted on
the previous page), you can deliver more than one physical table in a single
dimension build.

Build slide.
Conformed Dimensions: Use
1. Create a second hierarchy or lookup Conformed Dimensions

from each dimension table.
2. Reference these structures in fact build
dimension elements.
D_Customer
D_Customer
Avoid delivering dimension tables in
fact builds. (Best Practice)
F_Sale
F_Sale F_Returns
F_Returns
Dimension Table
Hierarchies/Lookups
D_Product D_Customer D_Product

D_Product
Sale
Transactions ProductNumber
CustomerCode
Returns Data Warehouse
Updates ………………….ProductNumber
CustomerCode
Fact Builds ……………...
To use the dimensions created in the previous slide, you first create hierarchies or
lookups that reference the dimension tables in the data warehouse.
When you created the dimension tables, you specified all levels of your hierarchies.
This permits you to deliver all relevant attributes and levels to the dimension
tables. You may need all these attributes for reporting.
When you reference dimension data in a fact build through dimension elements,
you rarely need all these attributes. You need to create a smaller hierarchy or a
lookup containing all necessary attributes that will be referenced by this particular
fact build.
Lookups are discussed further in Module 7, "Templates, Lookups, and

Attributes."

Dimension Build Schemas
When you use the Dimension Build wizard, you have the
option of delivering dimension data using specific schema.
The most common schema options are star and snowflake.
A star schema will deliver the data to one dimension
table.
A snowflake schema will deliver the data to several
dimension tables.
DecisionStream also offers other schema that are outside the
scope of this course.
When you use the Dimension Build wizard, you choose one of five schemas to
organize the result. The star and parent-child schemas organize each dimension in
a single dimension table, whereas the snowflake, optimal snowflake and optimal
star schemas create more than one dimension table for each dimension.

Star Schema
Customer Customer Sales Product

• Customer Cd Type Area Line
Sales Staff • Customer Name
• Sales Rep Cd • Customer Type Cd
• Sales Rep Name Sales Product
• Customer Type Desc Customer Type
• Sales Area Cd Rep
• Sales Area Desc
Order Product
Order Fact Header
• Customer Cd
• Sales Rep Cd Order
• Product Cd Line
• Order Date
• Order_Qty
• Order_Line_Value
Product
• Product Cd
Date • Product Name
• Order Date • Product Type Cd
• Week • Product Type Desc
• Month • Product Line Cd
• Year • Product Line Name
A star schema represents a dimension in a single table with the levels of the
associated hierarchy represented as columns within that table. The primary key is
the member Id of the lowest level of the hierarchy.
A variant of the star schema is the optimal star schema, which is similar to a
star schema; however, the optimal star schema removes the descriptive
attributes from all non-base levels of the hierarchy and puts them in their
own tables, along with the Id attributes related to these attributes. This
structure saves storage space, and optimizes reporting performance.
The slide diagram depicts a star schema with four dimensions. The Product,
Sales_Staff, Customer and Date dimension tables were created in separate
dimension build processes. The primary key of each dimension table is linked to
one of the four dimension element columns (Customer_Cd, Sales_Rep_Cd,
Product_Cd and Order_Date) in the Order_Fact table. "Collapsing" the data
integrity legs of the appropriate tables in the original OLTP data source (Product
Line, Product Type and Product) produced the Product hierarchy, which was
then used to create the Product dimension table.

Snowflake Schema
Product Line
Date
• Product Line Cd
• Order Date
A time-based dimension is • Product Line Name
• Week
typically not snowflaked • Month
• Year
Product Type
• Product Type Cd
Customer Type • Product Type Desc
• Customer Type Cd Order Fact
• Product_Line Cd
• Customer Type Desc • Customer Cd
• Sales Rep Cd
• Product Cd
• Order Date Product
Customer
• Order_Qty • Product Cd
• Customer Cd
• Order_Line_Value • Product Name
• Customer Name
• Product Type Cd
• Customer Type Cd
Sales Rep Sales Area

• Sales Rep Cd • Sales Area Cd
• Sales Rep Name • Sales Area Desc
• Sales Area Cd
A snowflake schema represents a dimension in a series of tables that correspond

to the levels of the related hierarchy. The primary key of each table is the member
Id of each level and each table has a foreign key to the level above. A snowflake is
a normalized representation of the dimension, but it is more complex and harder
to manage, maintain and enhance.
A variant of the snowflake schema is the optimal snowflake schema. This

structure is similar to a snowflake schema, but instead of each table having a
single column to show information about the parent, there is a column for each
of the higher levels. You are strongly encouraged to avoid this structure if at all
possible; it can become very complex and does not necessarily provide any
significant performance improvements.
The slide diagram depicts a snowflake schema with four dimensions. Three of the
dimensions (for example, Product) have a separate dimension table for each of
their levels. The lowest level dimension table links to the fact table (Order_Fact)
through its primary key (in this case, Product_Cd). A time-based dimension
(represented in the slide example by the Date dimension table) is typically not
snowflaked.

Dimension Data Within Fact Builds
Fact builds use dimension data in four ways:

data integrity checking
surrogate key substitution
merging/aggregation
metadata delivery
In a fact build, you do not necessarily need all dimension table attributes. Fact
builds use dimension data in four ways:
• Data integrity checking. Ensure that fact dimension elements match

existing members in the data warehouse dimension tables. For example, a
product code referenced in the fact data must exist in the product
dimension.
• Surrogate key substitution. Replace the natural or business key with the
surrogate key before inserting it into the fact table.
• Aggregation. Roll up detail values to higher levels in the hierarchy.
• Metadata delivery. For example, creating a Transformer Model with all

levels of a hierarchy or an Impromptu catalog.
When delivering aggregation or BI metadata, you may have to declare an entire

hierarchy with all attributes from the dimension table.
If you are performing only data integrity checking or surrogate key substitution,
you only need the atomic-level business key and surrogate from the dimension
table. A lookup is often sufficient for this purpose.

Surrogate Key Substitution in a Fact Table
Dimension Table ProductNumber = Business Key Fact Table

with Surrogates ProductKey = Surrogate Key Records With
Surrogate Keys
ProductNumber
ProductKey ProductKey
Fact Table
Records With
Product IDs Replace
ProductNumber Load fact table
ProductNumber with surrogate records into
ProductKey DBMS
When creating the data integrity lookup, you must use surrogate keys instead of
business (production) keys. There is an option in the lookup properties that you
can use to replace the business key with the surrogate key. However, in the Time
dimension, you can use two options. If you want to see the real dates in the fact
table, it is better to keep the business key for reference, not the meaningless
surrogate key.
If the surrogate key becomes a primary key in the dimension table, the fact table
must reference this key to preserve the data integrity. By selecting the Use
surrogates when available box, you opt for the dimension through the surrogate
key.
Surrogate keys are discussed further in Module 9, "History Preservation."

Build slide.
Design Data Integrity Lookups Based on 3 clicks to complete.
Select an existing Create and name

1 the lookup
dimension 2
4 Click Use template for data

Select the dimension table access option button to select it
3
template and required attributes and specify the source database
and the table
The process of creating a lookup is similar to creating a hierarchy within a Instructional Tips
In general, anytime you access data from a
dimension. dimension table in the data warehouse, you
should use a template rather than a
The slide example shows the data integrity lookup. The Product conformed DataStream to access the data.
dimension table already exists in the data mart. The lookup uses an existing
template to access that table.
1. Select the dimension in which you want to create a lookup. The
dimension will already exist because it was used to create the data mart
dimension table.
2. Create and name the lookup.
3. On the Attribute tab, select the template used to deliver the data mart
dimension table. Also, in the Available Attributes box, select only
those attributes necessary for data integrity checking. These keys will
usually be the business key and the surrogate key of the dimension
level that you are checking.
4. On the Data Access tab, click the Use template for the data access option
button to select it, and specify the database connection and the table to be
used as a source.
The attributes of the template used to create the data mart dimension table also
named the columns in that table. Because the attributes and column names
match, DecisionStream automatically maps the table columns to the template
attributes.
Note: If the attribute or column names have been changed, you must use data
source access and map the column manually.

Data Integrity Lookup in a Fact Build
Reference the lookup in a fact build through a dimension element.
Click the Use surrogates

when available box to
select it
In the fact build, declare a dimension element and select the lookup on the
Reference tab.
There are no hierarchy levels because a lookup has been used.
Because Product is a dimension, ensure that the Use surrogates when available
box is selected. The new fact rows being inserted into the data mart must use the
surrogate keys of the existing dimension rows.
In order for this to work, there must be an attribute in the template that has a
behavior of surrogate key.

Dimension Builds & Conformed Dimensions
Dimension data is delivered to conformed dimensions

through dimension builds.
Create and maintain dimensions by using dimension builds,
not fact builds.
D_ProductH
D_StaffH
D_TimeH
D_VendorH
After you establish the conformed dimensions and create the dimensional It is very important to stress to the students
framework, deliver the dimension data into the dimension tables. Each that Cognos recommends that users create
dimension build references and delivers one dimension. and maintain dimensions through
dimension builds and not as a part of the
fact build.
When delivering into a data warehouse, where the dimensions are conformed, the
dimension builds become the important component in the dimension delivery. In
the slide, four dimension builds are created: Product, Staff, Time, and Vendor.
Each dimension is represented by a table: D_ProductH, D_StaffH, D_TimeH,
and D_VendorH accordingly.
A rule to follow for data warehouse design is to deliver conformed dimensions

before the fact builds. It is also recommended that you maintain all dimensions
by using dimension builds, not fact builds.
Although it is physically possible to create dimensions as part of a fact build, it is

not recommended. Almost all dimensions must be incrementally updated. It is
very difficult to develop and debug fact builds that simultaneously deliver fact
data and incrementally maintain dimensions.

Standardization of Incoming Data
Data for an integrated data warehouse with conformed

dimensions come from different legacy systems and outside
sources.
When designing a data warehouse, make the data from
disparate sources consistent.
Product
ProductCode P_112 D_Product
ProdName Tent
ProdDesc Canvas 2 man pup ProductSID 120
Oracle ……... ProductCode P_112
ProdKey 1212
Sybase Product Product ProductName Tent

ProdKey 1212
Description Canvas 2 pers
Name Tent
Desc Canvas 2 person tent
DecisionStream supports various data sources. When DecisionStream brings all

these sources together, it has to define a common structure and merge data from
different legacy systems.
For example, the Product table exists in two data sources, Oracle and Sybase.
However, the structure of the two tables is different. In the Oracle database, the
table Product has ProductCode, ProdName, and ProdDesc as attributes. In the
Sybase database, the Product table has ProdKey, Name, and Desc as the
attributes. When you merge these two tables, you must decide on common
column names.
A more significant problem is that the data values between the source tables may Technical Information
Cleansing data is not an easy task. This
differ. In the slide example, a single product is identified by different product topic will not be covered to the extent
codes and even different descriptions in the two separate systems. To merge required in this course.
these together as a single member in the product hierarchy, the data must be
"cleansed." Data cleansing is a complicated and time-consuming problem in data
warehousing.

Summary

define conformed dimensions
understand how conformed dimensions are used to
perform data integrity checking


6
1. Getting Started
2. Create a Catalog
6. Derivations

Developers

DERIVATIONS
Objectives

create derivations
apply operators and functions to derivations
apply derivations in fact builds
apply derivations in dimension builds
test derivations

What is a Derivation?
A derivation is a value that DecisionStream calculates rather

than obtaining it from the data source.
A derivation can be derived from other DecisionStream
objects.
A derivation can be used as:
source data in a fact build or dimension build
target data in a fact build
Sometimes you need to perform complex calculations on source data or target

data. Using DecisionStream, you can create a derivation to perform these
complex calculations and, where required, output the result in the data mart.
A derivation can be one of the following:

Instructional Tips
To get a string literal, such as FRANCE,
• a string literal; for example, FRANCE or 021 the CONCAT or any other text functions
can be used.
• a calculation defined in terms of other DecisionStream objects. For
example, a transformation model derivation can be defined using other
elements that exist in the transformation model.
DecisionStream\Help\Contents and Index\Index\derivations

DERIVATIONS
Where Can You Create a Derivation?
You can create a derivation in:

a data source of a fact build or dimension build
the DataStream of a fact build or dimension build
the transformation model of a fact build
You use data source or DataStream derivations when you perform calculations Instructional Tips
You can use derivations in an output filter.
on source data in a fact build or dimension build. A data source derivation lets Output filters are discussed in Module 21,
you perform a calculation on data that is accessed from a single data source. A "Delivery in Depth."
DataStream derivation lets you to perform a calculation on data that is accessed
from multiple data sources.
Create transformation model derivations when you want to perform calculations

on incoming data that has been merged or aggregated. You can also create a
transformation model derivation that uses data derived from the dimensional
framework.

When to Use a Derivation?
Use a derivation when:

you want to perform data cleansing or standardization
you want to avoid performing calculations in the SQL
statement of a data source
certain key performance indicators (KPIs) must be
derived
a calculation must be stored in the data mart that is
shared by users
Key Information
A derivation is a calculated value that can contain numeric and character
Using DecisionStream, a user can create
constants, operators, functions, and the names of other DecisionStream objects. the KPIs on which the enterprise can
The simplest expression is either a literal value or the name of a DecisionStream gauge the success of its critical areas.
object. DecisionStream has a rich library of built-in functions to assist you with
calculating derivations.
Calculations stored in the data mart assist in standardization, where all users apply
the same formulas. This produces consistency in the organization's use of the
data mart because all users apply one standardized calculation. For example, you
may need to concatenate a customer's first name and last name to produce a full
name for use in reports and queries.

DERIVATIONS
Operators for Derivations
Mathematical: Logical:
( ) =, !=
>=, <=
&
>, <, <>
+
IS
- AND, OR
* [NOT] LIKE
/ [NOT] BETWEEN
[NOT] IN
NOT ( ……….. )
Generally, mathematical operators combine numeric values to produce numeric

results. There are two exceptions: the unary minus operator and brackets. The
unary minus operator negates its operand. Brackets force expression evaluation
order.
Binary logical operators compare two values. Unary logical operators operate on a
single value. The result of a logical operation is either TRUE or FALSE.
Expressions can contain more than one operator. In such cases, DecisionStream
applies the operators in order of precedence.

General Functions for Derivations
Function Type Examples

Text Char(), Concat(), Trim(), Length(),...
Mathematical Abs(), Floor(), Trunc(), Round(), Mod()
Logical If(), IfNull()
Type Conversion ToChar(), ToDate(), ToInteger()
Date AddToDate(), SysDate(), DaysBetween()
Control RowsUpdated(), Exit(), SQL(), LogMsg(),..
Member Member(), Level(), IsAncestor()
You can use the functions in the slide in calculations to provide values for
derivations, for filters, and in the SQLTXT Designer.
• Text functions perform calculations on strings.
• Mathematical functions return the results of mathematical calculations.
• Logical functions return their results dependent on some test.
• Conversion functions convert data from one data type to another.
• Date functions perform calculations on date values, return date values, or

both.
• Control functions give some control over how DecisionStream executes

a build.
• Member functions refer to the DecisionStream dimensional framework.

Therefore, they are valid only within the context of DecisionStream
builds. You cannot use these functions when acquiring data from a
SQLTXT database.

DERIVATIONS
Build slide.
How to Define a Data Source Derivation 3 clicks to complete.
Add
Calculate
Test
Because a derivation does not come from the data source directly, you do not
have to modify the data source SQL statement. However, the fact build or
dimension build must contain the DecisionStream objects that are included in the
calculation.
To create a data source derivation:
• Add a derivation to the data source of the fact build or reference

structure (for a dimension build).
• Provide a name for it and a calculated expression that can be built using
operators, functions, or control statements.
• Test the expression. If you do not get the correct or any result, the
expression is invalid.

Build slide.
How to Define a DataStream Derivation
Add
Calculate
Test
Where there are multiple data sources set up for a build, a DataStream derivation
can include DecisionStream objects from any of the data sources.
To create a DataStream derivation:
• Add a derivation to the DataStream of the fact build or reference

structure (for a dimension build).
• Provide a name for it and a calculated expression that can be built using
operators, functions, or control statements.
DecisionStream\Help\Contents and Index\Index\derivations: defining in DataStreams

DERIVATIONS
Build slide.
How to Define a Transformation
Model Derivation
Insert
Calculate
Test
To create a transformation model derivation:
• Add a derivation element to the transformation model of the fact build.

Key Information
• Provide a name for it and a calculated expression that can be built using Because a derivation is a measure
operators, functions, or control statements. If the build has deliveries element, it can be added to the fact table
defined, you can add the derivation to the deliveries. delivery only.
DecisionStream\Help\Contents and Index\Index\derivations: adding to deliveries

When to Calculate a Derivation in the

Transformation Model
By default, a transformation model derivation is calculated

after aggregation.
You can choose to calculate the derivation before
aggregation.
Select the Calculate at Source box to calculate
first and then aggregate. Select an aggregation method
AverageRevenue
Quantity * UnitCost Revenue AVGRevenue
(Quantity*UnitCost)
MEASURE MEASURE DERIVATION AGGREGATION
DecisionStream provides two options for calculating derivations:
• Perform the calculation first, and then aggregate (summarize) the Key Information
calculated results. By selecting the first option, you can eliminate
rounding errors in the summary data. The
• Aggregate the data first, and then calculate the derivations. second option makes it possible to process
data faster.
Aggregation is a process of taking data across one hierarchical level and
summarizing it to the higher level.
If you select the Calculate at Source box, DecisionStream calculates a derivation

as soon as data is retrieved from the query, and then performs an aggregation. A
source derivation is calculated for detail data and aggregated as though it was a
measure. It is the equivalent of performing a calculation in the SQL query and
then treating the resulting column as a measure.
In the slide example, DecisionStream creates a derivation AverageRevenue that

first calculates revenue for each product by multiplying Quantity by UnitCost,
and then aggregates (summarizes) the result to the high level, Product Type, using
the AVG function. As a result, the derivation calculates an average row of the
revenue for each product type.
To aggregate the derivation, you must select the Calculate at Source check box to
activate the Aggregation tab and then select a function from the list of functions
in there.
Note: The aggregation process will be discussed in Module 15, "Aggregation."
DecisionStream\Help\Contents and Index\Index\derivations: aggregating data

DERIVATIONS
Derivation Timing Model
Data
Source
Data Pivot Merge Lookup Aggregation

Stream Validation
Data Source Derivation - Data Stream Derivation and Transformation Derivation -

Calculated with each record Transformation Derivation calculated Calculated post-Aggregation
At Source
Where you create a derivation determines when the derivation is calculated in the
fact build process.
A derivation created in the data source is calculated as each row is retrieved from
the data source.
A derivation created in the DataStream (or created in the transformation model

with the calculate at source option selected) is calculated after the source data is
pivoted, but before it is merged and aggregated.
A derivation created in the transformation model and not calculated at source is

calculated after the source data has been pivoted, merged, and aggregated. Technical Information
If you want to use merged data in a
This timing model also applies to derivations created in the DataStream or data hierarchy, you can merge the data using a
source of a hierarchy, except that there is no concept of a transformation model fact build, and then use this data as the
within a hierarchy. basis for your hierarchy.
Any derivation created in a data source, DataStream, or transformation model

can reference other derivations that were created previously in the same place.
For example, you can create a derivation in the DataStream that references
another derivation that was created previously in the same DataStream. The same
rule applies for data source and transformation model derivations. Derivations
are evaluated in the order they are specified in the user interface. If necessary, you
can reorder them.

Demo 6-1
Add a Derivation Element to a

Fact Build

DERIVATIONS
Demo 6-1: Add a Derivation Element to a Fact Build
Purpose:
We want to see the best-selling products. Therefore, we must
create a derivation that calculates the total sales for each
product.
Task 1. Add the SalesTotal derivation to DemoSales that

calculates the total sale value.
StartMod6.ctg.
The catalog is restored using the contents of the .ctg file.
3. Ensure the Builds node is expanded, and then expand DemoSales.
4. Right-click Transformation Model, and then select Insert Derivation.
The Derivation Properties window opens.
5. In the Name box, type SalesTotal.
Task 2. Insert the calculation formula for the derivation.
1. Click the Calculation tab.
2. Double-click the Elements folder in the left pane of the Expression box,
and then double-click Quantity.
The Quantity element is added to the right pane.
3. Double-click the Operators folder in the left pane of the Expression
box, double-click the Mathematical folder, and then double-click *
(multiply).
The multiplication sign is added to the right pane, next to the Quantity
element.
4. Double-click the Elements folder in the left pane of the Expression box,
and then double-click UnitSalePrice.

Task 3. Test the calculation formula.

1. Click the Test button.
The Values for Expression window opens.
Check marks to the left of Quantity and UnitSalePrice indicate that

DecisionStream has located a definition for each of these objects. The
scope of these definitions (shown in the Immediate box) is the
DemoSales fact build.
2. In the Input values box, double-click the Value column to the right of
Quantity to select the empty cell, type 5, and then press Enter.
3. Double-click the Value column to the right of UnitSalePrice, type 25,
and then press Enter.
4. Click the Type column to the right of Quantity, and then click the
ellipsis.
The Data Type dialog box appears.
5. In the Data type box, click NUMBER, and then click OK.
6. Repeat steps 4 and 5 for UnitSalePrice.

DERIVATIONS
7. Click Calculate.
The result appears as follows:
8. Click Close, and then click OK.

9. Save the catalog.
Results:
We have added a derivation element called SalesTotal to the
transformation model of the DemoSales build. The derivation
calculates the total sale for each product.

Summary

created derivations
applied operators and functions to derivations
applied derivations in fact builds
applied derivations in dimension builds
tested derivations

7
1. Getting Started
2. Create a Catalog
6. Derivations

Developers

TEMPLATES, LOOKUPS, AND ATTRIBUTES
Objectives

create a basic template and add attributes
create a dimension build that uses a template
use a template for dimensional data access
discuss lookups and when they should be used
reference a lookup in a fact build
discuss translation lookups

Understand Reference Structures
DecisionStream supports both multilevel (hierarchy) and

single-level (lookup) reference structures.
Hierarchy Lookup
AllProducts
Product Line
Product Type
Product
DecisonStream has two kinds of reference structures:
• hierarchies
• lookups
A hierarchy presents a particular view of a multilevel business dimension. The

levels of the structure represent parent-child relationships between members of
adjacent hierarchy levels.
A lookup is a simple, single-level reference structure. There are no parent-child

relationships in its structure. You can create a lookup for any level of dimensional
hierarchy. You cannot, however, have an ALL level in a lookup. A lookup
consists of a single DataStream and, optionally, a set of static members.
Performance is the same for both types of reference structure. They are treated
the same internally.
DecisionStream\Help\Contents and Index\Index\reference structures

Define a Hierarchy and its Attributes
Members at each level of a hierarchy have attributes of

special interest to DecisionStream.
A hierarchy consists of DataStreams and levels.
DataStreams gather together data source(s), each of which can contain SQL
statements, literals, and mapping information. DecisionStream uses a data source
to obtain the data from the database tables. You specify a data source by entering
an SQL SELECT statement to extract data from the database, and then adding
any literal(s) that may be required.
Once you have defined the data sources, you must create DataStream items to
map the data source columns returned by the SELECT statement(s) to the
attributes in each level of the hierarchy.
Members are unique instances of data at one level. Each member is defined by
level attributes.
In the slide example, the Vendor level has four attributes: VendorCode,
CompanyName, VendorTypeCode, and VendorCodeMR. The CompanyName
attribute serves as the ID for that level. Each vendor at that level has a different
value for the ID attribute, and this value uniquely identifies that vendor from all
the others.
Each vendor also has a caption of CompanyName, which may or may not be
unique. The VendorTypeCode attribute is the parent for the Vendor level. This
attribute links each vendor to the type it belongs to at the next level, which is
VendorType.
DecisionStream\Help\Contents and Index\Index\attributes, defining for levels

Add Attributes to a Hierarchy
Questions
When you define a hierarchy using the Hierarchy wizard, the wizard only Ask the students why you would want to
generates those attributes that are fundamental building blocks in the hierarchy add an attribute. Explain that a table may
definition. These attributes include ID, caption, and parent. require modification if additional source
data can be added that would increase
After you have created the hierarchy and its levels with the Hierarchy wizard, you value to the dimension table.
can enhance data analysis by manually adding other attributes. You define the
properties of these additional attributes in the template for that level. All the
attributes on one level are part of that level's dataset.
You can add attributes to a single level of a hierarchy. For example, in the
Product hierarchy, only members at the Product level will have a Color attribute.
You can also add attributes to multiple levels in the same hierarchy. For example,
in the Location hierarchy, the members at the Country, Region, and City levels
may have a Population attribute. The attributes that you choose to include in your
hierarchies will reflect your specific data analysis requirements.
In the slide example, we want to add a new attribute to the Products level of the
ProductH hierarchy. Since this attribute does not exist in the underlying
ProductT template, we must first add it to this template's list of attributes.
Templates will be discussed later in this module.
DecisionStream\Help\Contents and Index\Index\attributes, in dimensions

Build slide.
Add Attributes to a Hierarchy (cont’d) 2 clicks to complete.
In the slide example, an attribute called MyNewAttribute has been added to the Sometimes SQL columns can play more
than one role in a hierarchy. A column
ProductT template. This adds MyNewAttribute to the list of available attributes
returned by a SELECT statement can
for the Products level. simultaneously be the ID and the caption
for a level.
MyNewAttribute can then be added to the list of attributes in this level's dataset.
This level already has attributes specified as ID, caption, and parent. As a result,
MyNewAttribute does not have to be designated as having one of these three
roles.
After you add attributes to a hierarchy, you map them to items in the level or
hierarchy DataStream. The DataStream items are, in turn, mapped to columns
from tables in the source database. Once this mapping process is complete, data
can be delivered to the hierarchy. Mapping will be covered later in this module.

Use Consistent Hierarchy Naming

Conventions
When building hierarchies, it is essential that a consistent set

of attribute naming conventions be used.
If hierarchy attributes have intuitive and standardized names,
then the hierarchies themselves can be more easily used to
develop conformed dimension tables.
Type of Attribute Example

Surrogate product_skey
ID product_id
Caption product_caption
Parent ID product_parent_id
Hierarchies consist of levels, which in turn are made up of attributes, such as the
ID number of each product. Before developing the hierarchies in the dimensional
framework, the organization should decide on how these attributes will be named
and what attribute each will represent. If the hierarchy attributes have clearly
defined, standardized names, then it is easier to develop conformed dimension
tables from these hierarchies.
The slide example refers to a proposed hierarchy structure that will contain
product data. A naming convention has been decided upon for each of the
important attributes of the lowest level, product. For example, all the surrogate
key values will be tracked by the product_skey attribute.

Maintain Dimensions with Templates
A template defines the attributes and their behaviors for a

dimension table.
DecisionStream also uses templates to define attributes on
input and output hierarchies.
Create a template for each:
dimension delivery in a fact build
dimension build
data source in a hierarchy
lookup
When you create a hierarchy, you must define a template. If you use the
Hierarchy wizard, the templates are defined dynamically as you create the
various levels.
If you create a hierarchy manually, you must specify a template in each level of
the hierarchy before you can add the level to the hierarchy. You can choose an
existing template, or create one.
If you choose an existing template, it must reside in the Templates folder of the
dimension to which the hierarchy level is being added.
DecisionStream\Help\Contents and Index\Index\templates

Create a Template
You can:
create the template before you add the level to the hierarchy
insert the level and create the template at the same time
When creating a hierarchy with multiple levels, you have two options regarding
templates:
• Create the template first, add the level to the hierarchy, and then use the
template that you just created to define the attributes for that level. When
using this method, you do not have to add any attributes before you save
the template.
• Insert the level into the hierarchy and create the template that will
hold the required attributes for that level at the same time. The
template must contain at least one attribute. You must define at least
one attribute from the list of attributes in the template as the ID
before you can create the level.
The best practice, whenever possible, is to define all attributes within one
template, and then use that template when adding each level.
DecisionStream\Help\Contents and Index\Index\templates, creating

Use Templates in the Data Warehouse:

Example
In a dimension, you will usually have two templates. The first

(source) template keeps track of incoming data. The second
(reference) template describes the columns of the target
dimension table, which includes surrogate key behavior.
Product
Reference template
Product D_Product
D_Product
Product(H) ProductNumber
ProductNumber
ProductKey
ProductKey
---------------------
---------------------
Source template Lookup Product
Product(L)
F_Sales
F_Sales
Sales
ProductKey
In the slide example, the ProductH hierarchy references a template, which lists
the attributes used by the hierarchy, such as ProductNumber and ProductName.
This ProductH hierarchy will be delivered to the data warehouse through the
D_Product dimension table.
The dimension table, in turn, references a second template that lists the columns
of the table as well as their behavior (such as surrogate key). This second template
automatically generates surrogate key values (in this case, ProductKey).
The second template is also important for slowly changing dimensions, which are
discussed in Module 9, "History Preservation."

Use Data Sources in Hierarchies and Levels
Data sources define where DecisionStream can acquire data.

Data sources appear in the DataStream.
You can create:
1. one data source for all levels in a hierarchy
2. different data sources for each level in a hierarchy
3. a combination of 1 and 2
You can use an unlimited number of tables
A DataStream gathers together a number of data sources. Each data source in the data sources of a DataStream. The
contains an SQL SELECT statement, and may contain literal values. tables must be related in some way to
DecisionStream uses data sources to extract data from database tables. make the data in the hierarchy relevant.
A hierarchy may get its data from a single table that contains data for all the When you use the Hierarchy wizard,
levels. A hierarchy may also get its data from multiple tables. In the latter case, creating a hierarchy from multiple tables
necessarily creates one SQL statement per
you may have to define multiple data sources to extract data for the entire level. In more complex situations, one SQL
hierarchy, or separate data sources at each hierarchical level. Whether you use one statement provides attributes for only some
data source or multiple data sources will be determined by the complexity of the of the levels. In this case, several SQL
hierarchy and its levels. statements could be required to populate a
single level.
The ProductH hierarchy on the left side of the slide example gets its data on a
Each SQL statement runs against one
level-by-level basis: each level of the hierarchy has its own data source, each of single connection. However, not all SQL
which contains a separate SQL SELECT statement. By contrast, the statements within a hierarchy have to
VendorCustomerH hierarchy on the right side has one data source at the top. originate from the same connection.
This data source contains a single SQL SELECT statement that retrieves data for Therefore, each level has the potential to
all the levels in the hierarchy. run against many data sources to map to
that level's attributes.
DecisionStream\Help\Contents and Index\Index\data sources

Access Data for Hierarchy Levels
Select template data access to generate a simple SELECT

statement against a single table and ignore the SQL in your
data source(s), if any.
Select DataStream access to use SQL in data sources instead
of SQL generated by the template.
In general, use DataStream access against operational data,
and template access against data mart data.
You can access data for each level of a hierarchy using either:
• a template
• data sources in the level or hierarchy DataStream
A template creates its own SQL when it accesses data, which is very powerful
when maintaining slowly changing dimensions (SCDs). However, the designer has
no direct control over the SQL. If you want to write custom SQL statements to
acquire data for the hierarchy level, then you cannot use template data access.
Use template data access in situations where no modification to an SQL

statement is required. Template access requires you to specify a single table. If a
DataStream uses multiple tables, you cannot use template access.
If you want to acquire data from operational source systems, you will likely
require custom SQL. Therefore, template data access is rarely applicable against
operational data, unless you can use a simple query. A basic SELECT statement
only contains column names from a single table. If you use template access, you
cannot join tables, calculate fields, or specify a WHERE, ORDER BY, or
GROUP BY clause.
DecisionStream\Help\Contents and Index\Index\data access

Set Data Access for Hierarchy Levels
On the Data Access tab of the Level Properties dialog box, indicate how you Key Information
The SourceConnect data source indicated
want to obtain the source data. If you select template data access, you must on the left side of the slide example is listed
further indicate the data source and table that contain the data you require for in the catalog's Connections folder. The
that level. Connections folder is, in turn, stored in the
Library folder.
In the top slide example, each level of the ProductH hierarchy uses a DataStream
for data access. Each DataStream contains one data source, and each data source
contains a single SQL SELECT statement that retrieves data for one level.
In the bottom slide example, all the levels of the Product hierarchy use a template
for data access. As indicated by the Level Properties dialog box, the ProductLine
level gets its data from the ProductLine table contained in the GOSales data
source.
Remember the guideline from the previous page:
• use template access against a data mart
• use data source access against operational data

Demo 7-1
Create the Staff Hierarchy from

Multiple Tables

Demo 7-1: Create the Staff Hierarchy from Multiple Tables
Purpose:
Mangers want to create reports about the sales staff for the Great
Outdoors. To do so, we must create a dimension that will deliver
the necessary levels of information.
We will create a Staff hierarchy that will include the SalesCountry,

SalesBranch, and SalesStaff levels. We will then modify the SQL
to more effectively display the data for the SalesStaff level.
Task 1. Add a new dimension and hierarchy to represent

the Sales staff.
1. In DecisionStream, open GO_Catalog if it is not already open.
StartMod7.ctg.
The catalog is restored using the contents of the .ctg file.
4. In the Library folder, right-click the Dimensions folder, and then
click Insert Reference Dimension.
5. In the Name box, type StaffD, and then click OK.
6. Right-click StaffD, and then click Insert Hierarchy.
7. In the Name box, type StaffH, and then click OK.
We now require a template and levels for the StaffH hierarchy.
Task 2. Insert the SalesCountry level.
1. Right-click the StaffH hierarchy, and then click Insert Level.
2. In the Name box, type SalesCountry, and then click the Attributes tab.
There is no existing template that we can use for this hierarchy; therefore,
we must create one now.
3. Next to the Template box, click New.
The Template Properties window opens.
4. In the Name box, type StaffT.
Now we have to add the attributes for the SalesCountry level.
We want to reference a table that contains the necessary attributes for the
SalesCountry level.

6. Click Import Table.

7. Expand SourceConnect, click GOSCountry to select it, and then
click OK.
We are returned to the Template Properties window and all attributes
from the table are added to the template.
8. Click OK.
We return to the Level Properties window.
9. Double-click SalesCountryCode.
It is added to the Chosen attributes list.
10. Repeat step 9 to add Country, ISOThreeLetterCode,
ISOTwoLetterCode, ISOThreeDigitCode, CurrencyName, and
EuroInUseSince to the Chosen attributes list.
11. Click the Id box for SalesCountryCode and the Caption box for
Country to select them.
The attributes appear as shown below.
12. Click OK.

The level is added to the StaffH hierarchy.
Now we have to insert a data source for the level and then map the
columns to the attributes of the level.
Task 3. Add a data source for the SalesCountry level and
map the attributes.
1. Expand SalesCountry, right-click DataStream, and then click Insert
Data Source.
2. Click the SQL tab, and then click SQL Helper.
The SQL Helper window opens.
3. In the Database Objects pane, expand SourceConnect.
4. Right-click GOSCountry, and then click Add table select statement.
5. Click OK.
We return to the Data Source Properties window.

6. Click the Prepare button to select it, and then click Refresh.
The columns referenced in the SELECT statement are prepared for use
in the hierarchy.
7. Click OK to accept the SQL code.
The data source is added to the DataStream.
We now must map the columns to the DataStream items.
8. In the SalesCountry level, right-click DataStream, and then click
Properties.
9. Click the Auto Map button.
The columns in the data source are mapped to items in the DataStream.
The mapping appears as shown below.
10. Click OK.

We now need to map the DataStream items to the attributes of the level.
11. Right-click the SalesCountry level, and then click Mapping.
The DataStream Mapping window opens.
12. In the Level Attributes pane, click and drag SalesCountryCode to the
Maps To column beside the SalesCountryCode row in the DataStream
Item column.
13. Repeat step 12 for the other level attributes.
14. Click OK to close the DataStream Mapping window.

Task 4. Insert the SalesBranch level.

2. In the Name box, type SalesBranch, and then click the Attributes tab.
3. In the Template box, click the down arrow button , and then click
StaffT.
We need additional attributes for the SalesBranch level.
4. Click Edit.
We now have to import a table that contains the necessary attributes for
the SalesBranch level.
7. Expand SourceConnect, click GOSSalesBranch to select it, and then
click OK.
We are returned to the Template Properties window. All attributes from
the table are added to the template.
8. Click OK.
9. Double-click SalesBranchCode.
It is added to the Chosen attributes list.
10. Repeat step 9 to add ManagerCode, Address1, Address2, City,
Region, PostalZone, and CountryCode to the Chosen attributes list.
11. Click the Id box for SalesBranchCode, the Caption box for City, and the
Parent box for CountryCode to select them.
12. Click OK.

The SalesBranch level is added to the StaffH hierarchy.
Now we must insert a data source for the level, and then map the

Task 5. Add a data source for the SalesBranch level and

map the attributes.
1. Expand SalesBranch, right-click DataStream, and then click Insert
Data Source.
4. Right-click GOSSalesBranch, and then click Add table select
statement.
5. Click OK.
The columns of the SELECT statement are prepared for use in the
SalesBranch level.
7. Click OK to accept the changes.
We now have to map the columns to the level attributes.
8. In the SalesBranch level, right-click DataStream, and then click
Properties.
9. Click Auto Map.
The columns in the data source are mapped to items in the DataStream.
10. Click OK to close the DataStream Properties window.

11. Right-click the SalesBranch level, and then click Mapping.

12. In the Level Attributes pane, click and drag SalesBranchCode to the
Maps To column beside SalesBranchCode in the DataStream Item
column.
13. Repeat step 12 for the remaining level attributes.

Task 6. Insert the SalesStaff level.
2. In the Name box, type SalesStaff, and then click the Attributes tab.
3. In the Template box, click the down arrow button, and then click StaffT.
We need additional attributes for the SalesStaff level.
4. Click Edit.
We now have to import a table that contains the necessary attributes for
the SalesBranch level.
7. Expand SourceConnect, click GOSSalesStaff to select it, and then click
OK.
We return to the Template Properties window.
All attributes from the table are added to the template. We don't need
FirstName and LastName as two separate columns; therefore we will
combine them later by using a data source derivation. We can now delete
these columns and create a new attribute, Name.
8. Click FirstName to select it, and then click Delete.
9. Double-click LastName, type Name, and then press Enter.
We will use the SQL code to join FirstName and LastName as one
column.
If the SalesBranchCode attribute is duplicated in the list, click one of
them, and then click Delete.

10. Click OK. We are returned to the Level Properties window.

11. Double-click SalesBranchCode.
It is added to the Chosen attributes list. Instructional Tips
12. Repeat step 11 to add Position, WorkPhone, Extension, Fax, Email, In DecisionStream 7.1, duplicate attributes
are automatically removed from the
DateHired, SalesStaffCode, and Name to the Chosen attributes list. template when you import them from a
13. Click the Id box for SalesStaffCode, the Caption box for Name, and the table.
Parent box for the SalesBranchCode to select them.
In step 13, ensure that the students add
The attributes appear as shown below. the attributes one by one (rather than
14. Click OK.

The level is added to the StaffH hierarchy.
Now we must insert a data source for the level and then map the
Task 7. Add a data source for the SalesStaff level.
1. Expand SalesStaff, right-click DataStream, and then click Insert Data
Source.
4. Right-click GOSSalesStaff, and then click Add table select statement.
5. Click OK to close SQL Helper, click the Prepare button to select it, and
then click Refresh.
The columns are prepared for use in the level.
6. Click the Derivations tab, and then click Add.
7. In the Name box, type StaffName, and then click the Calculation tab.

8. In the right pane, type Concat(FirstName, ' ' , LastName )

9. Click OK to close the Derivation Properties window, and then click OK

to close the Data Source Properties window.
Task 8. Map data source columns to DataStream items,
and DataStream items to level attributes.
1. In the SalesStaff level, right-click DataStream, and then click
Properties.
2. Click Auto Map.
The columns of the data source and the data source derivation are
mapped to the DataStream items.
3. In the DataStream Items pane, click StaffName, and then click Edit.
The DataStream Item Properties window opens.
4. In the Name box, type Name, and then click OK.
5. In the DataStream Items pane, click FirstName, and then click Delete.
6. In the DataStream Items pane, click LastName, and then click Delete.

8. Right-click the SalesStaff level, and then click Mapping.

9. In the Level Attributes pane, click and drag SalesBranchCode to the
Maps To column beside the SalesBranchCode row.
10. Repeat step 9 for the remaining level attributes.
11. Click OK to close the DataStream Mapping window, save the catalog,
and then keep DecisionStream open for the next demo.
Results:
By creating a hierarchy that has various levels, managers can
create detailed reports about the sales staff for the company.

Create Literals
Use a literal to return a value that is constant for every row

that a data source returns.
Golf Other
Products
Golf Other
Focus Group
Literal values are static pieces of data that the DataStream can return. The literal
values remain constant for every row that is returned. Use a literal value to flag a The slide example shows two different data
piece of data that is returned from the data source. sources being used for one level called
Products.
You can also use a literal value when accessing two data sources in the
DataStream to provide data for a hierarchy level. An example of a literal value The level obtains other data, as well as a
might be the letter C for Current and H for Historical. In this case, use the literal fourth attribute called Focus Group, that
to represent whether a value comes from current data or historical data. indicates which source the data is coming
from. Source 1 is a table called Golf, and
In the slide example, we want to find information regarding a range of sports Source 2 is a table called Other. The literal
equipment. The DataStream references two different data sources: Golf returned from each source will indicate
Equipment and Other Equipment. Each data source returns values that fall within which table was accessed for the data.
a certain range, which is determined by the SQL code. We create a literal called As another example, two different data
Golf to flag any equipment that the first data source returns. We flag any sources are used to produce dimension
equipment that the second data source returns with another literal called Other. data. One source uses French data, while
the other uses German data. You define a
The literals indicate which data source the values are being returned from. The literal that indicates which source the data
two literals are included under a new attribute, such as Focus Group, for that came from: the French data source or the
level. German data source.
DecisionStream adds these literal values to each row. You can achieve the same
result by inserting a constant in the SQL SELECT statement. Using a constant,
however, is less efficient because the database adds the constant to the row
before sending it to DecisionStream. For a million rows of data, adding a
constant such as "Golf" to the SQL would equate to four million additional bytes
of data that would have to be transmitted across the network.
DecisionStream\Help\Contents and Index\Index\literals, adding to data source

Build slide
Map Literals to DataStreams and
Attributes
Map the literals for each data source to the new attribute for
the level in the hierarchy.
To return a literal value along with the other source data acquired through SQL,
you must perform mapping in two places:
• in the DataStream Properties dialog box, from the literal to one or more
DataStream items
• in the Level Properties dialog box, from the resulting DataStream item to
one or more level attributes
In the slide example, the Products level gets its data from two sources: one
returning golf product data (Golf), and the other returning data about other
products (OtherProducts). Each data source includes a literal that indicates
whether each row is derived from the Golf or OtherProducts data source. The
Golf and Other literals are mapped from the data source to the appropriate items
in the DataStream (Golf or Other).
The Products level includes an attribute called FocusGroup. This attribute

indicates whether each member of the Products level is a golf-related product or
of some other type. To include this information in the delivered hierarchy, the
Golf and Other DataStream items are both mapped to the FocusGroup
attribute.
DecisionStream\Help\Contents and Index\Index\mapping, level attributes to DataStream

items

Use Static and Dynamic Members
"ALL"
Static
member
Product Class
Classes table
Product
Family
Dynamic Families table
members
Products Product
table
Stock-keeping unit SKU

table
Instructional Tips
DecisionStream can use both dynamic members and static members to populate Static members are members that are not
each hierarchy level. already in the source data.
Dynamic members: If source data exists that does not have a

parent ID, create a static member to foster
• are populated by using SQL statements to extract data from reference this data into an ALL level.
tables in a database
For example, you may have source data
• change as the reference data changes that is organized into three levels: Rep,
Branch, and Country. You can roll up the
• are used wherever possible Country data into a foster member called
Region.
Static members are members of hierarchies or lookups that:
• are predefined Lookups will be discussed later in this
module.
• often define an ALL member at the top of a hierarchy
Questions
• are useful for things that never change, such as months Q: What is the difference between a literal
and a static member?
• do not exist explicitly in the source data
A: A literal adds a column to a member,
You add a static ALL member to act as a foster parent for any record that does while a static member adds a new
not have a parent associated with it. By doing so, the record can be grouped member, not found in the database, to
under the ALL member. Fostering is discussed later in this module. the level.
Before adding a static member to a level, ensure that you have the attributes for
that level already defined. To edit a static member, double-click its cell, and then
enter the new value.
DecisionStream\Help\Contents and Index\Index\static members

Use Fostering to Fix Hierarchy

Inconsistencies
Fostering addresses inconsistencies with hierarchies where:

members do not have a parent
an explicit foster parent does not exist at the next
highest level
ALL
Europe ?Unknown
Europe
Continent
England France
France ?Unknown
Country
John Jim Sue

Smith White Jones John Jim Sue Tom
Smith White Jones Green
Instructional Tips
By default, DecisionStream provides a foster parent for any member that has a
In the slide, there are three levels, Region,
missing or unknown parent. This mechanism is known as fostering. Country, and Rep.
In a typical hierarchy, each level contains a set of members. At every level except The example shows Tom Green being
the highest, each member is related to a member at the next-highest level: the fostered.
parent of the member. Each member is also related to members at the
next-lowest level: the children of the member. When Tom Green is added to the
hierarchy, it has no parent level.
A foster parent is an artificially introduced member that acts as a parent for It is therefore fostered under the
members that either have no defined parent or whose defined parent cannot be ?UnknownCountry level, which in turn is
found at the next highest hierarchy level. fostered under the ?UnknownContinent
level, and finally under the ALL level.
DecisionStream\Help\Contents and Index\Index\fostering

Rename Foster Parents
The default name for a foster parent is the name of the level,
prefixed with Unknown.
You can rename the foster parent of a level by using a static
member.
Instead of using the default foster parent

(Unknown ProductLine), the static member
will serve as the foster parent for those
product types that are not part of an existing
product line.
When a member does not have a parent, DecisionStream automatically creates a

foster parent and gives it a default name consisting of the name of the level
prefixed with "Unknown." Where the hierarchy is entirely numeric,
DecisionStream names this foster member by prefixing a minus sign (-) to the
level number.
You can rename a foster member, substituting the default name assigned by
DecisionStream with a name of your choice. You rename foster members by
using static members.
For each level of a hierarchy, you can set one static member as the foster
member. You can either create a static member that serves specifically as the
foster member, or use an already existing static member.
If you want to use an existing static member as the foster member, select the
Foster box adjacent to the required static member. Otherwise, create a static
member and then select the adjacent box in the Foster column.
In the slide example, the product types that did not roll up into an existing
product line were assigned a default foster parent with caption of Unknown
ProductLine. To replace this, a static member was created that has an Id of
00000000000000 and a caption of Default ProductLine. This new static member
was assigned the foster parent role.
DecisionStream\Help\Contents and Index\Index\foster parents, naming

Demo 7-2
Manually Add a Level to a Hierarchy

Demo 7-2: Manually Add a Level to a Hierarchy
Purpose:
We have been asked to add a additional level to an existing
hierarchy that will provide further details for reporting and will
group records that have no parent into a separate group for
later analysis.
Task 1. Insert the AllProducts level into the ProductsH

hierarchy.
1. Expand the ProductD dimension, right-click the ProductsH hierarchy,
and then click Insert Level.
The Level Properties window opens.
2. In the Name box, type AllProducts.
4. Beside the Template box, click New.
5. In the Name box, type ProductT, and then click the Attributes tab.
6. Click Add, type AllProductsId, and then press Enter.
7. Repeat step 6 to create the AllProductsCaption attribute, and then click
OK.
8. Double-click AllProductsId and AllProductsCaption to move them to the
Chosen attributes list.
9. In the Chosen attributes list, click the Id box for AllProductsId and the
Caption box for AllProductsCaption to select them.
10. Click OK.
The level is added to the hierarchy.
11. Right-click AllProducts, and then click Move Up.
12. Repeat step 11 twice more.
The level is moved into the correct position in the hierarchy as shown
below.

Task 2. Add a static member to the AllProducts level.

1. Expand the AllProducts level.
2. Double-click Static Members.
The Static Members window opens.
3. Click Add.
4. In the first row, under AllProducts, type ALL.
5. In the first row, under AllProductsCaption, type AllProducts.
6. Under Foster, click the box to select it, and then click OK.
The static ALL member is added to the level. The hierarchy can now
aggregate up to the AllProducts level.
7. Save your work.
Task 3. Create a dimension build for the ProductD
dimension.
1. On the toolbar, click the Dimension Build wizard button .
The Dimension Build Wizard window opens.
2. In the Name box, type Product, and then click Next.
3. In the Dimension to be delivered box, click ProductD.
4. In the Deliver into connection box, click TargetConnect, and then click Instructional Tips
Next. When students create the build, they may
forget to choose the correct database to
5. Click Next to accept the default schema naming conventions. deliver the data into.
6. Click Next to accept the default features of the build. If they mistakenly selected the wrong
7. Click Next to accept the default properties of the build. database, use SQLTerm to go into that
database, and drop the table. Then, go
8. Click the Add Surrogate Keys to the Dimension Tables check box to back into the Dimension Build Properties,
select it. click the Dimension tab, and then change
the table selected in the Deliver into
9. Click the Add Change Tracking Attributes to the Dimension Tables database box.
box to select it, and then click Next.
10. Click Next to accept the default change tracking behavior.
11. Click Finish, and then save your work.
The dimension build is complete.
12. In the Builds folder, click Product to select it.

13. From the toolbar, click the Execute button .

The DOS window appears and 115 rows are inserted into the dimension
table.
15. Save your work and keep DecisionStream open for the next demo.
Results:
By adding a top level to the hierarchy, any record without a
parent will be grouped into the level and can then be analyzed
further.

Demo 7-3
Update and Configure a Level in a

Hierarchy

Demo 7-3: Update and Configure a Level in a Hierarchy
Purpose:
Certain records in the VendorCustomerH hierarchy have no
associated region. We have been asked to update the
VendorCustomerH hierarchy to accommodate these records
by using City as the region.
Task 1. Use SQL Helper at the Region level to update the

code.
1. Expand VendorD, right-click the VendorCustomerH hierarchy, and
then click Explore.
If you receive a message prompting you to save the changes, click OK.
2. Click OK, and then click OK again to dismiss the warning message.
3. Expand ALLVendorCustomers (ALLVendorCustomers), and then
maximize the window if necessary.
4. Expand ActiForme (1), and then click 195(195).
Notice that VendorSite 195 (195) has no associated region. If you
continue to browse the data, you will find a number of other records
that have no region.
We will add the code to give these records a value by using the value for
City.
6. Expand the VendorCustomerH hierarchy DataStream, right-click
DataSource1, and then click Properties.
8. Execute the SQL Query.
391 rows are read.
9. Scroll through the records.
Notice a number of records have no region associated with them.
10. In the Source SQL pane, modify the code as shown below.
Instructional Tips
The columns in the SQL statements do not
have to be indented (as shown) for the
statement to work. However, indenting may
make these statements easier to read.

11. Execute the query again.

217 rows are read.
Notice that the records without a region value do not appear.
12. Copy the SQL statement to the clipboard.
We will use this statement shortly.
13. Click OK.
The columns of the data source are prepared for use in this level.
15. Click the General tab, and then in the Name box, type
HaveRegion_DataSource.
16. Click OK.
Task 2. Supply the missing Region code.
1. In the VendorCustomerH hierarchy, right-click DataStream, and then
click Insert Data Source.
2. In the Name box, type NoRegion_DataSource.
4. In the Source SQL pane, paste the code that you copied in Task 1, and
then modify the code so that it appears as shown below.
5. Execute the SQL query.

174 rows are read. Notice that these rows are missing a value for the
Region column.
then click Refresh.
8. In the Name box, type Region, and then click the Calculation tab.

9. In the right pane, type Concat( ToChar('['), City, ToChar(']') )

10. Click OK to close the Derivation Properties window.

11. Click the SQL tab, click the Prepare button to select it, and then click
Refresh.
The columns of the data source are prepared for use in this level.
12. Click OK to close the Data Source Properties window.
Task 3. Map the columns from the second data source.
1. In the VendorCustomerH hierarchy, right-click DataStream, and then
click Properties.
2. In the DataStream Items pane, click each of the items and then click
Delete.
The DataStream items are mapped to the columns of the second data
source (NoRegion_DataSource).

4. In the left column, under NoRegion_DataSource, click the Region row

that is third from the bottom, and then click Clear.
We want to map the Region DataStream item to the Region data source
derivation, not to the incoming value from the Region column. We know
that, for this data source, all the values for the Region column are null.
Therefore, we want to substitute the value for the City column instead,
concatenated with brackets.
5. Click OK to accept the mappings.

6. Right-click the VendorCustomerH hierarchy, and then click Mapping.
7. Click the Clear All button.
8. Click and drag VendorSiteCode from the Level Attributes pane to the
Maps To column, beside the VendorSiteCode row in the DataStream
Item column.

9. Repeat step 8 to map all of the level attributes to the DataStream items.
Instructional Tips
Each DataStream item must be mapped to
the correct level attribute, as shown in step
9. Otherwise, you may receive an error
when you attempt to explore the hierarchy.
10. Click OK.

11. Save your work, and then explore the hierarchy in Reference Explorer.
12. Close Reference Explorer, and then keep DecisionStream open for the
next demo.
Results:
By adding the additional SQL code, the records that did not
have a region will now show the City value as the region.

Demo 7-4
Create an Advanced Hierarchy

Demo 7-4: Create an Advanced Hierarchy
Purpose:
Vendors are customers of the Great Outdoors and sell
products distributed by the Great Outdoors. We must create a
hierarchy to represent these vendors. We will create the first
part of the VendorH hierarchy using the Hierarchy wizard, and
then manually insert the remaining levels to complete the
hierarchy.
Task 1. Use the wizard to create the first two levels.

1. On the toolbar click the Hierarchy Wizard button.
The Hierarchy Wizard window opens.
2. Click Create the hierarchy from multiple tables. (Snowflake
Schema), and then click Next.
3. In the Enter the name of the hierarchy box, type VendorH.
4. In the Select the reference dimension to use for this hierarchy box, click
VendorD, and then click Next.
5. Click the Include a top level with an 'All' member check box to clear
it, and then click Next.
We do not need to aggregate the data higher than the first level in the
hierarchy.
Task 2. Define the levels.

1. Click VendorH.
2. Click Add.
3. In the Name box, type VendorType.
This is the name of the top level.
Make sure the selected database in the Source database list is
SourceConnect.
4. Next to the Source table box, click Browse for table .
5. Click the GOVVendorType table to select it, and then click OK.
6. In the Available attributes box, double-click VendorTypeCode and
TypeName to add them to the Chosen attributes list.
7. In the Chosen attributes list, click the Id box for VendorTypeCode and
the Caption box for TypeName to select them.

8. Click OK.
The level is added.
9. Repeat steps 2 to 8 to create a Vendor level.
Name the level Vendor, and use GOVVendor as the source table. Add
VendorCode, VendorCodeMR, CompanyName, and VendorTypeCode
to the Chosen attributes list. Make VendorCode the Id, CompanyName
the Caption, and VendorTypeCode the Parent.
10. Click OK, click Next, and then click Finish.
Task 3. Add the Country level manually and its attributes.

1. Right-click the VendorH hierarchy, and then click Insert Level. Task 3. Step 1
Bug: There is a bug in the product when
The Level Properties window opens. you try to add a level manually to the
Vendor Hierarchy after using the Hierarchy
2. In the Name box, type Country, and then click the Attributes tab. Wizard.
We need to create a new template and add the attributes to it.
Have the student save the catalog and then
3. Click New. close it. Open the catalog again, and
everything will work correctly.
4. In the Name box, type CountryT.
Here we add the necessary attributes for the level. We need three
attributes: Id, Caption, and Parent.
6. Click Add three times.
Three new attributes are added.
7. Double-click Attribute1, type VendorCountryId, and then press Enter.
This attribute will be the ID attribute for the template.
8. Double-click Attribute2, type Country, and then press Enter.
This attribute will be the caption attribute for the template.
9. Double-click Attribute3, type VendorCode, and then press Enter.
This attribute will be the parent attribute for the template.
10. Click OK.
The attributes are added to the list of available attributes.
11. Double-click each attribute to add them to the Chosen attributes list.
12. Click the Id box for the VendorCountryId attribute, the Caption box for
the Country attribute, and the Parent box for the VendorCode attribute
to select them.
13. Click OK.
The level is created with its attributes.

Task 4. Link the Country level to the Vendor level by

referencing the GOVVendorSite table.
1. Under the VendorH hierarchy, expand the Country level.
2. Right-click DataStream, and then click Insert Data Source.
We need to edit the SQL code to create the source for the ID attribute as
well as obtain the correct vendor key to link back to the Vendor level.
The tables that are in the database appear.
5. Right-click GOVCountry, and then click Add table select statement.
The SQL code for the table appears in the SQL Query window.
Edit the SQL code as shown below.
We included a WHERE clause so that the Country level will link back to
the Vendor level.
6. Click OK to close SQL Helper, click the Derivations tab, and then click
Add.
7. In the Name box, type VendorCountryId, and then click the
Calculation tab.
8. In the right pane, type Concat( ToChar(VendorCode), ' ',
ToChar(VendorCountryCode) )
9. Click OK to close the Derivation Properties window.
10. Click the SQL tab, click the Prepare button to select it, and then click
Refresh to prepare the columns for use in the level.
Task 5. Map the attributes for Country to the data source.
1. Under the Country level, right-click DataStream, and then click

Properties.
2. Click Auto Map.

3. In the DataStream Items column, click VendorCountryCode, and then

click Delete.
The results appear as shown below:

5. Right-click the Country level, and then click Mapping.
6. Click and drag the attributes from the Level Attributes pane to map them
to the corresponding DataStream items on the left.
The results appear as shown below.
Task 6. Add the Site level manually and its attributes.

1. Right-click the VendorH hierarchy, and then click Insert Level.
2. In the Name box type Site, and then click the Attributes tab.
We need to create a new template and add the attributes to it.
3. Click New.
4. In the Name box, type SiteT, and the click the Attributes tab.
Here we will add the necessary attributes for the level. You must add the
Id, Caption, and Parent attributes for this level.
5. Click Add three times.
Three new attributes are added.

6. Double-click Attribute1, type SiteId, and then press Enter.

7. Double-click Attribute2, type City, and then press Enter.
8. Double-click Attribute3, type VendorCountryId, and then press Enter.
9. Click OK.
The attributes are added to the list of available attributes.
10. Double-click each attribute to add it to the Chosen attributes list.
11. In the Chosen attributes list, click the Id box for SiteId, the Caption box
for City, and the Parent box for VendorCountryId to select them.
12. Click OK.
The level is created with its attributes.
Task 7. Link the Country level to the Vendor level, by using

the GOVVendorSite table.
1. Expand the Site level, right-click DataStream, and then click Insert
Data Source.
We need to edit the SQL code to create the source for ID attribute as
well as obtain the correct vendor key to link back to the Vendor level.
3. Under Database Objects, expand SourceConnect.
4. Right-click GOVVendorSite, and then click Add table select
statement.
then click Refresh.
7. In the Name box, type VendorCountryId, and then click the
Calculation tab.

8. In the right pane, type Concat(ToChar(VendorCode), ' ' ,

ToChar(CountryCode) )

Task 8. Map the attributes for Site to the data source.

1. Expand the Site level, right-click DataStream, and then click
Properties.
2. Click Auto Map.
3. In the DataStream Items column, click VendorCode, and then click
Delete.
4. In the DataStream Items column, click CountryCode, and then click
Delete.
Key Information
The DataStream items in the far right
column may not be visible until you click
the Auto Map button. When the button is
clicked, the DataStream items are
automatically created and mapped to the
appropriate columns in the data source.
5. Click OK, right-click the Site level, and then click Mapping.

6. Click and drag the attributes from the Level Attributes pane to map them
to DataStream items on the left.

8. Click VendorH, and then run Reference Explorer to examine the
results.
9. If you are prompted to save your changes, click OK.
10. Click OK, then click OK again.
11. Expand Golf Shop, and then The Golf Hut.
12. Click United States to select it.
Notice that the Id for Country level is consists of two numbers

concatenated, as we stipulated in the data source derivation.
Results:
We created a hierarchy to represent the vendors to which the
Great Outdoors sells its products. We created the first part of
the VendorH hierarchy using the Hierarchy wizard, and then
manually inserted the remaining levels to complete the
hierarchy.

Understand When to Use Lookups
Create lookups to organize reference members that do not need

to be organized hierarchically.
Use lookups to:
check the data integrity of fact rows against a single level of
an existing conformed dimension
perform data translation (for example, a currency lookup)
clean complex unstructured data from source systems before
hierarchies can be properly determined (optional lookup)
Lookups are primarily used for reference sources that will not be
analyzed dimensionally.
There are some cases when you do not need a multilevel reference structure to
organize the dimension data in your data warehouse.
For example, create a lookup if you want to check data integrity against a single
level, usually the lowest level, of a conformed dimension.
Tables are often created just to aid in data transformations. These tables are not
typically used in dimensional analysis. For example, you can create a table for
currency conversion. This table is used only to translate world currencies into a
standard currency and will never be the subject of dimensional analysis. It would
be more appropriate to base this type of table on a lookup rather than a
hierarchy.
Lookups are widely used for cleaning the incoming data from various
unstructured data sources. These lookups are called optional lookups. This type
of lookup is often used to determine whether records from various databases
match. Optional lookups are discussed later in this course.
DecisionStream\Help\Contents and Index\Index\lookups, creating empty

Design a Lookup
There are five fundamental requirements for a lookup:

a dimension (existing or new)
a lookup object in the dimension
a template to specify attributes
a data access method (template or SQL)
DataStream and attribute mapping (if using SQL)
A lookup has the same fundamental requirements as a hierarchy. As with a

hierarchy, you must specify the template that holds the lookup attributes.
You must also specify how the members of the lookup are loaded into memory:
in other words, a data access method for the lookup.
You have two choices for data access: you can use template access, or you can
write the SQL yourself. If you write your own SQL SELECT query, you must
map the resulting columns to the items in the lookup's DataStream, and then
map the items in the DataStream to the attributes of the lookup (such as Id).
DecisionStream\Help\Contents and Index\Index\lookups, creating

Build Slide.
Design a Lookup to Translate Source Data
2a Create a new template

Create a new dimension
1
and a lookup
2b Import or create the template

attributes
Move available attributes to
2c
Chosen Attributes
You may want to use a lookup to translate certain data values to other data
values. The slide example shows how to create a lookup that converts currency
values that exist at various rates in the database.
Creating a translation lookup involves the following steps:
1. If necessary, create a new dimension and a lookup in that dimension.

Although the lookup data will never used for dimensional analysis,
DecisionStream requires a dimension object as a container to hold the
lookup.
2. Define the lookup attributes in a template.
a. If necessary, create a template. If the dimension is new, an existing

template may not be available.
b. Add attributes to the template. You can specify the attribute behavior
if you want to maintain the resulting table in the future. However,
lookups do not require attribute behavior. You can create the
template attributes manually, or you can add them by importing
columns from a table. Keep only the attributes that you require for
the lookup.
c. In the Lookup Properties dialog box, add the necessary attributes in

the template to the lookup. One attribute must be declared as the Id.
DecisionStream\Help\Contents and Index\Index\lookups, defining attributes

Build Slide.
Design a Lookup to Translate Source 3 clicks to complete.
Data (cont’d)
4 Insert a Data Source
Select Use
DataStream for data
3
access
Map the Data Source

5a columns to the DataStream
items
5b Map the DataStream items

to Lookup attributes
3. The columns in the tables containing currency data must be manipulated

in SQL to match the incoming fact data. Therefore, you must use
DataStream (SQL) access.
4. Insert the data source and create the SQL query and any necessary
derivations.
5. Map the data returned from the SQL SELECT query:
a. Map the columns from the data source to items in the DataStream.
b. Map the items in the DataStream to the attributes of the lookup.

Reference a Translation Lookup in a

Fact Build
As soon as the reference is established, you can use the lookup

columns for calculations in derivations and filters.
Use the lookup columns for calculations If never delivered
A translation lookup does not usually use surrogate keys. In the slide example, the
currency table is never joined directly to the fact table. It is used to convert values
from one currency into another currency.
The lookup often contains many attributes from the translation or reference
table. As soon as a lookup is referenced by a dimension element, its attributes can
be accessed and used for calculations in derivations and filters.
If a dimension element that references a lookup is only used for calculations and
translations, it does not need to be delivered. Mark the element as Never Output.

Summary

created a basic template and added attributes
created a dimension build that uses a template
used a template for dimensional data access
discussed lookups and when they should be used
referenced a lookup in a fact build
discussed translation lookups


Workshop 7-1
Add a Level to a Hierarchy and Create

Dimension Builds

Workshop 7-1: Add a Level to a Hierarchy and Create

Dimension Builds
Your assignment is to update the Sales hierarchy to include an ALL level so that
the Staff data can roll up.
When the update is complete, create two dimension builds by using the
Dimension Build Wizard. The additional builds will create the necessary
dimension tables that we can use for our data mart.
• Insert the AllStaff level of the Staff hierarchy.
• Create a dimension build for the StaffD dimension that includes slowly
changing dimension attributes and surrogates.
• Create a dimension build for the VendorD dimension that includes

slowly changing dimension attributes and surrogates.
page.
Table.


1. Insert the AllStaff level of StaffH hierarchy • Add a static member,
the StaffH hierarchy. and put levels in the
correct order.
2. Create a dimension build Dimension Build Wizard • Add surrogates and

for the StaffD dimension. slowly changing
dimension attributes.
• Use the StaffH

hierarchy.
3. Create a dimension build Dimension Build Wizard • Add surrogates and

for the VendorD slowly changing
dimension. dimension attributes.
• Use the VendorH

hierarchy.
4. Examine the attributes in Staff build • Double-click the

the template for the Staff template in the
dimension build. Visualization pane.
5. Examine the attributes in Vendor build • Double-click the

the template for the template in the
Vendor dimension build. Visualization pane.

instructions for this exercise in Appendix A of this course.

Workshop 7-1: Results

The structure of the Staff hierarchy is as shown below.
The template for the Staff dimension build is as shown below.

Below is the template for the Vendor dimension build.


8
8. Fact Builds
13. JobStreams

Developers

FACT BUILDS
Objectives

discuss conformed facts
create a fact build manually
add dimensions, measures, and attributes to a fact build

Use Fact Builds to Deliver Fact Data
Input Hierarchies
Reference
Data
Reference
Sources
Data Sources Product
Updates Dimension Builds
Product
Product D_Product
D_Product
Dimension Build
Hierarchies/Lookups
ProductLevel
Fact Builds
Transactional Sales
Data Sources ProductNumber
Quantity
UnitCost
Updates AverageRevenue
As soon as the reference data is processed and delivered to the data warehouse
using dimension builds, it is time to design one or more fact builds. Fact builds
deliver the fact data to the data warehouse.
As previously stated (see Module 4, "Create Basic Builds"), a single fact build can
deliver fact data, dimension data, and metadata. However, in a production
environment, the main purpose of the fact build is to create one or more fact
tables. Dimension tables are typically created and maintained through dimension
builds.

FACT BUILDS
Conformed Fact Definitions
Like dimensions, fact attributes should also be conformed.
Sales
(Gross)
Profit Gross Profit
SalesFact
Profit (Gross Profit)
(Net Profit)
Net Profit
Finance
(Net)
Profit
According to Kimball, setting up the conformed fact dimensions takes 80 percent

of the up-front architectural effort. Establishing standards for fact definitions use
the remaining 20 percent. It is important to know your data and its definitions,
and to predict what definitions will be added to the structure in future.
The identification of standard fact definitions is done mainly at the beginning

of the development process, when the conformed dimensions are
established. Users need the standard fact definitions to use the same
terminology for the same interpretation or different terminology for different
interpretation across data marts.
The most common measures that require standardization are Profit, Cost,
Price, and Revenue. If two operational systems both use an identical term but
their definitions differ, the warehouse should use different names to properly
identify each measure. For example, Sales and Finance may both use the term
Profit, but Sales defines it as Gross Profit and Finance defines it as Net Profit.
Using the term Profit in the warehouse would be inaccurate and misleading.
Ensure that both measures have unique names to represent their appropriate
values in the fact table.
The entire enterprise should agree on the definition of all dimensions and all
measures.

Build slide.
Create a Fact Build Manually
2 Define the data source(s) Add and define

1 Add a build 3 the build
elements
4a Map the columns to the DataStream Map the DataStream to the

4b transformation model
As outlined in Module 4, "Create Basic Builds," you can create a fact build by Key Information
There are three ways to construct a fact
using the Fact Build wizard. However, you will often want to have more build:
flexibility and power to control the construction and delivery of the fact • Use the Fact Build wizard (the
build. In this case, you can construct the fact build manually. simplest option). This wizard
creates a basic build.
To construct a fact build manually, you must complete the following process: • Create the build manually, which
gives the designer full control
1. Add a new build and provide a name for it. over the build process.
• Combine the above two methods:
2. Add the data source(s) to the DataStream using one or more SQL use the wizard to prepare the
basic structure, and then refine it
SELECT statements. A fact build can have more than one data source. manually.
3. Map the columns of the data source(s) to the DataStream items.
4. Add and define the build elements in the transformation model

(dimensions, measures, attributes, and derivations).
5. Map the DataStream items to the transformation model elements.

Because derivation elements are calculated, they do not map to
DataStream items.
DecisionStream\Help\Contents and Index\Index\fact builds, defining

FACT BUILDS
Build slide.
Add a Dimension Element to a Fact Build 2 clicks to complete.
Insert
Select to roll up
Select to use the to the higher
surrogate, if one is level(s)
available, as a foreign
key Map the dimension item to a DataStream item
Instructional Tips
When you add a dimension element to a fact build, you are adding a column to a Rather than creating transformation model
fact table that links this table to a dimension table. Therefore, before you create a elements and then mapping the
dimension element, make sure the corresponding dimension with the hierarchy DataStream to the transformation model
exists in the DecisionStream library, and the build data source contains the separately, you can perform both tasks
referencing column. together.
To add a dimension element to a fact build, follow these steps: You can create attribute, dimension, and
measure transformation model elements
1. Add a dimension element to the transformation model and name it. using the following method:
2. Associate it with the corresponding hierarchy. In the slide example, the 1. Right-click the transformation
Product dimension element is associated with the Product hierarchy. model, and then click Mapping.
2. In the DataStream Item column,
3. Set Output Levels, and select the Dimension boxes for all levels if you click the DataStream item that
want them to be represented in the dimension delivery. you want to map to the
4. Clear the Aggregate box if you do not want the DecisionStream to 3. Drag the selected item to the
perform aggregation for the associated measures and derivations. white space in the
Aggregation will be discussed in Module 15, "Aggregation." Transformation Model column.
4. From the popup menu, click the
5. Select the Use surrogates when available box if you want to reference the relevant element type to create:
dimension through the existing surrogate key. dimension, measure, or attribute.
6. On the Unmatched Members tab, select the Accept unmatched member DecisionStream creates the transformation
identifiers box if you do not want the unmatched records to be rejected. model element and maps the selected item
to the element.
If you choose to create a dimension

element, DecisionStream does not map the
element. You must associate it with a
hierarchy or lookup first.

7. Map the dimension element to a DataStream item. The hierarchy level

that you specify in the Maps To pane determines the level at which the
fact build accepts input-level data for that dimension. In the slide
example, input-level data for the Product dimension will be accepted at
the Product level of the ProductH hierarchy, not at the ProductType or
ProductLine levels.
8. You can add the dimension element to the fact or metadata, or both,
deliveries as the fact build is executed.
DecisionStream\Help\Contents and Index\Index\dimension elements, adding to fact builds

FACT BUILDS
Add a Measure to a Build
Insert and name it
Specify a merge behavior
Specify an aggregation method
A measure is a value, frequently numeric, that holds a piece of information for

analysis. It can also be used for calculations in derivations.
To create a measure in a fact build, follow these steps:

1. Add a measure to the transformation model and provide a name for it.
2. Select an aggregation method. By default, the aggregation method is
SUM. You can also specify one or more aggregation exceptions.
Aggregation and aggregation exceptions will be discussed in Module 15,
"Aggregation."
3. Select a merge behavior. By default, the merge behavior is SUM. Merge
methods will be discussed in Module 11, "Facts in Depth Merging."
DecisionStream\Help\Contents and Index\Index\measures, adding to fact builds

Add an Attribute to a Build
Insert and name it
Specify a merge behavior
An attribute element holds additional information that is not a dimension or a Technical Information
Mathematical merge behaviors (for
measure but that may be of interest. Attributes differ from measures in that they example, SUM), are always available, even
cannot be aggregated. if the attribute is not of a numeric data type.
If they are used on a non-numeric attribute,
Attribute columns are generally either: they will cause an error when the fact build
is executed. The MAX or MIN options are
• an attribute of one of the dimensions, such as unit weight or size more appropriate options for non-numeric
attributes.
• a property of the record, such as the name of the operator who entered
the record or the timestamp of record creation
To create an attribute in a fact build, follow these steps:
1. Add an attribute to the transformation model and provide a name for it.
2. Select a Merge behavior. By default, the merge behavior is SUM. Merge

methods will be discussed in Module 11, "Facts in Depth Merging."
Note: Attributes only have data values at levels retrieved directly from input
data SQL queries. In summary information, the value of an attribute is
always null.
DecisionStream\Help\Contents and Index\Index\attributes, adding to fact builds

FACT BUILDS
Demo 8-1
Create a Fact Build Manually and

Add Transformation Model Elements

Demo 8-1: Create a Fact Build Manually and Add

Transformation Model Elements
Purpose:
The company wants to know what products are distributed to
which vendors on a daily basis. We need to construct a new
fact build called DemoSales, and we want to add dimension
elements, measures, and an attribute to it. We will create this
fact build manually.
Task 1. Create the DemoSales fact build manually.

1. In GO_Catalog, right-click Builds, and then click Insert Fact Build.
The Fact Build Properties window opens.
2. In the Name box, type DemoSales, and then click OK.
We must insert a data source.
Task 2. Add a data source to the DemoSales build.
1. Expand DemoSales, right-click DataStream, and then click Insert
Data Source.
2. In the Name box, type DemoSales.
Make sure SourceConnect is selected in the Database box.
4. Click SQL Helper.
The tables that are in the database appear.
6. Right-click GOSOrderDetail, and then click Add table select
statement.
The SQL code for the table appears in the SQL Query pane.
7. Modify the code as shown below.

FACT BUILDS
8. Click Execute SQL Query (limit to 1 return row) to execute the

SQL query.
If you receive an error, check the syntax of the statement and try again.
9. Click OK to close SQL Helper, click Prepare to select it, and then click
Refresh to prepare the columns of the data source for use in the fact
build.
Task 3. Add a derivation to the data source.
2. In the Name box, type DateOrder, and then click the Calculation tab.
3. In the right pane, type Concat(SubStr(OrderDate, 1, 4),
SubStr(OrderDate, 6, 2), SubStr(OrderDate, 9, 2))
5. Right-click DataStream, and then click Properties.
6. Click Auto Map.
The columns of the data source are mapped to DataStream items.
7. In the DataStream Items column, click OrderDate, and then click
Delete.

Task 4. Add the ProductNumber and VendorSiteCode

dimension elements.
1. In the DemoSales build, right-click Transformation Model, and then
click Insert Dimension.
The Dimension Properties window opens.
2. In the Name box, type ProductNumber.
We must associate the new dimension element to the ProductsH
hierarchy.
3. Click the Reference tab, and then in the Dimension box, click
ProductD.
4. In the Structure box, click ProductsH (H).
Notice the AllProducts, ProductLine, ProductType, and Product levels
appear.
5. Ensure that the Aggregate check box is cleared.
We do not want to perform aggregation for this dimension now.
6. Click the Use surrogates when available check box to select it.
7. In the Product row, click the Output check box to select it.
8. In the AllProducts, ProductLine, and ProductType rows, click the
Dimension boxes to select them.
9. Click OK.
10. Repeat steps 1 to 6 to add the VendorSiteCode dimension element to
the DemoSales build, selecting VendorD in the Dimension box and
VendorH (H) in the Structure box.
11. In the Site row, click the Output box to select it.

FACT BUILDS
12. In the VendorType, Vendor, and Country rows, click the Dimension
check boxes to select them.
The result for VendorSiteCode appears as follows.
13. Click OK.

Task 5. Add the DateOrder dimension element.
1. Repeat steps 1 to 6 of Task 4 to add the DateOrder dimension element
to the DemoSales build, selecting TimeD in the Dimension box and
TimeH (H) and in the Structure box.
2. In the Day row, click the Output check box to select it.
3. In the Month, Quarter, and Year rows, click the Dimension check
boxes to select them.
Instructional Tips
You can map the three dimension elements
to DataStream items once you have
created the elements and specified the
associated reference structures. This
mapping process sets the level at which
input data is accepted for these elements.
However, you may choose to wait and map

all the transformation model elements
(dimensions, measures, and attributes) at
once, as specified in Task 8.
4. Click OK to close the Dimension Properties window.

Task 6. Add measures to the build. Questions

1. In the DemoSales build, right-click Transformation Model, and then Q: Where do we set the input level for
click Insert Measure. dimension elements such as DateOrder?
There are no check boxes in the Input
The Measure Properties window opens. column, as there are in the Output and
Dimension columns.
2. In the Name box, type Quantity, and then click OK to insert the
measure. A: We set the input level for a dimension
element by mapping it to a DataStream
3. Repeat steps 1 and 2 to insert the UnitCost, UnitPrice, and UnitSalePrice item in the Transformation Model
measures. Mapping window. In this window, we
specify that a DataStream item
Task 7. Add an attribute to the build. corresponds to an attribute in a level of a
hierarchy or lookup.
1. In the DemoSales build, right-click Transformation Model, and click
Insert Attribute.
The Attribute Properties window opens.
2. In the Name box, type OrderCode, and then click OK to insert the
attribute.
Task 8. Map the DataStream items to the transformation
model elements.
1. In the DemoSales build, right-click Transformation Model, and then
click Mapping.
The Transformation Model Mapping window opens.
2. In the Transformation Model pane, expand ProductNumber,
ProductsH, and Product.
3. In the Transformation Model pane, click the ProductNumber attribute
under the Product level, and then in the left pane, click the
ProductNumber row.
4. Click Map.
The transformation model element is mapped to the DataStream item.
5. In the Transformation Model pane, expand VendorSiteCode,
VendorH, and Site.
6. In the Transformation Model pane, click the SiteId attribute under the
Site level, and then in the left pane, click the VendorSiteCodeGO row.
7. Click Map.
8. In the Transformation Model pane, expand DateOrder, TimeH, and
Day.
9. In the Transformation Model pane, click the DayId attribute under the
Day level, and then in the left pane, click the DateOrder row.

FACT BUILDS
10. Click Map.

11. In the Transformation Model pane, drag the Quantity, UnitCost,
UnitPrice, and UnitSalePrice measures to the Maps To column beside
the Quantity, UnitCost, UnitPrice, and UnitSalePrice DataStream items,
respectively.
12. In the Transformation Model pane, drag the OrderCode attribute to the
Maps To column beside the OrderCode DataStream item.
13. Click OK to close the Transformation Model Mapping window.

14. Save your work and keep DecisionStream open for the upcoming demo.
Results:
We manually constructed a fact build called DemoSales and
added dimension, measure, and attribute elements to the
build's transformation model.

Summary

discuss conformed facts
create a fact build manually
add dimensions, measures, and attributes to the fact build

9
8. Fact Builds
13. JobStreams

Developers

HISTORY PRESERVATION
Objectives

discuss slowly changing dimensions
generate and assign surrogate keys
use templates to manage dimensional changes

What if Our Data is not Static?
Small occasional changes in dimension data are normal in

business.
Examples of these changes include:
addition of new members (a new product is launched)
changing of relationships within the dimension (a sales
rep moves to another branch)
properties of members changed (a product is
reformulated or renamed)
deletion of members (this is rare in data warehousing)
Although gradual changes to database information are a sign of a well-used data Key Information
Up to this point, we have mainly dealt with
warehouse, organizations face a challenge when maintaining a cube, or reading data from a data source, and then
multidimensional representation of detail and summary data, over time. For writing that data to a target database. Now
example, sales representatives transfer to a new branch office. If their old sales we want to handle incrementally added
figures move to the new branch with them, a report from their original office data, and outline how we should deal with
suddenly shows historically poor performance. this new data.
Other common changes include the addition of a new product, department For example, what happens if an employee
is assigned a new employee number? One
reorganization, or changing characteristics of a property (a product is changed or solution would be to create two sets of
reformulated). reports for that employee: one set using the
old employee number, and another set
One of the common problems in maintaining cubes and other user-reporting using the new employee number.
data sources is that many dimensions change over time. New members are added
to dimensions, but existing members do gradually change over time. In a data However, the best practice would be to
assign a new surrogate key to the
warehouse, dimensions that reflect these types of incremental changes are known employee, and use this key to link the old
as slowly changing dimensions (SCDs). DecisionStream handles these situations and new employee data.
quite easily.
Instructional Tips
Situations where a dimension is completely changed quickly are rare. Drastic Deleting from a data warehouse is a rare
changes are usually caused by design changes or, less often, by a complete occurrence, but it can happen. For
reorganization of the relationships among members in the dimension. example, a company may disband a
department and must remove all
references to that department.

Understand Surrogates
4-byte integer key

(can hold more than 2 billion positive integers)
Internally assigned and meaningless
insures uniqueness
always known
Used in conjunction with business keys
business key is often mnemonic; for example, OTA
used for Ottawa office
surrogate key is numeric; for example, 000128
Surrogate keys are never used in reports. They are used to
link dimension tables to fact tables.
Surrogate keys have existed for years in operational systems. For example, Technical Information
Students may ask why we would use a
Invoice Number, Order Number, and Employee Number are all operational surrogate key on a Time dimension.
surrogate keys. When dealing with orders, you may have
an OrderDate (that has a value) and a
An entity such as an employee usually has a natural key (for example, the ShippedDate (with a value of NULL).
employee's name) so that application users can easily find the data they want.
However, the employee typically has another meaningless key, such as employee When the product is eventually shipped,
number. This additional key creates several advantages in an operational system. the ShippedDate value changes to show
the actual date the product was shipped.
Using an internally assigned surrogate key means that the operational system can To track this record, surrogates are used.
ensure uniqueness. We do not have to worry if there are two Mary Smiths. Even
if the employee has some other externally assigned unique key (for example, a
social security number), that key may be missing or incorrect when the employee
Example
data is initially entered into the system. An internally assigned surrogate key The Dallas office at Cognos was in
always exists and is guaranteed unique. Arlington, Texas. Its natural key is ARL.
This key is now meaningless but has not
Also, the surrogate key is a better choice to tie all employee records together. changed even though the office moved to
What if the employee changes her name? If the surrogate key joins all records, Dallas.
then the employee name need only be stored in one place (the employee record)
and changed in one place. The surrogate key (the employee number) still
connects all other operational records together.
A final advantage of operational surrogate keys is their size. A natural key may be
many bytes long. If natural keys are used to join tables, the table sizes can be
unnecessarily large.
DecisionStream\Help\Contents and Index\Index\surrogate keys

Understand Surrogate Keys Used In

Operational Systems
Operational databases also sometimes use surrogate keys (for

example, Employee_No). These keys typically cannot be used
as the data mart surrogate keys.
A single member in a data mart (for example, a particular
employee) may have several data mart surrogate keys
assigned over time to deal with slowly changing dimensions.
You may have to merge entities from separate operational
systems, each with its own operational surrogate key (for
example, customers from separate banking and insurance
applications).
Operational surrogate keys are usually considered business
keys in the data mart.
Although surrogate keys can be passed from an operational system into a data
mart, usually a new surrogate key is assigned inside the data mart. The surrogate
key from the operational system is often used for queries into the data mart,
playing the role of a natural key. There are a number of reasons for assigning a
new data mart-only surrogate key.
Often, several operational systems will have entities merged to form a single data
mart entity. For example, a single customer in a banking data mart may exist as a
checking account, a savings account, and an insurance policy in the operational
systems. Instead of using three account numbers to identify the customer, a new
surrogate key may be assigned, and the three account numbers become alternate,
natural keys.
The sheer size of data marts often makes surrogate keys preferable to natural
keys. Where an operational database may have millions of rows, a data mart may
have billions or even trillions of rows in a single fact table. The small size of
surrogate keys can often save large amounts of space.
However, in spite of the other advantages of using surrogate keys, the single most
important use of surrogate keys is their value in tracking changes to dimensional
information over time. The common term for tracking changes over time is
SCDs.

Understand Natural Keys: Example

Metrics
Product Dimension Customer Dimension

Prod Code Name Cust Code Name
PR X 002 39 Soup SA 1 11 Safeway
PR X 003 40 Beans LO 2 22 Loblaws
PR Y 003 40 Peas SE 5 55 7-11
Fact Table
Prod Code Cust Code
PR X 002 39 SA 1 11
Measures
PR X 003 40 LO 2 22
PR Y 003 40 SE 5 55
The slide example illustrates using natural keys to join dimension tables to a fact
table. A fact table can be joined to many dimensions. Using natural keys in the
fact table can take up large amounts of physical space, both in terms of table
structure and indexes. This is one of several reasons to avoid using natural keys.

Understand Surrogate Keys: Example
Product Dimension Customer Dimension

Prod Sur Prod Code Name Cust Sur Cust Code Name
1 PR X 002 39 Soup 10 SA 1 11 Safeway
2 PR X 003 40 Beans 20 LO 2 22 Loblaws
3 PR Y 003 40 Peas 30 SE 5 55 7-11
Fact Table
Prod Sur Cust Sur
1 10
Measures
2 20
3 30
The slide example illustrates using two surrogate keys, Prod Sur (in the Product
dimension table) and Cust Sur (in the Customer dimension table). The natural key
is still available to users in the dimension table, but joins are implemented
through these surrogate keys.
Although the surrogate keys are not used in reports, including them in the fact
table instead of natural keys saves space. For example, in the previous slide, each
measure is identified by a unique combination of a product code and a customer
code.
By contrast, in the slide example on this page, each row of fact data is uniquely
identified by a combination of two surrogate keys. If there are several million
rows of data in the fact table, using surrogates makes implementing the joins to
dimension tables more efficient.

Assign Surrogate Keys to Fact Table Records
To assign surrogates to fact

rows, click the Use surrogates
when available box.
This option is set on each

dimension element of the fact
build.
The Fact Build Wizard sets

this option by default.
If there are no surrogates

available in the dimension
table, this option is ignored.
You normally use surrogates to link fact tables to dimension tables, but
DecisionStream gives you the option of using natural keys. To join fact tables to
dimension tables using surrogate keys, click the Use surrogates when available box.
The Fact Build wizard assumes that all dimension elements must use
surrogate keys when they are available. If you manually add a new dimension
element to an existing fact build, you must set this option if you intend to
deliver surrogates.
Note: If you use the fact build to deliver dimension tables (which is not the
recommended approach), the resulting dimension tables will not include
surrogate keys.

Generate Surrogate Keys for Dimensions
You can add surrogate keys manually or through the

Hierarchy Wizard.
You can add a surrogate key to each level of a hierarchy.
We often think of data marts as having single grain fact tables (for example, daily Technical Information
The templates for extracting the source
sales). If this is true, only the lowest level of the hierarchy requires a surrogate key. data will not have surrogates. These
templates do not need surrogate attributes.
However, there are often multiple fact tables with different grains (for example, Surrogates are required only when
monthly budgets). You can use a single dimension for different fact tables at members are added to the dimension
different grains. In this case, surrogate keys are required for more than just the tables in the data mart. Therefore, only the
lowest level. template associated to the dimension build
will contain the surrogate information.
DecisionStream lets you determine whether surrogates are available at each level.
Instructional Tips
You specify this by adding one or more attribute(s) to a template and specifying
Emphasize that manually creating and
their behavior as surrogate key. Then set the value for the business key to the maintaining surrogate keys in a data
appropriate level. Or, you can set the surrogate key starting value, as in the slide warehouse can be an onerous task. By
example. If you use the Hierarchy wizard, the surrogate key will have a default using templates, DecisionStream
name of skey, which you can change. automates the generation and
management of surrogates.
In the slide example, the D_VendorSurrT template has a surrogate key called
Surrogate. This attribute is mapped to the VendorCode business key attribute. In
other words, in the corresponding dimension table, each value of VendorCode
will have a separate value for Surrogate.
DecisionStream\Help\Contents and Index\Index\surrogate keys, generating automatically

Surrogate Key Substitution in a Fact Table
Dimension Table ProductNumber = Business Key Fact Table

with Surrogates ProductKey = Surrogate Key Records With
Surrogate Keys
ProductNumber
ProductKey ProductKey
Fact Table
Records With
Product IDs Replace
ProductNumber Load fact table
ProductNumber with surrogate records into
ProductKey DBMS
Once surrogate keys are added to dimensions, they can be assigned to fact table
records. This process links each row of the fact table to the correct row in the
corresponding dimension table.
The slide example outlines how this process works:
1. Each record from the source system has a ProductNumber column that
holds natural key values (such as PR X 002 for Beans).
2. Each value in the ProductNumber column is mapped to a separate value

in the ProductKey column. The values in the ProductKey column are
contained in the Product dimension table.
3. Each fact row uses the ProductKey column as part of its primary key,
instead of the natural key. The values in the ProductKey column are
linked to the corresponding values in the same column of the Product
dimension table.
4. Each row of fact data is loaded into the target data mart. The original key
(ProductNumber) is no longer used to join the fact table to the Product
dimension table. The more efficient surrogate key (ProductKey) is used
instead.
This diagram is a simplified version of the one used in Ralph Kimball et al.'s The
Data Warehouse Lifecycle Toolkit (1998, Wiley). (See page 634 for a more complex
example of how the natural keys of a fact table can be replaced with surrogate
keys in the data mart.) .

Demo 9-1
Add a Surrogate Key to a Dimension

Table

Demo 9-1: Add a Surrogate Key to a Dimension Table
Purpose:
The Great Outdoors has data that tracks their vendors. We
want to store this data in a conformed dimension table that
can potentially be used by multiple fact tables in the data mart.
To accomplish this, we will manually create a dimension build
that is based on the VendorCustomerH hierarchy. We will then
execute the dimension build to deliver the data to a single
conformed dimension table.
Task 1. Create the ConformedDimension dimension build.

1. In GO_Catalog, from the File menu, click Restore Catalog, and then
Instructional Tips
click Yes.
If necessary, refer to the setup instructions
The Restore Catalog dialog box appears. in the IG Preface to ensure that the
connections refer to the correct data
2. Navigate to C:\Edcognos\DS7001, and then double-click sources.
StartMod9.ctg.
Stress to the students that, in the vast
The catalog is restored using the contents of the .ctg file. majority of cases, using the Dimension
3. Right-click the Builds folder, and then click Insert Dimension Build. Build wizard is the most effective way to
create a dimension build. We are
The Dimension Build Properties window opens. constructing it manually in this demo to
4. In the Name box, type ConformedDimension, and then click the show what the wizard does automatically.
Dimension tab.
5. In the Dimension to be delivered box, click VendorD.
The VendorCustomerH (H) hierarchy is selected automatically in the
Hierarchy/Lookup to be delivered box. This is currently the only
hierarchy in the dimension.
6. In the Deliver into database box, click TargetConnect.
We want to deliver the dimension table to the database specified by the
TargetConnect connection.
7. Click the Full Refresh check box to select it.
This option specifies that DecisionStream should truncate the dimension
table every time the dimension build is executed. Selecting this option is
not considered best practice. In production, dimension tables are updated
on a regular basis.
8. Click OK to close the Dimension Build Properties window.
The Build Visualization pane appears as shown below.

Task 2. Add a dimension table to the ConformedDimension

dimension build.
1. Right-click the ConformedDimension dimension build, and then click
Insert Table.
The Dimension Table Properties window opens.
2. Maximize the Dimension Table Properties window, and then in the
Table name box, type D_ConformedDimension.
3. Ensure that the Exclude Partially Populated Rows and Enabled
boxes are selected, and then click the Columns tab.
4. In the Use template box, click VendorCustomerT.
The eight attributes in the template are listed in the bottom of the
window.
5. In the Available Sources pane, expand the VendorCustomerH
hierarchy, and then expand the ALLVendorCustomers, Vendor, and
VendorSite levels.
6. Click and drag the ALLVendorCustomers [Id, Caption] attribute
from the Available Sources pane to the Sourced From column beside the
ALLVendorCustomers row.

7. Repeat step 6 to map the remaining hierarchy attributes to the columns

of the dimension table.
We now need to add a surrogate key to the dimension table.

Task 3. Add a surrogate key to the D_ConformedDimension
dimension table.
1. Beside the Use template box, click the Edit button.
2. Click the Attributes tab, and then click the Add button.
A new attribute called Attribute1 is added to the list of attributes in the
template.
3. With Attribute1 selected, type Surrogate, and then in the Behavior box,
click Surrogate Key.
4. In the Behavior column beside VendorSiteCode, ensure Business Key is
selected, and then in the Value column, ensure True is selected.
This specifies that the VendorSiteCode attribute is the primary key in this
template.

5. In the Surrogate row, click the Value box.

6. Beside Business Key, in the Value box, click VendorSiteCode, and then
press Enter.
This associates each value of the surrogate key with a value of the
VendorSiteCode. The result appears as shown below.
7. Click OK to close the Template Properties window.

We are returned to the Dimension Table Properties window. Notice that
the Surrogate column is added to the dimension table and is associated to
the VendorSiteCode column.
8. Click OK to close the Dimension Table Properties window.
The Build Visualization pane appears as shown below.
Task 4. Execute the ConformedDimension dimension build

and then view the results in SQLTerm.
1. In the Builds tree, right-click the ConformedDimension build, and then
click Execute.
A dialog box appears, prompting us to save our changes.
2. Click OK.
The Execute Build dialog box appears.
3. Click OK.
The Vendor build executes and delivers 217 rows to the
D_ConformedDimension table.
4. Press Enter to close the DOS window, and then on the toolbar, click
SQLTerm.
SQLTerm opens.
5. In the Database for SQL Operations box, click TargetConnect.

6. In the Database Objects pane, expand TargetConnect, right-click

D_ConformedDimension, and then click Add table select statement.
A SELECT statement is added to the Source SQL tab on the left side.
7. Click Execute SQL Query.
The query runs and returns 217 rows of data. The result appears as
shown below:
Notice that each value of VendorSiteCode (the primary key for the table)
is associated with a separate surrogate key. For example, VendorSiteCode
101 is associated with a value of 4 in the Surrogate column.
8. Close SQLTerm and leave DecisionStream open for the next module.
Results:
We manually created a dimension build that is based on the
VendorCustomerH hierarchy. We then executed the dimension
build to deliver the data to a single conformed dimension
table.

Track Dimensional Changes Over Time
Operational systems tend to contain data about the current

state of the business.
A data warehouse is expected to hold data for five to 10
years.
Users may need to query data as of any particular date (for
example, at which office(s) did Mary Smith work between
January/1999 and December/1999?).
If Mary Smith changes offices, to which office do her sales
apply, the old one or the new one?
An operational system usually only contains data about the current status of the
business. Therefore, the sales representative record will record the office in which
the sales representative currently works.
By contrast, the data warehouse is expected to hold data for perhaps five or 10
years. Over this time, it may be important to know all the sales offices in which a
sales representative has worked (and when).
Adding time variants to a data structure makes it more complex; however, not
adding them can cause considerable difficulties for the users.
For example, comparing a division's performance this year versus last year may
be impossible if the customer's sales representative has moved divisions. Does
the sales history move to the division that the sales representative is in now, or
does it stay with the sales representative's old division?
Obviously the answer is… it depends. The data warehouse must be able to give
whatever answers the user requires.

Understand Slowly Changing Dimensions

(SCD)
Operational dimensional data may often be thought of

as static. It may only need to reflect the current state.
Data warehouse dimensional data often must show how
the dimensional data changes over time. It is not static.
The term Slowly Changing Dimension (SCD) refers to the
tracking of changes to dimensional data over time.
The dimensional data in an OLTP system (for example, one tracking customer
orders) is usually static. As noted previously, the only data that matters is that
which reflects the most current state of the business.
It is unnecessary - and probably counterproductive - to track a customer's past

personal history in an OLTP system. For example, a customer's old address is
irrelevant when recording that customer's most recent order. Keeping the
customer's old address in the OLTP system will just take up unnecessary space
and reduce transaction-processing time.
Because the data warehouse is expected to hold data for several years, it must
contain the most current dimensional data, in addition to all the changes it has
undergone gradually over time, in tandem with the changing structure of the
business. Slowly changing dimensions let you track these historical changes in the
warehouse.

Understand Issues With Slowly Changing Build slide.

1 click to complete.
Dimensions The elements appear automatically, each
after a two-second delay.
Maintaining SCDs can be complex without surrogates.
Business
Normal Type2 Type2 Normal Type2
Key
Surrogate Emp. No Hire
Name Branch Position Salary
Date
1 10001 Jack OTA VP Jan88' 50K
2 10002 Jane ARL MK Jan92' 40K
3 10003 Tom NY SS Jan93' 35K
* 4 10001 Jack SJ VP Jan88' 50K

** 5 10001 Jack SJ S-VP Jan88' 50K
*** 6 10001 Jack SJ S-VP Jan88' 60K
* (Emp. No + Branch)
Imagine the effect of having such a large
** (Emp. No + Branch + Position) Natural key in the fact table.
*** (Emp. No + Branch + Position + Salary)
SCDs is a technique for managing historical data. They are dimensions where
non-key attributes can change over time without corresponding changes in the
business key. For example, employees may change their department without
changing their employee number, or the specification for a product may change
without changing the product code.
In the slide example, the original record changes when Jack changes location. To
keep track of the change, a new record is added. The new record would have the
same Employee Number, which means that the key is no longer unique. To make
it unique, the key must be the combination of Employee Number and Branch.
For the next change, Jack is promoted. Again, a new record is added, and again,
the key is no longer unique. To make it unique, the key must be the combination
of Employee Number, Branch, and Position.
These steps continue for every change to Jack's status within the company. As
you can see, the unique key eventually becomes quite long.
Also consider the effect of having such an inefficient and large key in each fact
record, and consider that this problem is repeated for each dimension that the
fact table references.
SCDs require that the designer has a way of tracking the data warehouse
members and their status without making the issue unnecessarily complex. The
solution is to replace these cumbersome natural keys with efficient surrogate keys.
The maintenance of SCD and surrogate keys is automated in DecisionStream.

Use Different Methods of Handling

Dimensional Changes
Two most commonly used types of SCDs (according to

Kimball):
Type 1. Overwrite the old value with the new value
(do not track the changes).
Type 2. Add a new dimension record with a new
surrogate key (track changes over time).
A single row may have a combination of columns of different
types.
Ralph Kimball maintained that there are three typical ways to handle changing Key Information
As noted in Ralph Kimball and Richard
dimensional data; he called them Type 1, Type 2, and Type 3. The industry has Merz's The Data Webhouse Toolkit (Wiley,
adopted these terms. 2000), "Type 3 SCD occurs when an
alternate, simultaneous description of
The choice of method largely depends on the business' need to track changes. By something is available. In this case an
far, the most commonly accepted methods are: extra 'old value' field is added in the
affected dimension."
• Type 1 - where no historical record is required
DecisionStream does not automatically
support Type 3 surrogate keys.
• Type 2 - where an accurate historical record is required at any stage
A single row may consist of attributes in any combination of the three types.
Often, if a single column is Type 2, the entire dimension is referred to as a Type 2
slowly changing dimension (SCD).
To track Type 2 SCDs, you must use surrogate keys. Creating and maintaining
surrogates is discussed later in the module.
DecisionStream\Help\Contents and Index\Index\tracking attribute changes

Build slide
1 click to complete.
Type 1: Overwrite the Original Value
The organization may not choose to track certain data

changes because:
the original data may have been incorrect
the change is not considered relevant for tracking
Sales Rep Dimension Table

Sales Fact Table
Rep Name Marital Office …
Rep Order Cust …
Key Date Key Key Status
00128 1/1/1999 12345 00128 Mary Jones
Smith Married
Single Dallas
00128 2/1/1999 12345
In a Type 1 SCD, the data fields are overwritten with the new values. If the only
changes in data are to Type 1 fields, the existing row can be updated in place. No
new rows are inserted into the dimension table.
The original data could be incorrect. In this case, the type of change is just a
correction. There is rarely a legitimate business need to track data that was
originally recorded incorrectly. For this reason, any data field could, in theory, be
eligible for a Type 1 change.
The most common reason for a Type 1 change, however, is lack of relevancy.
Although the original data was correct, there is no business reason to track the
change.
It is important to understand that Type 1 changes cannot be tracked over time.

Once the change is made, the prior information is lost. Kimball estimates that
approximately half of all attributes are Type 1.
In the slide example, the marital status of Mary Jones has changed from single to
married. Since her previous marital status is no longer relevant in the Sales Rep
dimension table, the previous value (Single) in the Marital Status column is
overwritten with the most current value (Married). This is a good example of a
Type 1 change.

Build slide
Type 2: Add a New Dimension Record
When a tracked change is detected, a new surrogate key is

assigned and a new row is added to the dimension table.
Usually, an effective begin/end date is also updated on the
new and old rows.
Multiple rows may have the same business key, but they will
always have unique surrogate keys.

Sales Fact Table
RepSur Rep Name Office Eff
RepSur OrderDate Cust
RepSur Order Cust ……
Key Date Key
Key Key Key Date
11111 01/01/1999
11111 1/1/1999 12345
12345
11111 02/01/1999
11111 2/1/1999 12345
12345
11111 00128 Mary Smith Dallas
NYC 9901
11111 03/01/1999 12345
11112 11112 00128 Mary Smith NYC 9903
11111 04/01/1999 12345
11112
In a Type 2 SCD, changes are detected in the source data that must be tracked in Instructional Tips
If the Sales Rep table included an extra
the data mart. For example, a Sales Representative has moved to a new office. column, End Date, the first row's record
From this date on, sales must be reported under the new sales office, but all prior would have a value of NULL in that column.
sales should be reported under the previous office. However, all sales for the Once the second record is added, the first
representative are credited to her, regardless of which office she worked in when row's value in the End Date columns would
the sale was made. have a valid date, and the second row's
value for End Date would become NULL.
In this case, a new row that has a new surrogate key must be added to the DecisionStream lets you manually control
the characteristics of the End Date column.
dimension table. The original row still points to pre-existing sales facts. From this
point on, all new sales will be joined to the new dimension record. Key Information
In this example with Mary Smith moving
You can use surrogate keys even if you do not track changes. However, you must offices, where Mary's movements have to
use surrogate keys to implement Type 2 SCDs. be preserved for history, a more rational
approach would be to have two
The advantage of this technique is that the user can report all combinations of dimensions: one employee dimension and
one location dimension.
sales. In the slide example, all of Mary's sales can be found by constraining
(filtering) on Mary's Sales Rep Key (00128). All Dallas sales (including Mary's Ideally, in the data warehouse, dimensions
while she was there) can be found by filtering on the natural key for the Dallas should be kept as atomic as possible. In
office. the current example, two dimensions have
been intersected which creates the need to
Usually, when a Type 2 change is detected, you must find the existing, current use SCD logic to preserve history.
row for the entity and update it as "no longer current;" that uses an Effective End If two dimensions had been used, the
Date. The new row will then have an Effective Begin Date and a null End Date. history preservation problem would never
There are, however, other ways to mark the current row. have occurred in the first place.
Typically, if a dimension contains a many-

to-many relationship, it should be
dismantled into two dimensions.

Use Templates to Manage Dimensional

Changes
Surrogate key associated to

a business key attribute.
Date range between which
the record is currently valid.
Date when the record
was last updated.
Flag indicating that the record

is the current one or a past
value.
The D_StaffH template is referenced by the D_StaffH dimension table. This template is also
referenced by the SalesStaffL lookup, which reads from the D_StaffH dimension table.
DecisionStream automatically generates and manages surrogates. It determines

the next surrogate value to be used. Managing these surrogates and SCDs takes
place in the Template Properties window and in the Dimension Table Properties
window for the dimension table.
DecisionStream determines the inserts to new rows and the updates to existing
rows based on Type 1 SCDs by:
• creating an insert for new members
• updating existing members
It automatically determines inserts and updates based on Type 2 SCDs by:
• creating an insert for new members
• modifying existing members by:
• inserting a new copy of the existing member with the new changes
and generating a new surrogate key, thereby preserving the prior
image
• updating the prior image to flag it as not current
Even with Type 1 SCDs, it is a good idea to have a column that indicates the date
that the record was last updated and another column that indicates whether the
record is current. This provides more flexibility when querying the dimension
table. For example, you can limit your search to only those records that were
modified after a certain date.
DecisionStream\Help\Contents and Index\Index\templates, creating

Manage Type 1 and Type 2 SCDs in the

Dimension Table
Type 1 SCD is the default for DecisionStream. Type 1 SCDs will

update existing records in place.
If no attributes are checked as Type 2, the entire dimension is
treated as a Type 1 SCD.
Type 1 (No attributes specified). Type 2 (At least one attribute specified).
Type 2 SCDs preserve history. However, this is not always needed or preferred. It
depends on the business requirements. For example, if a product is moved from
one product type to another, the company may want it to appear as if it was always
a member of the new product type. In this case, the row only needs to be updated.
DecisionStream assumes that all attributes of all dimensions are Type 1. You only
have to specify Type 2 to preserve history. You specify Type 2 SCDs in the
Dimension Table Properties window. Clicking the Track changes (Slowly
Changing Dimensions) box enables the Track column. Click the Track box for
each attribute for which you want to preserve historical data.
On the left side of the slide example, we have specified that we do not want to
track the history of any of the dimension's attributes. Therefore, D_ProductH is a
Type 1 dimension.
On the right side of the slide example, we want to track the history of the
ProductName attribute. If the name of a product changes, we do not want to
overwrite the old one; rather, we want to track the entire history of the product,
regardless of its name. Although we do not want to preserve the history of the
remaining attributes, this dimension will be treated as a Type 2 SCD.

Use Business Keys with Type 2 Changes
The surrogate key is associated to a

business key called ProductNumber. This
business key is also set as the primary key
of the dimension table. This business key
cannot be specified as a Type 2 attribute.
Attributes marked as business keys are natural keys from the original operational
system. If you have several levels in your hierarchy, you may have several business
keys. These business keys are IDs from the related hierarchy levels. In the slide
example, the D_ProductH template has four business keys, each of which is the
ID of the associated hierarchical level: Product, ProductType, ProductLine, and
AllProducts.
You can mark only one of these business keys as the primary key of the
dimension table (Value = True). The lowest level business key must be the
primary key. In the slide example, the lowest level is Product, and the business
key for this level is ProductNumber. As a result, ProductNumber is designated as
the primary key.
A business key with a primary key value of True cannot be a Type 2 attribute.
DecisionStream uses this business key to locate existing members in the
dimension. If you change this key, you have effectively created a new dimension
member.
Each surrogate key is related to a particular business key (usually the primary key
of the table). This means that each separate business key value will have a separate
surrogate key value. In the slide example, ProductNumber is a business key that
has values such as 648, 4732, and 1190. This business key is related to the "key"
attribute (which has a behavior of Surrogate). Each value of ProductNumber will
have a corresponding value for key (for example, 1, 2, and 3). As indicated on the
right side of the slide example, we cannot track changes to the ProductNumber
column in the dimension table, since it is the primary key.
DecisionStream\Help\Contents and Index\Index\templates, behavior types

Demo 9-2
Apply SCDs to a Dimension Build

Demo 9-2: Apply SCDs to a Dimension Build
Purpose:
Five of our employees have been transferred to new locations.
Using SCDs, we can have our data mart automatically updated
when such transfers take place. We will modify the StaffH
hierarchy and run the Staff dimension build to demonstrate.
Task 1. Remove partially populated rows from the D_StaffH

dimension table, and then view the data in
SQLTerm.
1. In GO_Catalog, expand the Staff dimension build, right-click D_StaffH,
and then click Properties.
2. Click the Exclude Partially Populated Rows check box to select it, and
then click OK to close the Dimension Table Properties window.
3. Right-click the Staff dimension build, and then click Properties.
The Dimension Build Properties window opens.
4. Click the Dimension tab, click the Full Refresh check box to select it,
and then click OK to close the Dimension Build Properties window.
5. On the toolbar, click the Execute button, and then click OK if
prompted to save changes.
A command window opens, displaying the progress of the dimension
build execution.
6. When the build has finished executing, press Enter on your keyboard to Key Information
close the command window. If you do not exclude partially populated
rows from the D_StaffH dimension table,
7. On the toolbar, click the SQLTerm button. the query will return 104 rows. The last two
rows of data will be incomplete.
SQLTerm opens.
D_StaffH, and then click Add table select statement.
A SELECT statement appears in the SQL Query pane.
10. Run the query.
The query runs and returns 102 rows, one for each employee of the
Great Outdoors's sales staff.
11. Close SQLTerm.

Task 2. Add the DS_Sources data source to the

Go_Catalog.
1. Right-click the Connections folder, and then click Insert Connection.
2. In the Alias box, type DS_Sources, and then click the Connection
Details tab.
3. In the list, click SQLTXT, and then click the Browse button .
The Select File dialog box appears.
4. Navigate to C:\Edcognos\DS7001, and then click DS_Sources.def.
We are adding a data source that is in text file format to our catalog.
5. Click Open, and then click Test Connection.
A dialog box appears, indicating that the connection was successful.
6. Click OK, and then click OK again to close the Connection Properties
dialog box.
We have added a connection to a data source that contains new
employee data.
Task 3. Add another data source for the SalesStaff level of
the StaffH hierarchy.
1. Expand the Dimensions folder, and then expand the StaffD dimension.
2. Expand the StaffH hierarchy, and then expand the SalesStaff level.
3. Right-click DataStream, and then click Insert Data Source.
5. In the Database box, click DS_Sources, and then click SQL Helper.
SQL Helper opens.
6. Expand DS_Sources, right-click SalesStaffUpdate, and then click Add
table select statement.

7. Delete the DateHired and EffectiveDate columns from the SELECT

statement.
8. Run the SQL query.

The query runs and returns five rows showing the five employees who
have changed information. The result appears as shown below.
9. Click OK to close SQL Helper, and then prepare and Refresh the
columns for use in the SalesStaff level.
11. Under the SalesStaff level, right-click DataStream, and then click
Properties.

12. Click Auto Map.

The columns of the modified data source (listed on the left side) may be
in a different order than the one shown. However, each column must be
mapped to the appropriate attribute of the SalesStaff level, as indicated by
the screen capture.
Task 4. Modify the D_StaffH dimension table so that it
includes Type 2 attributes.
1. Under the Staff dimension build, double-click the D_StaffH dimension
table.
2. Click the Columns tab, and then ensure that the Track changes
(Slowly Changing Dimension) check box is selected.

3. In the DateHired and SalesBranchCode rows, click the Track boxes

to select them.
We have specified that we want to track changes to the values in two of

the columns of the D_StaffH dimension table. Therefore, whenever an
incoming row of source data with an existing SalesStaffCode has a
different value in either of these columns, a new row of data is inserted
into the dimension table.
4. Click OK to close the Dimension Table Properties window.
Task 5. Run the build to view the five SCD changes.
1. Under the DataStream of the SalesStaff level, right-click DataSource1,
and then click the Enabled button to clear it.
DataSource1 is disabled in the DecisionStream tree.
2. Right-click the Staff dimension build, and then click Properties.
The Dimension Build Properties window opens.
2. Click the Dimension tab, and then click the Full Refresh check box to
clear it.
3. Click OK, right-click the Staff dimension build, and then click Execute.
4. Click OK.

5. Click the Override build settings check box to select it, click the
Progress, Detail, SQL, and ExecutedSQL check boxes to select them,
and then click OK.
The Staff dimension build runs and applies five SCD changes to the
existing D_StaffH dimension table. The result appears as shown below.
Notice that the changes were only applied to the SalesStaff level, because
that is where we added the new data source.
6. Press Enter to close the DOS window, and then run SQLTerm.
SQLTerm opens.
D_StaffH, and then click Add table select statement.
9. Run the query.
The query runs and returns 107 rows.
This data set includes five new rows to reflect the changed values of the
Type 2 attributes that we set previously. These rows are at the bottom of
the result set. The result appears as shown below.
We can see new values for the SalesBranchCode column, as well as blank
values for the DateHired column. The hire dates for each of these
employees are in their previous records and will not change in the future.

10. Click the right arrow at the lower-right corner of SQLTerm to scroll the
entire data set to the right side of the screen.
By scrolling vertically in the pane, we can see that five rows have today's
date as a value in the end_date column. This indicates that these rows of
data are no longer current.
Other rows have new values in the udt_data column, which indicate
when each row was last updated. They also have values in the curr_ind
column, which indicate whether each row is the most current. The result
appears as shown below.
11. Close SQLTerm.

Task 6. Re-enable the first data source and disable the
second data source.
1. Under the SalesStaff level, right-click DataSource1, and then click
Enabled to select it.
The data source is shown as enabled in the DecisionStream tree and the
Visualization pane.
2. Under the SalesStaff level, right-click DataSource2, and then click
Enabled to deselect it.
The data source is shown as disabled in the DecisionStream tree and the
Visualization pane.
Results:
We have applied SCD changes to a dimension table by
modifying a hierarchy and then re-executing the dimension
build that delivers this hierarchy.

Summary

discussed slowly changing dimensions
generated and assigned surrogate keys
used templates to manage dimensional changes


10
8. Fact Builds
13. JobStreams

Developers

HIERARCHICAL DIMENSIONS
Objectives

use multiple data sources to populate a hierarchy
use derivations in hierarchies
review Type 1 and Type 2 updates
use effective date attributes to enable dimension history

Populate a Hierarchy Using Multiple

Data Sources
You can use multiple data sources in a hierarchy and

hierarchy levels.
Data must be accessed using a DataStream as this is where
the data is merged.
Merging in hierarchies is different from merging in fact builds.
Each hierarchy and level in a hierarchy can have multiple data sources, which
means that data can be input from more than one connection. This can be useful
when you want to perform data cleansing using different queries.
When you use multiple data sources, you must access the data using a
DataStream, because this is where the data sources are merged.
When data is merged in a hierarchy, you cannot specify how the merge is
performed. When you merge records in a fact build, you can select first non-null,
maximum, average and many more.

Derivations in Hierarchies
This example will not produce a result, as the

derivation uses different data sources.
You can include a derivation in a data source or DataStream in a hierarchy. As

shown in the slide example, you can combine the FirstName and LastName
items in the DataStream rather than performing the same calculation in SQL.
Because merging is different in a hierarchy than in a fact build, derivations are Instructional Tips
You can resolve this problem by creating a
used differently. In a hierarchy or hierarchy level, you cannot perform a fact build to retrieve and merge the data to
derivation on different data sources. A derivation that is performing a calculation a staging area, and then use the data from
based on two data sources returns unexpected results. the staging area to create the hierarchy.
In the slide example, the nProduct data source has a DataStream item called
ProductionCost and the nProductMargin data source has a DataStream item
called Margin. A derivation calculated from these two items will produce no
results.

Build slide.
One click to complete.
Type 1 and Type 2 Updates
Business Key Type 1 Type 2 Surrogate Key
A dimension that contains Type 1 and Type 2 attributes changes all the records
with the same business key when you perform a Type 1 update. DecisionStream
assumes that when you have a Type 1 attribute, you want to maintain consistency
in all the records.
The previous value of the Type 1 attribute is updated, as well as the values in all
the previous versions of the record. In the example, ProductName changed from
Tea to Chai Tea, so DecisionStream updates all occurrences of Tea to Chai Tea.
To update the Type 1 change, DecisionStream must locate all the affected
records. To speed up this process, correct indexing must be used. For star
schemas, you should index the business key at each level. This is important when
a Type 1 attribute is updated at a higher level, since the number of records
updated is increased.

Type 1 and Type 2 Updates
[PROGRESS - 16:29:59] Table 'D_Product': 0 new record(s)

[PROGRESS - 16:29:59] Table 'D_Product': 0 'Type 2' change(s)
[PROGRESS - 16:29:59] Table 'D_Product': 1 'Type 1' update(s) (by 'Product')
[11 row(s) affected, average 11]
[PROGRESS - 16:29:59] Table 'D_Product': 0 'Type 1' update(s) (by
'ProductCategories') [0 row(s) affected, average 0]
[PROGRESS - 16:29:59] Done - 0 00:00:01 elapsed
dimbuild -- completed (07-Jan-2003 16:29:59)
You can use the build log to see what happens when you execute a build
containing updates. In the slide example, there is one Type 1 update, but no Type
2 updates. Eleven rows were updated because there was a Type 1 change.
To make the updating more efficient, it is recommended that you index your
business key. In the slide example, the ProductID column was indexed.

SCD - Type 2: Review

assigned and a new row is added to the dimension table
Usually, an effective begin/end date is also updated on the
new and old rows.

Sales Fact Table
RepSur
RepSurOrder
OrderDate Cust
Cust ……
Key
Key Date Key
Key Key Key Date
11111
11111 01/01/1999 12345
1/1/1999 12345
11111
11111 02/01/1999 12345
2/1/1999 12345
NYC 9901
11112
11111 03/01/1999 12345 11112 00128 Mary Smith NYC 9903
11112
11111 04/01/1999 12345

Dimensional History
OLTP and ERP systems often contain historical data; for

example, employee status and product changes.
The historical data can be loaded into the dimension table.
When a tracked change is detected in a Type 2 attribute, a
new surrogate key is assigned and a new row is added to the
dimension table.
Usually, effective begin and end dates are also updated.
Online Transactional Processing (OLTP) and Enterprise Resource Planning

(ERP) systems often contain historical information related to day-to-day
transactions. For example, an employee’s salary may change, or they may be
moved to a new position. Another example is when a product has a new price or
package.
DecisionStream can load the history into a dimension table in the data
warehouse. This is useful if you want to load old data, or when the dimensions
change frequently. When you have loaded the historical data into the dimension,
you can start to load the fact records and take advantage of the late arriving facts
functionality.
Multiple rows with Type 2 attributes may have the same business key, but they

Enable Dimensional History Using the

Effective Date Attribute
Historical dimension data can be loaded into the warehouse

during:
initial loads
periodic runs
multiple changes between runs
Specifying an effective start date lets DecisionStream
recognize multiple changes to the data and automatically
generate the effective end date for all except the most recent
change.
Product_Surrogate Product_ID ProductName Price Effective_Start_Date Effective_End_Date

1 10000 Chai Tea 10 2003/01/01 2003/01/31
2 10000 Chai Tea 11 2003/02/01 2003/02/28
3 10000 Chai Tea 30 2003/03/01
Using effective date attributes solves problems that may occur when loading
historical data in the warehouse. Initial loads are usually required if you want to
load data for a number of years. Periodic runs may only be used once a week or
once a month.

Implement Dimensional History
Source Template Reference Template
Delivered Dimension Table with History
To implement dimensional history, you must define an attribute for effective start
date in the source template.
In the Dimension Table Properties window, you assign a column from the
hierarchy data as the effective start date attribute. DecisionStream derives the
effective start date from the source data, instead of automatically generating the
effective start date.
When you execute the dimension build, the effective start date is initialized from
the source data, and the effective end date is generated automatically.

Processing Dimensional History
Your dimension table may include attributes for which you

are tracking changes, such as product name.
When DecisionStream encounters a change in an attribute
for a specific dimension member, it normally only checks the
current dimension record for that member when updating
the dimension table.
Product Code Surrogate Key Specification Effective Date End Date

P1 1 Blue pen 1998/11/01 2001/12/22
P1 2 Black pen 2001/12/23 2002/02/09
P1 3 Red pen 2002/02/10
Specification is a Type 2 attribute that has The specification for the product underwent a change on
undergone two changes since November 1, Feb. 10, 2002. If the specification is going to change again on
1998. It has changed from being a blue pen to June 7, 2002, DecisionStream will check this record, then
a black pen, and then again to a red pen. update the dimension table again as required.
By default, DecisionStream assumes that changes to dimension data that are fed
into the data mart arrive in chronological order. In the slide example, changes to
the specification of product P1 occurred sequentially. Product P1 was a blue pen
on Nov. 11, 1998, was reclassified as a black pen later (Dec. 23, 2001), and was
then reclassified as a red pen even later (Feb. 10, 2002).
When DecisionStream encounters a change in a Type 2 attribute such as

Specification, it normally only checks the current dimension record for that
member when it updates the dimension table. In the slide example,
DecisionStream checked the second last row of data about product P1 before
adding new data to the last row.
DecisionStream detected that the incoming data about product P1 had a different
specification, which became effective at an earlier date (Dec. 23, 2001). As a
result, a new row was added to the dimension table with a new effective date of
Feb. 10, 2002. DecisionStream specified that the previous row was no longer
current by adding a new end date value to that row. The end date for this row is
the date in which the new row of data became effective minus one day or one
second.
DecisionStream\Help\Contents and Index\Index\dimension builds, rejecting late arriving dimension details

Process Late Arriving Dimension Details
Incoming data for a specific dimension member may include

additional historical attribute changes which occurred prior
to the effective start date of the most current record. This
data cannot be written to the dimension table.
You can choose to reject this incoming data, or write the
details to a file.
Data in original dimension table
DecisionStream rejects
the two rows of incoming Product Code Surrogate Key Specification Effective Date End Date
dimension data because
P1 1 Blue pen 1998/11/01 2001/12/22
they are dated before Feb.
10, 2002. They will be P1 2 Black pen 2001/12/23 2002/02/09
written to a reject file P1 3 Red pen 2002/02/10
called Product.rej.
Incoming dimension data

Product Code Specification Date
P1 Orange pen 1999/05/14
P1 Mauve pen 2001/04/27
However, what if the change from a black pen to a red pen had not taken place
on Feb. 10, 2002, but on Feb. 10, 2001, several months before the effective date
of the most current row of data? This type of change is an example of a late
arriving dimension detail.
Type 2 attribute changes to a dimension member that occurred prior to the Technical Information
Late arriving dimension details are a
effective start date of the most current record for that member cannot be written procedural problem that is not handled
to the dimension table. automatically by DecisionStream. If you
write the details to a .rej file, you will have
The data at the bottom of the slide example includes two changes to the to manually process them back into the
specification of product P1. Both of these changes occurred prior to the effective system.
date of the current record in the dimension table for product P1. The product
was an orange pen on May 14, 1999, and a mauve pen on April 27, 2001. But on Late arriving dimension details present the
following problems:
Feb. 10, 2002 (the date of the most current record for P1), it was a red pen. Do • Further work is required to insert
we accept or reject these late arriving dimension details? this dimension history into the
existing dimension table.
To save the late arriving dimension details to a reject file, specify the name and • Existing surrogate key and
location of the file on the Dimension History Options tab of the Dimension effective date values in the
Table Properties window. If you do not specify a reject file, the late arriving dimension table must be
dimension details are lost. realigned to accommodate the
late arriving dimension details.
• Reassigning surrogate key
values on dimension tables to
accommodate late arriving
dimension details creates
problems for the fact tables that
use these surrogate keys for
referential integrity.

For the Dimension History Options tab to be enabled, the following conditions
must be met:
• there must be a column in the source data that supplies the effective date
(for example, Effective_Begin_Date in a Product table)
• this column must be referenced by an attribute in a template and this

attribute must be assigned a behavior of Effective Start Date
• in the target dimension table, ensure that you have a column that is
mapped to this attribute that has Effective Start Date behavior; we do
not want DecisionStream to automatically generate the effective start date
in the template, we want to derive it from the source data

Specify the Source of the Effective Date
When you deliver data for a specific dimension member,

DecisionStream sets the effective start date in the initial record
according to the template.
You can specify from where DecisionStream should read the
effective start date for the initial record for a specific
dimension member.
P1 1 Blue pen 1998/11/01
P2 4 White paper 2001/11/01
The initial record for each dimension

member is different. Should these dates be
used, or should the effective date be set
automatically by the template for the
dimension table?
Should the effective start date be the date specified for each
dimension member in the data? Or should the effective date be
set automatically by the template?
In the slide example, there are two dimension members. Product P1 is a blue pen
and product P2 is white paper. These are the first rows that represent each of
these products. Because of this, there is no value in the End Date column: no
new records have arrived that render the old records out of date.
In the Dimension Table Properties window, you can specify from where
DecisionStream should read the effective start date for the initial record for a
specific dimension member. You have two options:
• From source attribute: DecisionStream sets the effective start date to the
date specified for each dimension member in the dimension data. This is
the default setting.
• According to reference template: DecisionStream uses the template to set

the effective date for all dimension records.
In the slide example, we have overridden the template settings and used the date
specified in the dimension data for individual dimension members as the effective
start date for initial records. As a result, the data for both products became
effective at different times. Product P1 was identified as a blue pen on Nov. 1,
1998, while P2 was identified as white paper on Nov. 1, 2001.
DecisionStream\Help\Contents and Index\Index\dimension builds, specifying effective start date source

Specify the Source of the Effective Date

(cont’d)
If you use the template to set the effective start date for the
initial records of distinct dimension members, you can further
specify how this date should be set.
You can set the effective start date to the timestamp set by the
dimension build, a variable, a specific date, or a null value.
If you use the template to set the effective date, you can further indicate how
DecisionStream should set the effective start date in the initial record for a
specific dimension member. The following table outlines your options.
Your situation Option
You want to use the date that Use data timestamp value
corresponds to the timestamp set by
the dimension build.
You want to use a variable to set the Variable

date.
This enables a second box in which
you type the variable name. Instructional Tips
The variable must be defined on the
You want to set an explicit date and Date (yyyy-mm-dd hh:mm:ss) Variables tab of the Dimension Build
Properties window.
time.
This enables a second box in which If you have selected Date only in the
you type the date and time. Effective Date granularity box, you cannot
include the time.
You do not want to set a date value. Null

Specify Other Effective Date Options
You can control the way in which

DecisionStream sets effective
start dates and effective end
dates in the template referenced
by the dimension table.
Changing the template affects
effective date options for all
dimension tables referencing that
template.

P1 1 Blue pen 2001-11-01 24:00:00 2002-11-09 23:59:59
P1 2 Red pen 2001-11-10 24:00:00
A dimension table must reference a template. The template lists attributes that Technical Information
If Date only is selected in the Effective date
represent the columns in the table, as well as the behavior of these columns. For granularity box at the top, you cannot
example, only one attribute in the template can represent a primary key column in specify an explicit date and time, just the
the dimension table. date.
If you use a template to set the effective start date for the initial record of a The option in the Set previous record
dimension member, you can control how DecisionStream sets these start dates, Effective Date to box is set to minus one
as well as the effective end dates.In the Effective Date granularity box, specify the day, not one second.
format to use for both the effective start date and the effective end date columns.
You can use either the date and time (the default) or just the date. The option you
choose depends on your reporting requirements.
For example, if you are reporting on electricity rates and are using a date
timestamp, your reports will not be accurate if the rate changes numerous times
in one day. However, using a timestamp that includes just the date may be valid
for reporting on human resources data. For example, it is unlikely that an
employee’s personal information will change twice in the same day.
If you use a timestamp that includes both the date and time, you have more
flexibility when you create reports. For example, you can specify a BETWEEN
clause to retrieve only records that have been modified during a particular time
period in one day.
In the Effective End Date in current records box, indicate how DecisionStream
should set the effective end date when the template detects a change in the
incoming dimension data. You can either use a null value (the default), a variable,
or an explicit date and time.

In the Set previous record Effective Date to box, indicate how you want to set
the effective end date for the previous row of data when DecisionStream creates
a new row in the dimension table. You can set this effective start date to be the
same as the effective start date of the new dimension data row, or the same date
minus one day or second.
DecisionStream\Help\Contents and Index\Index\effective date options, specifying

Demo 10-1
Preserve Dimensional History

Demo 10-1: Preserve Dimensional History
Purpose:
The Great Outdoors recently acquired a new line of food and
beverage products. The data about the past sales of these
products is stored in a separate database. We will add a
connection to this database, then create a new hierarchy that
shows the entire history of this dimension data, including
changes in product names and prices. Finally, we will
construct and execute a dimension build to deliver this history
to a dimension table and view the table’s contents in
SQLTerm.
Task 1. Use ODBC Administrator to add a data source

name (DSN) that references the
BIAS_Northwind.mdb database.
1. From the Tools menu, click ODBC Administrator.
The ODBC Data Source Administrator dialog box appears.
The Create New Data Source dialog box appears.
3. Click Microsoft Access Driver (*.mdb), and then click Finish.
4. In the Data Source Name box, type BIAS_Northwind, and then click
Select.
The Select Database dialog box appears.
5. In the Directories pane, navigate to C:\Edcognos\DS7001, and then in
the Database Name pane, double-click BIAS_Northwind.mdb to select
it.
We return to the ODBC Microsoft Access Setup dialog box.
6. Click OK.
The BIAS_Northwind DSN is added to the System Data Sources list.
7. Click OK to close the ODBC Data Source Administrator dialog box.
Task 2. Add a connection to the Northwind database.
1. In GO_Catalog, right-click the Connections folder, and then click
Insert Connection.
2. In the Alias box, type BIAS_Northwind, and then click the
Connection Details tab.

3. In the left pane, ensure that ODBC is selected, and in the Data Source
Name box, click BIAS_Northwind, and then click Test Connection.
A message appears indicating that the connection is successful.
4. Click OK, and then click OK again to close the Connection Properties
dialog box.
Task 3. Create a new hierarchy in the Product dimension
and add the Category level.
1. Expand the Dimensions folder, right-click the ProductD dimension,
and then click Insert Hierarchy.
The Hierarchy Properties window opens.
2. In the Name box, type ProductHistory, and then click OK.
3. Right-click the ProductHistory hierarchy, and then click Insert Level.
4. In the Name box, type Category, click the Attributes tab, and then click
New.
5. In the Name box, type ProductHistory, and then click the Attributes
tab.
6. Click Add, and then type CategoryID.
7. Click Add button, and then type CategoryName.
8. Click OK.
9. Click Add all attributes to add the attributes to the Chosen
attributes pane.
10. In the CategoryID row, click the Id check box to select it, and then for
the CategoryName box, click the Caption check box to select it.
11. Click OK to close the Level Properties window.

Task 4. Add the Product level to the ProductHistory

hierarchy.
1. Right-click the ProductHistory hierarchy, and then click Insert Level.
2. In the Name box, type Product, click the Attributes tab, and then in the
Template box, click ProductHistory.
3. Click Edit.
5. Click Add, type ProductID, and then press Enter.
6. Repeat step 5 to add three more attributes called ProductName,
UnitPrice, and EffectiveBeginDate.
7. In the Behavior column for EffectiveBeginDate, click Effective Start
Date.

We are returned to the Level Properties window.
9. In the Available attributes pane, double-click ProductID to add
ProductID to the Chosen attributes pane.
10. Repeat step 9 for ProductName, UnitPrice, EffectiveBeginDate, and
CategoryID.

11. In the ProductID row, click the Id check box to select it, in the
ProductName box, click the Caption check box to select it, and then in
the CategoryID row, click the Parent check box to select it.
12. Click OK to close the Level Properties window.

Task 5. Add data sources for the levels of the
ProductHistory hierarchy.
1. Expand the Category level, right-click DataStream, and then click
Insert Data Source.
2. Click the SQL tab, and then in the Database box, click
BIAS_Northwind.
3. In the SQL pane, type the following code:
SELECT `CategoryID`,
`CategoryName`
FROM `Categories`
4. Click Prepare, click Refresh to prepare the columns for use by the level,
and then click OK to close the Data Source Properties window.
5. Expand the Product level, right-click DataStream, and then click Insert
Data Source.
6. Click the SQL tab, and then in the Database box, click
BIAS_Northwind.

7. In the SQL box, type the following code:

SELECT a.`ProductID`,
b.`ProductName`,
a.ÙnitPrice`,
a.Èffective_Begin_Date`,
a.`CategoryID`
FROM `Product_Unit_Price` a, `Products` b
WHERE a.`ProductID` = b.`ProductID`
8. Click Prepare, click Refresh to prepare the columns for use by the level,
and then click OK to close the Data Source Properties window.
Task 6. Map the data source columns to the hierarchy
attributes.
1. Under the Category level, right-click DataStream, and then click Instructional Tips
Properties. The DataStream items may not be visible
in the right pane until you click Auto Map.
The DataStream Properties window opens. When this button is clicked, the
DataStream items are created and
2. Click Auto Map, and then click OK to close the DataStream Properties automatically mapped to the appropriate
window. columns in the data source.
3. Repeat steps 1 and 2 for the Product level DataStream.
4. Right-click the Category level, and then click Mapping.
5. In the Level Attributes pane, click and drag the CategoryID and
CategoryName attributes to the Maps To column beside the
CategoryID and CategoryName DataStream items respectively.

7. Repeat steps 4 to 6 for the Product level.

Task 7. Create a dimension build to deliver the

ProductHistory hierarchy.
1. On the toolbar, click the Dimension Build wizard button.
The Dimension Build wizard opens.
2. In the Name box, type ProductHistory, and then click Next.
3. In the Dimension to be delivered box, click ProductD, in the Reference
item to be delivered box, click ProductHistoryH, and then in the
Deliver into connection box, click TargetConnect.
4. Click Next, and then click Next again to accept the default naming
conventions.
5. Click Next twice to accept the default features and properties for the
dimension build.
6. Click the Add Surrogate Keys to the Dimension Tables and Add
Change Tracking Attributes to the Dimension Tables check boxes
to select them, and then click Next.
7. Click the Track boxes for all the available attributes except
EffectiveBeginDate, and then click Next.
8. Click Finish.
The ProductHistory dimension build is added to the build tree.
Task 8. Modify the dimension table properties to maintain
dimension history.
1. Expand the ProductHistory dimension build, and then double-click the
D_ProductHistory dimension table.
2. Click the Columns tab, and then click the eff_date check box under
Column to clear it.
3. Beside the Use template box, click Edit.

4. Click the Attributes tab, click the eff_date attribute, and then click
Delete to remove the attribute from the template.
5. In the Behavior column beside EffectiveBeginDate, click Effective
Start Date, and then press Enter on your keyboard.

The Dimension History Options tab is enabled. The result appears as
shown below.
7. Click the Dimension History Options tab, ensure that the From Instructional Tips
source attribute button is selected in the Effective Start Date in Initial Students may have to close and re-open
the Dimension Table Properties dialog box
Records area, and then click OK to close the Dimension Table before the Dimension History Options tab is
Properties window. enabled.

Task 9. Execute the ProductHistory dimension build and

then view the results in SQLTerm.
1. Right-click the ProductHistory dimension build, and then click
Execute.
2. Click OK.
3. In the Trace area, click the Override build settings box to select it, and
then click the Detail, SQL, and ExecutedSQL check boxes to select
them.
4. Ensure that the Progress check box is also selected, and then click OK.
A command window opens and a log file is created as the build is
executing. Note that 93 rows are delivered to the D_ProductHistory
dimension table.
5. When the build has finished executing, press Enter on your keyboard to
close the command window.
6. Open SQLTerm.
7. In the Database for SQL Operations box, click TargetConnect, and
then in the Database Objects pane, expand TargetConnect.
8. Right-click D_ProductHistory, and then click Add table select
statement.
A SELECT statement is added to the Source SQL tab.

9. Click Execute SQL Query.

The query runs and returns 93 rows of data. It includes the complete
history of changes to the products, and the EffectiveBeginDate column
indicates when each of these changes took place. The result appears as
shown below.
10. Close SQLTerm and leave DecisionStream open for the next module.
Results:
We added a connection to a new database. We then created a
new ProductHistory hierarchy that showed the entire history of
the product data in this database, including changes to
product names and prices. Finally, we constructed and
executed a dimension build to deliver this product history to a
dimension table and viewed the table’s contents in SQLTerm.

Summary
In this module we have:

populated a hierarchy using multiple data sources
used derivations in a hierarchy
reviewed Type 1 and Type 2 updates
enabled dimensional history using effective date
attributes


11
8. Fact Builds
13. JobStreams

Developers

FACTS IN DEPTH MERGING
Objectives

understand how DecisionStream acquires data
handle duplicate data
merge data
reject data

10100101
Hierarchical data is loaded into memory
and is referenced multiple times in the
fact build process.
Product
DecisionStream uses the levels of the
hierarchy to merge the data, to validate
Data incoming data, to aggregate the data,
Source and to partition and filter the data.
Data Pivot Merge Lookup Aggregation Partition and

Stream Validation Filter
When you run a fact build, DecisionStream:
• checks that the columns in the target table are the same as those being
delivered
• validates the build specification
• finds the associated reference data in the dimensional framework
• loads the dimensional data into memory (using the hierarchy or lookup
definitions)
• acquires the fact data (and performs any necessary merging)
• performs referential integrity checks using the reference data
• transforms (calculates and aggregates) the fact data
• delivers the fact data
To process fact data, DecisionStream acquires and merges the data as specified,
aggregates it, filters and partitions the result, and then delivers the data to the
appropriate tables.
DecisionStream launches the fact build (using the DataBuild.exe program) as a

new process that runs independently of the designer. All the relevant progress
information is written to a log file.

How DecisionStream Processes Fact Data:

Default Processing
Multiple Data Sources DataStream
Cust Date Qty Amt Cust Date Qty Amt Paid Cr L

1 199901 1 100 1 199901 1 100 0 0
2 199901 2 200 2 199901 2 200 0 0
2 199901 3 300 2 199901 3 300 0 0
3 199901 4 400 3 199901 4 400 0 0
4 199901 5 500
4 199901 5 500 0 0
1 199901 0 0 100 500
Cust Date Paid Cr Lmt
2 199901 0 0 50 350
1 199901 100 500
2 199901 0 0 100 450
2 199901 50 350
2 199901 100 450 3 199901 0 0 200 350
3 199901 200 350
Like UNION ALL
No additional memory required if you are not merging or rejecting duplicate rows.
Regardless of whether you are using a single data source or multiple data sources,
all the rows are moved into the DataStream. After the rows reach the DataStream
they may be rejected or merged.
The slide example shows what happens if two or more data sources are acquired
and the user does not specify whether to merge or reject duplicate rows. In this
case, DecisionStream reads every row from the first source, and then every row
from the second source, and so on, processing each record separately until all the
records from all the data sources are read.
No memory is used if the developer does not specify that they want to merge or
reject duplicate rows.

How DecisionStream Acquires Fact Data:

Merge or Reject Duplicate Rows
When you want to merge or reject duplicate data, the dimension

values are used to create a memory table of pointers called a
hash table.
Hashing groups similar values together, but does not sort them.
D D DataStream
Cust Date Qty Amt Cust Date Qty Amt Paid Cr L
2 199901 2 200 2 199901 2 200 0 0
2 199901 3 300 2 199901 3 300 0 0
1 199901 1 100 2 199901 0 0 50 350
3 199901 4 400
2 199901 0 0 100 450
4 199901 5 500
1 199901 1 100 0 0
Cust Date Paid Cr Lmt 1 199901 0 0 100 500
1 199901 100 500
3 199901 4 400 0 0
2 199901 50 350
2 199901 100 450
3 199901 0 0 200 350
3 199901 200 350 4 199901 5 500 0 0
The hash table groups rows with the same dimension values. This process requires additional memory. Technical Information
A hash table is a way for DecisionStream
to load the rows of data into memory and
When a DataStream incorporates multiple data sources, the result is very much then access them quickly using a minimum
like an SQL UNION ALL command: all rows are retrieved, and all columns are amount of resources.
represented and loaded into memory.
Each record is given a value based on a
It is actually much quicker for DecisionStream to deal with the data in memory. calculation. The calculation takes into
Instead of going back and forth to the tables to determine what to do with the account the dimensional values and uses
these to calculate a number that is unique
data based on the user's requirements (at the same time), DecisionStream loads all
for that set of dimensional values.
the data into memory (interleaving) and then uses pointers to retrieve and
organize the data accordingly. If another record comes in with the same
set of dimensional values, it will have the
In the slide example, the two tables on the left represent two different data same hash value as a record already in the
sources. As the data is acquired, it is entered into memory based on the order that hash table, and DecisionStream will match
the data comes into the DataStream. Data sources that have similar dimension these similar values. Any other combination
of dimensional values usually yields
data are grouped together, as shown in the table on the right. This grouping another distinct value for the hash table.
requires additional memory.
Now, as DecisionStream reads in each
If the rows are not ordered in the data sources, they will not be ordered after this row, it takes the calculated hash value and
step. Technically, the rows in the DataStream are "hashed;" that is, loads it into the table in memory. This
means that when you load the hash
DecisionStream performs a calculation against all the values of the dimension
numbers for all rows with identical
elements and generates a number. That number is used to determine a place in dimension values, they will be clustered
memory to hold a pointer. The pointer acts as the index of the original row in the together. With each hash value, a pointer in
DataStream. memory is assigned.
It is usually much easier and faster for

DecisionStream (and the operating system)
to deal with pointer values than with the
actual records that are entered. Pointers
also use less memory.
DecisionStream\Help\Contents and Index\Index\data sources: managing

If two rows have identical dimensional values, then the calculation generates the
same hash number. Because that number determines where to load the pointer in
the hash table, rows that have identical dimensional values will have their pointers
clustered together in the hash table. The clustering is the interleave process that
DecisionStream performs. Do not confuse interleaving rows with sorting or
ordering the rows.
Keep in mind that the interleaving process just groups common dimension
values together. The actual merging of the related measures, attributes, and
derivations, takes place in the transformation model.
Since the hash number is derived from a calculation, it is possible that more than
one set of dimensional values will yield the same hash number. If this happens,
the slot in memory is already occupied. This situation is referred to as a collision.
In this case, another slot is picked in the hash table for the new pointer. The hash
table is discussed further in Module 20, "Troubleshooting and Tuning."
The DataStream is written to memory as a simple list of rows, and a hash table of
pointers is created to find them again for processing.

Handle Duplicate Fact Data Rows
Rows are duplicates if they contain the same values in all

dimension elements.
You can write all duplicates to the fact table, reject
duplicates, or merge each group of duplicates into a single
output row.
Reject file name
Allow
Reject
Merge
Duplicate rows occur when the values in all the dimensional columns match. You
can have duplicate rows even if you only have one data source. When handling
duplicates in DecisionStream, you must define how you want to handle them.
Duplicate behaviour is defined on the Input tab of the Build Properties window
because the processing is performed at the input stage of a transformation.
You can reject, accept without further manipulation, or merge (consolidate)
duplicate rows. The default (and most efficient) behaviour is to allow duplicate
rows. No sorting or hashing is required. Use the Allow records with duplicate keys
option when duplicates are not important, or when you know that no duplicate
rows exist in the input data.
Option Description
Allow records with DecisionStream accepts the duplicate records. This
duplicate keys option is selected as the default.
Reject records with For each set of duplicate records, DecisionStream
duplicate keys accepts the first data row and rejects subsequent rows
to the reject file. If you do not specify a reject file to
write rejected records to, the rejected rows are lost. You
specify the reject file on the Input tab.
Merge records with DecisionStream tracks the duplicate records and merges
duplicate keys all non-dimension columns using the merge functions
that you specify.
DecisionStream\Help\Contents and Index\Index\duplicate: key handling

Build slide
One click to complete.
Allow or Reject Duplicate Rows
D D DataStream
Cust Date Qty Amt Cust Date Qty Amt
1 199901 1 100 1 199901 1 100
1 199901 1 200 1 199901 1 200
2 199901 2 200 2 199901 2 200
2 199901 3 300 2 199901 3 300
3 199901 4 400 3 199901 4 400
4 199901 5 500 4 199901 5 500
D D DataStream
Cust Date Qty Amt Cust Date Qty Amt
1 199901 1 100 1 199901 1 100
1 199901 1 200
2 199901 2 200
X 1
2
199901
199901
1 200
2 200
2 199901 3 300 2 199901 3 300
3 199901 4 400 3 199901 4 400
4 199901 5 500 4 199901 5 500
In the slide example, the first two records are duplicate rows. Both have the same
values for the Customer and Date dimensional elements (1 and 199901,
respectively).
With the setting that allows for duplicates, both records are passed to the
DataStream. If you set DecisionStream to reject records with duplicate keys, the
first record is passed, but any subsequent records are not. These duplicate records
are written to the reject file.
DecisionStream\Help\Contents and Index\Index\rejecting data: duplicate source data

Understand the Fact Build Process:

Merge Input Rows
Merging duplicate rows

D D
Cust Date Qty Amt Paid Cr L
1 199901 1 100 0 0
1 199901 0 0 100 500 D D
2 199901 2 200 0 0 Cust Date Qty Amt Paid Cr L
1 199901 1 100 100 500
2 199901 3 300 0 0
2 199901 5 500 150 450
2 199901 0 0 50 350
3 199901 4 400 200 350
2 199901 0 0 100 450 4 199901 5 500 0 0
3 199901 4 400 0 0
3 199901 0 0 200 350
4 199901 5 500 0 0
sum sum sum last
Set the merge property for each

non-dimensional element
In the slide example, DecisionStream is set to merge any records that have Questions
In the table on the right, the last column on
duplicate keys. The keys are the actual dimensional values. the right does not sum the Credit Limit
values, but chooses the last value from the
Note the source rows for customer 2. The first two rows referring to this values of each of the third, fourth, fifth, and
customer have zero values for Paid and Credit Limit. The second two rows have sixth records. That value is 450.
zero values for Quantity and Amount. All four records have the same
dimensional values, or keys (in this case, 2 and 199901). Merging the records Ask the students if they notice anything
creates a merged record that has the same dimension values as all four records, unusual about this table to see if they spot
this, and ask why the Credit Limit column
and merged values for all remaining elements. has this value for the record that has
dimension values of Customer=2,
This merging process continues for all records that have duplicate dimensional Date=199901.
values until all duplicate records are merged.
DecisionStream\Help\Contents and Index\Index\merging

Choose How to Merge Input Data
Select the
Merge Behavior
Instructional Tips
You can choose from one of the following merge methods.
If the element (for example, an attribute
Your situation Merge Method called CustomerName) is a character
string, do not select a mathematical merge
You want to use the sum the duplicate child members. SUM method (such as SUM). Doing so will
produce an error when the fact build is
You want to use the child member with the maximum MAX executed.
value.
You want to use the child member with the minimum MIN
value.
You want a count of all the child members. COUNT
You want the average value of the child members. AVG
You want to use the first value that occurs FIRST
(the first answer is always the correct one).
You want to use the first non-null value that occurs FIRST NON-NULL
(the first answer is always correct provided it is present).
You want to use the last value that occurs LAST
(the latest information is always best).
You want to use the last non-null value that occurs LAST NON-NULL
(the last record represents the last update, but a null
value is never an improvement on a previous real value).
You want to use 1 or 0, depending on whether values ANY
are present or not.
DecisionStream\Help\Contents and Index\Index\merging: methods

Track Rejected Records
The reject file contains:

fact rows where the value of one or more dimension
elements does not exist in a reference dimension
rejected fact records that have duplicate dimensional data
if the “Reject records with duplicate keys” option button is
selected
Reject File
If you create a build using the Fact Build wizard, a reject file named
{$DS_BUILD_NAME}.rej is specified. The $DS_BUILD_NAME portion is a
variable that returns the name of the build itself. As shown in the slide example,
Instructional Tips
you can also rename the reject file and give it a different extension (in this case, To change the properties of a fact build,
reject.txt). right-click the build and then click
Properties. Click the Input tab to specify
When executing a fact build, DecisionStream writes to the reject file all rows record rejection options.
where the value of one or more dimension elements does not exist in a reference
dimension. This process is part of basic data integrity checking.
If you specified DecisionStream to reject duplicate records; that is, you selected
the Reject records with duplicate keys option, these records are also written to the
reject file.
In the Write Any Rejected Records to box, enter the full directory location and
name of the file. If necessary, click the ellipsis to open the Select Reject File dialog
box. By default, reject files have a .rej filename extension. If you do not specify a
file, the rejected rows are lost.
The reject file is deleted and re-created each time the build is run.
DecisionStream\Help\Contents and Index\Index\ rejecting data: saving during delivery

Fact Data Integrity Checking
Unmatched members are fact rows that are not referenced in

the dimensional data.
By default, unmatched members are written to the reject file.
Customer Dimension
Cust Name Address
1 Bob Canada
Fact Data 2 Tom U.S.A.
Cust Date Qty Amt Paid Cr L 3 Mary England
1 199901 1 100 100 500
2 199901 5 500 150 450
3 199901 4 400 200 350
4 199901 5 500 0 0 Unmatched Member
Data integrity checking is a fundamental feature of extraction, transformation,

and loading (ETL) tools. DecisionStream performs data integrity checking by
default. All incoming fact rows have their dimensional element values checked
against the dimension hierarchies referenced by the fact build.
If a fact row with an unmatched member is detected, it is written to the reject file
(assuming one has been specified). The log file indicates whether there are any
rejects and how many. You must, however, check the reject file to see the
rejected records.
Unmatched member(s) happen for many reasons. For example, the fact data may
be incorrect, or a dimensional member may be missing.
If most or all rows are rejected, it is likely because of problems with the definition
of the reference hierarchy or a source query.
DecisionStream\Help\Contents and Index\Index\unmatched members: rejecting

Include Unmatched Members
Although you can include fact rows with unmatched members

and automatically include new dimension members in
DecisionStream, it is not usually recommended.
A fact row is delivered to the fact table(s).

A new member is added to the dimension.
However, only the key(s) will have a value.
Customer Dimension
Cust Name Address
Fact Data
1 Bob Canada
Cust Date Qty Amt Paid Cr L 2 Tom U.S.A.
1 199901 1 100 100 500 3 Mary England
2 199901 5 500 150 450 4 <null> <null>
3 199901 4 400 200 350
4 199901 5 500 0 0 Unmatched Member
You can allow DecisionStream to accept source fact data that does not relate to Instructional Tips
any dimension reference data. This is known as including unmatched members. You must specify whether you want to
DecisionStream treats unmatched members as data at the lowest level of the include unmatched members for each
reference data. dimension element. Open the Properties
dialog box and then click the Unmatched
Members tab.
You can also add unmatched members to the dimension reference data so that
the data is not unmatched in subsequent build executions. To allow the addition To accept source data that does not relate
of unmatched members to reference data, the dimension element must be to any dimension reference data, select the
associated with a lookup that uses a template for data access. This is because a Accept unmatched member identifiers
template automatically creates the correct INSERT and SELECT statements check box. To add the unmatched
required to include the unmatched members. members to the dimension reference data,
click the Save unmatched member details
via reference structure check box.
You do not require a dimension delivery for the unmatched members to be
written back to the reference structure. Note: If the conditions for adding
unmatched members are not met, a
In the slide example, the customer with an Id number of 4 has been added to the message appears informing you of
Customer dimension, but the Name and Address attributes have no useful values. this. This check box is then not
There are no values in the fact data that DecisionStream can place into those available.
attributes. When the unmatched member is added
back to the reference structure, the correct
It is preferable to resolve data issues in the source system instead of in the data surrogate key and any date attributes are
warehouse. assigned correctly.
DecisionStream\Help\Contents and Index\Index\unmatched members

Understand Custom Dimensions: Example
ALL Static Member
Country
Region
Customers table
City
Customer
Order (or ShipTo) Orders table
A dimension can be made up of many different types of data from many sources Questions
Ask the students how they would create
and are customizable to suit the needs of the user. this dimension.
The slide example shows an unusual dimension. It is unusual because the 1. Use the wizard to create the Country,
lowest level Order is usually part of the fact (transactional) data, not the Region, City, and Customer levels
dimension data. from one table.
2. Add the Order level manually.
The top level only contains one static member. The next four levels are all 3. Add the ALL level manually.
(or include it when using the wizard)
derived from data contained in one single table: Customers. The names of these
four levels come from the structural data. The last level is derived from the
Orders table. The name of this level comes from the transactional data.

Demo 11-1
Merge Data

Demo 11-1: Merge Data
Purpose:
To streamline the reporting requirements of our managers, we
want to remove any data that is duplicated in the tables. We
must eliminate any unnecessary information by merging all
duplicate data.
Task 1. Duplicate the DemoSales fact build and add a fact

delivery.
1. In GO_Catalog, under Builds, right-click DemoSales, and then click
Duplicate.
A duplicate fact build called DemoSales:1 is created.
2. Expand DemoSales:1 and Delivery Modules, right-click Fact
Instructional Tips
Delivery, and then click Insert Relational Table Delivery.
You may also choose to add a new fact
The Table Delivery Properties window opens. delivery to the original DemoSales fact
build instead of duplicating it as specified in
3. In the Name box, type F_Merge, and then click the Table Properties Task 1.
tab.
4. In the Connection box, click TargetConnect, and then in the Table
name box, type F_Merge.
5. Click the Module Properties tab, and then in the Refresh Type box,
click TRUNCATE.
6. Click OK to close the Table Delivery Properties window.
Task 2. Limit the amount of data to process and remove
reference to DateOrder.
1. In the DemoSales:1 build, expand Transformation Model.
For the purpose of this demo, we do not have to reference DateOrder.
We want to find duplicates for the Product and Vendor references.
2. Right-click DateOrder, click Delete, and then click Yes to confirm the
deletion.
3. In DemoSales:1 build, right-click DataStream, and then click Properties.
4. Click the Input tab.
5. In the Maximum input rows to process box, type 10, and then click OK.
6. Right-click DemoSales:1, and then click Execute.
A dialog box appears prompting us to save our changes to the catalog.
7. Click OK.
We do not have to deliver metadata or dimension data.

8. Under Deliver, clear Dimension and Metadata, and then click OK.
The build runs and delivers ten rows to the F_Merge table.
Task 3. View the data in the fact table.
1. Open SQLTerm.
3. Under Database Objects, expand TargetConnect.
4. Right-click F_Merge, and then click Add table select statement.
5. Execute the query.
Ten rows are read.
There are two pairs of records that have the same dimension data. We
want to merge the duplicates to eliminate duplicate data because it is
unnecessary to include it in reports.
6. Close SQLTerm.
Task 4. Change the merge behavior.
1. Ensure that Transformation Model is expanded, right-click UnitCost, Technical Information
and then click Properties. If the students look at SalesTotal, they will
see that it does not have a Merge tab
2. Click the Merge tab. associated with it. This is because the
merging is done in the transformation
3. In the Merge Behavior box, click AVG, and then click OK. model after the data is acquired.
This setting forces DecisionStream to average the values for the UnitCost
element.
4. Repeat steps 1 to 3 for UnitPrice to average the values for UnitPrice.
Task 5. Change the fact build properties.
1. Right-click DemoSales:1, and then click Properties.
3. Under Duplicate Key Handling, click Merge records with duplicate
keys to select it, and then click OK.

Task 6. Execute the build and view the data in the fact
table.
1. Click DemoSales:1 to select it, and then on the toolbar, click the
Execute button.
A dialog box appears prompting us to save our changes to the catalog.
2. Click OK.
The build runs and delivers 8 rows to the F_Merge table.
4. Open SQLTerm.
6. Under Database Objects, expand TargetConnect.
7. Right-click F_Merge, and then click Add table select statement.
Eight rows are read. The result appears as shown below:
We expect only 8 rows because we have merged any duplicate records. If

you look at the record that has dimension values of 1 and 17, the
UnitCost is 4.390000 and the UnitPrice is 6.590000. The record that has
the dimension values of 1 and 21 shows a UnitCost of 4.390000 and a
UnitPrice of 6.59000. These values are the result of the averages of the
original values for the duplicate records when the duplicates are merged.
9. Close SQLTerm and keep DecisionStream open.
Result:
To create clear, concise reports, we have merged rows with
duplicate dimension values that are unnecessary for our
reporting needs.

Demo 11-2
Reject and Analyze Data

Demo 11-2: Reject and Analyze Data
Purpose:
We want to further analyze the records that are rejected during
the execution of the DemoSales:1 fact build. We will create a
file that contains the rejected records, and then view the
contents of this file.
Task 1. Set the DemoSales:1 fact build to reject data.

Instructional Tips
3. Under Duplicate Key Handling, click Reject records with duplicate You could also use the original DemoSales
keys to select it. fact build instead of the duplicate version
created in Demo 11-1.
4. In the Write any rejected records to box, type MyRejectFile.txt, and
then click the Browse for file button .
5. Navigate to C:\Edcognos\DS7001\Backups, and then click Save.
6. Click OK, and then save your work.
7. Click DemoSales:1 to select it, and then on the toolbar, click Execute.
A DOS window opens and the records are merged and rejected.
Eight records are accepted and two are rejected. We will look at these
two rejected records.
Task 2. Analyze the reject file.
1. Open Notepad.
2. From the File menu, click Open, and then navigate to
C:\Edcognos\DS7001\Backups.
The student may have to maximize the
3. In the Files of type box, click All Files. window in Notepad to see all columns
correctly.
4. Double-click MyRejectFile.txt to open the file.
You can also open the reject file in
The first column indicates the reason why each record was rejected. In Excel.
this case, it is because the two rows are duplicates, as we expected. We
can also see the values associated with each record.
5. Close Notepad and keep DecisionStream open.
Results:
We defined a file for tracking rejected records. We also viewed
the contents of this file so that we can analyze the data and
determine why it was rejected.

Summary

understood how DecisionStream acquires data
handled duplicate data
merged data
rejected data

12
8. Fact Builds
13. JobStreams

Developers

USER-DEFINED FUNCTIONS AND VARIABLES
Objectives

identify the significance of user-defined functions
work with internal UDFs
work with external UDFs

What Is a User-Defined Function?
A user-defined function supplements the extensive library of

DecisionStream built-in functions.
UDFs can be used in:
output filters for fact deliveries
derivations
JobStreams
Variables
UDFs are especially useful for if you want to reuse simple or
complex calculations.
Create a user-defined function (UDF) to supplement the functions that

DecisionStream provides.
A UDF can be used to define business rules. It can be used in output filters,
derivations, JobStream procedure and condition nodes and build JobStream
variables.
The UDFs for the open catalog are located in the Functions folder in the
library tree.
DecisionStream\Help\Contents and Index\Index\user-defined functions

Create a UDF
The four-step process to create a UDF is:

1. name the function
2. define any required arguments and assign their data types
3. define any required variables
4. enter the syntax of the function
1. To create a UDF, give it an appropriate name within the context of the

catalog and build.
2. Next, define any necessary arguments and assign the appropriate data types
to those arguments.
3. You can also define any required variables and their data types for use within
the function.
4. After you complete these tasks, define the syntax of the function and test it to
ensure that it returns the correct values and types.

Define the UDF
User-defined functions are part of the Library in a catalog.
User-defined functions are intended to promote the re-use of common business

logic. DecisionStream provides many functions, which are available for every
catalog.
When you add a UDF to the library, it becomes available to all builds and Instructional Tips
To export a component of a catalog (in this
JobStreams within that catalog. After you define the function in a catalog, you can case, a UDF), use the Create Package
reference it many times, and export or import it across catalogs. command under the File menu.
A UDF is the same as a built-in DecisionStream function. It appears in the palette

of functions, and can be used wherever a function is valid in an expression.
To create a UDF, insert it into the Functions folder.
Use the General tab to define the name and description of the function, as well as
the return value type.

Declare the Interface of a UDF
On the Interface tab, you

declare arguments that
are then passed into the
UDF.
A UDF can accept a
maximum of 16
arguments.
A UDF can only accept
specified data types.
The order in which you
specify the arguments of
the UDF is important.
On the Interface tab, you can add up to 16 unique arguments for each function.
DecisionStream gives the name Argumentn (where n is a unique sequential
number) to each argument and allocates an argument type of CHAR by default.
When adding an argument, give the argument a more meaningful name.
To change the default data type of the argument, click the Argument Type box
and choose the type you prefer. To delete an argument, click the argument and
click Delete.
If a function has more than one argument, the order of the arguments indicates
the order in which you must enter the values when using the function. You can
change the order of the arguments by clicking the argument and then clicking
Move Up or Move Down.

Implementation Tab
Specify the calculation that

will return a single result.
Use the Expression Editor
to help build the
calculation.
Calculations can be
composed of single or
multiple lines of code.
Expression assistance available in left pane.
A single line of code that returns a single result.
Multiple lines of code that return a single result.
You define the syntax of the function on the Implementation tab.
When implementing an internal UDF, DecisionStream provides a number of

built-in functions, as well as operators and control statements.
The left pane of the Implementation tab contains a list of functions and other
calculation options. Choose from logical and mathematical operators, built-in and
user-defined functions, various control statements, as well as variables and
arguments. You have access to over 75 built-in functions, 20 operators, and
various control statements (If/Case/Do While).
Note: Variables and arguments must be previously defined in the catalog

to appear in each list.

Test a UDF
Test the syntax and logic of an expression before

implementing the UDF in a JobStream, build, or derivation.
Set scope of
expression
Set data
type for Enter test
arguments values
Set default
values for
arguments
Click Calculate
to test
expression
It is good practice to test the syntax of a function and ensure that it returns
correct results.
By using the Test button (see the slide on the previous page), you can enter real
values and determine whether the function will operate correctly.
If DecisionStream is able to locate a definition for an item in the expression, a

check mark appears next to the item name in the Input values or Output values
box. If a definition is not found, a cross appears. There are two reasons why an
item definition may not have been found:
• the item has not yet been defined
• the item is defined at a higher scope that is currently unknown to

DecisionStream
DecisionStream\Help\Contents and Index\Index\testing:expressions and scripts

There are a number of options you can set when testing an expression:
• the top scope of the expression
By default, DecisionStream only searches for variable and argument

definitions within the immediate scope of the expression. You may also
test a UDF within the scope of a build or JobStream. For example, a
UDF may reference a build variable that has a pre-assigned value
• the data type for input variables or arguments
If no definition is found for an input variable or argument, or no data

type is set in the definition, DecisionStream deduces the data type from
the test values you enter. If you do not want DecisionStream to deduce
the data type, you can explicitly set it.
• the default values assigned to variables or arguments in the expression
A variable or argument definition may include an initial value that you

can use rather than specifying your own test values.
In the slide example, by entering the values of 100 for Price and 50 for Cost and
then clicking the Calculate button, the correct value of 0.5 is returned, verifying
that the function returns the correct result.

Use UDF Variables
Create variables to enhance or simplify UDFs.

UDF variables are part of the definition of the UDF and are not
available outside of that UDF.
On the Variables tab, you can create your own variables for use within a UDF.
Property variables are intrinsic to DecisionStream and affect the operation of

DecisionStream programs. These variables include licensing, system, debugging,
auditing, performance tuning, and DBMS-related variables.
All variables must start with an alphabetic character and must contain only Technical Information
When you deal with variables in
alphabetic characters, numeric characters, and underscores. Names of variables DecisionStream, a variable with a $ in front
are case-sensitive; therefore, you can use a combination of characters, including of it is a dynamic variable, whose value can
uppercase and lowercase, to create many different variable names. be changed.
For example, $X := $X + 10
You reference a variable in an expression by preceding its name with a dollar
sign ($). For example, the following expression: However, by adding braces around the
variable, the variable becomes static and
its value cannot be changed.
$X := $X + 10;
For example, {$X}
adds 10 to whatever value was previously in the variable X.
Note: The assignment operator for a variable is not an equals operator,

but a colon followed by an equals operator.
DecisionStream\Help\Contents and Index\Index\user-defined variables

Build and JobStream Variables
Variables can also be included in builds, in JobStreams, as part

of the operating system, and as part of the CLI command line.
Variables may also be created outside of functions as part of a build or

JobStream.
Build and JobStream variables are available anywhere within the object in which it
is declared.
You can also define variables within the environment of the operating system, or
as a command-line parameter. (We will discuss this topic later in Module 22, "The
Command Line Interface").

Demo 12-1
Create and Test an Internal UDF

Demo 12-1: Create and Test an Internal UDF
Purpose:
We want to create a UDF that calculates profit margin. The
UDF will be applicable throughout the GO_Catalog, and can be
used within different builds and JobStreams.
Task 1. Add a user-defined function to track the Great

Outdoors' profit margin.
1. Under Library, right-click the Functions folder , and then click
Insert Function.
The Function Properties window opens.
2. In the Name box, type Margin.
3. In the Return Type box, click NUMBER, and then click the Interface tab.
4. Click Add.
5. In the Argument Name box type a.
6. In the same row, click the Argument Type column.
7. Click the Argument Type down arrow button , and click NUMBER.
8. Repeat steps 4 to 7 to add a second argument called b, press Enter, and
then click the Implementation tab.
Task 2. Use the a and b arguments to create the
calculation for the Margin UDF.
1. In the left pane of the Calculation area, double-click Control and then
double-click return.
The Return command ends execution of the script and returns the value
of the specified expression to the calling process. This command is
required in any UDF that contains more than one expression.
Although this UDF will contain only one expression for now, we will
include the Return command. By doing so, we can include multiple
expressions in this UDF later.
2. In the right pane, delete <expression>.
3. In the left pane, double-click Operators, Mathematical and the open
parenthesis (.
4. Double-click Arguments, and then double-click a.
5. Double-click Operators, Mathematical and the minus sign (-).
6. Repeat step 4 to add the b argument.
7. Repeat step 3 to add the close parenthesis ), and then the forward slash
(/) operator.

8. Repeat step 4 to add the a argument.

The result appears as shown below. Instructional Tips
Task 2
Alternatively, you may choose to skip steps
3 through 8, and type the following:
(a-b)/a
As soon as you add the return command,

you must add the semicolon (;) to the end
of the line so that DecisionStream
recognizes it as a line of code.
Task 3. Test the Margin UDF.

1. Click the Test button.
The Values for Expression window opens.
2. In the Input values area, double-click the Value column to the right of
the a argument to select the empty cell, and then type 4.56.
3. Repeat step 2 to enter the value 2.00 for the b argument.
4. Click Calculate.
The Result box should display 0.5614. The result is the value produced
by the Margin function and is based on the sample values that we have
passed to the function's parameters.
5. Click Close, and then click OK to close the Function Properties window.
Results:
We have created a UDF that calculates profit margin. We can
use this UDF for any build in the catalog.

Demo 12-2
Use a Variable in a Fact Build

Demo 12-2: Use a Variable in a Fact Build
Purpose:
We want to use a variable to set the location in which reject
files are stored for a fact build.
Task 1. Add a variable to the DemoSales:1 build.

1. In the GO_Catalog, right-click DemoSales:1, and then click Properties.
3. In the Write any rejected records to box, delete the current text and type
{$RejectLocation}\MyRejectFile.txt.
4. Click OK.
Task 2. Specify a value for the RejectLocation variable
when you execute the DemoSales:1 build.
1. Right-click DemoSales:1, and then click Execute.
2. Click OK to save changes to the catalog.
3. In the Additional options box, type
-VRejectLocation="C:\Edcognos\DS7001".
This line of code indicates that you want to automatically save the reject
file (if one is created during build execution) in the
C:\Edcognos\DS7001 folder.
4. Click OK.
DecisionStream executes the build.
5. When the build has finished executing, press Enter to close the DOS
window, open Windows Explorer, and then search for MyRejectFile.txt
in the C:\Edcognos\DS7001 directory.
The file was saved to this folder during the execution of the DemoSales:1
build.
6. Close Windows Explorer, and then keep DecisionStream open for the
next demo.
Results:
We have used a variable to set the location of reject files
created by a fact build.

Internal and External Functions
Internal functions include:

UDFs with coded calculations implemented internally
built-in functions
External functions:
are defined within run-time libraries (dynamic link
libraries (.DLL) in Windows or shared libraries in UNIX)
must be declared to DecisionStream within a UDF
User-defined functions are implemented either internally or externally. In an
In Windows, dynamic link libraries (.DLL)
internal implementation, the calculations are coded directly inside the UDF. files are used. In UNIX, shared library files
These calculations may incorporate existing DecisionStream functions, other are used.
UDFs, or a combination of both.
An external function is defined in an external compiled module called a run-time

library.
To use an external function, you must create a UDF in DecisionStream, mark it

as an external implementation, and declare the library and function name.
External functions may have been created by the client or purchased from a third
party. You can use these functions for complex calculations or data cleansing. It is
preferable to re-use existing functions rather than re-write them.

Create an External UDF
The three-step process to create an external UDF is:

1. name the function
2. define the required arguments and assign their data types
3. choose the run-time library and function to use
The external DLL file (Windows) must be located in one of:
the current working directory of DecisionStream
a directory specified in the PATH environmental variable
the Windows or Windows System Directory
A UNIX environment uses shared libraries and not DLLs.
To create an external UDF, follow the same basic steps for creating an internal
UDF. The only major difference is that you must create the external UDF
functions in a run-time library, and they must adhere to specific rules and
conventions.
• You must register an external UDF within DecisionStream before you

can use it.
• Libraries of external UDFs must use C calling conventions. In Windows, Key Information
you must use dynamic link libraries (DLLs). In UNIX, use shared A DLL file does not have to be written in
the C programming language, but it must
libraries. use the C calling conventions.
• You must store the result of an execution in a formal parameter because
DecisionStream ignores any value that UDFs return.
• All formal parameters except the second one must be of type
DS_VALUE_T, a structured data type that supports values of several
standard C data types. The first parameter contains the results of
executing the function. All other parameters except the second one
supply input values to the function.
The second formal parameter must be of type DS_BOOLEAN. It
specifies in which mode to execute the function.
• DecisionStream can execute functions in either validation or normal
execution mode and any functions that you write must support both
execution modes.
DecisionStream\Help\Contents and Index\Index\external user-defined functions

Implementation of an External UDF
Specify the library and function name that the UDF will use.
After you create and define the function in a library file, you must declare the Technical Information
Use an external UDF to cover any complex
function to DecisionStream. Declare any function you have defined by clicking calculations that DS cannot handle with the
the Implementation tab of the Function Properties window. built-in features. A benefit of this is that the
creator of the UDF can debug the file,
Click the External option button to select the type of UDF to use. In the without depending on the DecisionStream
Library Name box, enter the name of the library file that contains the administrator.
function you are registering. On Windows the library file is a dynamic link
library (.dll) file, and on UNIX a shared library file. You do not have to enter
the full directory path for the file because the standard rules for locating
dynamic libraries for your platform apply.

UDF: In a Derivation
Add a UDF to the

calculation of a
derivation, or combine
it with existing
functions. This can
better implement your
business rules.
You can also use a UDF in a derivation to better implement your organization's
business rules. To assist you when adding a calculation to a derivation,
DecisionStream provides built-in functions as well as the UDFs that you created
previously.
In the slide example, the UDF Margin(a[numeric], b[numeric] ) is used in the

syntax for the GrossProfit derivation.

UDF: In an Output Filter
Add a UDF to the

output filter of a
fact build delivery
to determine
which rows to
deliver to the
database.
To use a UDF in the output filter for delivery, choose Output Filter from the
Filter tab of the delivery, then in the left pane, locate the UDF and add it to the
output filter.
Output filters will be discussed further in Module 21, "Delivery in Depth."

UDF: In a JobStream
Add a UDF to a Condition

or Procedure node of a
JobStream. This
enhances your control
over the logic and flow of
the JobStream.
You can use a UDF in a JobStream to provide more control over the logic and
flow of the JobStream.
You can add a UDF to a Procedure node or Condition node. This topic will be
covered in detail in Module 13, "JobStreams."

Demo 12-3
Use an Internal UDF in a Fact

Build

Demo 12-3: Use an Internal UDF in a Fact Build
Purpose:
Management wants to include data regarding the gross profit
margin of each product sold in the data mart. Therefore, we
must add a new derivation to the DemoSales build that will use
the Margin internal UDF. We then have to execute the build
and view the results.
Task 1. Add a derivation to the DemoSales fact build and

modify the DataStream.
1. In the GO_Catalog, expand the DemoSales fact build, right-click
DataStream, and then click Properties.
3. In the Name box, type GrossMargin, and then click the Calculation tab.
4. In the left pane of the Expression area, double-click Functions, User- Instructional Tips
Defined, and then Margin. You can demonstrate the use of variables
here by adding another derivation to the
Margin( a[number], b[number] ) appears in the right pane. transformation model that includes the
$DS_RUN_TIMESTAMP variable.
5. Delete the text after Margin, and then type (ToNumber(UnitSalePrice,
4, 0), ToNumber(UnitCost, 4, 0)) When you execute the DemoSales fact
build, this variable is set to the value
6. Click OK. specified in the Data timestamp box in the
We have added a DataStream derivation to DemoSales that makes use of Fact Build Properties window. As a result,
the Margin UDF and passes two of the elements of the fact build as the the new column in the fact table will contain
a value indicating when each row of data
appropriate parameters. was added.
7. Click the Input tab, and then in the Maximum input rows to process
box, type 100.
8. Click OK.
We now need to map the DataStream derivation to a transformation
model element.
9. Right-click Transformation Model, and then click Mapping.
10. Double-click the GrossMargin DataStream derivation in the left pane.
DecisionStream creates an attribute named GrossMargin, and maps the
DataStream derivation to it.
12. Expand Transformation Model, right-click GrossMargin, and then
click Move Down.
13. Repeat step 12 as necessary to move the new attribute to the bottom of
the transformation model.

Task 2. Add a fact delivery to the DemoSales fact build.

1. Under the DemoSales fact build, expand Delivery Modules, right-click
Fact Delivery, and then click Insert Relational Table Delivery.
The Table Delivery Properties window opens.
2. In the Name box, type F_DemoSales, and then click the Table
Properties tab.
name box, type F_DemoSales.
4. Click the Module Properties tab, and then in the Refresh Type box,
click TRUNCATE.
Task 3. Execute the DemoSales fact build and view the
results.
1. Click the DemoSales fact build, and then execute the build.
2. Click OK.
The DemosSales fact build executes and delivers 100 rows of data to the
data mart.
4. Open SQLTerm.
The TargetConnect database appears in the Database Objects pane.
6. In the Database Objects pane, expand the TargetConnect database.
The tables of TargetConnect appear.
7. Right-click the F_ DemoSales table, and then click Add table select
statement.
A SELECT statement that retrieves all the data from the F_DemoSales
table appears in the SQL Query pane.


The query returns 100 rows of data. The result appears as shown below.
The values for the GrossMargin derivation are in the last column of the
table and are calculated based on the value of UnitSalePrice and
UnitCost.
9. Close SQLTerm.
10. Save your work and keep DecisionStream open for the next Demo.
Results:
By creating a new derivation and by using an existing internal
function that we created earlier, we have used existing data to
develop results that can help identify the profit margins of
products that we sell.

Summary

identified the significance of user-defined functions
worked with internal UDFs
worked with external UDFs

13
8. Fact Builds
13. JobStreams

Developers

JOBSTREAMS
Objectives

understand where job control fits into the data warehouse
lifecycle
create a JobStream
add, link, and reposition JobStream nodes
execute a JobStream and view the results

Job Control and the Data Warehouse
The basic processes of the data warehouse require

automation for the warehouse to function effectively.
Raw Data Target Data Mart
Automation of:
• Data Extraction
• Data Transformation
• Data Loading
• Exception/Error Handling
• Logging/Notification
Managing a data warehouse requires the coordination of various tasks. After the
data marts are created, performing these tasks can be automated. These tasks are
performed in JobStreams and can be performed in sequence or in parallel.
Several data warehouse management scenarios benefit from a well-constructed

job control process:
• build status notification (for example, send an e-mail if the job fails)
• coordinating fact and shared dimension builds (using a conformed

dimensions approach, as outlined in Module 5, "Conformed
Dimensions")
• data staging and cleansing prior to creating the data mart
• prefixing dimension builds with staging builds
• pre-processing and post-processing SQL (for example, dropping and

creating indexes on target tables)
• different arrival rates of source data
• partitioning tasks to leverage multiple CPUs
• custom application logging and auditing

JOBSTREAMS
What is a JobStream?
A JobStream is a DecisionStream component used to

coordinate data mart management tasks.
The JobStream Visualization pane helps to understand the
JobStream workflow.
A JobStream is a DecisionStream component that is used to coordinate groups of Technical Information

Rundsjob.exe is located in the bin directory
builds, processing instructions, conditions and SQL into an operational process. of the main DecisionStream program files
In other words, you use JobStreams to implement the job control processes directory.
needed to keep your data warehouse or data mart functioning smoothly. These
processes can include any of the scenarios listed on the previous page, such as
automatically extracting, transforming, and loading transactional data or
partitioning data mart tasks to leverage multiple CPUs.
A JobStream is created within the DecisionStream Designer and is run by the

DecisionStream Engine by using the RUNDSJOB command.
All JobStream design work is done in a highly graphical environment called

the JobStream Visualization pane. This pane is similar to the Build
Visualization pane; you can use it to view how the JobStream works. You can
also use this pane to modify the JobStream by adding, removing, and
repositioning its segments (nodes).
DecisionStream\Help\Contents and Index\Index\JobStreams

Characteristics of JobStreams
A JobStream is associated to a single

catalog.
Nodes are processed in series or in parallel.
A JobStream is aware only of objects in its
catalog but can call system commands (for
example, databuild.exe) that can reference
other catalogs.
Catalog-specific JobStreams
contained in separate folder
Nodes within a single JobStream
A JobStream is similar to a build; it can be used only in the catalog in which it was
created. As such, it can use other components within the same catalog, such as
builds and user-defined functions (UDFs). However, a JobStream can call any
system command, including DecisionStream commands such as DATABUILD,
which may reference another catalog.
JobStreams are contained in a separate folder in the left pane of the Designer
interface. When you select a JobStream from this folder, a graphical
representation of the component appears in the Visualization pane.
JobStreams work by processing individual segments called nodes. There is a node

for each type of operation that can be performed. When you select a JobStream,
the Visualization pane is refreshed to display these nodes in what DecisionStream
determines is the most logical order. You can rearrange these nodes at any time
to make the workflow easier to understand.

JOBSTREAMS
Add a JobStream to a Catalog
JobStreams are added to a catalog and are specific to that

catalog.
Enter a name and other

properties for the
JobStream
The first step in implementing a job control process within DecisionStream is Instructional Tips
You can also add a JobStream by right-
adding a JobStream to your catalog, similar to adding any other component. Click clicking the JobStreams folder and clicking
the JobStreams folder, and then from the Insert menu, click JobStream. You then Insert JobStream from the shortcut menu.
enter basic information about the JobStream, such as its name and what logging
and audit information you want to track when it is implemented. You can modify
these properties at any time.
When you add a JobStream to the catalog, a Start node (a green triangle labeled
Start) appears in the JobStream Visualization pane. This node indicates where the
execution process begins.
DecisionStream\Help\Contents and Index\Index\JobStreams:adding

Add Variables to a JobStream
Define variables to use when creating procedure, condition,

and SQL nodes.
The COUNTER JobStream

variable is used in the Counter
Test condition node
A DecisionStream variable is a name and value pair that resides in the memory of Technical Information
the computer. Variables affect the operation of DecisionStream programs, store When you reference a variable in a node,
values for use in builds and procedures, and control the flow of JobStreams. you must precede it with a dollar sign (for
example, $COUNTER).
Within the properties of a JobStream, you can add variables. These variables can,
in turn, be referenced within procedure, condition, and SQL nodes.
Variables can be read and assigned values during the execution of JobStreams.
In the slide example, the COUNTER variable is referenced in the Counter Test
condition node as $COUNTER. If the COUNTER variable is less than 2, the
workflow loops back to the previous node in the JobStream.
DecisionStream\Help\Contents and Index\Index\JobStreams:variables

JOBSTREAMS
Main Components of a JobStream
A fact build node executes a fact build
A dimension build node executes a dimension build
An SQL node contains SQL statements to be executed
A procedure node can contain functions, variables, or

other actions to be performed
A condition node tests conditions for branching
A JobStream node contains one or more other nodes

An alert node creates audit records that can be read by
Cognos NoticeCast
An email node sends notifications to mail systems
When you create a JobStream, you add nodes to represent the execution of Instructional Tips
internal DecisionStream commands, user-defined commands, or operating It may be useful to temporarily exclude a
system commands and programs. node from processing while testing a
Each JobStream include any number of the following nodes. JobStream. You can omit a single node or
all the nodes that follow a specified point.
Node type Description
Build An existing fact or dimension build that you can add to a To do this, open the properties of the node
JobStream. You can include any of the fact or dimension builds and select Exclude this node from
in the current catalog. processing and (if necessary) Exclude
subsequent nodes in this thread.
SQL Contains SQL statement(s) to be implemented.
Procedure Contains one or more DecisionStream functions or variable
references. Operating system commands or other programs may
be called by these functions. Procedure nodes can make use of
the same scripting language that is available for UDFs, variables,
and derivations.
Condition Provides conditional branches between nodes. In other words, it
sets one or more conditions in place that will determine how the
remainder of the JobStream progresses.
JobStream A JobStream within a JobStream, which makes it possible to
break down larger jobs into a series of smaller ones.
Alert An alert node writes messages to the audit table that can be used
as the basis for alerts in Cognos NoticeCast. You must create an
agent in Cognos NoticeCast that will make use of these audit
table records.
Email An email node sends event notifications to mail systems via
SMTP. For example, you can set up emails to provide
notifications when a job has completed or failed. You can also
include attachments with emails.

Build Node
Add a fact or dimension build node to a JobStream to

automate the extract-transform-load process.
Fact build that uses conformed
dimensions
Conformed dimension builds
The cornerstone of the data warehouse lifecycle is the ETL (extract-transform-

load) process. As discussed in Module 4, "Create Basic Builds," fact and
dimension builds are created for this process. Raw data is extracted from
various data sources, transformed according to specific criteria, and loaded into
a staging area or fact and dimension tables.
You can automate these tasks by adding build nodes to a JobStream. Each time
the JobStream is executed and a build node is reached, the build it references is
executed.
In the slide example, we see four dimension builds being executed

simultaneously at the beginning of the JobStream. These are named
ConformedDates, ConformedProduct, ConformedStaff, and
ConformedVendor. When these four dimension build nodes have been
executed, the ConformedMart fact build executes. This adheres to the
conformed dimensions approach outlined in Module 5, "Conformed
Dimensions."

JOBSTREAMS
SQL Node
Use an SQL node to include SQL statement(s) to be executed

during the job control process.
Database referenced by the SQL
SQL statements to create

indexes on a fact table
previously loaded by the
JobStream
An SQL node contains SQL statements that DecisionStream performs at specific

stages in the JobStream. SQL nodes are usually used for database administration
functions, such as creating or dropping indexes or tables.
DecisionStream automatically commits successful SQL statements but rolls back

if the statement fails. An SQL node can contain multiple SQL statements (as in
the slide example), but a semi-colon must separate each statement. You can use
SQL Helper, available by clicking the SQL Helper button within the SQL tab of
the SQL Properties window, to test the statements before implementing them.
In the slide example, the Create Indexes SQL node contains four separate SQL
Instructional Tips
statements. When the JobStream reaches this node, it runs these statements to SQL Helper is used here in the same
create four separate indexes on the SalesFact table. manner as everywhere else. The interface
is covered in Module 2, "Create a Catalog."
DecisionStream\Help\Contents and Index\Index\SQL nodes:adding to JobStream

Procedure Node
Use a procedure node to coordinate processing around

a build.
The Execution Log

procedure writes a
message to the log file
using the LogMsg
function.
Procedure nodes are useful for coordinating processing around builds for such Technical Information
A procedure can include UDFs as well as
activities as checking for input files, sending mail messages and alerts, and built-in DecisionStream functions. These
generating custom logging and auditing messages. UDFs must exist in the same catalog.
A procedure node can contain one or more DecisionStream statements or

functions. You have access to the full range of operators, functions, control
statements and variables available within DecisionStream.
In the slide example, the Execution Log procedure writes a message to the
AutomationJS log file after each successful execution of the Sales fact build.
The commands and control statements that are part of DecisionStream are not
intended to serve as a full-fledged programming language. If you have complex
algorithms to create, you may be better off issuing an operating system command
to call an external program or function that issues a return code.
DecisionStream\Help\Contents and Index\Index\procedure node:adding to JobStream

JOBSTREAMS
Alert Node
Use an alert node to create entries in audit tables in the catalog

that can then be used by Cognos NoticeCast agents.
Enter the topic of the

message
You can use an alert node to send event notifications to Cognos NoticeCast. For
example, you can set up alerts to provide notifications when a JobStream has
completed successfully or failed. For this process to work, you must create an
agent in Cognos NoticeCast that will make use of the audit table entries.
DecisionStream\Help\Contents and Index\Index\alert nodes:adding to JobStream

Email Node
Use an email node to send event notifications to mail systems

using Simple Mail Transfer Protocol (SMTP).
Enter the profile and

password for the
computer you are using
Email nodes are used to send event notifications to mail systems using Simple Technical Information
Email attachments are not supported on
Mail Transfer Protocol (SMTP).
UNIX.
You enter basic information about the email node, such as its name, and then
enter the email profile and password for the computer you are using, details of
the recipient, and the message itself. You can include attachments with an email.
DecisionStream\Help\Contents and Index\Index\email nodes:adding to JobStream

JOBSTREAMS
Specify the Result Variable
All job nodes (except

condition nodes) set the
Boolean variable
RESULT to indicate
success or failure.
Condition nodes check
RESULT by default.
A node may deliver its
result to another
variable if specified.
By default, each type of node (except the condition node) sets the RESULT
Boolean variable to TRUE or FALSE, depending on whether the node
succeeded or failed.
You can specify a different variable to receive the node execution results. If you
specify another variable, you must add it on the Variables tab of the JobStream
Properties window, and you must declare it as a Boolean data type.
Result variables are often tested in condition nodes to control JobStream flow.

Action on Failure
You can specify what DecisionStream does when a node fails.
CONTINUE: Continue processing

the next node
TERMINATE: Terminate the current

flow--continue with the
next flow
ABORT: Stop ALL processing

immediately
In the Properties window for each node, you can specify how you want the
JobStream to respond if the node fails to complete successfully.
If you select Continue, DecisionStream moves on to the next node in the

JobStream and continues processing. Make sure this does not cause problems later
in the JobStream if the failed node was essential to the success of the workflow.
You may want to send an email or a NoticeCast alert indicating the error
message, while still completing as much of the JobStream as possible.
If you select Terminate, DecisionStream stops processing the current flow, starting
with the failed node. However, any remaining flows will still be processed.
If you select Abort, the JobStream stops processing immediately after the node
fails.

JOBSTREAMS
Condition Node
Use a condition node to test a condition before executing

another JobStream node.
If the condition specified in

Counter Test is False, then End
Automation Report executes
A condition node provides a branching mechanism between nodes for Technical Information
conditional execution. Each condition node can have many nodes linking to it A condition can include UDFs as well as
built-in DecisionStream functions. These
but only two output links (True and False). As with a procedure, you have access UDFs must exist in the same catalog.
to the full range of operators, functions, control statements and variables available
within DecisionStream.
In the slide example, the Counter Test condition node is used to check the value
of a variable called COUNTER. The initial value of $COUNTER is set in the
properties of the JobStream. If the value is less than 2, the DemoSales node is
processed. Otherwise, the condition is False, and the Create Indexes SQL node
executes.
The values of 0, F and f are considered equivalent to False. Any other value is
considered equal to True.
DecisionStream\Help\Contents and Index\Index\condition nodes:adding to JobStream

JobStream Node
Use a JobStream node to break large jobs into a series of

smaller jobs.
If the FactBuild JobStream node

is reached, a separate process is
executed that contains its own
logic
A JobStream node lets you nest JobStreams, which supports breaking larger jobs Instructional Tips
into separate groups of tasks. When DecisionStream encounters a JobStream Any node within a JobStream can be
node, it processes all the steps within the node. It moves on to the next node in converted to a JobStream. Right-click the
the sequence only when these steps are completed. node and click Convert Into JobStream
from the shortcut menu.
This nesting process can proceed indefinitely. You can nest JobStreams within
If you CTRL+click multiple nodes, all of
JobStreams within JobStreams, theoretically to the point of infinity. It is best, them will be included in the JobStream
however, to keep this sort of nesting to a minimum to make the JobStream as node.
efficient and easy to understand as possible.
In the slide example, a separate JobStream control process is initiated when the
FactBuild JobStream node is reached in the first JobStream. When this happens,
all the nodes within this JobStream are completely processed.
DecisionStream\Help\Contents and Index\Index\JobStream nodes:adding to JobStream

JOBSTREAMS
Link JobStream Nodes
Each node in a JobStream process must be linked to at least

one other node.
The Sales fact build node will be
completed BEFORE the Execution Log
procedure node (to which it is linked).
A Condition node includes True

and False nodes to indicate what
must be executed if the
condition evaluates to true or
false.
Each JobStream node must be linked to at least one other node; otherwise
DecisionStream will not process it. DecisionStream starts processing at the Start
node and progresses through the JobStream following the links that you created.
When DecisionStream encounters a JobStream node, it runs all the nodes within
it before progressing to the next node.
Each node can have one or more nodes linked to it and can, in turn, link to one
or more nodes. The exception to this is a Condition node, which must link to
two nodes (True and False).
Each node can link to any node within the JobStream, whether it precedes or
follows it.
You can link nodes directly within the JobStream Visualization pane. On the
Predecessors and Successors tabs of the Properties window for each node, you
can specify how that node connects to any other nodes. The Predecessors tab
indicates which nodes precede the current one, whereas the Successors tab shows
which nodes follow it. For a node to be processed, all its predecessors must have
finished executing.
DecisionStream does not support links from a Condition node other than True
and False. If you link from a Condition node, DecisionStream allocates a status of
True to the node that you link to first and a status of False to the second node.
You can change the logic of the condition by right-clicking a link and clicking
Reverse Logic. This will switch the value of a True link to False and a False link
to True.

Execute Nodes Sequentially or In Parallel
These nodes are run in parallel. These nodes are run sequentially.
The nodes of a JobStream can be processed in parallel or sequentially, depending

on what you want the JobStream to do. You may have to complete many steps in
parallel before moving on to the next node; however, this may delay processing
the JobStream. Executing certain nodes sequentially is often essential because in
many cases, you will be testing conditions and altering the workflow depending
on whether these conditions have been met.

JOBSTREAMS
Execute a JobStream
When you execute a JobStream, rundsjob.exe opens a new process in a

command window. This window writes trace information that is saved
automatically in a log file.
If the JobStream encounters a problem with a node, the JobStream may fail to
complete. After you resolve the problem, you can instruct DecisionStream to
restart executing a JobStream, starting with the node that failed. You indicate this
by clicking the Restart Last JobStream box to select it.
When you execute a JobStream, the command that the DecisionStream engine
implements is shown in the Command Line box. You can add additional options
to be included in the command line.
As with a fact or dimension build, you can execute a JobStream entirely from the
command line.
A JobStream can be run in any of the following ways:
• Right-click the JobStream and click Execute from the popup menu.
• From the Actions menu, click Execute.
• Click the Execute button.

Demo 13-1
Create a JobStream

JOBSTREAMS
Demo 13-1: Create a JobStream
Purpose:
Management wants to create a job control process that will
extract, transform, and load the raw data from the Great
Outdoors OLTP system into the data warehouse. To test this
process before fully automating it, we will create a JobStream
that will run the Sales fact build twice.
Task 1. Create a JobStream.

1. In the GO_Catalog, right-click the JobStreams folder ,
and then click Insert JobStream.
The JobStream Properties window opens.
2. In the Name box, type AutomationJS.
Task 2. Add two variables to the JobStream.
1. Click the Variables tab, and then click Add.
The Variable window opens.
2. In the Name box, type COUNTER.
3. In the Type box, click INTEGER.
4. In the right pane of the Initial expression area, type 0, and then click OK.
This variable will track the number of times that the Sales fact build
has run.
5. Add another variable called MSG, keep the type as CHAR, and then in
the Precision box, type 20.
6. Click OK.
This variable will be used to hold messages that will be written to the
AutomationJS log file.
7. Click OK to close the JobStream Properties window.
We have to add nodes to the AutomationJS JobStream to provide the

required functionality.

Task 3. Add an SQL node to drop any existing indexes on

the F_DemoSales fact table.
1. Right-click the AutomationJS JobStream, point to Insert Node, and
then click SQL Node.
The SQL Node Properties window opens. Instructional Tips
You can also add a node to a JobStream
2. In the Business name box, type Drop Indexes, and then click the SQL by right-clicking the Visualization pane and
tab. choosing the type of node you want (for
example, an SQL node). The arrow
3. In the Database box, click TargetConnect. becomes a cross hair, at which point you
can drop the selected node anywhere you
wish.
DROP INDEX NewIndex1 ON F_DemoSales;
This SQL code will drop the specified indexes that exist on these three
columns of the F_DemoSales table.
5. Click the Details tab, and then in the Action on failure box, click
CONTINUE.
We do not want the AutomationJS JobStream to terminate if this node
does not encounter any indexes on the F_DemoSales table.
6. Click OK to close the SQL Node Properties window.
Task 4. Add a fact build node.
then click Fact Build Node.
The Fact Build Node Properties window opens.
2. In the Associated Build box, click DemoSales, and then click OK.
When this node is reached in the JobStream, the DemoSales fact build
will be run.
Task 5. Add a procedure node to log the number of times
the DemoSales build has run.
then click Procedure Node.
The Procedure Node Properties window opens.
2. In the Business name box, type Execution Log, and then click the
Action tab.

JOBSTREAMS
3. In the Action area, in the right pane, type the following code: Instructional Tips
You can select the MSG and
$MSG := Concat('Build execution # ', ($COUNTER+1)); COUNTER user-defined variables
from the tree structure in the left
LogMsg($MSG);
pane.
This node will write a message to the AutomationJS log file after each
successful execution of the Sales fact build.
4. Click OK to close the Procedure Node Properties window.
Task 6. Add a procedure node to increment the counter
variable.
2. In the Business name box, type Increment Counter, and then click the
Action tab.
3. In the Action box, type $COUNTER := $COUNTER + 1.
By incrementing the COUNTER variable by 1 after each execution, this
node will track the number of times that the Sales fact build has
completed.
Task 7. Add a condition node to the AutomationJS
JobStream.
then click Condition Node.
The Condition Node Properties window opens.
2. In the Business name box, type Counter Test, and then click the Action
tab.
3. In the right pane of the Action area, type $COUNTER < 2.
This node will test whether the DemoSales fact build has run less than
two times. If it has, this condition will be True. Once the DemoSales fact
build has run twice, this condition will be False.
4. Click OK to close the Condition Node Properties window.
Task 8. Add an SQL node to create four indexes on the
F_DemoSales fact table.
then click SQL Node.
The SQL Node Properties window opens.
2. In the Business name box, type Create Indexes, and then click the SQL
tab.
3. In the Database box, click TargetConnect.


CREATE INDEX NewIndex1 ON F_DemoSales (VendorSiteCode);
CREATE INDEX NewIndex2 ON F_DemoSales (ProductNumber);
CREATE INDEX NewIndex3 ON F_DemoSales (DateOrder);
5. Click OK to close the SQL Node Properties window.
When this node is reached in the JobStream, it will create four indexes on
the F_DemoSales fact table (one for each dimension element column).
Task 9. Add a procedure node to indicate that the
DemoSales fact build has run twice.
2. In the Business name box, type End Automation Report, and then
click the Action tab.
3. In the right pane of the Action area, type the following code:
LogMsg(Concat('Extract Completed at ', DateTimeNow()))
This node will write a message to the AutomationJS log file after the
DemoSales fact build runs twice, including the date and time of the last
implementation.
5. Save your work and keep DecisionStream open for the upcoming demo.
Results:
We have created and added nodes to a JobStream that will
extract, transform and load the raw data from the Great
Outdoors OLTP system into the data warehouse.

JOBSTREAMS
Demo 13-2
Link and Run Nodes in a JobStream

Demo 13-2: Link and Run Nodes in a JobStream
Purpose:
To complete the AutomationJS JobStream, we must link all of
its nodes so that they are run in the correct sequence. After we
link the nodes, we will run the JobStream and view its log file.
Task 1. Link the nodes in the AutomationJS JobStream.

1. Ensure that GO_Catalog is open, and then in the JobStream toolbar,
click Insert link .
2. In the JobStream Visualization pane, click the Start node, and then click
and drag the resulting arrow to the Drop Indexes node.
3. Repeat step 2 to link the Drop Indexes node to the DemoSales node.
4. Repeat step 2 to link the DemoSales node to the Execution Log node.
5. Repeat step 2 to link the Execution Log node to the Increment
Counter node.
6. Repeat step 2 to link the Increment Counter node to the Counter Test
node.
7. Repeat step 2 to link the Counter Test node to the DemoSales node.
This link automatically takes a value of True.
8. Repeat step 7 to link the Counter Test node to the Create Indexes
node. Instructional Tips
This link automatically takes a value of False. Regardless of how you arrange the nodes
in the JobStream, DecisionStream will
9. Repeat step 2 to link the Create Indexes node to the End Automation automatically rearrange them when you
Report node. click the Refresh button.
The link automatically takes a value of False.
10. On the toolbar, click the Refresh button.

JOBSTREAMS
Task 2. Modify the properties of the DemoSales fact

delivery.
1. In the DecisionStream tree, expand the DemoSales fact build, and then
expand the Delivery Modules and Fact Delivery folders if necessary.
2. Right-click F_DemoSales (Relational Table), and then click
Properties.
3. Click the Table Properties tab, and then ensure that the Index check
box is deselected.
We will create separate indexes, instead of a composite index, on each of
the fields with the JobStream.
Task 3. Run the AutomationJS JobStream and view the
log file.
1. Right-click the AutomationJS JobStream, and then click Execute.
2. Click OK.
Instructional Tips
The Execute JobStream dialog box appears. Students will receive an error when the first
SQL node (Drop Indexes) runs because
3. Click the Override Job/JobStream settings check box to select it, and none of the specified indexes exist on the
then click the Detail, SQL, Variable and User check boxes to select F_DemoSales table. However, subsequent
them. executions of the node will not produce any
errors because the indexes will exist by this
4. Ensure that the Progress check box is selected, and then click OK. time.
The AutomationJS JobStream runs in a separate DOS window. The log file may have a different number if
5. When the JobStream has finished executing, press Enter to close the a student has run the JobStream more than
DOS window. once. However, Job_AutomationJS will still
preface it.
6. From the Tools menu, click Browse Log Files.
The log window opens.
7. Double-click Job_AutomationJS_0001.log.
The log file for AutomationJS opens in Notepad. We can see detailed
information about the execution of the JobStream, including the values of
the COUNTER and MSG variables. Also, we can see the date and time at
which the DemoSales fact build finished executing for the second time.
8. Close Job_AutomationJS_0001.log, and then close the Log window.
Results:
We have linked the nodes in the AutomationJS JobStream. We
then ran this JobStream and viewed its log file to evaluate its
progress.

JobStreams and Warehouse Deployment
JobStreams can be scheduled like any other program, which

supports a remote and unsupervised approach to the process.
Consider how to manage the logs and persistent information:
Who needs it?
How long should it be kept?
What reporting framework should be put around it?
"The ultimate warehouse operation would run the regular load processes in a
lights-out manner, that is, completely unattended. While this is a difficult Technical Information
A JobStream can be used only within a
outcome to obtain, it is possible to get close." (Kimball et al, 1998). catalog. If you create it using the
DecisionStream language, you must export
Ultimately, the purpose of a JobStream is to automate the basic tasks of managing it to a catalog before you can execute it.
a data warehouse. If all goes well, the JobStream is able to extract the raw data After it is created, the JobStream can be
into the transformation process and then load it into the fact and dimension run as a batch file on a Windows or UNIX
tables of the target data mart. Ideally, this is scheduled to occur after business operating system using rundsjob.exe. This
will be, essentially, the backbone of the
hours so that the most up-to-date data is available on a daily basis. data warehouse automation process. See
Module 22, "The Command Line Interface,"
It is also important to consider how to manage the JobStream logs and other for a discussion of this topic.
persistent information regarding the success or failure of the job control process.
Who in the organization needs the information? How long will it be kept? What
reporting framework can be used to communicate this information? Answers to
these questions are necessary to continually refine and enhance the job control
process.

JOBSTREAMS
Summary

understood where job control fits into the data warehouse
lifecycle
created a JobStream
added, linked, and repositioned JobStream nodes
executed a JobStream and viewed the results


JOBSTREAMS
Workshop 13-1
Create a JobStream to Deliver a

Conformed Dimension

Workshop 13-1: Create a JobStream to Deliver a

Conformed Dimension
The DS GO_Catalog contains several conformed dimension builds that can be
related to multiple fact tables. Management wants to automate the development of
such a schema by creating a JobStream that will first deliver the conformed
Product dimension and then the related fact data. You must also log messages that
indicate whether the conformed dimension was successfully delivered and at what
time it was completed. If the Product dimension build was not successful, you will
want to abort the process and indicate to the user that it failed.
To do this, you must:
• create a new fact build called JobStreamBuild
• add a new JobStream called ConformedDimension to the

DecisionStream tree
• add dimension and fact build nodes to represent the execution of the
Product conformed dimension build and the JobStreamBuild fact build
• add a condition node to determine the successful execution of the

Product conformed dimension BEFORE executing the JobStreamBuild
fact build
• add an Abort procedure to exit the JobStream and log a message if the
Product conformed dimension failed to run
• add a Success procedure to indicate if and when the Product conformed

dimension successfully ran
• add a Finish procedure to indicate when the fact data was delivered if the
Product dimension build ran successfully
• link the JobStream nodes into a coherent process
• execute the JobStream
page.
Table.

JOBSTREAMS

1. Create the JobStreamBuild • Copy the DemoSales fact
fact build. build.
2. Create the
ConformedDimension
JobStream.
3. Add nodes for the DecisionStream\Help\ • Leave the name of each of
Product dimension build Contents and the builds as the name of
and the JobStream fact Index\dimension build nodes the node.
build.
DecisionStream\Help\
Contents and Index\fact
builds:adding to JobStreams
4. Add a condition node to DecisionStream\Help\ • Return the value of the
determine the successful Contents and Index\condition RESULT variable.
delivery of the Product nodes:adding to JobStreams
conformed dimension.
5. Add an Abort procedure DecisionStream\Help\ • Use the LogMsg(),
to exit the JobStream if Contents and DateTimeNow(), and
and when the Product Index\procedure node:adding Concat() functions.
dimension build failed. to JobStreams
6. Add a Success procedure • Use the LogMsg(),
to indicate that the DateTimeNow(), and
Product dimension build Concat() functions.
succeeded.
7. Add a Finish procedure to • Use the LogMsg(),
indicate when the fact data DateTimeNow(), and
was delivered. Concat() functions.
8. Link the JobStream nodes JobStreams toolbar, Create a • If necessary, right-click the
into a coherent process. link between two nodes links to the condition
button node and click Reverse
Logic.
9. Run the JobStream. • Modify the logging to
include detail, user and
variable messages.

instructions for this workshop in Appendix A of this course.


After all the nodes of the JobStream are linked (see Task 8), the JobStream
Visualization Pane appears as shown below.
After the ConformedDimension JobStream finishes running (see Task 9), the
result appears as shown below.

14
8. Fact Builds
13. JobStreams

Developers

FACTS AND HISTORY PRESERVATION
Objectives
In this module we will:

review Type 2 slowly changing dimensions
discuss late arriving facts
create a lookup for late arriving facts
implement late arriving facts in a fact build

SCD - Type 2: Review

assigned and a new row is added to the dimension table.
Usually, an effective begin/end date is also updated
on the new and old rows.
Multiple rows may have the same business key,
but they will always have unique surrogate keys.

Sales Fact Table
RepSur
RepSurOrder
OrderDate Cust
Cust ……
Key
Key Date Key
Key Key Key Date
11111
11111 01/01/1999 12345
1/1/1999 12345
11111
11111 02/01/1999 12345
2/1/1999 12345
NYC 9901
11112
11111 03/01/1999 12345 11112 00128 Mary Smith NYC 9903
11112
11111 04/01/1999 12345
As we have seen in a previous module, using surrogates to track slowly changing

dimensions, you can create dimension history. When a change is detected in the
source data that must be tracked in the data mart, a new surrogate key is assigned
and a new row is added to the dimension table.
Effective dates are required to build the dimension history. In the existing current
row, an effective end date is inserted. In the new row, an effective begin date is
inserted together with a null effective end date.
The dimension table contains a separate row for each change that has been
tracked. As a result, there may be multiple rows that contain the same business
key, but each row will have a unique surrogate key.

Build slide.
Four clicks to complete.
What Are Late Arriving Facts?
The dimension table may contain multiple instances of the

same business key.
Transactions may arrive after a dimension has changed and a
new dimension record exists.
The process must assign the correct surrogate key.
Sales Fact Table
RepSurKey OrderDate CustKey
11111 01/01/2002 12345
11111 02/01/2002 12345
11112 03/01/2002 12345
11112 04/01/2002 12345
11111
11112 02/12/2002
02/12/2002 12345
12345
RepSurKey RepKey Name Office EffDate CurrInd
Based on the EffDate column, 11111 00128 Mary Smith Dallas 200201 N
DecisionStream will assign the most
recent surrogate key value (in this case,
11112 00128 Mary Smith NYC 200203 Y
11112). We want to assign the previous
surrogate key value (11111).
In the slide example, the Sales Rep dimension table contains two instances of
business key 00128, with a unique surrogate key for each instance.
The sales fact record dated February 12, 2002 is processed after April 1, 2002. By
default, if you do not enable the late arriving fact processing option, the value of
the surrogate key will be 11112, which is the current record. However, if you do
enable the late arriving fact processing option, the value of the surrogate key will
be 11111.
One case in which late arriving facts are likely to occur is credit card transactions.
These transactions (for example, the payment of a bill) may occur after the close
of a month. Each credit card transaction must be correctly associated with the
records in the related dimension tables (for example, those tables containing date
and customer data).

When Can Late Arriving Facts Occur?
Fact data is classified as being late arriving when:

you perform an initial load which includes historical fact
data
fact data is processed after the transaction date to which
it relates
multiple changes occur in the reference data between
each fact data load
Keep in mind that all facts are initially considered late arriving
facts, because they are typically processed after dimension
data has been loaded into the data mart.
There are three types of late arriving facts:
• Initial load late facts

When you create a data warehouse, the initial load may include historical
dimension data. Where dimension history is loaded, fact history should
also be loaded.
• True late facts

A data load may include transactions that are being processed after the
transaction date to which they relate. This may occur when a process has
failed to transfer data or a transaction is entered after the fact.
• Loading late facts

Late facts can result from multiple Type 2 changes occurring in a
dimension between each fact data load. Without late arriving fact
processing, outstanding facts default to using the current dimension
member, which may be incorrect.
This feature is useful for handling issues such as currency conversion.

Prerequisites For Late Arriving Facts
The prerequisites for late arriving fact processing are:

the fact build must not merge data
the fact build must not include any dimension deliveries
for the dimension
the dimension must use a lookup
the lookup must use a template for data access
the lookup must include an effective start date attribute
that is defined in the template
Before you can enable late arriving fact processing for a dimension element, the Instructional Tips
If you need to merge data or include a
conditions stated must be met. dimension delivery, you can use another
job to merge the data in the staging area or
to deliver the dimension.

Implement Late Arriving Facts In A Fact Build
To implement late arriving fact processing in a fact build:
• Enable late arriving facts for the appropriate dimension element on the
Late Arriving Facts tab.
• Select a transaction date element that corresponds to the required

timestamp value in the fact records.
During processing, this date will be matched to the dimension record for
validation.
Instructional Tips
• Specify the action to perform when validating the transaction date.
The transaction date element cannot be a
If the transaction date is null, you can choose to reject the transaction derivation element from the transformation
model.
record, or use the current dimension record.
If the transaction date falls outside the range of dates set up for the
dimension record, you can choose to reject the transaction record, use
the current dimension record, or use the dimension record that has the
closest date to the transaction date.
DecisionStream\Help\Contents and Index\Index\late arriving facts

• Specify what date range limits to impose on processing late arriving facts:
• "All available" caches the entire dimension so that all records are
processed.
• "Current record only" caches the current dimension record so that

only current records are accepted.
• "Variable" uses a variable (specified by you) from which to obtain the

date range limit for caching dimension records. All records that occur
after the specified date in the variable are processed.
• "Date (yyyy-mm-dd hh:mm:ss)" is similar to variable. Only those

records that occur after the specified date are processed.
Specifying the date range limits for late arriving facts is important if you want to
consider only a specific range of dates for checking late arriving dimension data.
For example, specifying a range ensures that a late arriving fact can only occur up
to three months ago. Any facts older than three months should not be considered
for late arriving fact processing.
A combination of rejecting records out of range and setting a variable or specific

date will allow you to reject records that are too old to process, for example,
records from the previous year.

Demo 14-1
Create a Lookup and a Fact Build

to Process Late Arriving Facts

Demo 14-1: Create a Lookup and a Fact Build to

Process Late Arriving Facts
Purpose:
Some of our incoming product sales data may have either a
null value for the order date or a value that falls outside of an
acceptable range. We will create a lookup in the Product
dimension that references existing values in the
D_ProductHistory dimension table. This lookup will determine
the permissible range of values for each order date.
We will then create a fact build with late arrival fact processing
enabled. In doing so, the dates for any late arriving product
sales or product sales with a null value for their order date will
be replaced with a more appropriate date determined by the
lookup.
We will then execute the fact build and view the results in
SQLTerm.
Task 1. Add the ProductHistoryL lookup to the ProductD

dimension.
1. In the GO_Catalog, expand the Dimensions folder, right-click the
ProductD dimension, and then click Insert Lookup.
The Lookup Properties window opens.
2. In the Name box, type ProductHistoryL, and then click the Attributes
tab.
3. In the Template box, click D_ProductHistory, and then double-click
the skey, ProductID, and EffectiveBeginDate attributes to add them
to the Chosen attributes pane.
4. In the ProductID row, click the Id box.

5. Click the Data Access tab, and then click Use Template for data
access to select it.
6. In the Connection box, click TargetConnect, and then click Browse.
7. Click D_ProductHistory, and then click OK.
8. Click OK to close the Lookup Properties window.
Task 2. Create the NorthwindOrders fact build.
1. On the toolbar, click the Fact Build wizard button.
2. In the Enter the name of the build box, type NorthwindOrders.
3. In the Select the connection into which the build is to deliver data, click
TargetConnect.
4. Click Perform a full Refresh on the Target Data to select it.
5. Click Next, click Data source, and then click Add.
The Data Source wizard opens.
click BIAS_Northwind, and then click Next.
7. In the right pane, type the following SQL code:
SELECT b.ÒrderID`,
b.`CustomerID`,
b.ÈmployeeID`,
b.ÒrderDate`,
b.`RequiredDate`,
b.`ShippedDate`,
b.`ShipVia`,
a.`ProductID`,
a.ÙnitPrice`,
a.`Quantity`,
a.`Discount`
FROM Òrders` b, Òrder Details` a
WHERE a.ÒrderID` = b.ÒrderID`
8. Click Finish to close the Data Source wizard.
9. Click Next to accept the DataStream, and then click the ShipVia
measure.

10. Click the Change Type button, click To Attribute, and then click Next.
11. Click Next to accept the properties of the dimensions, click Next to
accept the default fact delivery, and then click Next to accept the default
table and column names.
12. Click Next to accept the summary of the fact build, clear the Deliver
Dimensions check box, and then click Next.
13. Clear the Deliver Metadata check box, click Next, and then click
Finish.
The NorthwindOrders fact build is added to the Builds folder.
Task 3. Modify the properties of the ProductID dimension
element to enable late arriving facts processing.
1. Expand the NorthwindOrders fact build, expand Transformation
Model, and then double-click ProductID.
2. Click the Reference tab, in the Dimension box, click ProductD, and
then in the Structure box, click ProductHistoryL.
3. Click the Late Arriving Facts tab, and then click the Enable late
arriving fact processing check box to select it.
4. In the Transaction date element box, click OrderDate.
5. In the Transaction date value actions area, beside the When NULL area,
click Use current reference member to select it.
Selecting this option specifies that if an incoming value for the
OrderDate column is null, DecisionStream should use the current record
in the ProductHistoryL lookup.

6. Beside the When out of range section, click the Use closest reference
member button.
Selecting this option specifies that if an incoming value for the
OrderDate column is outside the effective date range specified in the
ProductHistoryL lookup, DecisionStream should use the closest
matching record in the lookup.
7. Click OK to close the Dimension Properties window, and then save the
catalog.
Task 4. Execute the NorthwindOrders fact build and view
the results in SQLTerm.
1. Right-click the NorthwindOrders fact build, and then click Execute.
2. Click the Override build settings check box to select it, and then click
Detail, SQL, and ExecutedSQL.
3. Ensure that the Progress check box is selected, and then click OK.
A command window opens and a log file is created, tracking the progress
of the fact build execution. Notice that 2155 rows are inserted into the
F_NorthwindOrders fact table.
4. When the build has finished executing, press Enter to close the
command window.
5. Open SQLTerm.
SQLTerm opens.
6. In the Database for SQL Operations box, click TargetConnect, and
then in the Database Objects pane, expand TargetConnect.

7. Right-click F_NorthwindOrders, and then click Add table select

statement.
A SELECT statement is added to the Source SQL tab on the left side.
8. At the bottom of the SELECT statement, type the following:
WHERE ProductID = 1
ORDER BY OrderDate
The query runs and returns 22 rows of data. The result appears as shown
below:
Each incoming value of OrderDate was evaluated against the range of

values in the ProductHistoryL lookup. If any of the incoming values fell
outside the range in the ProductHistoryL lookup, they were replaced
with the closest matching date in the lookup. Any incoming null values
for OrderDate were replaced with the most current date in the lookup.
10. Close SQLTerm and leave DecisionStream open for the next demo.
Results:
We created a lookup in the Product dimension that referenced
existing values in the D_ProductHistory dimension table. This
lookup determined the permissible range of values for each
order date. We also created a fact build with late arrival fact
processing enabled. We then executed the fact build and
viewed the results in SQLTerm.

Summary
In this module we have:

reviewed Type 2 slowly changing dimensions
discussed late arriving facts
created a lookup for late arriving facts
implemented late arriving facts in a fact build

15
15. Aggregation
16. Pivoting
Developers

AGGREGATION
Objectives

use aggregation functions
discuss aggregation exceptions
look at considerations when using aggregate tables

10100101
Hierarchical data is loaded into memory
and is referenced multiple times in the
fact build process.
Product
DecisionStream uses the levels of the
hierarchy to merge the data, to validate
Data incoming data, to aggregate the data,
Source and to partition and filter the data.
Data Pivot Merge Lookup Aggregation Partition and

Stream Validation Filter
When you run a fact build, DecisionStream:
• checks that the columns in the target table are the same as those being
delivered
• validates the build specification
• finds the associated hierarchies in the dimensional framework
• loads the hierarchy data into memory (using the hierarchy definitions)
• acquires the fact data (and performs any merging necessary)
• performs referential integrity checks by using the hierarchy data
• transforms (calculates and aggregates) the fact data
• delivers the data
For processing fact data, DecisionStream acquires and merges the data as
specified, aggregates it, filters and partitions the result, and then delivers the data
to the appropriate tables.
DecisionStream launches the fact build (by using DataBuild.exe) as a new process
that runs independently of the designer. All the relevant progress information is
written to a log file.

AGGREGATION
What is Aggregation?
Aggregation is the process of rolling up numbers to higher

levels of detail.
In the PRODUCTdimension
ALL
33
Class
C1 C2 C3 Sum
13 16 4
P1 P2 P3 P4 P5 P6
5 8 9 7 3 1
Aggregation is the process of reading data across one level in a reference structure
and summarizing it. In DecisionStream you can perform aggregation on a
measure or a derivation element.
You can:
• aggregate data to any level of any dimension
• aggregate data simultaneously over a number of levels (multi-level

aggregation), a number of dimensions (multi-dimensional aggregation), or
both
• exclude detail data from the output to provide compact summary data
collections
• include or exclude individual levels
For example, you can include every conceivable combination of summary data of
in-depth business analysis, or just a high-level summary for management
reporting.
Note: You cannot perform aggregation for a lookup because it has only one
level.

Understand the Aggregation Process
You may want to roll up your low-level input data to a higher

level of detail. This process is known as aggregation.
The levels in the hierarchies enable aggregation.
Aggregation functions determine the type of rollup that is
performed.
Deliver only aggregates Deliver detail and

(no detail on output) aggregate on output
Aggregation can optimize query performance by summarizing data into a table.

Querying against a summary table means fewer records to query against when
reporting as opposed to querying against a table that has a lot of detail
information.
On the left side of the slide example, we are only delivering aggregated fact data
to the data mart: total sales of all products and total sales of each product line. On
the right side, we are delivering input data arriving at the lowest level (sales for
each product). We are also delivering summarized sales data for all products, each
product line, and each product type.
DecisionStream\Help\Contents and Index\Index\aggregating data

AGGREGATION
Use Aggregation Methods
Aggregation can be performed on measure or derivation

elements in a fact build.
By default, measures and derivations are aggregated
identically across all dimensions.
DecisionStream has many available methods to calculate aggregates, such as Instructional Tips
average and standard deviation. To specify aggregation, select the measure or Aggregation exceptions are covered later in
derivation you want to aggregate, and then select the type of aggregation. this module.
Your situation Aggregation Method
You want to calculate a summary data value by SUM

adding the child values together.
You want to assign the maximum child value to the MAX
summary data value.
You want to assign the minimum child value to the MIN
summary data value.
You want to assign the count of the number of child COUNT
data rows to the summary data value.
You want to calculate a summary data value by AVG
taking the average of the child values.
You want to calculate a summary data value by VAR
taking the variance of the child values.
You want to calculate a summary data value by STDDEV
taking the standard deviation of the child values.
DecisionStream\Help\Contents and Index\Index\aggregating data:specifying method

Non-mathematical Aggregation Methods
DecisionStream includes other non-mathematical aggregation

methods.
Fiscal Value
Q1 <NULL> FIRST
Q2 10 FIRST NON-NULL
Q3 20 LAST NON-NULL
Q4 <NULL> LAST
The following table outlines other methods for aggregating fact data.
Your situation Aggregation Method
You want to use the first value that occurs (the first FIRST
value is always the correct one).
You want to use the first non-null value that occurs (the FIRST NON-NULL
first value is always correct, provided it is present).
You want to use the last non-null value that occurs. The LAST NON-NULL
last value represents the last update, but a null value is
never an improvement on a previous real value.
You want to use the last value that occurs (the last value LAST
is always best).
You want to add one as a summary value to indicate ANY

whether there are any children.
You do not want to perform any calculation on the NONE

measure or derivation. Instead, you want to assign a
NULL value to the summary data value.

AGGREGATION
Understand Aggregation Exceptions
Some measures, such as opening and closing balances,

cannot be summed across all dimensions (usually Time).
To override aggregation for a dimension, click the Exception
box to select it.
Inventory Levels
In the TIME dimension
Year
1
Quarter
First Q1 Q2 Q3 Q4
1 4 3 6
J F M A M J J A SO N D
1 3 2 4 1 5 3 2 8 6 7 3
Most measures can be aggregated identically across all dimensions, whereas other Key Information
measures cannot. This is an issue when summing (adding) measures across time. Measures that have an aggregation
exception are what Ralph Kimball refers to
Opening and closing balances (for example, inventory levels and bank balances) as "semi-additive measures."
should not be summed over time. However, it is reasonable to sum them across
other dimensions (for example, Product or Customer).
In these cases, specify the Time dimension as an exception for aggregation and
choose whether you want to use the first or last value of each time period.
Inventory levels can be aggregated over the levels of the Product dimension to
produce a total inventory of products at one point in time.
However, summing inventory levels across the Time dimension would not make
sense. Instead, as shown in the slide example, an aggregation exception has been
specified for this dimension. Instead of summing the number of products in
inventory over the entire year, we want to use the first value that we encounter at
each level above the lowest one. We also specified an exception indicating that
the last date we want to reference is Aug. 22, 2003.
Note: The exception dimension element must have its domain type set to
Reference, not Dynamic. This is important when dealing with future
values, such as Forecast.
DecisionStream\Help\Contents and Index\Index\aggregating data:exceptions

Enable Aggregation
Aggregation must be enabled for at least one dimensional

element for aggregation to occur.
To enable aggregation, click the Aggregate box on the Reference tab of the Instructional Tips
If you input data at two different levels and
Dimension Properties window. Then select the level(s) of aggregation that you also output at those levels, you would not
require by clicking the box(es) adjacent to the level(s) in the Output column. want to aggregate the data.
The relevant hierarchy icon in the Visualization pane indicates that a dimension
element has been set to aggregate. In the slide example, we specified that we want Key Information
To access the aggregation option,
to aggregate the Product dimension element at the ProductType level. The right-click the dimension element,click
ProductH icon in the Visualization pane has arrows to indicate that the source Properties, and then click the Reference
data is being rolled up to the ProductType level, in addition to being retrieved at tab.
the lowest level, Products.

AGGREGATION
Aggregate Rows of Fact Data
Aggregation creates additional rows.
D D D D
Cust Date Qty Amt CrL Cust Date Qty Amt CrL
1 199901 1 100 200 1 199901 1 100 200
2 199901 5 500 200 2 199901 5 500 200
1 199902 1 100 300 1 199902 1 100 300
2 199902 5 500 300 2 199902 5 500 300
1 199903 1 100 400 1 199903 1 100 400
2 199903 5 500 400 2 199903 5 500 400
3 199903 4 400 400 3 199903 4 800 400
sum sum last 1 19991 3 300 400
2 19991 15 1500 400
The last three rows of data are rolled 3 19991 4 400 400
up numbers from the first quarter.
Aggregating input data in the fact build creates additional rows. This can increase
processing time.
In the slide example, the data for each customer is aggregated across the Quarter
level. All the data values for each customer are rolled up to create the summary
data. The resulting three aggregated rows are then added to the existing rows in
the DataStream.
DecisionStream\Help\Contents and Index\Index\DataStreams

Factors to Consider when Aggregating Data
PowerPlay can affect the decision to create aggregate tables:

Cubes may build faster from an aggregate table.
PowerPlay can replace the need for many aggregate tables.
Other issues may have to be considered:
Does the data change often? You may have to re-create the
entire aggregate table.
Does performance of a specific query require an aggregate
table?
In general, avoid creating aggregate tables early in the development process. It is a

very common development mistake to overuse aggregate tables.
Also, avoid creating an aggregate table until you know there is a process that
requires it.

AGGREGATION
Avoid Aggregate Table Explosion
Fully aggregating a fact build with many dimension and level

combinations can result in massive numbers of aggregate
tables.
For example, you may have a build with three dimensions, each
of which contain five levels. If you fully aggregate all levels on
all dimensions, you will create 53 (125) aggregate combinations.
To use all of these combinations, you will have to create 125
aggregate tables.
Aggregating several dimensions over several levels can result in massive numbers
of aggregate tables. Every combination of every level in every dimension is a
potential aggregate table.
The slide example refers to a hypothetical fact build that references three
dimensions with five levels each. If seven such dimensions were used, 78,125
tables would be required to support every combination.
When deciding whether to create additional aggregate tables, keep in mind the
number of tables that may potentially be created. Create aggregates only when
there is a specific need for them.

Demo 15-1
Aggregate Data

AGGREGATION
Demo 15-1: Aggregate Data
Purpose:
We want to report on our sales data at higher levels of detail.
We will use aggregation in our fact build to deliver rolled up
data to our fact table.
Task 1. Set the input rows and view the current data.
1. In the GO_Catalog, expand the DemoSales:1 fact build, right-click
DataStream, and then click Properties.
3. In the Maximum input rows to process box, type 15, and then click OK.
5. Click the Input tab, in the Write any rejected records to box, type
reject.txt, and then click OK to close the Fact Build Properties window.
Execute button.
7. Click OK. Instructional Tips
You may also want to set the merge
Thirteen rows are delivered. duplicate rows option, if it has not been set
8. Press Enter to close the DOS window when the build is finished already.
executing.
9. Run SQLTerm.
11. Under Database Objects, expand TargetConnect, right-click F_Merge,
and then click Add table select statement.
12. Execute the SQL.
The query returns 13 rows of data. Now we will add aggregation to the
build to create summary data for later analysis.
13. Close SQLTerm.

Task 2. Specify output levels.

1. Under the DemoSales:1 fact build, expand Transformation Model.
2. Right-click the ProductNumber dimension element, and then click
Properties.
3. Click the Reference tab.
4. In the AllProducts row, click the Output box to select it.
5. Click Aggregate to select it.
6. Click OK, and then save your work.

Execute button.
Notice that 13 rows are delivered along with an additional ten summary
rows, for a total of 23 rows delivered.
8. Press Enter to close the DOS window when the build is finished
executing.
9. Run SQLTerm.
F_Merge, and then click Add table select statement.

AGGREGATION
12. Execute the SQL. Technical Information

The value for ProductNumber is 0 because
This time, 23 rows are delivered: the original 13 rows of data and 10 rows this is the Id value of the ALL level static
of additional summary data. The records that were aggregated came from member.
the same ProductLine and ProductType.
13. Close SQLTerm.

Task 3. Average the UnitCost and UnitPrice.
1. Under Transformation Model for DemoSales:1, right-click UnitCost,
2. Click the Aggregation tab.
3. Click the Calculate box, and then click AVG.
4. Click OK.
5. Repeat steps 1 to 4 for UnitPrice and UnitSalePrice.
6. In the Delivery Modules folder, expand Fact Delivery, right-click
F_Merge (Relational Table), and then click Properties.
7. Click the Module Properties tab, and ensure that TRUNCATE is Instructional Tips
selected in the Refresh Type box. If the students do not specify the
TRUNCATE option, they may see more
If we do not truncate the data, the number of records will continue to rows than expected.
increase instead of remaining the same.
9. Execute the DemoSales:1 fact build.
If prompted to save your changes, click OK.
10. Press Enter to close the DOS window when the build is finished
executing.
11. Run SQLTerm.


13. Under Database Objects, expand TargetConnect, right-click F_Merge,
and then click Add table select statement.
14. Execute the SQL.
Notice how the aggregated records (those with a ProductNumber of 0,
representing all products) show the average value for UnitCost and
UnitPrice, instead of a sum of the original data.
For example, there are two records relating to VendorSiteCode 21. The
first has a UnitCost value of 4.39, and the second has a UnitCost value of
9.64. The average of these two values is 7.015, which is the first
aggregated value for UnitCost shown in the screen capture.
15. Close SQLTerm, and then leave DecisionStream open for the next
demo.
Results:
We used aggregation in our fact build to deliver rolled up data
to our fact table.

AGGREGATION
Summary

used aggregation functions
discussed aggregation exceptions
looked at considerations when using aggregate tables


16
15. Aggregation
16. Pivoting
Developers

PIVOTING
Objectives

discuss pivoting
use the single pivot technique
use the multi-pivot technique

What is Pivoting?
Pivoting is a technique that moves a table from the horizontal

to the vertical axis.
Product Region Sales

X EASTERN 10
X WESTERN 20
Y EASTERN 50
Y WESTERN 30
Product Eastern Sales Western Sales

X 10 20
Y 50 30
Pivoting makes it possible for columns of tables to be transformed to create new

rows. Data source pivoting involves treating multiple table columns as if they
were a single column with multiple values. The term pivoting is used because
table columns are rotated through 90 degrees to form rows.
For example, consider a relational table that has a single column, which holds all
product values. Assume that this table has one column for each business
measure, such as Actual Sales and Forecast Sales. The product column could be
pivoted to produce one column for each product, and the business measures
could be pivoted to produce one row for each measure.
In the slide example, the original table contains three columns: Product, Eastern
Sales and Western Sales. The Sales columns have a double meaning. They identify
the sales Region (Eastern or Western) and the Sales value in that region. As a
result of pivoting, a new column (Region) is created, and all sales are combined in
a single measure column (Sales).
DecisionStream\Help\Contents and Index\Index\pivoting data

PIVOTING
Rules for Pivoting
Pivot values are declared in a DataStream item.

An element is declared in the transformation model for
mapping the DataStream item.
No limit exists on the number of pivot values.
Pivoting increases the number of output fact rows.
You can pivot more than one set of columns within a
DataStream.
Data source derivations can be used to modify data before
pivoting is applied.
A typical pivot rotates two or more data source columns to two DataStream
items. One DataStream item maps to a transformation model element that
records the data source column from which the value originates. The second
DataStream item maps to a transformation model element that contains the
values from the data source columns.
You can create as many pivot values as you need for each DataStream item. You
typically create one pivot value for each column you are pivoting. For example, a
table has four columns that must be pivoted: Product A, Product B, Product C,
and Product D, where each of the columns holds sales data for each given
product. To pivot the table, create a DataStream item with four pivot values, each
corresponding to a product.
Because pivoting transforms the table structure from horizontal to vertical, the
number of columns decreases, whereas the number of row increases. Row limits
apply to the output of a DataStream, not to the number of source data rows
processed.
You can perform multiple pivoting. Multiple pivoting implies that you create
more than one DataStream item with pivot values. Each DataStream item has its
own set of pivot values.
For example, four data source columns called Eastern Sales, Eastern Forecast,
Western Sales, and Western Forecast might be pivoted. The following
DataStream items will be created with pivot values:
• Region - Eastern and Western
• Sales - Actual and Forecast

Pivot Data in DecisionStream
Pivot values
The slide illustrates the single-pivot technique. Use the single-pivot technique
when you know all values for the pivoted column in advance. In the slide
example, all twelve of the month columns from the source table are replaced with
literal values for the twelve months of the year (Jan, Feb, and so on). These values
are declared as pivot values in the Month DataStream item and are written to a
single column, Month, in the fact table.

PIVOTING
Pivot Technique
Create DataStream items, one with pivot values.
Map the DataStream item and a pivot value to each pivoted data source column.
Each data source column selected for pivoting must be mapped twice:
• to a corresponding pivot value
• to the DataStream item that will contain the values from the data source
columns
Once the data source columns have been mapped, the DataStream items must be
mapped to transformation model elements in the usual way. In the slide
example, each month column from the data source has double mapping to a
corresponding month pivot value within the Months DataStream item and to the
Amount DataStream item containing information on the number of products for
this month.
The pivot values in the DataStream item are designed for two purposes. The first
purpose is to provide literal values to plug into subsequent fact table rows. The
second purpose is to indicate DecisionStream how many rows to insert into the
fact table for each row from the data source.
When you perform pivoting, you require static pivot values.
When pivoting, create a new DataStream item to store the pivot values. In the
slide example, a Month DataStream item is created containing twelve pivot values
to represent the twelve months of the year. Also, a DataStream item called
Amount is created for the quantitative values.
DecisionStream\Help\Contents and Index\Index\pivoting data:single pivots

In order to map both the pivot value and the Amount DataStream item to the
appropriate columns, map the pivot value first, and then use Ctrl+click to map
the DataStream item representing quantitative values.
Note: The slide example is inappropriate for checking data integrity against the
Month dimension. The keys in the Month dimension represent actual
dates (199901, 199902, and so on) instead of literal values (Jan, Feb, and
so on). Data integrity checking requires a more advanced technique.
DecisionStream\Help\Contents and Index\Index\pivoting data:creating pivots

PIVOTING
Demo 16-1
Single Pivot Technique

Demo 16-1: Single Pivot Technique
Purpose:
We want to create a fact build that performs basic pivoting. We
will create a new fact build and deliver a pivot table called
PivotData to the data mart.
Task 1. Create a fact build called PivotData and add a data

source to it.
1. In the GO_Catalog, open SQLTerm, and then examine the
Rolling12months table in the DS_Sources database.
2. Close SQLTerm.
Now we need to create a simple fact build to deliver a pivot table.
3. From the Tools menu, click Fact Build Wizard.
The Fact Build Wizard window opens.
4. In the Enter the name of the build box, type PivotData.
5. Make sure that Cognos BI Mart (Star) is selected in the Select the type
of fact build to create box.
6. In the Select the connection into which the build is to deliver data box,
click TargetConnect.
7. Click Perform a full Refresh on the Target Data to select it, and then
click Next.
8. Click Data source, and then click Add.
The Data Source Wizard window opens.
9. In the Enter the Data Source name box, type PivotData.
click DS_Sources, and then click Next.

PIVOTING
11. In the left pane, click the Rolling12Months check box to select the
entire table.
A SELECT statement appears in the right pane of the Data Source
Wizard window displaying the columns that will be included in the build.
12. Click Finish to close the Data Source Wizard window.

We do not need to modify the data source mapping.
Task 2. Finish creating the PivotData fact build.
1. Click Next.
2. Click Year, click Change Type, and then click To Attribute.
3. Click Next.
We do not need to assign the Products dimension element to any
hierarchy. The product codes do not match.
4. Click Next.
5. Click Next to accept the default values for the fact delivery.
6. Click Next to accept the default values for the fact delivery naming.
7. Click Next to accept the default values for the fact delivery summary.
8. Click the Deliver Dimensions check box to clear it.
We do not need to deliver dimension data.
9. Click Next.
10. Click the Deliver Metadata check box to clear it.
We do not want to deliver metadata also.
11. Click Next, and then click Finish.

Task 3. Delete all DataStream items and elements except

Products, Year, and unit_cost from the build.
1. Expand the PivotData fact build, right-click DataStream, and then click
Properties.
2. In the right pane, click mth1, and then click Delete.
3. Repeat step 2 for other DataStream items: mth2, mth3, mth4, mth5,
mth6, mth7, mth8, mth9, mth10, mth11, and mth12.
4. When you have deleted the DataStream items, click OK.

5. Under PivotData, expand the Transformation Model, right-click mth1,
and then click Delete.
The dialog box appears prompting us to confirm that we want to delete
the element.
6. Click Yes.
7. Repeat steps 5 and 6 for other elements: mth2, mth3, mth4, mth5, mth6,
mth7, mth8, mth9, mth10, mth11, and mth12.

PIVOTING
Task 4. Add the Month and Amount DataStream items, and

create 12 pivot values for the Month item.
1. Under the PivotData fact build, right-click DataStream, and then click
Properties.
2. Click Add.
3. In the Name box, type Month.
4. Click the Pivot Values tab.
5. Click Add, and then in the new row, type Jan.
6. Repeat step 5 to add the following values: Feb, Mar, Apr, May, Jun, Jul,
Aug, Sep, Oct, Nov, and Dec.
7. Click OK.
8. Click Add.
9. In the Name box, type Amount, and then click OK.
The DataStream Items pane should appear as follows.
Task 5. Map the data source columns to the Amount item

and Month pivot values.
1. Map the Jan pivot to the mth1 column.
2. Repeat step 1 for the rest of pivot values under Month, mapping Feb to Instructional Tips
This is the first time that students perform
mth2, Mar to mth3, Apr to mth4, May to mth5, Jun to mth6, Jul to
double-mapping. It is important for them to
mth7, Aug to mth8, Sep to mth9, Oct to mth10, Nov to mth11, and follow steps 3 and 4 as written.
Dec to mth12. You also can suggest the alternative
3. Click Amount, click mth1, and then click Map. technique for double mapping: clicking the
Ctrl key while dragging the item to the
column row.

4. Repeat step 3 to map Amount to the following columns: mth2, mth3,

mth4, mth5, mth6, mth7, mth8, mth9, mth10, mth11, and mth12.
The mapping appears as follows.
5. Click OK.
Task 6. Add the Month and Amount elements and map the
DataStream items to the transformation model.
1. Under PivotData, right-click the Transformation Model, and then click
Mapping.
2. In the left pane, double-click Month to create and map a Month
element.
3. Double-click Amount to create and map an Amount element.
The mapping appears as follows.
Notice that both elements have been created as attributes. You can
change the element type later if you require.
4. Click OK.
The Add New Elements dialog box appears.
5. Click OK.

PIVOTING
6. Right-click the DataStream for the PivotData fact build, and then click Instructional Tips
Properties. Steps 6 to 9 may not be necessary.
However, if you execute the build and a
The DataStream Properties window opens. data type error occurs, ensure that the
7. In the DataStream Items column, click Year, and then click Edit. Year DataStream item is of data type
CHAR with a precision of 4.
8. In the Type box, click CHAR, and then in the Precision box, type 4.
9. Click OK to close the DataStream Item Properties window, and then
click OK to close the DataStream Properties window.
10. Save the catalog, click PivotData, and then execute the build.
The log file indicates that 96 records have been inserted into the
F_PivotData fact table.
12. Open SQLTerm and examine the data in the F_PivotData table in the
TargetConnect database.
13. Close SQLTerm and keep DecisionStream open for the upcoming
demo.
Results:
We performed basic pivoting by creating a new fact build and
delivering a pivot table called PivotData to the data mart.

What if One Pivot is not Enough?
There are instances where pivoting data on one axis will not
produce a sufficient result in the fact table.
For example, month and year may be separate and you must
concatenate them.
You may have more than one measure that you want to use
in a pivot.
You have one data source that is already pivoted and another
that is not, but you need to use both to produce a result in
your fact table.
Your source data may dictate pivoting on more than one axis to obtain the
desired results in your fact table. In this case, you need to perform multiple pivots
of your data.
For example, your source data may contain multiple measures that are required in
the pivot; or source data that needs to be concatenated to correctly express your
results in a format that makes sense.
You will need to use a multiple pivot technique to obtain results that will best suit
your requirements.

PIVOTING
Multi-Pivot Technique: Mapping
Each column from the first array must be mapped:
• to the DataStream item that will contain the values from the data source
columns
Each derivation from the second array must be mapped:
• to the dimension element that you output to the fact table
In the slide example, the sales array is mapped to the new DataStream item
(Amount).
The calculated month array is mapped to the Months DataStream item, which is
then mapped in the transformation model to the first dimension element for data
integrity checking.
Also, both the sales array and the month array are mapped to the dummy
element (RowNumber). This mapping creates twelve output rows for each input
row.
DecisionStream\Help\Contents and Index\Index\pivoting data:double pivots

Multi-Pivot Technique
Use the advanced pivoting technique when the required pivot values are derived
from an expression or calculation, rather than literal values.
If data integrity checking is required while you are pivoting, you may need two
transformation elements. This is especially true when you are pivoting date arrays.
The first element must be a dimension element. It is used to check referential Instructional Tips
The dimension element that is set for
integrity, if necessary. This element is output to the fact table. The second element output can be used for data integrity
is a dummy element that is mapped to the DataStream item containing the pivot checking, if it references an existing
values. This element is not output to the fact table. dimension.
In the slide example, the new query includes two arrays. The first array contains
the sales values that come from the source table. The second array contains
derivations used to calculate dates for each month.
DecisionStream\Help\Contents and Index\Index\pivoting data:multiple pivots

PIVOTING
Demo 16-2
Multi-Pivot Technique

Demo 16-2: Multi-Pivot Technique
Purpose:
We want to pivot the month columns in a fact table so that it
includes both the year and the month. Therefore, we must use a
multi-pivot technique to calculate the month values. We will not
create a new fact build from the beginning. Instead, we will
duplicate the PivotData fact build and modify it.
Task 1. Duplicate the PivotData build and rename it

AdvancePivot.
1. Under Builds, right-click the PivotData build, and then click Duplicate.
A new build called PivotData:1 is created.
2. Right-click PivotData:1, and then click Properties.
3. In the Name box, type AdvancePivot.
4. Click OK.
Task 2. Verify the source query.
1. Expand AdvancePivot, expand DataStream, right-click the PivotData
data source, and then click Properties.
2. In the Name box, type AdvancePivot.
Ensure that the statement is as shown below:
4. Run the query to check the syntax, and then click OK.
5. Click Prepare, and then click Refresh to prepare the columns for use in
the fact build.

PIVOTING
Task 3. Modify the DataStream.

1. Under AdvancePivot, right-click DataStream, and then click Properties.
The DataStream Properties dialog box appears.
2. In the DataStream Items column, click the Year DataStream item, and
then click Edit.
3. In the Type box, click INTEGER, and then click OK to close the
DataStream Item Properties window.
4. In the DataStream Items column, click the Month DataStream item, and
then click Edit. The DataStream Item Properties window opens.
5. Click the Pivot Values tab.
6. Double-click Jan and type 1.
7. Repeat step 6 for other pivot values. Replace Feb with 2, Mar with 3,
Apr with 4, May with 5, Jun with 6, Jul with 7, Aug with 8, Sep with 9,
Oct with 10, Nov with 11, Dec with 12.
8. Click OK to close the DataStream Item Properties window, and then
expand Month.
Instructional Tips
If you have made changes to the order of
the columns in the SQL SELECT
statement, then you may have to re-map
some columns to the appropriate
DataStream items.


10. In the Name box, type YearMonth, and then click the Calculation tab.
11. In the Return type box, click CHAR, in the right pane, type Year * 100
+ Month, and then click OK to close the Derivation Properties window
Task 4. Modify the transformation model.
1. Under AdvancePivot, expand the Transformation Model if necessary.
2. Right-click the Year attribute, click Delete, and then click Yes.
3. Right-click the Month attribute, and then click Properties.
4. In the Name box, type YearMonth, and then click OK.
5. Right-click the YearMonth attribute, and then click Move Down.
The transformation model appears as shown below.

7. Map the YearMonth attribute to the YearMonth DataStream
derivation.
8. Click the Month row, and then click Clear.

PIVOTING
Task 5. Modify the fact table properties.

1. Under the AdvancePivot build, expand Delivery Modules and Fact
Delivery, right-click F_PivotData (RelationalTable), and then click
Properties.
2. In the Name box, type F_AdvancePivot.
3. Click the Table Properties tab, and in the Table name box, type
F_AdvancePivot.
4. Click OK.
Task 6. Execute the build and examine the results.
1. Execute the AdvancePivot build.
Notice that 96 records have been delivered.
3. Open the F_AdvancePivot table in SQLTerm and analyze the results.
The query result appears as follows.
4. Close SQLTerm.
Results:
We duplicated a fact build and used a multi-pivot technique to
calculate month values.

Summary

discuss pivoting
use the single pivot technique
use the multi-pivot technique

17
15. Aggregation
16. Pivoting
Developers

RAGGED HIERARCHIES
Objectives

use auto-level hierarchies
describe circular references
create a ragged hierarchy

Balanced and Ragged Hierarchies
Dimensional data are usually structured as hierarchies, either

balanced or ragged (unbalanced).
Balanced hierarchies (those with a fixed number of levels)
are most common and are the easiest to understand and
analyze.
In ragged hierarchies, each branch does not break down into
the same number of levels. They are harder to analyze and
report against.
Also, PowerPlay requires that all leaf (lowest-level) nodes be
at the same level to aggregate properly.
The goal of dimensional modeling is to represent a set of business measures

in a standard framework, which in DecisionStream is called the dimensional
framework. The greatest benefit of dimensional modeling is that users can
understand this framework. But dimensional modeling is not always
straightforward as is evident when you encounter a dimension that contains
ragged hierarchies.
In general, hierarchies can be balanced or ragged (unbalanced). In a balanced

hierarchy, the parent of every member comes from the level immediately
above the member. In a ragged hierarchy, each branch of the tree does not
break down into the same number of levels. Reporting is made more difficult
as the number of levels is unknown, and the number of levels can vary from
one branch to another.
Note: PowerPlay requires all the leaf nodes to be at the same level to
aggregate properly. This topic is covered in greater detail in the
following pages.

RAGGED HIERARCHIES
Parent-Child Relationships
Parent-child relationship are recursive relationships.

The levels of the hierarchy are determined by rows of the
same table.
Reports To
Employees
Orders
A parent-child schema represents the delivered dimension in a single table. Each

row of this table contains the Id of the member, together with the Id of its parent
(where a parent exists). Although not required, a parent-child schema may
identify the hierarchy level of each member. A parent-child schema is a recursive
representation of the dimension.
A variant of this schema uses one table for all dimensions and has an additional
column to identify the dimension to which each row relates.
A parent-child schema is a structure data table where each row contains the Id of
the member and the identifier of its parent. Where each structure data row
identifies the hierarchy level of that row, you can use this information to select for
each level DataStream only the structure data that relates to that level. Therefore,
such hierarchies acquire structure data through the level DataStream. If the levels
are not identified, use an auto-level hierarchy to determine the number of levels.

Ragged Hierarchies
Leaf nodes have no children.

DecisionStream fact builds only look for leaf nodes at the
lowest level.
Employee Hierarchy
Andrew Fuller
leaf leaf leaf leaf

Nancy Davolio Janet Leverling Stephen Buchanan Margaret Peacock Laura Callahan
leaf leaf leaf

Michael Suyama Robert King Anne Dodsworth
Often, dimension modelers encounter what is known as ragged (unbalanced)

hierarchies. A ragged hierarchy has leaf nodes at more than one level. A leaf node
is a member that has no children. Ragged hierarchies are common in a data
warehouse. They usually represent organizations and bills of materials. In the slide
example, the employees at the lowest level of their branch are not necessarily on
the lowest level. Some employees on a certain level have people reporting to
them and other employees do not.
In a fact build the dimension elements that reference a hierarchy will look for a
match only at the lowest level of the dimension. In the slide example, this
means that only the lowest three employees (Suyama, King, and Dodsworth)
will be referenced, but all the other employees will be ignored (Davolio,
Leverling, Peacock, and Callahan).
The most difficult aspect of the problem is that you often do not know how
many values the dimension takes on until you see the data itself. Therefore, the
REJECT and REJECT/WARN options are not available for this feature. If you
set this feature to ERROR, DecisionStream issues an error message and halts
processing if the hierarchy is ragged.
The issue with a ragged hierarchy is not how to manage the dimension, but how
to correctly display to users the results in an understandable manner.
DecisionStream can handle ragged hierarchies by using auto-level hierarchies.
DecisionStream\Help\Contents and Index\Index\hierarchies:ragged

RAGGED HIERARCHIES
Resolve Ragged Hierarchies: Step 1
Create an auto-level hierarchy to obtain the number of levels.

Create a dimension build to create a physical table that will
identify for each row the level it belongs to.
The process for resolving data reporting problems such as ragged hierarchies
usually has many steps. The first step is to use an auto-level hierarchy, which is
defined solely in terms of parent-child relationships to create the required defined
levels for a new hierarchy.
The auto-level hierarchy is used to create a table that will be referenced in the
creation of a new hierarchy. This new hierarchy will have the defined level
structure necessary to continue resolving the ragged hierarchy problem.

Resolve Ragged Hierarchies: Step 1 (cont’d)

Use Auto-Level Hierarchies
The purpose of auto-level hierarchies in DecisionStream is to

determine the number of levels in a hierarchy.
Top Level (Level 1)
Report to Andrew Fuller (Level 2)
Report to Steven Buchanan (Level 3)
Report to Andrew Fuller (Level 2)
Auto-level hierarchies are hierarchies from parent-child schemas where the

structure data table contains no hierarchy level information. DecisionStream
builds the hierarchy by following the parent-child relationships between the
structure data rows. Because the hierarchy levels are generated at run time, no
level DataStream exists and the hierarchy must acquire all structure data through
the hierarchy DataStream.
In the slide example, through the use of an auto-level hierarchy, DecisionStream

determined that there were three levels of data. The top level, which has no
parent, contains only the member Andrew Fuller. The second level, whose parent
is Andrew Fuller, has five members. One member of this level, Steven Buchanan,
has three children. The remaining four members of this level are considered to be
leaf nodes. The third level, whose parent is Steven Buchanan, has three members.
These members are also leaf nodes.
After the levels are known, a dimension build is used to create a table that
will contain each row and a column to identify the level of the row. This step
is necessary only if the original parent-child relationship table does not
identify the levels.
DecisionStream\Help\Contents and Index\Index\auto-level hierarchies:creating

RAGGED HIERARCHIES
Demo 17-1
Determine the Number of Levels

Demo 17-1: Determine the Number of Levels
Purpose:
We have a problem with recursive relationships in our
Employee table. We must create an auto-level hierarchy to
determine the number of levels that exist in this table. This is
the first step in solving this problem.
Task 1. Create the Employees hierarchy and template.

1. In the GO_Catalog, expand the Library and Dimensions folders Instructional Tips
(if necessary), right-click the StaffD dimension, and then click Insert BIAS_Northwind is the same data source
Auto-Level Hierarchy. that we used in Module 10, "Hierarchical
Dimensions," and Module 14, "Facts and
2. In the Name box, type EmployeesAH, and then click the Attributes tab. History Preservation."
3. Click New. When creating the EmployeesT template,

you may choose to add the three attributes
4. In the Name box, type EmployeesT, and then click the Attributes tab. manually rather than importing them from
5. Click Import Table, and then expand BIAS_NorthWind. the Employees table.
6. Click Employees to select it, and then click OK.

The table attributes are added to the template.
7. Click LastName to select it, and then click Delete.
We do not need this attribute in the template. We can delete all other
unnecessary attributes from the template as well.
8. Repeat step 7 to delete the Title, TitleOfCourtesy, BirthDate, HireDate,
Address, City, Region, PostalCode, Country, HomePhone, Extension,
Photo, and Notes attributes.
9. Double-click FirstName, and then type Name.
10. Click OK.

RAGGED HIERARCHIES
Task 2. Add the attributes to the auto-level hierarchy and

then add the Data Source for the Employees
hierarchy.
1. In the Auto-Level Hierarchy Properties window, click to add all
attributes to the Chosen attributes list.
2. In the Chosen attributes list, click the Id check box for EmployeeID, the
Caption check box for Name, and the Parent check box for ReportsTo
to select them.
3. Click OK.
4. Expand the EmployeesAH hierarchy, right-click DataStream, and then
click Insert Data Source.
5. Click the SQL tab, click BIAS_NorthWind in the Database for SQL
Operations box, and then click SQL Helper.
6. In the SQL Query pane, type the following: Instructional Tips
This may be a good opportunity to
SELECT ÈmployeeID`, demonstrate to the students how to use the
`FirstName`, drag-and-drop feature for adding columns
`LastName`, to the SQL code.
`ReportsTo`
FROM Èmployees` Under Database Objects, under the
Employees table, left-click+drag a column
7. Execute the query. to the SQL Query pane.
The table has nine records. You will likely not see the items in the far
8. Click OK to close SQL Helper. right DataStream Items column until you
click the Auto Map button. In effect, clicking
9. Click Prepare to select it, and then click Refresh. this button both creates the DataStream
items and maps them to the appropriate
10. Click the Derivations tab, and then click Add. data source columns in the left pane.
11. In the Name box, type EmpName, and then click the Calculation tab.
12. In the right pane, type Concat(FirstName, ' ', LastName)

Task 3. Create and map DataStream Items.

1. Under the EmployeesAH auto-level hierarchy, right-click DataStream,
3. In the DataStream Items column, click FirstName and then click
Delete.
4. In the DataStream Items column, click LastName and then click
Delete.
5. In the DataStream Items column, click EmpName, and then click Edit.
6. In the Name box, type Name, and then click OK.

Task 4. Map the DataStream items to the hierarchy level
attributes.
1. Right-click the EmployeesAH hierarchy and then click Mapping.
2. Map EmployeeID to EmployeeID, map Name to Name, and map Instructional Tips
ReportsTo to ReportsTo. The DataStream Items may be in a
different order than what is shown, but this
The mapping appears as shown below. is acceptable.
3. Click OK.
Task 5. Examine the results in Reference Explorer.
2. Right-click the EmployeesAH hierarchy, and then click Explore.
The Reference Explorer dialog box appears.
3. Click OK.
The Reference Explorer window opens.
4. Expand Andrew Fuller to view the data.
We can see that Mr. Fuller has five people reporting to him.

RAGGED HIERARCHIES
5. Expand Steven Buchanan to view the data.

We can see that Mr. Buchanan has three people reporting to him.
6. Click Elements at each level . Instructional Tips

Point out Level1, Level2, and Level3. This
We can see that there are three levels in this hierarchy. This information is indicative of an auto-level hierarchy.
will be important in resolving our problem with ragged hierarchies.
7. Close Reference Explorer and keep DecisionStream open.
Results:
By creating an auto-level hierarchy, we have determined the
number of levels that will have to be created to resolve our
ragged hierarchy problem.

Demo 17-2
Create a Dimension Build

RAGGED HIERARCHIES
Demo 17-2: Create a Dimension Build
Purpose:
Using the Dimension Build Wizard, create a build by using the
EmployeesAH hierarchy. We will create a second hierarchy by
using this new dimension build to help solve our ragged
hierarchy problem.
Task 1. Create the Employees dimension build and

execute.
1. Right-click the EmployeesAH auto-level hierarchy, and then click
Properties.
The Auto-Level Hierarchy Properties window opens.
2. Ensure that Use Numeric Sort on Id is selected, and then click OK.
4. In the Name box, type AutoEmployees, and then click Next.
5. In the Dimension to be delivered box, click StaffD.
6. In the Reference to be delivered box, click EmployeesAH (A).
7. In the Deliver into connection box, click TargetConnect, and then click
Next.
8. To accept the defaults for the schema naming conventions, click Next.
9. To accept the default features of the build, click Next.
10. To accept the default properties of the build, click Next.
11. To accept the default attributes for the dimension tables, click Next.
12. Click Finish.
13. Save the catalog, and then execute the AutoEmployees build.
A DOS window opens, and nine rows are delivered.
Task 2. Examine the table in SQLTerm.
1. Run SQLTerm.
3. In the Database Objects pane, expand TargetConnect.
4. Right-click D_EmployeesAH, and then click Add table select
statement.

5. Click Execute.
There are nine employees in the table.
6. Close SQLTerm and keep DecisionStream open.
Results:
We will use the D_EmployeesAH table to create our new
hierarchy in the next demo.

RAGGED HIERARCHIES
Resolve Ragged Hierarchies: Step 2
Create a hierarchy and manually define each level with SQL

by using the table from the table created in step 1.
Flatten the hierarchy by dragging each member down to a
lowest common level.
Create the table in Object
Creation mode to create
columns with the appropriate
size.
Create a dimension build that
delivers the data to the
dimension table one level at
a time.
After you create the auto-level to determine the number of levels and the Key Information
Remind the students that we are using
dimension table by using the auto-level hierarchy as a reference, you have to these techniques because PowerPlay
create a new hierarchy by using the dimension table. requires fixed levels to create cubes
suitable for reporting.
At each level, you must define the SQL to extract the data correctly. The slide
example has three levels of employees. The SQL will extract the lowest level of We must have the ability to track data
data to populate the hierarchy. A similar SQL statement is also used at the first about each category (in this case, each
and second levels. employee) at the lowest level, so that
aggregation can be performed. In a ragged
hierarchy, not all of the members go down
Note: If metadata is delivered to Transformer, it is important to go to the to the same level. When using this
transformer model and suppress blanks in the level properties of the hierarchy as a data source, Transformer
dimension. cannot establish a convergence level.
Ultimately, in these three demos, we are

using DecisionStream to create a flat table
that Transformer can use as a data source
for one or more PowerCubes. The resulting
PowerCubes will contain a useful rollup
structure.

Demo 17-3
Create a Ragged Hierarchy

RAGGED HIERARCHIES
Demo 17-3: Create a Ragged Hierarchy
Purpose:
We will build a new hierarchy called RaggedEmpH using the
data in the table created by the AutoEmployees dimension
build (D_EmployeesH). Later, we will deliver the data from this
new hierarchy to our data mart in incremental steps.
Task 1. Create the RaggedEmpH hierarchy by using the

D_EmployeesH table.
1. In the GO_Catalog, right-click the StaffD dimension, and then click
Insert Hierarchy.
2. In the Name box, type RaggedEmpH, and then click OK.
Task 2. Create the first level and add the Data Source.
1. Right-click the RaggedEmpH hierarchy, and then click Insert Level.
2. Click the Attributes tab, and in the Template box, click
D_EmployeesAH.
3. In the Available attributes box, double-click EmployeeID and Name to
add them to the Chosen attributes list.
4. In the Chosen attributes box, click the Id box for EmployeeID and the
Caption box for Name to select them.
The Level Properties window for Level1 appears as shown below.
5. Click OK.
6. Expand Level1, right-click DataStream, and then click Insert Data
Source.
7. Click the SQL tab, and then in the Database box, click TargetConnect.

8. Click SQL Helper, and then in the SQL Query pane type the following:
SELECT ÈmployeeID`,
`Name`
FROM `D_EmployeesAH`
WHERE `level_name` = 'Level1'
9. Click OK to close SQL Helper.
10. Click Prepare, and then click Refresh.
11. Click OK to add the new data source.
Task 3. Create the Level2 level and add the Data Source.
2. Click the Attributes tab, and in the Template box, click D_EmployeesAH.
3. In the Available attributes box, double-click EmployeeID, Name, and
ReportsTo to add them to the Chosen attributes list.
4. In the Chosen attributes list, click the Id box for EmployeeID, the
Caption box for Name, the Parent box for ReportsTo to select them.
5. Click OK.
Source.
SELECT ÈmployeeID` AS Level2ID,
`Name` AS Level2Name,
`ReportsTo` AS ParentID

RAGGED HIERARCHIES
Task 4. Create the Level3 level and add the data source.
2. Click the Attributes tab, and in the Template box, click D_EmployeesAH.
3. In the Available attributes box, double-click EmployeeID, Name, and
ReportsTo to add them to the Chosen attributes list.
4. In the Chosen attributes box, click the Id box for EmployeeID, the
Caption box for Name, the Parent box for ReportsTo to select them.
5. Click OK.
Source.
SELECT ÈmployeeID` AS Level3ID,
`Name` AS Level3Name,
`ReportsTo` AS ParentID

Task 5. For each level, map the data source columns and
create DataStream items.
1. Under Level1, right-click DataStream, and then click Properties.
The DataStream Properties window appears.
2. Click Auto Map.
3. Click OK.
The DataStream Properties window appears.
5. Click Auto Map.
6. Click OK.
8. Click Auto Map.
9. Click OK.
Task 6. For each level, map the DataStream items to the
level attributes.
1. Right-click Level1, and then click Mapping.
The DataStream Mapping window appears.
2. Map the EmployeeID level attribute to the EmployeeID DataStream
item, and Name to Name.
3. Click OK.

RAGGED HIERARCHIES

5. Map EmployeeID to Level2ID, Name to Level2Name, and
ReportsTo to ParentID.
6. Click OK.
8. Map EmployeeID to Level3ID, Name to Level3Name, and
ReportsTo to ParentID.
9. Click OK.
Results:
We built a new hierarchy called RaggedEmpH using the data in
the table created by the AutoEmployees dimension build
(D_EmployeesH). In the next demo, we will deliver the data
from this new hierarchy to our data mart in incremental steps.

Demo 17-4
Populate a Ragged Hierarchy

RAGGED HIERARCHIES
Demo 17-4: Populate a Ragged Hierarchy
Purpose:
In the previous demo, we built a new hierarchy that we can use
to resolve our ragged hierarchy problem. We will create a
dimension build that will deliver the data from this hierarchy to
the data mart in three separate passes. Then the instructor will
show the results in a PowerPlay report.
Task 1. Create a dimension build referencing the

RaggedEmpH hierarchy.
1. Right-click Builds, and then click Insert Dimension Build.
2. In the Name box, type RaggedEmp, and then click the Dimension tab.
4. In the Hierarchy/Lookup to be delivered box, click RaggedEmpH (H).
5. In the Deliver into database box, click TargetConnect, and then click
OK.
Task 2. Create a table and a template to populate the top
level of the table. You have to create a table to
populate all of the levels.
1. Right-click the RaggedEmp dimension build, and then click Insert
Table.
2. In the Table name box, type RaggedEmp.
3. Click the Columns tab, and then maximize the window if necessary.
4. Next to the Use template box, click New.
5. In the Name box, type RaggedEmpT, and then click the Attributes tab.
6. Click Add, type Level1Id, and then press Enter on your keyboard.
7. Repeat step 6 to add Level1Name, Level2Id, Level2Name, Level3Id,
Level3Name, LowestLevelId, and LowestLevelName.
8. In the Behavior column for Level1Id, click Normal, and then in the list,
click Business Key.
9. Repeat step 8 for Level2Id, Level3Id, and LowestLevelId.

10. In the LowestLevelId row, click in the Value column, and then change
False to True.
11. Click OK.

12. Under Available Sources, expand RaggedEmpH, and then expand
Level1.
13. Map EmployeeID [Id] to LowestLevelId.
14. Repeat step 13 to map Name [Caption] to LowestLevelName.
15. Click OK.

Task 3. Create another table to populate the second level.
Table.
2. In the Table name box, type RaggedEmp.
3. Click the Columns tab.
4. In the Use template box, click RaggedEmpT.
5. Under Available Sources, expand RaggedEmpH and then expand
Level1 and Level2.
6. From Level1, map EmployeeID [Id] to Level1Id and map Name
[Caption] to Level1Name.

RAGGED HIERARCHIES
7. From Level2, map EmployeeID [Id] to LowestLevelId and map

Name [Caption] to LowestLevelName.
8. Click OK.
Task 4. Create another table to populate the third level.
Table.
2. In the Table name box, type RaggedEmp, click the Columns tab, and
then in the Use template box, click RaggedEmpT.
3. Under Available Sources, expand RaggedEmpH and then expand
Level1, Level2, and Level3.
6. From Level3, map EmployeeID [Id] to LowestLevelId and map
Name [Caption] to LowestLevelName.
7. Click OK, and then save the catalog.

Task 5. Re-order the tables to accommodate the data.
1. Under the RaggedEmp dimension build, right-click the top
RaggedEmp table, and then click Move Down.
The table is now in the middle.
2. Right-click the middle RaggedEmp table, and then click Move Down.
The table is now at the bottom.

3. Right-click the new middle RaggedEmp table, and then click Move Up.
The table is now at the top.
By performing these steps, the Level3 data will populate the table first,
followed by the Level2 data, and then the Level1 data. This is important
for allocating enough memory for the data in each column.
Task 6. Remove fostering from the build.
1. Right-click the RaggedEmp dimension build, and then click
Properties.
2. Click the Dimension tab.
3. Click the Remove Unused Foster Parents check box to select it, and
then click OK.
4. Save your work.
Task 7. Execute the build and examine the RaggedEmp
table using SQL Term.
1. Execute the RaggedEmp dimension build.
A DOS window opens indicating that nine inserts in total were made.
3. Run SQL Term.
6. Right-click the RaggedEmp table, and then click Add table select
statement.
7. Click Execute.
The table appears as shown below.
8. Close SQLTerm and leave DecisionStream open for the next demo.

RAGGED HIERARCHIES
Task 8. View the data in the RaggedEmp table in

PowerPlay. (Instructor-only)
1. From the Start menu, navigate to Programs\Cognos Series 7 Instructional Tips
Version 2 and then click Cognos PowerPlay. For more information on how this cube was
built, see the Cognos Series 7 BI Advanced
PowerPlay opens. Solutions course.
2. From the File menu, click Open, and then navigate to
C:\Edcognos\DS7001, and then double-click Sales.mdc.
3. Click OK to dismiss any warning messages that may appear.
The PowerCube opens in PowerPlay.
4. Navigate through the cube to show the data.
5. Close PowerPlay without saving any changes.
Results:
We created a dimension build that delivered the data from the
RaggedEmpH hierarchy to the data mart in three separate
passes. Then the instructor showed how this data could be
reported upon in PowerPlay.

Possible Issue: Circular References
Circular references occur when a ragged hierarchy member

is its own ancestor.
ID Caption Parent
BUF Buffalo NY
EAST Eastern Region BUF
NY New York EAST
DecisionStream detects circular references by examining each hierarchy member

in sorted order and following the child-parent references. DecisionStream can
create auto-level hierarchies that can help resolve uncertain child-parent
relationships. For each member, DecisionStream stops following the child-parent
links when it reaches the ultimate ancestor or it has proved that a circular
reference exists. In the latter case, DecisionStream attaches, to the foster branch,
the last member that it examined.
Circular references occur when a hierarchy member is its own ancestor. For
example, if member A is the parent of member B, member B is the parent of
member C, and member C is the parent of member A, then each member (A, B,
and C) is its own ancestor. In the slide example, NY is the parent of BUF, EAST
is the parent of NY, and BUF is the parent of EAST. The resulting auto-level
hierarchy would be:
EAST - {Eastern Region}
NY - {New York}
BUF - {Buffalo}
This auto-level hierarchy is created as a result of Buffalo (BUF) being the first
item in the sort order.
DecisionStream assists you in handling circular references. When using auto-level
hierarchies, you can use a new option on the Features tab to define what to do
with those circular references. By default, circular references are set to ERROR in
DecisionStream.
DecisionStream\Help\Contents and Index\Index\circular references

RAGGED HIERARCHIES
Summary

utilized auto-level hierarchies
described circular references
created a ragged hierarchy


18
15. Aggregation
16. Pivoting
Developers

PACKAGING AND NAVIGATOR
Objectives

use packages to export components
use packages to import components
use the Navigator to search for components in a catalog
use the Navigator to locate component dependencies

What is Packaging?
Packaging is a utility that you can use to:

move components from one catalog to another
combine the efforts of several developers
distribute catalogs or components
partially backup and restore a catalog
Packages are useful if you work in an environment where: Key Information

DecisionStream components are the
• there is more than one developer working on a catalog connections, builds, reference dimensions,
reference structures, JobStreams,
templates and functions that are set up in a
• concurrent access to catalogs is not feasible
catalog.
They are also useful in situations where you have multiple catalogs that share
common components. Once components have been created, they can be
duplicated in different catalogs without the need to recreate them. For example,
you may have several catalogs that include the same staff reference dimension.

Packaging Concepts
A package must contain:

the components you choose to export from a catalog
all dependent components Dependent
components
Component required in package
selected for export
Some components are dependent on other components. For example, a

JobStream is dependent (among other things) on any builds that it executes.
When you select components to include in a package, DecisionStream includes all

dependent components. When you import components from a package, you can
either include these dependent components or, if the components already exist in
the target catalog, you can use the existing components.

How to Export Components to a Package
Save
Export components when you want to copy components to additional catalogs

or create a partial backup of your catalog.
To export components to a package:
1. Create a package and select the components you want to export.

DecisionStream selects any dependent components.
2. Save the package to your hard drive with a .pkg extension.

How to Import Components from a Package
Select
components to
import from
package
Import components when you want to duplicate components from other

catalogs or partially restore a catalog.
To import components from a package: Instructional Tips

DecisionStream gives you the option to
1. Open the target catalog and select the package from which you want to back up the target catalog before importing
import components. components. This is strongly
recommended.
2. Select the components to import. DecisionStream notifies you if any
dependent components must also be imported.
3. Complete the import process.

Importing Identical Components
When you import components, DecisionStream warns you if

identical components exist in the target catalog.
Instructional Tips
DecisionStream identifies a component as
When you select components to import, DecisionStream notifies you if there are being identical if it has the same name and
specification as the component in the target
dependent components. However, it is not compulsory to import dependent
catalog.
components if they already exist in the target catalog.

Demo 18-1
Export and Import Components

Using Packages

Demo 18-1: Export and Import Components Using

Packages
Purpose:
We want to create a package that contains the StaffH
hierarchy. We then want to import the package and view the
results.
Task 1. Create a package containing the StaffH hierarchy.

1. In GO_Catalog, from the File menu, click Create Package.
The Create Package window opens.
2. Expand the Dimensions folder and then expand StaffD in the left pane.
3. Click the icon to the left of StaffH to select it.
DecisionStream adds the hierarchy, together with the dependent
components, to the right pane.
.
Task 2. Save the package.
1. Click OK.
The Package File dialog box appears.
2. In the Save in box, navigate to C:\Edcognos\DS7001.
3. In the File name box, type Staff_Hierarchy, and then click Save.
Task 3. Import the Staff hierarchy.
1. From the File menu, click Import Package.
DecisionStream displays a message prompting you to back up the
catalog.
2. Click Yes and create a backup called Day1Backup in the folder
C:\Edcognos\DS7001.
The Package File dialog box appears.

3. In the Look in box, navigate to C:\Edcognos\DS7001, click

Staff_Hierarchy.pkg, and then click Open.
The Import Package window opens.
None of the components included in the Staff_Hierarchy package exist
in Day1Catalog, so no warnings are displayed.
4. Click StaffH in the left pane, and then click Add.
The Dependent Components window opens with a list of additional
dependent components that must be imported.
5. Click OK, in the Components to import, expand all components.
All components are listed in the right pane, ready to be imported, each
componenet will have a message indicating that there is already a
component existing in the catalog, and is identical. For our demo, this is
acceptable.
6. Click OK to complete the import, and click Yes to the message.
Results:
We created a package containing the StaffH hierarchy. We
then imported this package and viewed the results.

What is the Navigator?
The Navigator is a utility you can use to:

search for components in a catalog
locate the dependencies of components
A large catalog can become difficult to navigate. Using the navigator lets you
quickly search for components and check their dependencies. For example, you
may want to locate an obsolete dimension to delete it, but first you need to check
whether the dimension is used anywhere in the catalog.

How to Use the Navigator to Search for

Components
Type in the name of the component you are searching for.

Use the * and ? wildcards if you are unsure of the exact
component name or to perform a partial search.
Search criteria
Search results
When you perform a search, the navigator displays a list of all components that
match your search criteria. For each matching component, it shows:
• the component name
• the component type
• the component location
To locate a component in the Tree pane, click the component name in the list.

How to Use the Navigator to Locate

Component Dependencies
Move backwards or
forwards between
previously selected
components
Click on
component to view
its dependencies
When you select a component in the Tree pane, the navigator lists its
dependencies. For each listed component it shows:
• the component name
• the component type
• the component location Instructional Tips

To freeze the list of components in the
You can then explore the dependencies of any listed component by clicking the navigator, click Freeze. Only the Tree pane
and Visualization pane is refreshed when
component name in the list. DecisionStream locates the component in the Tree you explore further dependencies.
pane and refreshes the Visualization pane.

Demo 18-2
Locate a Component and Check Its

Dependencies

Demo 18-2: Locate a Component and Check Its

Dependencies
Purpose:
We want to search for the StaffD reference dimension and
explore its dependent components using the navigator.
Task 1. Search for the StaffD reference dimension.

1. From the View menu, click Navigate.
The Navigator dialog box appears.
2. Click the Find Components tab.
3. In the Search for the text box, type staff* to search for all components
that begin with Staff, and then click Find Now.
The search results appear as shown below.
4. Click Reference Dimension "StaffD" to locate this item in the Tree

pane.

Task 2. Explore the dependencies of the StaffD reference

dimension.
1. Click the Navigate tab.
A list of components that are dependent on StaffD is displayed in the
Navigator window.
2. Click Hierarchy "StaffH" to view its dependencies.

The list of dependencies in the Navigator window is updated. In
addition, the StaffH hierarchy is located in the Tree pane, and the
Visualization pane is refreshed accordingly. You may have to reposition
the Navigator dialog box to see everything. .
3. Click to return to the list of StaffD dependencies.

4. Click the Freeze check box to select it, and then click Reference
Template "StaffT".
The Navigator window is not updated, but StaffT template is located in
the Tree pane.
5. Click the Freeze check box to clear it.
The Navigator window is updated to show the dependent components
of the StaffT template.
6. Close the Navigator dialog box.
Results:
We have searched for the StaffD reference dimension and
explored its dependent components using the navigator.

Summary

used packages to export components
used packages to import components
used the Navigator to search for components in a catalog
used the Navigator to locate component dependencies

19
15. Aggregation
16. Pivoting
Developers

RESOLVING DATA QUALITY ISSUES
Objectives

discuss data quality and cleansing issues
handle fostered and unmatched members
perform debugging using SQLTerm and functions
assess the quality of output data
designed an optional lookup

Understand Data Quality and Cleansing
Kimball has defined a set of standards for quality data.

Accuracy - The warehouse data must match the
operational data. An audit trail must exist if differences
occur.
Uniqueness - If two elements are the same, they should
have the same name and be given the same key value.
Consistency - The warehouse data is free from
contradiction (that is, aggregates match underlying
details).
Complete - The warehouse data represents the entire set
of a business’ critical information.
Timely - Data must be delivered in a timely manner to
meet the business user’s needs.
Kimball has defined a set of standards that are used to understand the
characteristics of quality data in data warehouse implementation.
Data coming from multiple operational systems often exhibits inconsistencies.

Even data from a single operational system often has inconsistencies and errors.
For example, the gender field may contain M/F (uppercase) or m/f (lowercase).
In other cases, some data may be technically legal, such as a Sales Order for 1899,
but is not logical to the business in existence for only twenty years. Also, legacy
systems may have evolved where the meaning of the data fields have changed
over time. These issues must be addressed.
Also, elements from different sources that have the same implied meaning should
have the same key values and captions. For example, I.B.M and IBM should be
defined as the same company.
Note: This is not an exhaustive list of problems, but they are common examples.

Evaluate the Quality of Input Data
Common data quality problems often manifest themselves in

the following manner:
rejected fact records
fostering and unmatched members
multiple parents
duplicates
NULL data
You often encounter data problems when executing dimension and fact builds. Instructional Tips
It may be a good idea to reiterate to
Examining log and reject files can uncover many of these quality problems. students that these data characteristics do
not necessarily imply a bad data model.
Some of the more common problems are listed in the slide. These are examined
in greater detail on the pages that follow.

Track Rejected Records
DecisionStream can reject data on input or output.

By default, DecisionStream writes all rejected input data to
the reject file.
You can send rejected fact data to a fact delivery.
You can reject:
duplicate source data
unmatched dimension element Ids
data that does not match any level filter
data that evaluates to FALSE in an output filter
Rejections are not necessarily a sign of a bad model in DecisionStream. Instead, Technical Information
By default, the reject file is called
they could be an indicator of unexpected input data or incorrect specifications. {$DS_BUILD_NAME}.rej. This can cause
Separating the two requires analyzing the data because this is most often the problems if you duplicate a fact build,
cause of the data rejections. because the duplicated fact build will have
a colon in its name. The reject file name
By default, DecisionStream handles rejected transactional (fact) data from two cannot contain a colon, which is a reserved
perspectives. It fails either a reference data integrity check or a user logic check. character in Windows.
In the first scenario, the rejected data fails to match on some dimensional key (for
example, the state_code is not in the Location hierarchy) and the fact row, by
default, is written to a reject file, not to the fact table.
In the second scenario, the user may code some constraint into an output filter
on the fact table (for example, profit_margin > .30), and only those records that
pass that constraint are written to the table.

Understand Fostering
Fostering occurs when a member at one hierarchy level does

not have a parent.
Fostering is most commonly encountered when creating a
dimension from source.
When a member at one hierarchy level does not have a parent, DecisionStream
assigns a foster parent by default. In the slide example, the Dictionary member of
the ProductH hierarchy does not point to any higher-level parent.
When creating a dimension from the original source systems, you may choose to
include orphaned members. These orphan members are assigned a foster parent
with a default name of ?<Level Name>. In the slide example, the foster parent is
called Unknown ProductLine.
If possible, examine fostered members and assign them to legitimate parents in

the operational database. Fostering should be considered only a fix, not a
solution.

Understand Unmatched Members
Fostering can also result from delivering fact rows that do not
match existing dimension members.
You can choose to deliver both the fact data and the
unmatched members to the data mart.
By default, DecisionStream rejects fact rows that do not match existing

dimension members. You can override the default by clicking the Accept
unmatched member identifiers box on the Unmatched Members tab of the
Dimension Properties window. By selecting this box, you are disabling referential
integrity checking on that dimension.
You can also add unmatched members to the existing dimension data so that the
data is not unmatched in subsequent build executions. To save these members,
click the Save unmatched member details via reference structure box.
To save the unmatched members to the existing dimension data, the dimension
element must be using a reference structure (a hierarchy, auto-level hierarchy, or
lookup) that uses a template for data access. This template must contain an
attribute that has a behavior of business key and a primary key value of True. For
a non-auto-level hierarchy, this applies to the lowest level of the hierarchy at
which input data is mapped.
If the conditions for adding unmatched members are not met, a message appears
informing you of this. This check box is not available.

Include Unmatched Members in the

Data Mart
Including unmatched members delivers the fact row and

creates a new member in the dimension table if a dimension
delivery exists.
Our reference hierarchy looks like:
Our transaction data looks like:
Europe
Month City Revenue
Mar London 2
Jan Manchester 1
Jan Paris 1.5 England France
Feb London 2.5
Feb Manchester 2
Feb Paris 1
Mar London 3 London Manchester Paris
Mar Manchester 0
Mar Madrid 1
City Country Region SID
London England Europe 1
Because there is no member in the
Manshester England Europe 2
hierarchy for Madrid, it is added to the
Paris France Europe 3
reference structure.
Madrid <NULL> <NULL> 4
When you select the Accept unmatched member identifiers box, the unmatched
fact row is delivered to the fact table in the data mart.
DecisionStream also attempts to deliver a new member to the dimension table,

but it cannot write that member to the physical table unless the build includes a
delivery module for that dimension.
If you do not include a delivery module for the dimension, you will have a fact
row that has no matching member in the dimension table. Avoid having an
unmatched member, because it makes it difficult to locate this problem in the
data mart.

Understand Hierarchies with

Multiple Parents
Multiple parents can cause reporting and analysis problems.

"All"
All Products
95
Diet Cola Non-Cola

Product Type
40 60 35
Product
Diet Cola Cola Diet Orange Orange
25 35 15 20
Fact Dimension SQL Result Set

Product Amount Product Type
Diet Cola 25 Diet Cola Diet SQL Join Diet Cola 25 Diet
Diet Cola Cola Diet Cola 25 Cola
Occasions arise when a member of a hierarchy has more than one parent.
Although this situation occurs in various businesses, it is very difficult to report
and analyze the fact table from this type of structure. Aggregation in particular
can be quite complex, as noted in the slide example.
The slide demonstrates the discrepancy between totals at the Product level and
Product Type level because of multiple parents for the diet products. When the
total of the Product Type level is 135 (40+60+35), the total on the Product level
is 95 (25+35+15+20), which matches the number on the All Products level. This
discrepancy causes the problem for reporting and analysis.
The solution to this and similar problems depends on the company's reporting
needs and requirements.

Accept Multiple Parents
By default, DecisionStream accepts multiple parents in

hierarchies, but try to avoid them if possible.
Identifies a member with

multiple parents.
More than one Richmond

exists in North America.
Although DecisionStream supports multiple parent hierarchies by default, you Instructional Tips
This issue usually occurs due to data
should avoid them. quality problems or a lack of understanding
about the source data.
Multiple parent hierarchies are difficult to model and support in the data
warehouse and they lead to confusing and possibly inaccurate query results.
Also, there is no straightforward translation from multi-parent hierarchies to the

alternate drill-down paths that PowerPlay supports.
Note: You can view multiple parents in the Reference Explorer.

Ignore Multiple Parents
Click the Ignore Multiple Parents box if you want to deliver only
the first parent of a dimension member.
Subsequent parents for that member are ignored.
If you cannot solve the problem of multiple parents at the source, you can force
DecisionStream to ignore multiple parents when the dimension build is run.
Click the Ignore Multiple Parents box on the Dimension tab of the Dimension
Build Properties window. DecisionStream will deliver only the first parent of the
dimension member that it encounters in the incoming data. All subsequent
parents for that member are ignored.

Demo 19-1
Inspect Rejected Data

Demo 19-1: Inspect Rejected Data
Purpose:
We want to override the default build settings to view
additional information about our build in the log files. This lets
us see some of the flexibility that we have in changing build
settings for troubleshooting purposes.
Task 1. Modify the hierarchy properties.

1. In the GO_Catalog, under Library, expand the Dimensions folder, and Instructional Tips
then expand the ProductD dimension. The process of executing builds may have
generated a large number of log files that
2. Right-click the ProductH hierarchy, and then click Properties. are not needed. To avoid confusion, have
The Hierarchy Properties window opens. students delete the contents of this
directory before you start the demo. They
3. In the Hierarchy Properties window, click the Features tab. can navigate to this directory from the
Tools menu and clicking Browse Log Files.
4. In the box to the right of Non-unique Ids, click Reject/Warn.
5. In the Limit box adjacent to the Non-unique Ids feature, type 100. At this point, explain non-unique IDs to the
students and how, in most cases, they will
The result appears as follows: not cause problems.
6. Click OK to close the Hierarchy Properties window, and then save your
work.
DecisionStream\Help\Contents and Index\Index\Hierarchies: non-unique Ids

Task 2. Override the default build settings.

1. Under the Builds folder, right-click the Product dimension build, and
then click Execute.
the Detail, Internal, and Variable boxes to select them.
Ensure that the Progress check box is selected as well. The window
appears as shown below.
3. Click OK.
The DOS window opens and the build is executed.

Task 3. Inspect the log file.

1. From the Tools menu, click Browse Log Files. Key Information
If this is the first time creating a log file on
2. In the log window, double-click the file named your system, step 2 of Task 3 will be OK.
DimBuild_Product_<<number>>.log, and then review the contents. However, if you have executed the build
previously, you will have a number of log
files. In this case, choose the most recent
one. To do this, click the Modified column
to sort the files by date and time.
We can see that 26 rows of data with non-unique IDs already exist.
3. Close the log window and Notepad.
4. Under the ProductD dimension, right-click the ProductH hierarchy, and
then click Properties.
5. In the Hierarchy Properties window, click the Features tab.
6. For Non-unique Ids change Reject/Warn to Accept.
7. Change the Limit for Non-unique Ids to 0.
8. Click OK.
9. Save the catalog and keep DecisonStream open for the next demo.
Results:
We modified the default build settings for the purpose of
troubleshooting. In doing so, we were able to inspect the
progress of the build and identify rejected data in the form of
Non-Unique Ids.

Save Data Rejected During Delivery
1. Add a fact delivery module to hold the rejected data.
3. Click the Exclusive check box

so that DecisionStream does
2. Create an output filter that evaluates to TRUE not include the same data in
only for the rejected data. subsequent deliveries.
You can save data that DecisionStream rejects on delivery by creating a

DecisionStream delivery. You can send the rejected data to a suitable database in
any supported DBMS.
To save the rejected data, perform the following steps:
1. Create the delivery to save the unwanted data. This delivery must be the
topmost fact delivery in the Build tree.
2. In the Properties dialog box for the delivery, create an output filter with
an expression that evaluates to TRUE only for the rejected data. If
necessary, you can create a derivation transformation model element
against which to apply the filter expression.
3. Click the General tab and then select the Exclusive check box. This
ensures that DecisionStream does not offer the same data rows to
subsequent deliveries.
4. If you suspect that data will be rejected due to unmatched member
errors, you must indicate that you want to include unmatched member
identifiers for the appropriate dimension elements. Otherwise, the fact
rows will be rejected before they reach the output stage.
After running the build, you can edit the rejected data in the DBMS to which you
delivered it, or in SQLTerm to prepare it for reprocessing.
In the slide example, the F_RejectedData fact delivery will hold the rows that
meet the criteria specified in the output filter. In other words, the rows that
include a ProductNumber greater than 1000000 will be written to the
F_RejectedData relational table.

Use SQLTerm for Debugging
SQLTerm can assist with debugging.
By querying scratch tables or reject

tables, you can inspect rejected
records
Rejected records
When handling rejected data, you can use SQLTerm to implement remedies. You
can use techniques such as:
• Viewing or dumping columns from source and target tables to reveal

unexpected results.
• Modifying build data sources to address data problems, for example,

concatenating multipart keys or reformatting date columns.
• Running queries that concatenate delimiters around data values to locate

leading or trailing blanks.
In the slide example, a SELECT statement was run against a table containing data
rejected by a fact build. After careful examination, it was determined that it was
the state_cd column that was causing problems. The F_RejectData table contains
data about states that we did not want to include in the original fact table.

Use Functions for Debugging
The LogMsg and Audit functions can assist in debugging.
Using the LogMsg function in a derivation, you can insert messages into the build Instructional Tips
Writing messages to the DOS window will
log file. In the slide example, the LogMsg function is used in a derivation element degrade performance. Therefore, if you
called DebugProduct. When the fact build containing this derivation is run, a have many messages that you want to
message is written to the log file for each member of the Product dimension that include in your log file, avoid showing these
is processed. If something goes wrong during the execution of the build, we can messages in the DOS window during build
view the log file to see exactly which member of the Product dimension caused execution.
this problem.
You can also enable various audit functions to write additional information to the
DecisionStream catalog tables.

Evaluate the Quality of Output Data
Transformations, aggregations, and queries may produce

incorrect output results. These may appear in many forms,
including:
negative numbers and zeros
illogical or out-of-bound values
incorrect mapping of output values
You must consider many factors to ensure the quality of data in the data mart.
You can cleanse data in an operational data store (ODS) or a data staging area.
You can also create the data mart directly from operational systems without
staging it first.
In either situation, opportunities still exist for transformations, aggregations and

queries to produce incorrect results. When you encounter problems, consider the
following:
• Enable logging, auditing or datetime stamping on rows so that checks

can be run to verify that loads ran at the correct time and in the correct
sequence.
• Consider enabling guards that prevent runs from occurring if previous

runs failed.
• Verify that the correct set of data is extracted from the sources.
• Establish sanity checks or guards on derived and aggregated data and run
audit checks for out-of-bound conditions.
• Check the mappings from the DataStream to make sure that the values
are correctly assigned to all hierarchy attributes and build elements.

Understand Other Data Issues
There are cases where a fact build delivers fewer or more than
expected rows into the data mart. This can happen because:
data is being rejected due to bad dimension elements or
duplicates
merging of rows with duplicate dimension values is enabled
the build is suppressing output of detail data
the delivery has an output or level filter, is missing a filter, or
has an inappropriate filter
a dimension element property has enabled multiple output
levels and aggregation
In many cases, the best way to see the rows and columns that were passed to the
deliveries is to deliver to scratch tables or flat files, and include all build elements
(especially those referenced in filters that are not output in the normal deliveries).
Additional derivation columns could be added to the fact build that write Y/N
(0/1) flags based on the conditional expressions that are suspected of causing
problems.

Demo 19-2
Use Built-in Functions to Debug

Input Data

Demo 19-2: Use Built-in Functions to Debug Input Data
Purpose:
We want to ensure that data about each product is processed
into the fact table. To accomplish this, we will add a derivation
to the DemoSales build to generate a separate log message for
each product number.
Task 1. Create the derivation.

1. In the Builds folder, expand DemoSales, right-click Transformation
Model, and then click Insert Derivation.
2. In the Name box, type DebugProduct.
3. Click the Never Output check box to select it.
We do not want to output the values returned by this derivation to a
column in the fact table.
4. Click the Calculation tab, and then in the right pane, type the code as
shown below.
5. Click OK.
6. Under the DemoSales build, right-click the DataStream, and then click
Properties.
box, type 3000.

Task 2. Execute the build.

1. Right-click DemoSales, and then click Execute.
2. Click OK.
3. Click the Override build settings check box to select it, and then ensure
that only the Progress and User check boxes are selected.
4. Click OK.
The build executes.
Task 3. Inspect the log file and audit file.
1. From Tools menu, click Browse Log Files. Instructional Tips
You can also add a connection to your
2. Open the file named Build_DemoSales<<number>>.log to analyze catalog from within DecisionStream and
the message logging. then view the contents of the audit table in
SQLTerm.
We can see the log messages that were written as fact data regarding each
product that was processed. All the data was processed successfully.
3. Close the log file and the log window.
4. Browse on your hard drive to C:\Edcognos\DS7001, locate
Day1Catalog.mdb, and then double-click it to open the database file.
The file opens in Microsoft Access.
5. Double-click the ds_audit_msg_line table to open it, and view the
messages that were written to the table.
6. Close Microsoft Access, and do not save any changes to the files.
Results:
To ensure that data about each product was processed into
the fact table, we added a derivation to the DemoSales build to
generate a separate log message for each product number.

Build slide.
Create an Optional Lookup for Source Data
The optional lookup technique can be used to mark unmatched

members instead of rejecting them.
Optional lookup marks all

matching records as found
Records in the fact build which do not

match any existing lookup records are
marked as not found
Sometimes, data integrity checking is considered optional. That is, instead of

rejecting a fact row that has no match in the reference table, the row is included
in a fact table but marked or flagged as failing the check. You can create this
optional lookup by using the technique described on the following pages.
A value for matching records is defined in the lookup, and a value for
non-matching records is defined in the fact build. As shown in the slide, the
fact table has the found column that stores Y for matching records and N for
nonmatching records.
The optional lookup technique requires additional steps in both the design of the
lookup and the fact build. The following two pages describe these steps.

Build Slide.
Create an Optional Lookup (cont’d)
Select Use DataStream for data access and create
2
Add a flag column a SQL query.
1
to the template.
4a Map the columns to

the DataStream.
3 Create a literal with a

value for an existing
record.
Map the DataStream to

4b
the lookup attributes.
To find matched and unmatched data, we
In an optional lookup, you create a flag column with values for matching records must include unmatched members (see the
(for example, Y or 1). next slide). To include unmatched
members, you must use a template to
To design the optional lookup, follow these steps: access the source data, not a DataStream.
This is because a template automatically
1. When creating a template, add an extra attribute for the flag column. creates the correct INSERT and SELECT
When the lookup records are loaded into memory from the reference statements required to include the
unmatched members.
table, this flag will be set to Y for all members.
2. On the Data Access tab of Lookup Properties window, click the Use
DataStream for data access box to select it. Make sure to include a data
source.
3. In the Data Source for the lookup, create a literal that indicates a record
exists. In this example, the value is Y.
4. Map the literal and the data returned from the SQL SELECT query:
a. Map the literal value and the columns from the data source to items
in the DataStream.
b. Map the items in the DataStream to the attributes of the lookup.

Build Slide.
Add an Optional Lookup to a Fact Build
Click the Accept unmatched member identifiers box on the

Unmatched Members tab of the Dimension Properties dialog box.
Create a derivation that checks the existence of the flag column.
If no lookup row is found, the derivation will be null. In the flag
delivery column, assign N to Value if NULL.
Click the Accept
1 unmatched member
identifiers box.
Create a derivation with

2 a calculation referencing
the flag column.
Enter a literal value for

3 nonmatching records.
While a value for matching records is assigned in the optional lookup, a value for
nonmatching records is assigned in the fact build.
Using this technique, the fact build must have two elements: a dimension element
to reference the lookup and a derivation to calculate the reference literal value of
the lookup.
The slide example outlines the steps required:
1. You want all records to be included in the fact table, even if they do not
match an existing lookup row. As a result, you must click the Accept
unmatched member identifiers box in the properties of the dimension
element.
2. Create a derivation to hold the flag. The derivation references the flag
column; in this example, found. All fact records that match an existing
lookup row will include the value Y.
3. The rows that failed the lookup check will have a null value in the flag
because no matching member row existed in the lookup table. Assign a
value of not found (in this case, N) to the Value if NULL option on the
flag column. This option is under Element Properties in the fact table
delivery module.

Demo 19-3
Apply Translation and Optional

Lookup Techniques in a Single Lookup

Demo 19-3: Apply Translation and Optional Lookup

Techniques in a Single Lookup
Purpose:
We want to include a column in a fact table that will convert
product revenues to the local currency. First, we will use the
Fact Build wizard to create a build called GO_Fact. We will
then create the CurrencyD dimension and CurrencyL lookup to
be referenced from the GO_Fact build.
We must also add a dimension element to the build that will

reference the lookup and a derivation to calculate revenue in
the local currency. We want to deliver all the records, even
those that do not have the conversion rate specified. These
rows must have zero value in the converted product revenue.
Task 1. Use the Fact Build wizard to create the

GO_Fact build.
1. In the GO_Catalog, on the toolbar, click Fact Build wizard.
The Fact Build wizard appears.
2. In the Enter the name of the build box, type GO_Fact, in the Select the
connection into which the build is to deliver data box, click
TargetConnect, and then click the Perform a Full Refresh on the
Target Data check box to select it.
3. Click Next, click Data source, and then click Add.
The Data Source wizard appears.
click SourceConnect, click Next, and then type the following SQL code
in the right pane:
SELECT a.ÒrderCode`,
`ProductNumber`,
`Quantity`,
ÙnitPrice`,
ÒrderDate`
FROM `GOSOrderDetail` a, `GOSOrderHeader` b,
`GOVVendorSite` c
WHERE a.ÒrderCode`= b.ÒrderCode`
AND c.`VendorSiteCode`= b.`VendorSiteCodeGO`
5. Click Finish, click Next, and then click OrderCode.

6. Click Change Type, click To Attribute, and then click Next.

7. Click the Use reference box beside ProductNumber, click the ellipsis,
double-click ProductD, and then double-click ProductH.
8. Repeat step 7 to associate the DateOrder dimension element to the
TimeH hierarchy.
9. Click Next four times, clear the Deliver Dimensions check box, and
then click Next.
10. Clear the Deliver Metadata check box, click Next, and then click
Finish.
The GO_Fact build is added to the tree.
11. In the Builds folder, expand GO_Fact, and then expand
Transformation Model.
Task 2. Format the OrderDate column using a data source

derivation.
1. Under the GO_Fact build, double-click DataStream, and then double-
click DataSource1.
The Derivations Properties window opens.
4. In the box on the right side, type Concat(SubStr(OrderDate, 1, 4),


7. Click Auto Map.
The DateOrder derivation is mapped to a DataStream item called
DateOrder.
8. In the DataStream Items pane, click OrderDate, and then click Delete.
11. In the Transformation Model pane, expand OrderDate, TimeH, and
Day, and then click+drag DayId beside the DateOrder DataStream item
in the left pane.

13. In the Transformation Model, right-click OrderDate, and then click
Properties.
14. In the Name and Business name boxes, type DateOrder, and then click
OK.
Task 3. Create the CurrencyD dimension and the
CurrencyL lookup based on a new template.
1. Expand the Library folder, right-click Dimensions, and then click Insert
Reference Dimension.
2. In the Name box, type CurrencyD, and then click OK.
3. Right-click the CurrencyD dimension, and then click Insert Lookup.
The Lookup Properties window opens.

4. In the Name box, type CurrencyL.

5. Click the Attributes tab, and then click New to create a new template
for the Currency lookup.
6. In the Name box, type CurrencyT.
7. Click the Attributes tab, and then click Add.
8. In the new row, under Attribute Name, type CountryCurrencyCode.
9. Click Add, and then in the new row, under Attribute Name, type
Country.
10. Click Add, in the new row, under Attribute Name, type Rate, and then
press Enter on your keyboard.
11. Click OK.
The created attributes appear in the Available attributes pane.
12. Click the double-arrow button to move all the available attributes
into the Chosen attributes pane.
13. In the CountryCurrencyCode row, click the Id box to select it.
14. In the Country row, click the Caption box to select it.
15. Click OK.

Task 4. Create a data source for the CurrencyL lookup.
1. Expand the CurrencyL lookup, right-click DataStream, and then click
Insert Data Source.
2. In the Name box, type Currency.
Ensure SourceConnect is selected in the Database box.
4. Click the SQL Helper button.

5. Expand SourceConnect, right-click GOSConversionRate, and then

click Add table select statement.
The SQL statement appears in the SQL Query pane.
6. Modify the statement as follows:
7. Execute the query to ensure that the code works.

It should return 504 rows of data.
8. Click OK, and then in the Data Source Properties window, click the
Prepare and Refresh buttons to prepare the data source columns for
use in the lookup.
10. In the Name box, type CountryCurrencyCode, and then click the
Calculation tab.
11. In the right pane, type Concat(ToChar(CountryCode), ToChar(100 *
ConversionYear + ConversionMonth))
Task 5. Map the DataStream items and lookup attributes.
1. Right-click DataStream, click Properties, and then click Auto Map.
2. In the DataStream Items column, click ConversionMonth, and then
click Delete.
3. Repeat step 2 for the ConversionYear and CountryCode rows.
4. Click OK, right-click the CurrencyL lookup, and then click Mapping.

5. Drag the attributes from the Level Attributes pane to the Maps To
column so that the mapping appears as shown below.
6. Click OK, and then explore the hierarchy in Reference Explorer, saving
changes if prompted.
Task 6. Add a new dimension element called
CountryCurrency to the GO_Fact build.
1. Under the GO_Fact build, right-click Transformation Model, and then
click Insert Dimension.
2. In the Name box, type CountryCurrency.
3. Click the Never Output check box to select it.
This step excludes the dimension element from delivery.
4. Click the Reference tab, and then in the Dimension box, click
CurrencyD.
5. In the Structure box, click CurrencyL (L).
Notice that no levels appear in the Level column.
We want all transaction records delivered to the table, even those records
that do not match the existing reference data.
6. Click the Unmatched Members tab, and then click the Accept
unmatched member identifiers check box to select it.
This option prevents the records from being rejected.
7. Click OK.
8. Click CountryCurrency, and then drag it to the top of the
Transformation Model tree.

Task 7. Create a derivation called RevenueLocal.

1. Under GO_Fact, right-click Transformation Model, and then click
Insert Derivation. The Derivation Properties window opens.
2. In the Name box, type RevenueLocal.
3. Click the Calculation tab.
4. In the right pane, type the following:
(Quantity * UnitPrice) * ToDouble(CountryCurrency.Rate)
5. Click OK.
The Add New Element dialog box appears. This time, we want to add
the element to the delivery.
6. Click Yes.
Task 8. Modify the SQL query in the data source, and the
map the CountryCurrency dimension element to
the data source column.
1. Under the GO_Fact build, expand DataStream, and then double-click
the DataSource1 data source.
2. Click the SQL tab, and then click the SQL Helper button.
3. Modify the statement as follows:
4. Click the Execute SQL Query (limit to 1 return row) to ensure that
the code works.
5. Click OK, click the Prepare button to select it, and then click Refresh to
prepare the columns for use in the fact build.
6. Click the Derivations tab, and then click the Add button.
7. In the Name box, type CountryCurrency, and then click the
Calculation tab.
8. In the right pane, type Concat(CountryCode, SubStr(DateOrder, 1, 6)).


The columns of the data source are mapped to the DataStream items.
12. In the DataStream Items pane, click CountryCode, and then click
Delete.
13. Repeat step 12 for OrderDate.
14. Click OK.

15. Right-click the Transformation Model of the GO_Fact build, and then
click Mapping.
16. In the Transformation Model pane, expand the CountryCurrency
dimension, the CurrencyL lookup, and the CurrencyL level.
17. Click and drag the CountryCurrencyCode attribute from the
Transformation Model pane to the CountryCurrency DataStream item.

Now we need to provide a value for records that do not have a match in
the reference data. By default, they will have NULL values in the
RevenueLocal column.

Task 9. Provide a value for NULL values.

1. Under the GO_Fact fact build, expand Delivery Modules, Fact
Delivery, and then double-click F_GO_Fact(Relational Table).
2. Maximize the Table Delivery Properties window.
3. Click the Table Properties tab, click the RevenueLocal row, and then
click the ellipsis button in the Element Properties column.
The Delivery Element Properties dialog box appears.
4. In the Value if NULL box, type 0, click OK, and then click OK to close
the Table Delivery Properties window.
Task 10. Execute the GO_Fact fact build and analyze the
results.
1. Right-click DataStream for the GO_Fact fact build, and then click
Properties.
3. In the Maximum input rows to process box, type 3000, and then click
OK.
5. Right-click the GO_Fact build and click Execute.
6. Click OK.
All 3000 rows of input data are inserted into the F_GO_Fact table.
8. Click the SQLTerm button.
The SQLTerm window opens.
F_GO_Fact, and then click Add table select statement.

11. Run the query.

If we scroll to the bottom of the result set, we can see that some records
have the revenue values converted, and other records have a value of 0.
12. Close SQLTerm.
We need to change the number of rows to process back to 1000.
13. Under the GO_Fact build, right-click DataStream, and then click
Properties.
15. In the Maximum input rows to process box, type 1000, and then click
OK.
Results:
We have created the Currency dimension and Currency
Lookup to be referenced from the Sales fact build. We also
added a dimension element to reference the lookup and a
derivation to calculate revenue in the local currency.

Summary

discussed data quality and cleansing issues
handled fostered and unmatched members
performed debugging using SQLTerm and functions
assessed the quality of output data
designed an optional lookup


20
15. Aggregation
16. Pivoting
Developers

TROUBLESHOOTING AND TUNING
Objectives

use build logging to ensure that data marts are being
loaded properly
perform dimension breaking
manage memory and resources
export DDL statements

Use Troubleshooting Files
DecisionStream has three primary executable files:

dimbuild.exe - runs a dimension build
databuild.exe - runs a fact build
rundsjob.exe - runs a DecisionStream job node
Databuild.exe will produce a reject file, if

one is specified.
Each executable produces a log file. You can specify the

level of detail that you want the log file to contain. With
fact and dimension builds, you can also specify how
frequently you want to print messages to the log file.
It is important to be aware of the underlying DecisionStream architecture that

provides some of the tools that will help the process of troubleshooting.
The three primary programs in DecisionStream that are used to collect and
manage dimension and fact data are dimbuild, databuild, and rundsjob. These
programs execute specifications extracted from the DecisionStream catalog.
DecisionStream will parse and validate the specifications, and check runtime
license keys before beginning the source and delivering the data.
When these programs execute, they will always produce a log file. Each log file
has its own naming convention. The dimbuild.exe file produces a log file with a
name of DimBuild_<BuildName>_<sequence number>. log (for example,
DimBuild_Sales_0001.log).
The databuild.exe file produces a log file with a name of

Build_<BuildName>_<sequence number>.log (for example,
Build_Sales_0001.log). The databuild.exe file can also produce a reject file called
<BuildName>.rej.
The rundsjob.exe file produces a log file with a name of

Job_<JobName>_<sequence number>.log.

Specify the Build Execution Mode
The three execution modes available in a build are:

Normal
Object Creation
Check Only
Check Only mode is useful for testing and debugging.
Specify what you want to

deliver when you run the
fact build. These options
do not apply to Check
Only Mode.
Select the delivery type.
There are three execution modes in a build:
• Normal, where DecisionStream processes the whole build and delivers

the required fact data, dimension data, and metadata.
• Object Creation, where DecisionStream creates the required tables but

does not deliver any data.
• Check Only, where DecisionStream processes the build without delivering

data. Check Only mode is useful for testing a build and for estimating the
resources required. You cannot use this mode to deliver metadata.
It is a good practice to run a build in Check Only mode before you execute it
normally, so that you can inspect the log files for occurrences of unexpected
behavior.
The difference between executing a build in Normal mode and using Check Only
mode is that no data will be delivered in Check Only mode. However, the build
will still perform all internal processing. Check Only mode makes it possible for
you to track the progress of a build without populating or altering any of the
target tables.
DecisionStream\Help\Contents and Index\Index\executing:execution modes

Understand Memory Management
DecisionStream allocates memory for several purposes:

structure data
dimension domains
the hash table
the page pool
the page table
Structure Data DecisionStream stores all the data for hierarchies and Instructional Tips
Emphasize to students at this point that
lookups used in a fact build and caches it at the start of DecisionStream gets its great throughput
the build process. Once built, this structure is static and capabilities because of how effectively it
is not subject to application paging, except under uses the computer's memory.
extreme circumstances.
Dimension Domains A domain consists of all the found combinations of

dimension element values. For example, all the found
combinations of Product Number, Vendor Number,
and Order Date. It is only used when merging or
aggregation is turned on.
Hash Table An internal DecisionStream structure that is used to

process and transform rows of data.
Page Pool Contains fixed-size data pages in which DecisionStream

stores merged data and the associated hash table for
indexing the data.
Page Table Provides DecisionStream with location and content

details for all the data and hash pages currently in use.
DecisionStream\Help\Contents and Index\Index\managing:memory

Understand the Hash Table
A hash table is a data structure in which keys are mapped to

array slots in memory by a hash function.
Array Slots
Hash Table
In DecisionStream,
the default size of
the hash table is set
to 200,000 slots.
Hash Function
DecisionStream will
resize the hash table if
the maximum number
of slots is less than the
default.
By default, the size of the initial hash table in DecisionStream is set to 200,000
slots. However, DecisionStream precalculates the maximum possible number of
hash table slots and uses the precalculated value when it is smaller than the
specified number of slots.
For any build, you can determine the minimum hash table size by executing the
build in Check Only mode and then examining the log for an entry of the
following format, where nnn is the table size in slots:
[INTERNAL - hh:mm:ss]Hash: minimum table size nnn
DecisionStream attempts to restructure the hash table if it runs out of space. If

restructuring is not possible, DecisionStream resizes the table to 150% of the
previous size. You can determine whether the hash table was restructured or
resized by searching the log. Entries of the following format indicate that
restructuring has occurred:
[INTERNAL - hh:mm:ss]Hash: table full, restructuring
Entries of the following format indicate that resizing has occurred:
[INTERNAL - hh:mm:ss]Hash: resizing table from xx to yy
DecisionStream\Help\Contents and Index\Index\hash table

Specify the Hash Table Size
Specify the minimum hash table size from the log on the
Memory tab of the Build Properties window.
Specify the hash

table value here.
Where possible, determine the minimum size of the hash table by using the log Technical Information
To obtain the information, run the build in
file, and then specify this value on the Memory tab of the Build Properties Check Only mode and write the data to a
window. If you have to specify a lower value, use a value that will result in the log file.
minimum hash table size after resizing. You can calculate suitable values by
multiplying the minimum hash table size by (2/3)n where n is an integer. For Be careful not to make the hash table too
example, if the log indicates that the minimum hash table is 10,000 slots, then big. Allocating too much memory can take
(rounding up to the nearest integer): away memory that may be used by other
programs. Paging will occur, and eventually
the operating system will terminate the
10,000 x (2/3) = 6,670 program.
10,000 x (2/3)2 = 4,450
10,000 x (2/3)3 = 2,963
Therefore, 6670, 4450, and 2963 are good starting values.
Performance tip: Resizing the hash table is the worst possible scenario and will
have a serious impact on performance. The best possible scenario is when the hit
rate in the hash table is closest to 1. The log file will indicate the average hit rate.
If the hit rate is more than 1, more than one attempt was made to find a slot in
the hash table. Hit rate affects performance. The fuller the hash table is, the
greater the likelihood that it will take more than one attempt to find an empty slot
in the table.
DecisionStream\Help\Contents and Index\Index\memory:initial hash table size

Understand Memory Limits
You may want to limit the amount of memory that

DecisionStream requests from the operating system for the
following reasons:
The operating system may terminate any application that
requests more memory than the operating system can
provide.
You want to prevent DecisionStream from requesting
memory that other applications require.
You can determine the maximum amount of memory that DecisionStream used
during the build by inspecting the log file. Search for the last entry of this format:
[INTERNAL - hh:mm:ss]Mem(M): x[Peak= y]( details)
The value of y (the peak value) gives the maximum amount of memory (in MB)
that DecisionStream used. If DecisionStream reaches the limit of available
memory, it creates virtual memory by paging information to disk. You can
determine whether this happens by inspecting the log file. Search for entries of
this format:
[INTERNAL - hh:mm:ss]Paging: n1 pages, n2 in pool, n3 fault(s)
The presence of one or more of these lines indicates that DecisionStream has
exhausted available real memory. The value of n3 indicates the number of
times it occurred.
DecisionStream\Help\Contents and Index\Index\memory:limits

Set the Memory Limits
Specify a value here to

limit the amount of
memory used in a build.
If the memory limit is exceeded,

DecisionStream uses paging to
create virtual memory.
To manage the memory used, you can limit memory during build execution
by using dimension breaks (discussed later in this module) or setting memory Notice that some builds require very little
limits. Setting a memory limit within DecisionStream prevents an operating memory; therefore, even if you specify a
system from terminating an application that requests more memory than the memory limit as small as 1MB, paging may
operating system can provide. It also ensures that DecisionStream uses its not occur.
own paging algorithm instead of relying on the operating system to provide
In short, this tab is where you set the
this function.
amount of memory that DecisionStream
can allocate to execute the fact build. If
DecisionStream exceeds this amount
during the execution process, then
DecisionStream will perform its own
paging, which will greatly reduce
performance.

Dimension Breaking
Breaking is the process of tracking changes in ordered

dimensions and releasing data that is no longer needed to
reduce memory usage.
120
30 30
10 10 10 10 10 10
Aggregated data for March

Aggregated data for Aggregated data for and Q1 flushed
January flushed February flushed
During data acquisition, DecisionStream stores partially built aggregates and all
data that may contribute to future aggregations. When DecisionStream detects a
change in a sorted input stream, all aggregates can be cleared from memory,
therefore releasing the memory occupied by that data.
For example, the hierarchy in the slide example contains year, quarter, and month
dimensions. When DecisionStream has processed all the records for March, it
clears memory for the March records, checks the hierarchy, and flushes all the
records for Q1. DecisionStream continues processing until it has processed all the
records until June. Then it clears memory of all the records for Q2.
Breaking can only function correctly if each data source is sorted by the same set Key Information
This method requires that the data is
of dimensions and in the same sequence. Sorting is often performed in the
already sorted and in order.
database (for example, indexing often orders the data). However, if the source
data is not in the required order, you can either add an ORDER BY clause to the
source SQL or select the Force Sort on Break Dimensions box in the Fact Build
Properties window.
DecisionStream\Help\Contents and Index\Index\breaking

Perform Dimension Breaking
Select the dimensions

on which to break.
Select this option only if

the ORDER BY clause is
not part of the SQL.
If you want to configure a build for dimension breaking in DecisionStream, you

must use the Fact Build Properties window of the corresponding build. Also,
note that the order of the dimensions in the Break On pane determines the order
of the dimensions that DecisionStream uses for breaking. You can change the
order in this pane.
Check that all data sources are sorted by dimension in the sequence in which they
appear in the Break On list. If not, click the Force Sort on Break Dimensions box
to select it.
DecisionStream\Help\Contents and Index\Index\breaking:setting breaks

Specify When to Break
Data should not necessarily be flushed on every break

because identifying the completed break takes time.
You must specify when breaking is to occur after you set the
required dimension breaks.
Specify an
absolute Specify the value as a
value. number of breaks.
Specify the value

as a percentage.
When you perform dimension breaking, you can choose whether to break on a Technical Information
The algorithm for the hash table starts to
fixed number of dimension changes or when the hash table reaches a specific affect the breaking/flushing process if you
percentage of usage: set it to approximately 60 percent. What
does this mean?
• Click the Breaks option button to specify a break every n dimension
changes, and type the required number in the adjacent box. You can compare this to looking for a seat
in a movie theater. If the theater is less
than 60 percent full, it is relatively easy to
• Click the Percentage option button to specify that the break is to be find an empty seat. However, as the
based on the percentage of the hash table that is used. number of empty seats decreases, it
becomes progressively harder to find a
Type a number in the Perform Break Processing Every box to indicate the table place to sit.
usage or the required percentage (as a number between 1 and 100).
Sixty percent is an arbitrary number, but we
Note: A good dimension candidate for breaking is one that is fairly evenly generally recommend it as a good
benchmark. Using a percentage instead of
balanced. The Time dimension is usually a good example. literal number of breaks provides more
flexibility, because the percentage is in
relation to the actual hash table size and
changes as the hash table changes.

Fine-Tune Your Target Tables
Select the build for which

the target tables need to
be altered.
Save your statements

to a script file.
When you execute a fact build, DecisionStream creates the required fact and
dimension tables if they are not in the target data mart. For most purposes, the
structure of the created tables is acceptable.
However, you can fine-tune the structure of these tables by saving the data
definition language (DDL) statements to a script file or by copying them to
SQLTerm or another suitable program. You can edit this script file, and then
execute the modified DDL statements.
Note: You can copy the statements to the Clipboard by selecting Copy from the
shortcut menu.
DecisionStream\Help\Contents and Index\Index\exporting:DDL statements

Dimension Caching
By default, dimensions are cached at the start of a fact build.

You can also specify that data is cached as the rows arrive
(cache on demand).
As a general rule:
cache at the start of a build if most of the dimension
members are required
cache on demand if the build references a small
proportion of the members of a large dimension
By default, when you execute a fact build, the dimensions are cached in memory Instructional Tips
Dimension caching is only a concern if you
first, and then the fact records are processed. For large dimensions, caching uses a have very large dimensions (for example,
lot of memory, which means that the dimension data may take some time to load. over a million members).
You can specify when caching is to be performed, either at the start of build
execution or as rows arrive (caching on demand).
As a general rule, you should cache at the start of a build if most of the dimension
members are required (this is the default). Specify cache on demand if the build
references only a small proportion of the members in a large dimension.

Dimension Caching on Demand
Each dimension row is read as

it is required, and then cached.
Any subsequent facts for that
row come from the cache.
The dimension element must
be associated with a lookup
that uses a template to access
data.
Builds with very large
dimensions will lower the
build’s memory footprint, but
the I/O cost is higher.
Do not use this feature if the
build references most of the
members in the lookup.
Dimension caching on demand caches each dimension row as it arrives, rather

than caching the whole dimension when build execution commences.
DecisionStream always attempts to locate the record in the cache first.
To use caching on demand, the dimension element must be associated with a

lookup that uses a template to access the data.
When deciding when to cache, you should consider the portion of the dimension
that is to be loaded during build execution. If the build references most of the
records, then you should cache the dimension.
For example, a car insurance company has many customers, but only 10% of
them are invoiced each month. If the company has 120,000 customers, an
average of 10,000 invoices are generated each month. Since this uses only 10% of
the dimension, caching the Customer dimension on demand may be faster.

Demo 20-1
Perform Dimension Breaking,

Execute a Build in Check Only Mode,
and Inspect the Log File

Demo 20-1: Perform Dimension Breaking, Execute a

Build in Check Only Mode, and Inspect the
Log File
Purpose:
To more effectively utilize memory, we want to perform
dimension breaking while executing the DemoSales fact build.
We will then execute the build in Check Only mode to evaluate
the results of breaking. Lastly, we will inspect the log file.
Task 1. Perform dimension breaking.

1. In the build tree, right-click the DemoSales fact build, and then click
Properties.
2. Click the Breaks tab.
3. In the left pane, double-click DateOrder.
The DateOrder dimension is added to the Break On list on the right side.
4. Click the Perform Break Processing Every check box to select it, in
the box to the right, type 60, and then click the Percent button to select
it.
We are specifying that we want to break after 60 percent of the hash table
is used. The result appears as shown below.
5. Click OK to close the Fact Build Properties window, and then save the
catalog.

Task 2. Execute the DemoSales fact build in Check Only

mode.
1. Under DemoSales, right-click DataStream, and then click Properties.
2. Click the Input tab, and then clear the Maximum input rows to
process box.
3. Click OK to close the DataStream Properties window, and then save
your work.
4. Right-click the DemoSales fact build, and then click Execute, saving
5. With the Override build settings check box selected, click the Detail
and Internal check boxes to select them.
Ensure that the User check box is not selected, and that the Progress
check box is selected.
6. In the Execution mode box, click Check Only, and then click OK.
A command window opens and the fact build runs.
7. Press the Enter key to close the DOS window.
Task 3. View the log file.
1. From the Tools menu, click Browse Log Files.
The Log window opens.
2. If necessary, sort the files by date, and then double-click the most recent
log file.
The Build_DemoSales_<<log number>>.log file opens in Notepad.
Near the top of the log file, a message was written that indicates that the
build was run in check only mode and that no data was delivered to the
target database. This message is repeated near the bottom of the log file.

3. Scroll past the executed SQL statements and messages regarding

hierarchy member caching.
The INTERNAL messages indicate when breaking occurred during the Instructional Tips
execution of the build. You can add more messages to the log file
by reducing the numbers in the Message
4. Close Notepad and the log window, and then leave DecisionStream frequency (Input) and Message frequency
open for the next module. (Output) boxes. This can be done on the
Input tab of the Fact Build Properties
Results: window or in the Execute Build dialog box.
To more effectively utilize memory, we performed dimension
breaking while executing the DemoSales fact build. We then
executed the build in Check Only mode to evaluate the results
of breaking. Lastly, we inspected the log file.

Summary

used build logging to ensure that data marts are being
loaded properly
performed dimension breaking
managed memory and resources
exported DDL statements


21
15. Aggregation
16. Pivoting
Developers

DELIVERY IN DEPTH
Objectives

configure fact, dimension, and metadata delivery modules
apply vertical partitioning by subscribing to the elements
create indexes on fact and dimension tables
update fact data using keys

Delivery Process to a Single Data Mart
Fact Build Delivery Modules Delivery Targets
Dimension
Delivery
Data Mart
Fact
Delivery
Metadata
Delivery
There are various options for data delivery through a fact build.
When you must create an independently standing data mart, the deliveries for fact
data, dimension data, and metadata can be defined within the fact build.
However, when the data is delivered into a complex database, the fact build is
used only for a fact delivery.

DELIVERY IN DEPTH
Delivery Process to a Data Warehouse
Input Hierarchies
Reference
Data
Reference
Sources
Data Sources Product
Updates Dimension Builds

D_Product
D_Product
Product
Dimension Build
Hierarchies/Lookups
ProductLevel
Fact Builds
Transactional Sales
F_Sales
F_Sales
Data Sources ProductNumber
Quantity
UnitCost
Updates AverageRevenue
When the target database is a data warehouse, the best practice is to deliver and
maintain the reference data through dimension builds. Fact builds are responsible
for the fact data delivery and maintenance.
The slide shows that the Product dimension build delivers the Product dimension
based on the Product input hierarchy.
The Sales fact build, which consists of the ProductNumber dimension element
and a few fact columns, references the Product dimension through
ProductNumber.
In production, a dimension element does not have to reference all columns in the
corresponding dimension table. In most cases, the reference has to check data
integrity on a single level. Therefore, an additional, much simpler hierarchy or
lookup must be created. The new hierarchy or lookup references the dimension
table in the data warehouse. In the slide example, the ProductNumber dimension
element references the D_Product table through the new ProductLevel hierarchy,
which stores the lowest-level attributes of the Product dimension.
As the Product dimension build delivers dimension data, the Sales fact build
delivers fact data into the F_Sales table.
Note: Because the course emphasizes the importance of dimension delivery

through dimension builds and not through fact builds, the current module
only briefly describes the dimension delivery in a fact build.

Fact Delivery Modules
There are two groups of fact delivery modules.
Table delivery modules

•Bulk Copy (BCP)
Non-table delivery modules
•DB2 LOAD
•Text File
•Informix LOAD
•DataStream
•ORACLE SQL *Loader
•Essbase Direct Feed
•Red Brick Loader (TMU)
•Essbase File Feed
•Relational Table
•Teradata Fast Load
•Teradata Multi Load
•Teradata Tpump
•Microsoft SQL Server BCP Delivery
Fact delivery modules deliver the fact data that a fact build produces.
DecisionStream provides a number of the fact delivery modules that can be
separated into two groups:
Table delivery modules deliver data into database tables:
• The Bulk Copy (BCP) delivery module delivers data to Sybase or
Microsoft SQL Server databases using the BCP loader utility.
• The DB2 LOAD delivery module delivers data to IBM DB2 databases
using the DB2 bulk loader.
• The Informix LOAD delivery module delivers data to an Informix
database using the Informix DBACCESS command.
• The ORACLE SQL*Loader delivery module delivers data to an Oracle
database using the Oracle Bulk Load utility.
• The Red Brick Loader (TMU) delivery module delivers data to a Red
Brick database using the Red Brick Bulk Loader utility.
• The Relational Table delivery module is discussed later in this course.
• The Teradata delivery modules deliver data to the Teradata bulk loader
utilities.
• The Microsoft SQL Server BCP delivery module delivers data to
Microsoft SQL Server databases using the Microsoft SQL Server BCP
bulk load utility.

DELIVERY IN DEPTH
Non-table delivery modules deliver data into files:

• The Text File delivery module is discussed later in this course.
• The DataStream delivery module delivers data to either a text file or a
named pipe.
• The Essbase Direct Feed delivery module delivers data into an Essbase
server.
• The Essbase File Feed delivery module delivers data into flat files that
can then be loaded into Essbase.

Fact Delivery: Relational Table Delivery

Module
Target database Target table

Indexing options
Refresh Type:
• APPEND
• TRUNCATE
Elements • REPLACE
subscription • UPDATE/INSERT
for delivery • UPDATE
Commit Interval:
1
Key option ..
100 commit
…
200 commit
…
…
1000
The Relational Table delivery module formats the build data into relational tables.
These tables can be accessed by client-server analysis applications, or can form
the basis of standard reporting systems.
On the Table Properties tab of the Table Delivery Properties dialog box for the
fact table delivery:
• specify the target database and target tables (mandatory)
• specify how indexing is to be performed (optional)
• create keys for columns and change the column names (optional)
• subscribe to and define the properties for transformation model elements

(optional)
On the Module Properties tab of the Table Delivery Properties dialog box:
• select the method by which rows should be applied to the output table:
APPEND, TRUNCATE, REPLACE, UPDATE/INSERT, and
UPDATE (mandatory)
• define the interval (in terms of number of rows updated or inserted, or

both) between commits of data in the output table (optional). If you
leave this blank, DecisionStream will commit all the rows at the end of
the load

DELIVERY IN DEPTH
Fact Delivery: Text File Delivery Module
Output Filename
full directory path and file
name or just file name.
Line/Field Delimiter
NL(new line) TAB, COMMA,
SPACE, NONE(no delimiter),
or a character string.
Elements subscription for delivery
The Text File delivery module is used to format build data into simple text files.
These files then can be imported into spreadsheets for cross-tab (pivot table)
analysis or distributed to other systems.
On the Module Properties tab of the Table Delivery Properties dialog box for the
fact delivery:
• specify the filename of the output file that DecisionStream should

generate (mandatory)
• specify a character sequence that DecisionStream should use to delimit

the end of each data row and fields in the output file (mandatory)
On the Element Properties tab of the Table Delivery Properties dialog box for
the fact delivery:
• Subscribe or unsubscribe the build elements for delivery (optional). The

Available Elements pane shows all the elements in the fact build. The
Subscribed Elements pane shows the list of build elements to which the
delivery subscribes (by default, this includes all the elements).
DecisionStream delivers only values from the subscribed elements list.

Dimension Delivery in a Fact Build
In general, you should not deliver dimensional data in a fact

build. However, there are four exceptions to this rule:
a demo or proof of concept
data mart delivery from a warehouse
dimensional data available only in fact data
metadata delivery
As previously stated, you should not deliver dimensional data in a fact build.
However, there are cases when you can define dimension delivery in a fact build.
If you are delivering a demonstration or quick proof of concept, it is quite

convenient to deliver an entire delivery solution in a single build. This build,
however, is not typically used in production.
It may also be convenient to deliver portions of a large data warehouse to local Key Information
Because Cognos recommends maintaining
servers; for example, a sales data mart for the Northeast Region of your dimension data in dimension builds, and
company. This type of delivery can also be handled through a single fact build. not in fact builds, this module does not
contain a demo on dimension delivery as a
Sometimes the only source of dimensional data is within the fact data. For part of fact build. However, the following
example, if a distributor sells your products to new customers, the sales pages describe how to add a dimension
transaction file may be your only source of customer data. delivery to a fact build.
The final reason to include a dimensional delivery in a fact build is to enable a

metadata delivery. This is explained later in the module.

DELIVERY IN DEPTH
Dimension Delivery Module
Type or select a name for dimension table
Create or select a template for delivery
Add Relational Table
Template
Attributes
Map the data set columns

to the template attributes
DecisionStream automatically lists the available dimensions under the Dimension

Delivery Modules node. You then select the dimension that you want to deliver
and insert a relational table delivery for this dimension.
You can set as many relational table deliveries as you need for each dimension.
You can then add one or more tables for each dimension delivery.
To define the dimension table properties:
• Provide the table name (mandatory). You can deliver data into an existing
dimension table or create a dimension table for the data mart.
• Set the Commit Interval (optional).
• Provide the delivery template (mandatory). You can reuse an existing

template or create a template.
• Map the dataset columns (Column Tokens or DataSet Items) to the

template attributes (mandatory).

Metadata Delivery in a Fact Build
Provide a name and define

the module properties for
each metadata delivery
In DecisionStream, metadata delivery modules deliver information about fact

data, dimension data, or both to specific applications, such as Cognos and
Microsoft tools. When a fact build is executed, DecisionStream delivers metadata
after delivering the fact and dimension data.
The metadata can be delivered to Cognos Architect, Cognos Impromptu,

Cognos PowerPlay Transformer, Microsoft SQL Server Analysis Services.
To define properties for each metadata target, you must:
• provide a name (on the General tab of the Metadata Delivery Properties
dialog box)
• provide the module property definitions:
• model name and model filename for the Architect delivery
• catalog filename for the Impromptu delivery
• filename for the PowerPlay Transformer delivery
• connection and virtual cube name for the SQL Server Analysis
Services delivery

DELIVERY IN DEPTH
Metadata Delivery Modules
To deliver BI metadata, you must Select a fact table and columns from
the table that you want to deliver
declare dimension delivery modules
for each dimension, but you do not
need to deliver the dimension data.
Disable dimension deliveries if you do not
want dimension data to be delivered through
the fact build
Specify a source and a dimension delivery

for each dimension that you want to deliver
Dimension deliveries
The next step is to select a fact table and the table columns that you want to
include in the metadata delivery. Information about fact tables and the table
columns subscribed for the fact delivery is on the Fact tab of the Metadata
Delivery Properties dialog box. You can select the whole table or individual
columns for the delivery.
BI metadata requires knowledge of the dimensional structures in the data

warehouse. Therefore, dimension deliveries must be declared in fact builds that
generate BI metadata. Key Information
When you create Impromptu and
However, it is not recommended that you maintain dimension tables through fact PowerPlay metadata deliveries,
builds. When delivering BI metadata, you must declare dimensional deliveries, but DecisionStream creates Impromptu queries
to feed the Transformer Model. These
you should disable them in the build. The result is that actual dimensional data is queries reference the dimensional tables in
not written to the data mart. your data mart. Therefore, the table names
in your dimensional delivery modules must
The slide shows the dimensional delivery modules marked as disabled (the match the dimension tables in your data
Enabled option on the shortcut menu is not selected). mart, or the Impromptu queries fails.
With the dimensional deliveries declared, on the Dimension tab of the Metadata
Delivery Properties dialog box, specify the source of delivery and a dimension
delivery module for each dimension.

Demo 21-1
Add a Fact Table to the Fact Delivery

Module

DELIVERY IN DEPTH
Demo 21-1: Add a Fact Table to the Fact Delivery

Module
Purpose:
We want to add a relational table delivery to the DemoSales
fact build. We will ignore dimension delivery in this fact build
because we have already created the dimension tables that we
require by using dimension builds. We will add a new delivery
called F_DemoFact, execute the DemoSales build, and then
view the results in SQLTerm.
Task 1. Add a fact table to the Fact Delivery of the

DemoSales fact build.
1. In GO_Catalog, if necessary, expand the DemoSales fact build, and
then expand the Delivery Modules folder.
2. Right-click Fact Delivery, and then click Insert Relational Table
Delivery.
3. In the Name box, type F_DemoFact.
Make sure the Enabled check box is selected, because we want the
delivery to be processed.
4. Click the Table Properties tab, and then in the Connection box click
TargetConnect.
5. In the Table name box, type F_DemoFact.
6. Click the Module Properties tab, and then ensure that APPEND is
selected in the Refresh Type box.
7. Click OK to close the Table Properties window.

Task 2. Limit the number of records to be processed to

1000.
1. Under the DemoSales build, right-click DataStream, and then click
Properties.
2. Click the Input tab, and then in the Maximum input rows to process,
type 1000.
3. Click OK.
4. Right-click F_DemoSales (Relational Table), and then click Enabled
to disable the delivery of this fact table.
Task 3. Execute the build and view the results.
2. In the Execution mode box, click Normal, and then click the Override
build settings check box to clear it.
3. Click OK.
A command window opens and a log file is created that tracks the
process of the build. Notice that 1000 records are inserted into the
F_DemoFact table.
5. Open SQLTerm.
F_DemoFact, and then click Select rows.

DELIVERY IN DEPTH

The F_DemoFact table contains the 1000 rows that were inserted during
the build execution process.
9. Close SQLTerm and keep DecisionStream open for the next demo.
Results:
We added a relational table delivery to the DemoSales fact
build called F_DemoFact. We then executed the DemoSales
build and viewed the results in SQLTerm.

Partitioned Delivery
Using DecisionStream, you can set vertical partitioning, horizontal

partitioning, or both.
Vertical Partitioning
Horizontal Partitioning
Horizontal + Vertical Partitioning
DecisionStream gives you the choice of delivering partitioned data to different

tables within the same or different databases. However, you do not have this
choice when you use the Fact Build Wizard to create the fact build. You must do
partitioning in the fact build manually.
You can choose from three methods to partition your data to meet your delivery
requirements:
• Data rows. This method is horizontal partitioning and is performed by

applying filters. Horizontal partitioning is discussed in the next module.
• Build elements. This method is vertical partitioning and is performed by

subscribing to the elements that you want to deliver.
• Data rows within the build elements subscribed for delivery. This
method involves both vertical and horizontal partitioning.

DELIVERY IN DEPTH
Vertical Partitioning
Vertical partitioning is performed in a fact delivery by

subscribing to the elements to be delivered.
Columns included
in Fact Delivery
Columns excluded
from Fact Delivery
For each delivery, you can choose the transformation model elements to which
each delivery subscribes. This vertical partitioning is performed by subscribing to
the elements that you want to deliver. For example, you can partition the data as:
• sales fact delivery, including all the transformation model elements
• revenue fact delivery, including Product, Vendor, SalesStaff,

TotalRevenue, and OrderDate
By subscribing to certain elements in these fact deliveries, you deliver sales data
and revenue data to two different fact tables within the same data mart.

Demo 21-2
Apply Vertical Partitioning for the

Fact Delivery

DELIVERY IN DEPTH
Demo 21-2: Apply Vertical Partitioning for the Fact

Delivery
Purpose:
We want to create a new fact delivery in the DemoSales build
called F_CustomOrders. We will vertically partition the
incoming fact data by subscribing only to the transformation
model elements that we want to see in this table. We will then
execute the DemoSales build and view the results in SQLTerm.
Task 1. Create the F_CustomOrders fact table and

subscribe to elements to be delivered.
1. In the DemoSales build, under Delivery Modules, right-click Fact
Delivery, and then click Insert Relational Table Delivery.
2. In the Name box, type F_CustomOrders.
3. Click the Table Properties tab, and then in the Connection box, click
TargetConnect.
4. In the Table name box, type F_CustomOrders.
5. In the list of elements, click the UnitSalePrice and Quantity check
boxes to clear them.
The results appears as follows.
6. Click OK.
7. Right-click F_DemoSales, and then ensure that Enabled is not selected.
This will disable the delivery of the fact table.
8. Right-click F_DemoFact, click Enabled to disable the delivery of the
fact table, and then save the catalog.


1. Right-click DemoSales and click Execute, saving changes if prompted.
2. Click OK.
A command window opens and a log file is created that tracks the build's
progress.
Notice that 1000 records have been inserted into the F_CustomOrders
table.
3. Press Enter to close the command window.
4. Open SQLTerm, and then in the Database for the SQL Operations
box, click TargetConnect.
5. In the Target Database Objects pane, expand TargetConnect, right-
click F_CustomOrders, and then click Select rows. The SQL statement
appears in the SQL Query pane.
6. Execute the SELECT statement.
Notice that, unlike the F_DemoSales and F_DemoFact tables, the

Quantity and UnitSalePrice columns are not in the table.
7. Close SQLTerm, and then keep the catalog open for the next demo.
Results:
We created the F_CustomOrders fact table with partitioned
data delivery. We set vertical partitioning by subscribing only
to the elements that we want to see in the table. We then
executed the DemoSales build and viewed the results in
SQLTerm.

DELIVERY IN DEPTH
Create Indexes on Fact Tables
Indexes are used to improve query and update performance.

DecisionStream provides three index types:
unique B-tree index is used for high-cardinality primary key
columns (for example, product_id, customer_id)
repeating B-tree index (default) is used for high-cardinality
foreign key columns (for example, sales_product_id,
sales_customer_id)
bitmap index is used for low-cardinality columns
(for example, indexes on gender code or yes/no columns)
repeating B-tree index

(blank window)
unique B-tree index
bitmap index
DecisionStream provides three of the most common index types: Instructional Tips
Stress to students the difference between
keys and indexes. Keys are the logical
• The unique B-tree index is used for primary key columns. It builds a tree elements, whereas indexes are the
structure of possible values with a list of row unique identifiers that have physical elements. Usually, indexes are
the leaf value. It involves moving up the tree and finding the rows that created on columns that already have keys.
contain a given value. The index can be created on one or many columns.
• The repeating B-tree index is used for foreign key columns. It is a default
index type for DecisionStream. It is built in a similar way to the unique B-
tree index.
Key Information
• The bitmap index is used with dimension tables and fact tables, where
When you work with a large data
the constraint on the table results in a low cardinality match with the warehouse, you probably would not use
table. This index represents a string of bits for each possible value of the DecisionStream to initially create indexes.
column. Each bit string has one bit for each row. Each bit is set to 1 if They are usually created through the index
the row has the value that the big string represents, and is set to 0 if the plan by the DBA.
row does not have that value.

Use Fact Table Indexes
When updating data in tables, using indexes makes it faster to

retrieve the rows that must be updated.
The B-tree index is commonly used for dimension element and
fact columns.
DecisionStream provides two options, single-column and
composite, for fact table indexes.
For better performance and flexibility, a column can have a
single-column index and be a part of a composite index.
Single-column Indexes Composite Index
Orders
orders_product
product_no(FK)
orders_date order_date(FK) orders_prod_date_cust_pkey
customer_no(FK)
orders_customer quantity
unit_price
orders_order order_code
The most common index for a fact table is the B-tree index. When we declare a
primary key constraint on a table, a unique index is built automatically on those
columns in the order in which they were declared. A fact table contains not only a
primary key but also foreign keys represented by dimension elements; therefore, it
is important to create repeating B-tree indexes on those columns.
DecisionStream provides two options for creating fact table indexes:
• A single-column index created separately for each column.
• A composite index, which is a single index based on the fact table keys
represented by dimension elements.
It used to be more common to define multiple composite indexes. Now, it is

recommended that you create both single-column indexes on individual fact table
keys and a composite key on a set of fact table keys.

DELIVERY IN DEPTH
Single-column Index
A single-column index can be created in fact and dimension tables.
Create a repeating
B-tree index for a
single column in
the fact table
Select to ignore
errors while re-
creating the index
Select to drop and

re-create the index
Create a unique
B-tree index on a
primary key column
in the dimension table
A fact table can have a composite index and single-column indexes. It is a good
practice to create a composite index on a group of the table key columns and a
single-column index on individual fact key columns that likely will be used as a
join condition, filter, or group. In the slide example, a repeating index is created
on the SalesStaff dimension element column, because it is a high-cardinality
column and often will be used in join conditions and filters.
All dimension tables must have a single-column primary key and therefore
one unique index on that key. Larger dimension tables have more than one
single-column index. However, small dimension tables seldom benefit from
additional indexing. In a dimension table that has the business and surrogate
key columns, the unique index is created on the surrogate key column, which
is a primary key column. The repeating index is created on the business key
column.
Indexes are important for faster data retrieval. However, they are memory
consuming and slow the system's load and maintenance process. Because indexes
are important for updates, they are not needed for inserts. By selecting the
Recreate Index box in the Index Properties, you can drop the index before an
insert and re-create it after. To ignore errors while DecisionStream is re-creating
the index, select the Suppress Errors box.

Unique Index Versus Key
An index (unique or not) is a fast method of finding a row(s) in

a physical table. It is purely physical.
A key is a unique identifier for an Entity. It is purely logical.
Do not assume a unique index is also the primary key. They
usually are, but not always.
It is important to understand the difference between indexes and keys. Indexes

are the physical elements, whereas keys are the logical elements.
Indexes are required for a record search in a large amount of data. For example, if
you look for EmployeeCode 1234, the database does not have to scan through all
50,000 employees. It uses the index to find the record. Indexes are especially
useful for updates.
Keys ensure a column number uniqueness. For example, only one employee can
have EmployeeCode 1234.
Because indexes and keys are used for different purposes, they are not
interchangeable. However, it is recommended that you create indexes on key
columns to support both the logical and the physical data structure.

DELIVERY IN DEPTH
Update with Dimension Element Keys
When updating a fact table, the new data locates old records.
By default, if no explicitly defined keys exist, DecisionStream looks
for dimension element keys.
DecisionStream modifies the SQL statement to include the key
columns in the WHERE clause of the UPDATE statement.
Update Sale
Key columns Set Quantity = 500 Updated record
Where Staff = 50
and Product = 1
and Date =‘19970314’
Sale
Staff
Product
Date
Quantity
When the data is updated in a fact table, DecisionStream, by default, looks

for dimension element keys to include them in the WHERE clause of the
UPDATE statement.
If you do not want all dimension element keys to be included in the WHERE
clause, you must define them explicitly by creating a key for each dimension
element that you do want to include.

Fact Table Keys
You may create keys on fact columns if you intend to include

them in the WHERE clause of the UPDATE statement.
Creating keys on fact columns overwrites the default keys.
If changing the defaults, explicitly include dimension elements as
keys also or they will not be used.
Create keys on
dimension element
columns if you want to
include them in the
WHERE clause
Fact key column

Select Update/Insert or
Update if you want to
update a fact table
DecisionStream creates the fact table keys on dimension element columns by

default.
You can create keys on non-dimension columns if you want to use them in the
UPDATE statement. However, as soon as you explicitly define the keys,
DecisionStream overwrites the default key columns with the new ones. To
preserve dimension element column keys, you must define them explicitly.
To perform an update, you must change settings on the Module Properties tab in
the Table Delivery Properties dialog box by selecting UPDATE or
UPDATE/INSERT. You then create keys on columns that you want to include
in the Where clause if they are not created yet.

DELIVERY IN DEPTH
Composite Index in a Fact Table
Click the Index box at the fact Table

Delivery Properties to select it
Set the index properties in the

Index Properties dialog box
Select a type for the

composite index
DecisionStream supports only a single composite index per fact table. It creates
the index on all key columns that are defined either by default or explicitly in the
fact table.
In the slide example, the Index Properties dialog box for the sales_comp
composite index does not specify the segments of the index. DecisionStream
creates the index on all key columns. In this example, the segments of the index
are SalesStaff, Vendor, Product, DateOrder, and OrderCode.
You can create a composite index on the Table Properties tab of the Table
Delivery Properties dialog box:
• Click the Index box to select it, and then click the Index Properties
button.
• In the Index Properties, provide a name for the index, following the
index naming convention.
• In the Index Properties dialog box, select a type for the index. Because
DecisionStream creates only one composite index per table, this index
will be of the UNIQUE type.

Update Facts Using Composite Indexes
The most common need for a composite index is to locate a

fact row for updating.
Composite indexes are of limited use in reporting. Unless a
query includes the leading segment (column) of the
composite index, the last one will not be used.
Even if DecisionStream offers the two options for indexes, multiple single-
column indexes and a single composite index, it is more appropriate to create a
single-column index on each fact table key and then let the optimizer combine
those indexes as appropriate to resolve the queries.
However, when users try to update a row and the first column of the index is part
of the UPDATE statement, the SQL Query optimizer uses the composite index
to speed up the search process.

DELIVERY IN DEPTH
Use Filters to Partition Data
Level and output filters can partition data between multiple

deliveries.
Partition filters partition data between multiple deliveries.
Right-click Level Filters, and then

click Insert Level Filter
Each delivery can include many level filters; however, each delivery cannot have
more than one output filter.
Level filters specify level combinations that the delivery module needs to accept.
The output filter accepts output processed by the level filters and specifies the
data that the delivery module needs to accept.
Filter guidelines:
• You should have a delivery module for each combination of dimension

and hierarchy levels that you want to deliver. Each of these modules
should specify an appropriate level filter.
• A level filter is faster to evaluate than an output filter.
• Level filters are performed before output filters.

Level Filter Partitioning
D D
lev Cust Date Qty
M 1 199901 1
M 2 199901 5
M 1 199902 1
M 2 199902 5 Monthly
M 1 199903 1
M 2 199903 5
M 3 199903 4
Q 1 19991 3 Qtr
Q 2 19991 15
Q 3 19991 4
When delivering multiple levels,
always include a level filter in
each delivery
You can partition the data so that each delivery delivers a subset of the data rows.
Applying filters performs this delivery type, termed horizontal partitioning.
The slide example shows two fact deliveries, monthly sales and quarterly sales. A
level filter has been applied to each. If a filter is not applied to these deliveries,
DecisionStream delivers monthly and quarterly rows to each table of each
delivery.
DecisionStream knows the aggregate level of every row in the Data stream;
therefore, the level filters reject all rows except the monthly rows from the
monthly table, and the quarterly rows from the quarterly table.
DecisionStream\Help\Contents and Index\Index\level filter

DELIVERY IN DEPTH
Considerations: Level Filter Partitioning
D D
lev Cust Date Qty
M 1 199901 1
M 2 199901 5
M 1 199902 1 Monthly
M 2 199902 5
M 1 199903 1
M
M
Q
2
3
1
199903
199903
19991
5
4
3
X
Q 2 19991 15 Qtr
Q 3 19991 4
By default, DecisionStream offers all rows The "Exclusive" property on the Monthly
to all deliveries, although the level filter table will not offer rows to any subsequent
can reject them. delivery tables.
Even though level filters prevent unwanted rows from being delivered to the
aggregate tables, DecisionStream offers all rows to every delivery. Offering all
rows creates performance overhead and can add to processing time.
If rows delivered to one table are not meant to be written to other tables, set the
Exclusive option on that delivery. Setting this option saves internal processing
time because the rows that were written are flushed from memory and do not
have to be checked by each subsequent delivery.

Examples of Level Filters
[DIM] [LEVEL] * [DIM] [LEVEL]
period.year * location.state
A level combination on two dimensions.
period.year.1997,1998 * location.state
Two members from a single dimension combined

with a second dimension level.
Based on your data mart requirements, you can determine what type of filters to
apply and to what level(s) they apply.
Express these filters in the form: element1.level1[*element2.level2], where

element1, element 2 are dimension elements, and level1, level 2 are hierarchy
levels within the corresponding dimension.
You can also specify members of a level, as shown in the slide example.

DELIVERY IN DEPTH
Output Filters
Expression:
uses same syntax as derivations
must return TRUE or FALSE
Output Filter Delivered

Data
Expression involving:
Output • build elements
• members
When you apply an output filter to the delivery, only those data rows for which
the expression evaluates to TRUE are delivered.
Usually, you would use a delivery's output filter to horizontally partition data
other than by hierarchical levels.
For example, you can partition data by sales. Two deliveries, one having the
output filter units_sold > 0, and the other having the output filter units_sold<=0,
would partition the data according to whether the company had sold that product
or the fact row contained bad data.
Each delivery can have only one output filter.
DecisionStream\Help\Contents and Index\Index\output filters

Demo 21-3
Define and Generate Keys and

Indexes on Fact and Dimension Tables

DELIVERY IN DEPTH
Demo 21-3: Define and Generate Keys and Indexes on

Fact and Dimension Tables
Purpose:
The company wants to run updates to the F_DemoSales table
that was created by the DemoSales build. To facilitate this
process, we must define keys on the OrderCode and
dimension element columns. We must then create single-
column indexes on dimension element columns and a
composite index on all key columns in the F_DemoSales table.
For convenience, we will create a separate build to maintain
updates to the table.
The company also requires regular updates on dimension

tables. We must create single-column indexes on the
surrogate and business key columns in the D_ProductH table.
Task 1. Define keys on OrderCode, ProductNumber,

VendorSiteCode, and DateOrder in the
F_DemoSales table.
1. In the DemoSales build, under Fact Delivery, double-click
F_DemoSales (Relational Table).
2. Click the Enabled check box to select it and to make the table available
for delivery.
3. Click the Table Properties tab.
4. In the OrderCode row, click the Key check box to select it. A button
appears in the Element Properties column for OrderCode.
5. Click the ellipsis button .
The Delivery Element Properties dialog box appears.
6. In the Value if NULL box, type 0.
7. Ensure the Additive Update check box, if available, is deselected, and
then click OK.

8. Repeat steps 4 to 7 for ProductNumber, VendorSiteCode, and

DateOrder.
The result appears as follows.
Instructional Tips
We will use these four elements (marked
as keys) as parts of a composite key.
Task 2. Create single-column indexes on dimension

element columns in the F_DemoSales table.
1. In the ProductNumber row, click the Index check box to select it.
An ellipsis button appears in the Index Properties column for
ProductNumber.
The Index Properties dialog box appears.
3. In the Name box, type demo_product.
Leave the Type box blank.
4. Click the Suppress Errors and Recreate Index check boxes to select
them.
5. Click OK.

DELIVERY IN DEPTH
6. Repeat steps 1 to 5 for VendorSiteCode and DateOrder to create the

demo_vendor and demo_date indexes, respectively.
Task 3. Create a composite index on the F_DemoSales

table key columns.
1. Click the Index check box above the Columns pane.
The Index Properties button is enabled.
2. Click the Index Properties button.
3. In the Name box, type demo_comp.
4. In the Type box, click UNIQUE.
5. Ensure Suppress Errors and Recreate Index check boxes are selected.
The Index Properties dialog box appears as follows.
6. Click OK.
7. Click the Module Properties tab.
8. In the Refresh Type box, click TRUNCATE if necessary, and then click
OK to close the Table Delivery Properties window.

9. Right-click F_CustomOrders (Relational Table), and then click

Enabled to deselect it.
11. Right-click the DemoSales build, and then click Duplicate, saving
The DemoSales:2 fact build is created.
Now we want to modify the new build, so that the fact delivery for this
build updates the F_DemoSales table using the created keys and indexes.
Task 4. Modify the DemoSales:2 build properties.
1. Right-click DemoSales:2 and, then click Properties.
2. In the Name box, type DemoSalesUpdate, and then click OK.
3. Expand DemoSalesUpdate, Delivery Modules, and Fact Delivery.
4. Under DemoSalesUpdate, double-click F_DemoSales.
5. In the Name box, type F_DemoSalesUpdate.
6. Click the Table Properties tab, and then click the Index Properties
button.
7. Click the Suppress Errors and Recreate Index check boxes to deselect
them.
8. Click OK.
9. In the Element list, click ProductNumber.
Buttons for Index Properties and Element Properties appear.

DELIVERY IN DEPTH
10. In the ProductNumber row, click the Index Properties button.

11. Click the Suppress Errors and Recreate Index check boxes to deselect
them, and then click OK.
12. Repeat steps 9 to 11 for VendorSiteCode and DateOrder.
You must click the element before the Index Properties button will
appear.
13. Click the Module Properties tab, change the Refresh Type from
TRUNCATE to UPDATE, and then click OK to close the Table
Delivery Properties window.
Task 5. Execute the DemoSales fact build and analyze the
results.
3. In the Trace area, click the Override build settings check box to select
it, and then click the Progress and SQL check boxes to select them.
4. Click OK.
progress of the build.

5. Press Enter to close the command window, and then open the log file.
Key Information
You may receive an error at the end of the
fact build due to the dropping of indexes.
This is acceptable, as long as the fact build
completed successfully.
Notice that DecisionStream creates indexes on the dimension element

columns ignoring the processing errors. Also, it creates a composite
index on a group of the table key columns.
6. Close the log file, and then close the log window.
Task 6. Execute the DemoSalesUpdate fact build and
analyze the results.
1. Right-click DemoSalesUpdate, and then click Execute.
2. In the Trace area, click Override build settings, and then click the SQL
check box.
progress of the build's execution.

DELIVERY IN DEPTH
4. Analyze the log file.

Notice that DecisionStream has updated 1000 rows in the F_DemoSales
table, and the key columns are included in the WHERE clause. The
result appears as shown below.
5. Close the log file, and then close the Log window.
Task 7. Create indexes on the surrogate and business key
columns of the D_ProductH dimension table.
1. Under Builds, expand the Product dimension build, and then double-
click D_ProductH.
2. Click the Columns tab, and then in the skey row, click the Index box to
select it.
An ellipsis button appears in the Index Properties column for key.
4. In the Name box, type prod_pkey.
5. In the Type box, click UNIQUE.
We want the surrogate key to be a primary key for the table.
6. Click the Suppress Errors and Recreate Indexes check boxes to
deselect them.
Because we will be updating the table, we do not want the indexes to be
dropped.
7. Click OK.
8. In the ProductNumber row, click the Index check box to select it.
An ellipsis button appears in the Index Properties column for
ProductNumber.
9. Click the ellipsis button.
10. In the Name box, type prod_product.
Leave the Type box blank.

11. Click the Suppress Errors and Recreate Indexes check boxes to
deselect them.
12. Click OK, and then click OK to close the Dimension Table Properties
window.
We want to see how DecisionStream updates the dimension data
through a dimension build. Therefore, we must add extra records as static
members.
Task 8. Add static members to the Products level of the
ProductH hierarchy and execute the Product
dimension build.
1. In the Library folder, expand the Dimensions folder (if necessary), Instructional Tips
expand the ProductD dimension (if necessary), the ProductH There are already products in the ProductH
hierarchy with ID numbers of 110 (Blue
hierarchy, and the Products level. Steel Putter) and 115 (Course Pro Gloves).
2. Double-click Static Members. You may want to explore the hierarchy and
show the students which products already
The Products Static Members window opens. use these numbers.
3. Click Add. Instead of adding static members, you may
4. In the ProductNumber column, type 110. want to include rows of data in a text file,
and then add the text file as a data source
5. Double-click the ProductName box, and then type Red Pencils. in the catalog.
6. Repeat steps 3 to 5 to add another member, using 115 for
ProductNumber and Coffee Tables for ProductName.

DELIVERY IN DEPTH
7. Click OK to close the Products Static Members window, and then save
the catalog.
8. Right-click the Product build, and then click Execute.
9. In the Trace area, if necessary, click the Override build settings check
box to select it, and then click the SQL check box.
Other check boxes may be selected in the Trace area already. This is
acceptable.
10. Click OK, and then press Enter when the build has finished executing to
close the command window.
11. Analyze the log file.
Notice that there are two updates to the table. The surrogate key column
is placed in the WHERE clause of the UPDATE statement. Remember
that we did not create a key on the surrogate key column.
DecisionStream did it by default.
12. Close the log file and the Log window.
13. Under the Products level of the ProductH hierarchy, right-click Static
Members, and then click Delete.
A dialog box appears confirming the deletion.
14. Click Yes, save the catalog, and then keep DecisionStream open for the
upcoming workshop.
Results:
We have defined keys on the OrderCode and dimension
element columns. Then we created single-column indexes on
dimension element columns and a composite index on all key
columns in the F_DemoSales table. We also created single-
column indexes on the surrogate and business key columns of
the D_ProductH table.

Summary

configured fact, dimension, and metadata delivery
modules
applied vertical partitioning by subscribing to the
elements
created indexes on fact and dimension tables
updated fact data using keys

DELIVERY IN DEPTH
Workshop 21-1
Manually create a fact build and add

a fact delivery.

Workshop 21-1: Manually Create a Fact Build and Add

a Fact Delivery
We want to create a separate fact table in our data mart that tracks the number of
products that were sold by the Great Outdoors over the past few years. Your
assignment is to manually create a fact build that will process the source data, and
then add a relational table delivery to this build. For this fact delivery, you want to
create a composite index on all the dimension element columns.
• Add the Sales fact build to the build tree and add a data source that uses
data from the GOSOrderDetail and GOSOrderHeader tables in the
SourceConnect database.
• Add the Product, Vendor, Sales Staff, and OrderDate dimension

elements to the transformation model.
• Add the Quantity measure and OrderCode attribute to the

• Map the columns from the data source to items in the DataStream, and
then map the DataStream items to the elements of the transformation
model.
• Add a relational table delivery to the Fact Delivery folder.
• Add a unique composite index on the table delivery. At this point, you
want the index to be recreated. However, you do not need to track any
errors.
• You want to process only 1000 records.
page.
Table.

DELIVERY IN DEPTH

1. Manually create the Sales Build tree • Name the build Sales.
fact build. Data Source Properties • Use the following SQL code:
window SELECT a.ÒrderCode`,
`ProductNumber`,
`Quantity`,
`VendorSiteCodeGO`,
`SalesStaffCode`,
ÒrderDate`
FROM `GOSOrderDetail` a,
`GOSOrderHeader` b
WHERE a.ÒrderCode` = b.ÒrderCode`
2. Format the OrderDate Data Source Properties • Call the derivation DateOrder and
column using a data window, Derivations tab use the following formula:
source derivation. Concat(SubStr(OrderDate, 1, 4),
SubStr(OrderDate, 6, 2),
SubStr(OrderDate, 9, 2))
3. Add the Product and Transformation Model • Use the ProductH and VendorH
Vendor dimension hierarchies, respectively.
elements to the build.
4. Add the SalesStaff and Transformation Model • Use the StaffH and TimeH
OrderDate dimension hierarchies, respectively.
elements.
5. Add a measure and an Transformation Model • Quantity and OrderCode.
attribute to the build.
6. Map the data source DataStream Properties
columns to DataStream window
items, and the DataStream Transformation Model
items to fact build Mapping window
elements.
7. Add a fact delivery to the Table Delivery Properties • Call the delivery and table F_Sales.
Sales build. window
• In the Table Properties tab, select a
target database.
• In the Module Properties, click
Truncate in the Refresh Type box.
8. Create a composite index Table Delivery Properties • Call the index sales_comp.
on the fact table and limit window
• In the Maximum input rows to
data input. DataStream Properties process box, insert 1000.
window
9. Execute the build and Execute Build dialog box • Click the Progress, SQL, and
view the results. SQLTerm ExecutedSQL boxes to select
them in the Trace window.



The result in SQL Term appears as follows.
The result in the DOS window appears as follows.

22
15. Aggregation
16. Pivoting
Developers

THE COMMAND LINE INTERFACE
Objectives

differentiate between the Command Line Interface (CLI)
and DecisionStream Designer
use the most common CLI commands
execute builds and JobStreams from the command line

The Command Line Interface (CLI)
The Command Line Interface (CLI) supports creating and

executing projects from the Windows or UNIX command line.
Some DecisionStream functions can be used only from within
a catalog:
JobStreams
Auditing
User-defined functions (UDFs)
DecisionStream provides a command line interface (CLI) for all platforms that it Technical Information
supports. You can therefore use DecisionStream Designer to develop builds on You may have to configure the computer
the 32-bit Windows platform and then use the CLI to deploy these builds on before using the CLI. This is dependent on
supported UNIX or Windows platforms. the operating system in question.
Operating systems differ; therefore, some differences in syntax are inevitable.

However, where possible, the CLI syntax is independent of the operating system.
Within the DecisionStream language and CLI, a DecisionStream project

consists of a number of configuration files. On the command line, you
invoke the various DecisionStream components to perform the data
manipulation that these files define.
You can perform auditing only on catalog-based projects, and can use user-
defined functions and JobStreams only within a catalog.
DecisionStream\Help\Contents and Index\Index\commands

CLI Programs
The CLI consists of several programs, each with different

functionality:
DATABUILD CATBACKUP
DIMBUILD CATRESTORE
RUNDSJOB CATUPGRADE
CATIMP SHOWREF
CATEXP SQLTERM
CATLIST
Command Description Instructional Tips

DATABUILD Executes the fact build that you specify in command SQLTerm does not use a graphical user
line parameters. You can also use it to list all the interface when it is invoked from the
delivery modules to which your DecisionStream license command line. It is used in the same style
as the command prompt itself.
gives access or to list the composition properties of an
available module.
DIMBUILD Executes the dimension build that you specify in
command line parameters.
RUNDSJOB Executes the JobStream that you specify in command
line parameters.
CATIMP Imports a component from a text file to a catalog.
CATEXP Exports a component from a catalog to a text file.
CATLIST Lists the contents of a catalog.
CATBACKUP Performs a backup of a DecisionStream catalog to a
text file.
CATRESTORE Restores a backup of a DecisionStream catalog from a
text file.
CATUPGRADE Upgrades a DecisionStream catalog.
SHOWREF Lists the members of a file-based reference structure.
SQLTERM Starts the DecisionStream terminal for SQL databases
(as discussed in Module 2, "Create a Catalog").
You can use these commands to work with text-based or catalog-based
specifications. For details about each command, see Chapter 23, "Commands" of
the User Guide.

Execute a Fact Build
Use DATABUILD to execute a fact build from the command

line.
"d:\program files\cognos\cer3\bin\databuild.exe" -P -c "ODBC" "DSN=DS GO_Catalog" "GO_Fact" -XFDM
When you execute a fact build from within the catalog or the command line
interface, the DecisionStream Engine runs the DATABUILD command.
The advantage of using DATABUILD directly to execute a fact build is that after
you provide the correct parameters, you can use the command in a batch file to
automate the execution process. This makes it possible for you to process a fact
build from outside DecisionStream.
You can modify the behavior of the DATABUILD command by adding options.
For example, adding -c to the DATABUILD command, specifies that the fact
build definition should be retrieved from a catalog. You can also use
DATABUILD to list all the delivery modules to which your DecisionStream
license gives access or to list the properties of an available module (such as a DB2
LOAD delivery).
For information about the syntax of this command, see Chapter 23,
"Commands" of the User Guide.
DecisionStream\Help\Contents and Index\Index\commands:DATABUILD

Execute a Dimension Build
Use DIMBUILD to execute a dimension build from the

command line.
"d:\program files\cognos\cer3\bin\dimbuild.exe" -P -c "ODBC" "DSN=DS GO_Catalog" "Product"
When you execute a dimension build from within the catalog or from the
command line, the DecisionStream Engine runs the DIMBUILD command. As
with DATABUILD, the advantage of using DIMBUILD directly to execute a
dimension build is that, once you have provided the correct parameters, you can
use the command in a batch file to automate the execution process. This allows
you to process a dimension build from outside DecisionStream.
You can modify the behavior of the DIMBUILD command by adding options.
For example, adding -C to the DATABUILD command, executes the dimension
build in check only mode.
DecisionStream\Help\Contents and Index\Index\commands:DIMBUILD

Execute a JobStream
Use RUNDSJOB to execute a JobStream from the command

line.
"d:\Program Files\Cognos\cer3\bin\rundsjob.exe" -P "ODBC" "DSN=DS GO_Catalog" "AutomationJS"
When you execute a JobStream from within the catalog or from the command
line, the DecisionStream Engine runs the RUNDSJOB command. As with
DATABUILD and DIMBUILD, the advantage of using RUNDSJOB directly to
execute a JobStream is that after you provide the correct parameters, you can use
the command in a batch file to automate the execution process. This makes it
possible for you to process a JobStream from outside DecisionStream.
You can modify the behavior of the RUNDSJOB command by adding options.
For example, adding -L to the RUNDSJOB command, logs JobStream progress
to the specified file.
DecisionStream\Help\Contents and Index\Index\commands:RUNDSJOB

Demo 22-1
Use the CLI Language to Create a

Batch File Executable

Demo 22-1: Use the CLI Language to Create a Batch File

Executable
Purpose:
Management wants to simplify the extraction, transformation,
and loading of transactional data so that the process contains
fewer steps and can be done outside DecisionStream.
Therefore, we will use the DecisionStream language to create a
batch file to execute the Sales fact build that exists in the DS
GO_Catalog. We will then execute this batch file and view the
results.
Task 1. Drop the existing F_Sales table in SQLTerm.

1. Run SQLTerm, and then in the Database for SQL Operations box, click
TargetConnect.
F_Sales, and then click Drop Table.
The DROP TABLE statement is added to the Source SQL tab.
3. Run the query.
The F_Sales table is dropped from the database.
4. Close SQLTerm.
Task 2. Build the command line in Notepad.
1. In the GO_Catalog, right-click the Sales fact build, and then click
Execute.
2. Highlight all the text in the Command Line box, right-click, and then
click Copy.
3. Click Cancel, and then open and maximize Notepad.
4. From the Edit menu, click Paste.
The copied text is pasted into the text file.
5. Modify the command line so that the result looks as follows:
ECHO OFF Instructional Tips
ECHO ON turns command echoing on and
ECHO Executing Sales fact build… messages can be printed to the MS-DOS
"C:\Program Files\Cognos\cer3\bin\databuild.exe" screen.
-P -c "ODBC" "DSN= DS Day1Catalog" "Sales" -XFDM
Ensure that the last two lines of the command are entered on one line in
the text file. Otherwise, the batch file will not work.

Task 3. Save the Execute Sales.bat file.

1. From the File menu, click Save As. The Save As dialog box appears.
2. In the Save as type box, click All Files.
3. In the File name box, type ExecuteSales.bat, and then navigate to
C:\Edcognos\DS7001.
4. Click Save.
The Execute Sales.bat file is saved to a folder on the C: drive.
Task 4. Execute the ExecuteSales.bat file.
1. Close Notepad, and then open and maximize Windows Explorer.
2. Navigate to C:\Edcognos\DS7001\ExecuteSales.bat, and then
double-click the file.
The Sales fact build executes in a separate DOS window and delivers
1000 rows to the target data mart.
4. Maximize DecisionStream, and open SQLTerm.
5. In the Database for SQL Operations box, click Target Connect.
7. Right-click F_Sales, click Add table select statement, and then click
Execute SQL Query .
The query runs and returns 1000 rows. The result appears as shown
below.
8. Close SQLTerm, close Windows Explorer, and then close

DecisionStream.
Results:
We have created a batch file that will execute the Sales fact
build outside DecisionStream and then ran this file to extract,
transform, and load sales data into the target data mart.

Variables and the CLI
A DecisionStream variable is a name and value pair that

resides in the computer’s memory.
Variables affect DecisionStream programs, store
build/procedure values, and control the flow of JobStreams.
We discussed DecisionStream variables in the context of UDFs and JobStreams

in Module 12, "User-Defined Functions and Variables" and Module 13,
"JobStreams", respectively. In both cases, they exist as name and value pairs that
reside in the memory of the computer. That is, each variable must have a name
and must be assigned a value at some point. Variables are significant partly
because they affect the functionality of DecisionStream commands (such as
DATABUILD). Also, they can store values for use in builds and procedures, and
they can control the flow of JobStreams.
DecisionStream\Help\Contents and Index\Index\variables

Where Variables Can Be Declared
Variables can be declared in several places in DecisionStream.
1. Within the Properties window of a JobStream or build
2. From the command line when using a DecisionStream command

C:>DATABUILD –cGO_Catalog "DSN=GOCatalog" Sales –VMSG="The JobStream is initialized."
3. Within the environment of the operating system (Windows or UNIX)
Variables can be declared:
• Within a command that invokes a DecisionStream program (command-

line parameters). For example, when you invoke the DATABUILD
command from the command line, you can use the TRACE_VALUES
variable to determine the events that DecisionStream traces during
execution of a build.
• Within a build specification (build variables). These variables are specified

on the Variables tab of a fact or dimension build's Properties window.
• Within a JobStream. These variables are specified on the Variables tab of

a JobStream's Properties window. Also, as noted in Module 13,
"JobStreams", most JobStream nodes contain a Result variable that
indicates the success or failure of the node's execution.
• Within the environment of the operating system (environment variables).
For more information about variables, see Chapter 15 of the User Guide.
DecisionStream\Help\Contents and Index\Index\variables:declaring

Summary

differentiated between the Command Line Interface (CLI)
and DecisionStream Designer
used the most common CLI commands
executed builds and JobStreams from the command line

A
Appendix A
Step-by-Step Solutions

Developers
A-2 © 2003, Cognos Incorporated

STEP-BY-STEP SOLUTIONS
Workshop 2-1: Define connections to data sources

Task 1. Add the Stock, Output, and Reference connections
to Day1Catalog.
1. Right-click Connections, and then click Insert Connection.
2. In the Alias box, type Stock.
3. Click the Connection Details tab, and in the list in the left pane, click
ODBC.
4. In the Data Source Name box, click DS Stock, and then click Test
Connection.
5. Click OK to close the DecisionStream Designer dialog box and then
6. Repeat steps 1 to 5 to create two connections named Reference and
Output.
These will be connected to the DS Reference and DS Output data
sources, respectively.
Task 2. Use SQLTerm to view the data.
1. In the toolbar, click the SQLTerm button .
SQLTerm opens. Maximize the window if necessary.
2. In the Database for SQL Operations box, click Stock.
3. In the Database Objects pane, double-click Stock.
The ds_stock table is now available for analysis.
4. Right-click ds_stock, and then click Select rows.
© 2003, Cognos Incorporated A-3

5. Click Execute SQL Query .

The query runs and reads 10861 rows of data.
6. Click Clear SQL Query and Results .
7. In the Database for SQL Operations box, click Reference.
8. Repeat steps 3 through 6 to analyze ds_fiscal and other tables within the
Reference data source.
9. Close SQLTerm.
Task 3. Save and backup the catalog.
1. In the toolbar, click Save .

2. From the File menu, click Backup Catalog.
The Backup Catalog dialog box appears.
3. Navigate to C:\Edcognos\DS7001.
4. In the File name box, type Wkshp2-1, and then click Save.

Workshop 3-1: Create a Hierarchy from the Columns of

Task 1. Create the Fiscal dimension.
1. In the Tree pane, right-click the Dimensions folder, and then click
Insert Reference Dimension.
2. In the Name box, type Fiscal, and then click OK.
Task 2. Create the Fiscal hierarchy.
1. On the DecisionStream toolbar, click Hierarchy Wizard .

The Hierarchy Wizard opens.
2. Ensure that the Create the hierarchy from the columns of one table. (Star
Schema) button is selected, and then click Next.
3. In the Enter the name of the hierarchy box, type Fiscal.
4. In the Select the reference dimension to use for this hierarchy box, click
Fiscal, and then click Next.
5. Double-click Reference, click ds_fiscal, and then click Next.
6. Click the Include a top level with an ‘All’ member check box to clear
it, and then click Next.
You do not need a top level for this table because the data is coming
from a single table.
Task 3. Create the Year level for the hierarchy.
1. Click the Add button.
The Level Details dialog box appears.
This level will be the top level of the Fiscal hierarchy.
2. In the Name box, type Year.
3. Next to Source column for Id, click Browse for column .
The Select Column dialog box appears.
4. Click fiscal_yr, and then click OK.
5. Next to Source column for caption, click Browse for column .
6. Click fiscal_yr_desc, and then click OK.
7. Click OK to close the Level Details dialog box.
You have just created the top level for the Fiscal hierarchy.

Task 4. Create the Quarter level.

1. With Year selected, click Add.
2. In the Name box, type Quarter.
3. Next to Source column for Id, click Browse for column.
4. Click fiscal_qtr, and then click OK.
5. Next to Source column for caption, click Browse for column.
6. Click fiscal_qtr_desc, and then click OK.
7. Click OK to close the Level Details dialog box.
Task 5. Create the Month level and complete the
hierarchy.
1. With Quarter selected, click Add.
2. In the Name box, type Month.
3. Next to Source column for Id, click Browse for column.
4. Click period_no, and then click OK.
5. Next to Source column for caption, click Browse for column.
6. Click period_no_desc, and then click OK.
7. Click OK to close the Level Details dialog box, and then click Next.
8. Click Next to bypass assigning additional attributes to this level.
9. Click Finish.
Task 6. Examine the Fiscal hierarchy.
1. Double-click the Fiscal hierarchy to expand it.
2. Double-click DataStream, and then double-click the Fiscal data source.



The SQL code you see here represents the call made to the database to
retrieve the data when you execute the query.
The DataStream Properties window opens and displays the mapping of
SQL columns to the attributes of the Fiscal hierarchy.

Task 7. Use the Reference Explorer to examine the Fiscal
hierarchy.
1. Right-click the Fiscal hierarchy, and then click Explore. A dialog box
appears, prompting you to save your changes to the catalog.
2. Click OK.
The Reference Explorer appears.
3. Click OK.
The Reference Explorer runs and returns all the members in the Fiscal
hierarchy.
4. Click (+) to expand FY01 (2001).
The four quarters of fiscal year 2001 are displayed.

Workshop 7-1: Add a Level to a Hierarchy and Create

Dimension Builds
Task 1. Insert the AllStaff level of the StaffH hierarchy.
1. In the GO_Catalog, expand the StaffD dimension, right-click StaffH,
and then click Insert Level.
2. In the Name box, type AllStaff, and then click the Attributes tab.
3. In the Template box, click StaffT, and then click Edit.
Because you are creating the AllStaff level that is independent of any
source table, there is no table to import. You need to create the attributes
for the AllStaff level.
5. Click Add, type ALLStaffId, and then press Enter.
6. Click Add, type ALLStaffCaption, and then press Enter.
7. Click OK.
We are returned to the Level Properties window.
Notice that the template now contains the new attributes for the AllStaff
level and that the attributes appear in the Available attributes list.
8. Double-click the ALLStaffId and AllStaffCaption attributes to add
them to the Chosen attributes list.
9. Click the Id box for ALLStaffId and the Caption box for
ALLStaffCaption.
10. Click OK.

The level is added to the hierarchy.
11. Right-click the AllStaff level, and then click Move Up.
12. Repeat step 11 twice more.
We now need to add a static member to the level to act as the foster
parent for any lower-level members that do not have a parent and to
allow for the roll up of data.
13. Expand the AllStaff level.
14. Double-click Static Members.
The Static Members window opens.

15. Click Add, and then type ALL.

16. Press Tab, and then type ALLStaff.
17. In the same row, in the Foster column, click the check box to select it.
This indicates that the static member ALL will be the foster member of
the level.
18. Click OK.
The static member is added to the ALLStaff level.
Task 2. Create a dimension build for the StaffD dimension.
1. On the toolbar, click Dimension Build wizard .
2. In the Name box, type Staff, then click Next.
Next.
5. Click Next to accept the default schema naming conventions.
6. Click Next to accept the default features of the build.
7. Click Next to accept the default properties of the build.
8. Click the Add Surrogate Keys to the Dimension Tables check box
to select it.
9. Click the Add Change Tracking Attributes to the Dimension Tables
box to select it, and then click Next.
12. In the Builds folder, click Staff to select it.
13. On the toolbar, click Execute.
The DOS window opens and indicates the progress of the Staff
dimension build. 104 rows are inserted into the dimension table.
14. Press Enter after the build is finished executing.
Task 3. Create a dimension build for the VendorD dimension.
2. In the Name box, type Vendor, and then click Next.
3. In the Dimension to be delivered box, click VendorD.
4. In the Reference item to be delivered box, click VendorH(H).
Next.

6. Click Next to accept the default schema naming conventions.

7. Click Next to accept the default features of the build.
8. Click Next to accept the default properties of the build.
9. Click the Add Surrogate Keys to the Dimension Tables check box
to select it
10. Click the Add Change Tracking Attributes to the Dimension Tables
check box to select it, and then click Next.
13. In the Builds folder, click Vendor to select it.
14. On the toolbar, click Execute.
The DOS window opens and indicates the progress of the dimension
build. 391 rows are inserted into the dimension table.
15. Press Enter after the build is finished executing.
Task 4. Examine the attributes in the template for the Staff
dimension build.
1. In the StaffD dimension, expand the Templates folder.
2. Right-click the D_StaffH template, and then click Properties.
3. Maximize the window, and then click the Attributes tab.
Notice that the lowest level ID, SalesStaffCode, is the primary key for the
dimension. The remaining keys for each level are still keys at their
respective levels but are not primary keys for the Staff dimension.
If we want to, we can specify that we want to keep track of the changes
to some of these attributes. This is done in the Dimension Table
Properties window for the Staff dimension build, and is done to maintain
historicity in slowly changing dimensions.
4. Click OK.

Task 5. Examine the attributes in the template for the

Vendor dimension build.
1. In the VendorD dimension, expand the Templates folder.
2. Right-click the D_VendorH template, and then click Properties.
Notice that the lowest level ID, SiteId, is the primary key for the
dimension. The remaining keys for each level are still keys at their
respective levels but are not primary keys for the Vendor dimension.
If we want to, we can specify that we want to keep track of the changes
to some of these attributes. This is done in the Dimension Table
Properties window for the Vendor dimension build, and is done to
maintain historicity in slowly changing dimensions.
4. Click OK.
5. Save your work.

Workshop 13-1: Create a JobStream to Deliver a

Conformed Dimension
Task 1. Create the JobStreamBuild fact build.
1. In GO_Catalog, right-click the DemoSales fact build, and then click
Duplicate.
The DemoSales fact build is copied and given a name of DemoSales:2
3. In the Name and Business name boxes, type JobStreamBuild, and then
click OK to close the Fact Build Properties window.
Task 2. Create the ConformedDimension JobStream.
1. Right-click the JobStreams folder, and then click Insert JobStream.
The JobStream Properties window opens.
2. In the Name box, type ConformedDimension, and then click OK.
Task 3. Add nodes for the Product dimension build and the
JobStreamBuild fact build.
1. Right-click the ConformedDimension JobStream, point to Insert
Node, and then click Dimension Build Node.
The Dimension Build Node Properties window opens.
2. In the Associated Build box, click Product, and then click OK.
Node, and then click Fact Build Node.
The Fact Build Node Properties window opens.
4. In the Associated Build box, click JobStreamBuild, and then click OK.
Task 4. Add a condition node to determine the successful
delivery of the Product conformed dimension.
Node, and then click Condition Node.
The Condition Node Properties window opens.
2. In the Business name box, type ProductStatus, and then click the
Action tab.
3. In the Action area, in the right pane, type Return $RESULT, and then
click OK.

Task 5. Add an Abort procedure to exit the JobStream if

and when the Product dimension build failed.
Node, and then click Procedure Node.
2. In the Business name box, type Abort, and then click the Action tab.
3. In the Action area, type the following:
LogMsg(Concat(‘Product dimension build failed at ’,
DateTimeNow(), ‘. Please contact your DecisionStream
administrator.’))
Task 6. Add a Success procedure to indicate that the
Product dimension build succeeded.
2. In the Business name box, type Success, and then click the Action tab.
3. In the Action area, in the right pane, type the following:
LogMsg(Concat(‘Delivery of Product conformed dimension
succeeded at ’, DateTimeNow(), ‘. Executing JobStreamBuild fact
build...’))
Task 7. Add a Finish procedure to indicate when the fact
data was delivered.
2. In the Business name box, type Finish, and then click the Action tab.
3. In the Action area, in the right pane, type the following:
LogMsg(Concat(‘JobStreamBuild fact build completed at ’,
DateTimeNow(), ‘.’))

Task 8. Link the JobStream nodes into a coherent process.
1. On the JobStream toolbar, click Insert a link .

2. In the JobStream Visualization pane, click the Start node, and then click
and drag the resulting arrow to the Product node.
3. Repeat step 2 to link the Product node to the ProductStatus node.
4. Repeat step 2 to link the ProductStatus node to the Success node.
This link automatically takes a value of True.
5. Repeat step 2 to link the ProductStatus node to the Abort node.
This link automatically takes a value of False.
6. Repeat step 2 to link the Success node to the JobStreamBuild node.
7. Repeat step 2 to link the JobStreamBuild node to the Finish node.
8. In the DecisionStream toolbar, click the Refresh button .

Task 9. Run the JobStream.
1. Right-click the ConformedDimension JobStream, and then click
Execute.
2. Click OK.
The Execute JobStream dialog box appears.
3. Click the Override Job/JobStream settings check box to select it, and
then click the Detail, SQL, Variable and User boxes to select them.
The ConformedDimension JobStream runs in a separate DOS window.

Workshop 21-1: Manually Create a Fact Build and Add

a Fact Delivery
Task 1. Manually create the Sales fact build.
1. In the GO_Catalog, right-click Builds, and then click Insert Fact Build.
2. In the Name box, type Sales, and then click OK.
3. Expand the Sales build, right-click DataStream, and then click Insert
Data Source.
4. In the Name box, type Sales, click the SQL tab, and then ensure that
SourceConnect is selected in the Database box.
5. Click SQL Helper.
7. Right-click GOSOrderDetail, and then click Add table select
statement.
The SELECT statement appears in the SQL Query pane.
8. Edit the SQL code as shown below:
9. Click Execute SQL Query to test the SQL statement.

If you receive an error, check the syntax of the statement and try again.
10. Click OK to close SQL Helper, click Prepare to select it, and then click
Refresh to prepare the columns for use in the fact build.
Task 2. Format the OrderDate column using a data source
derivation.
The Derivations Properties window opens.
3. In the box on the ride side, type Concat(SubStr(OrderDate, 1, 4),

Task 3. Add the Product and Vendor dimension elements

to the build.
1. In the Sales build, right-click Transformation Model, and then click
Insert Dimension.
2. In the Name box, type Product, and then click the Reference tab.
3. In the Dimension box, click ProductD, and then in the Structure box,
click ProductH (H).
Notice that AllProducts, ProductLine, ProductType, and Products levels
appear.
4. If necessary, deselect the Aggregate check box, and then click the Use
surrogates when available check box to select it.
5. In the Products row, click the Output check box to select it.
6. In the AllProducts, ProductLine, and ProductType rows, click the
Dimension check boxes to select them.

8. Repeat steps 1 to 4 to add the Vendor dimension element to the build,

using VendorD in the Dimension box and VendorH (H) in the
Structure box.
9. In the Site row, click the Output check box to select it.
10. In the VendorType, Vendor, and Country rows, click the Dimension
check boxes to select them.
Task 4. Add the SalesStaff and OrderDate dimension
elements.
1. Repeat steps 1 to 4 of Task 3 to add the SalesStaff dimension element to
the build, using StaffD in the Dimension box and StaffH (H) in the
Structure box.
2. In the SalesStaff row, click the Output check box to select it.
3. In the AllStaff, SalesBranch, and SalesBranch rows, click the Dimension
5. Repeat steps 1 to 4 of Task 3 to add the OrderDate dimension element
to the build, using TimeD in the Dimension box and TimeH (H) in the
Structure box.
6. In the Day row, click the Output check box to select it.
7. In the Month, Quarter, and Year rows, click the Dimension check
Task 5. Add a measure and an attribute to the build.
1. Under the Sales build, right-click Transformation Model, and then click
Insert Measure.
The Measure Properties window opens.
2. In the Name box, type Quantity, and then click OK to close the
Measure Properties window.
3. In the Sales build, right-click Transformation Model, and then click
Insert Attribute.
4. In the Name box, type OrderCode, and then click OK to close the
Attribute Properties window.

Task 6. Map the data source columns to DataStream

items, and the DataStream items to fact build
elements.
1. Under the Sales fact build, right-click DataStream, and then click
Properties.
The data source columns are automatically mapped to DataStream items.
3. In the DataStream Items pane, click OrderDate, and then click Delete.

5. Right-click the Transformation Model, and then click Mapping.
6. In the Transformation Model pane, click and drag the OrderCode
attribute to the OrderCode DataStream item, and then click and drag the
Quantity attribute to the Quantity DataStream item.
7. In the Transformation Model pane, expand Vendor, VendorH, and
Site, and then click and drag the SiteId attribute beside the
VendorSiteCodeGO DataStream item in the left pane.
8. In the Transformation Model pane, expand SalesStaff, StaffH, and
SalesStaff, and then click and drag the SalesStaffCode attribute beside
the SalesStaffCode DataStream item in the left pane.
9. In the Transformation Model pane, expand Product, ProductH, and
Products, and then click and drag the ProductNumber attribute beside
the ProductNumber DataStream item in the left pane.

10. In the Transformation Model pane, expand OrderDate, TimeH, and

Day, and then click and drag the DayId attribute beside the DateOrder
DataStream item in the left pane.

Task 7. Add a fact delivery to the Sales build.
1. Expand Delivery Modules for the Sales build (if necessary), right-click
Fact Delivery, and then click Insert Relational Table Delivery.
2. In the Name box, type F_Sales.
Ensure that Enabled is selected because you want the delivery to be
processed.
3. Click the Table Properties tab.
name box, type F_Sales.
Task 8. Create a composite index on the fact table and
limit data input.
1. Above the Columns area, click the Index check box to select it.
2. Click Index Properties.

3. In the Name box, type sales_comp, and then in the Type box, click
UNIQUE.
4. Click OK to close the Index Properties dialog box, and then click the
Module Properties tab.
5. In the Refresh Type box, click TRUNCATE, and then click OK to
close the Table Delivery Properties window.
6. Under the Sales fact build, right-click DataStream, and the click
Properties.
box, type 1000.
8. Click OK to close the DataStream Properties window, and then save the
catalog.
1. Right-click Sales, and then click Execute.
the SQL and ExecutedSQL check boxes to select them.
progress of the build execution. Notice that 1000 records have been
inserted into F_Sales table.

5. Open SQLTerm, and then in the Database for SQL Operations box,
click TargetConnect.
6. Expand TargetConnect, right-click F_Sales, and then click Select
rows.
8. Close SQLTerm.


B
Appendix B
Entity-Relationship Diagram of
the GO_Demo Database

Developers
B-2 © 2003, Cognos Incorporated

ENTITY/RELATIONSHIP DIAGRAM OF THE GO_DEMO DATABASE
The DS GO_Source Data Source Name (DSN) refers to an Access database Legend:
called GO_Demo.mdb. This database consists of 32 tables, which are related as 1 = one ∞ = many
indicated by the following entity/relationship diagram.
© 2003, Cognos Incorporated B-3

B-4 © 2003, Cognos Incorporated

INDEX
Index
A C
aggregation, 4-22, 15-6 caching
additional rows created by, 15-11 dimension data, 20-15, 20-16
considerations, 15-12, 15-13 calculations
definition, 15-5 creating for user-defined functions, 12-14
enabling, 15-10 functions used in, 6-8
exceptions, 15-9 in derivations, 6-6, 6-12, 6-13, 6-15
functions, 15-7–15-8 catalogs
impact on derivations, 6-12 adding data sources to, 9-29, 10-21
using the AVG function, 15-17 adding dimensions to, 7-16
alert nodes, 13-9 adding JobStreams to, 13-7
architecture adding user-defined functions to, 12-6
data warehouses, 1-6 backing up, 2-19, 2-21, 2-32, 18-10
DecisionStream, 1-14 closing, 1-28
arguments creating, 2-7, 2-16
in user-defined functions, 12-7, 12-14 creating database schema for, 4-42
attributes, 4-12 database tables for, 2-5
adding to auto-level hierarchies, 17-11 definition, 2-4
adding to fact builds, 8-10, 8-16 documenting, 4-41, 4-44
adding to hierarchy levels, 7-6, 7-7 exploring, 1-27
in hierarchies, 7-5 opening, 1-26
mapping, 7-17–7-24 restoring, 2-19, 2-21
mapping literals to, 7-26 saving, 2-21, 2-32
mapping to DataStream items, 17-12 searching with Navigator, 18-13, 18-14, 18-16-18-17
naming conventions for, 7-8 shared library items in, 2-6
audit tables storing, 2-7
inspecting, 19-24 tools for developing, 1-23
auto-level hierarchies, 3-8 viewing documentation for, 4-44
adding attributes to, 17-11 circular references, 17-30
creating, 17-10 CLI. See command line interface
determining number of levels in, 17-10–17-13 CLI commands
using to resolve ragged hierarchies, 17-8 CATBACKUP, 22-5
CATEXP, 22-5
B CATIMP, 22-5
CATLIST, 22-5
backing up
CATRESTORE, 22-5
catalogs, 2-19, 2-21, 2-32, 18-10
CATUPGRADE, 22-5
balanced hierarchies, 17-4 DATABUILD, 22-5, 22-6
batch files DIMBUILD, 22-5, 22-7
creating in Notepad, 22-10
RUNDSJOB, 22-5, 22-8
executing, 22-11
SHOWREF, 22-5
saving, 22-11
SQLTerm, 22-5
bitmap indexes, 21-23 command line interface. See also CLI commands
build elements, 8-7 executing dimension builds from, 22-7
build nodes, 13-9, 13-10, 13-24
executing fact builds from, 22-6
build schemas. See schemas
executing JobStreams from, 22-8
builds. See dimension builds and fact builds
programs. See CLI commands
Builds folder, 1-16
commands. See CLI commands
business keys, 9-26 composite indexes, 21-29, 21-30, 21-39
condition nodes, 13-9, 13-17, 13-25
conformed dimensions, 1-9, 1-12, 3-5, 5-5 data delivery

advantages, 5-6 applying vertical partitioning, 21-21–21-22
creating, 5-12 fact data, 21-6–21-9
data delivery to, 5-21 metadata, 21-12–21-13
designing, 5-8 partitioning, 21-18
granularity in, 5-9 setting properties for, 4-27
in data warehouse design, 5-7 to a data mart, 21-4
using, 5-13 to conformed dimensions, 5-21
using star schema, 5-10 to data warehouses, 5-21, 21-5
conformed facts, 8-5 types of, 4-33
connecting data integrity
to data sources, 2-8, 2-16, 2-17 checking, 11-12–11-15
to definition files, 2-31 lookups, 5-18–5-20
to targets, 2-8, 2-16, 2-17 data marts, 1-6, 1-9
connections, 1-22 common elements, 1-9, 1-12, 3-5
creating, 2-17, 10-21 creation process, 1-15
deleting, 2-21 data delivery to, 21-4
creating private, 5-4
PowerCubes, 4-39 using conformed dimensions in, 5-10
customizing using surrogate keys in, 9-6
user interface, 1-24 using versus OLTP, 1-10
viewing using Cognos BI tools, 4-39
D data partitioning, 21-21–21-22
horizontal, 21-18, 21-31, 21-32-21-35
data. See also other data entries
using filters, 21-31
accessing with templates, 7-13
vertical, 21-18, 21-19
aggregation. See aggregation
data reporting. See data warehouses
analyzing rejected, 11-21
data source derivations, 6-9, 8-13, 17-11
ensuring quality of output, 19-20 data sources, 3-9
importing into definition files, 2-30 adding for levels, 7-17–7-24
limiting amount for processing, 11-17
adding in ODBC Administrator, 2-16, 10-20
managing history of. See SCDs
adding to catalogs, 9-29
mapping in hierarchies, 3-10
adding to fact builds, 4-17, 4-18, 4-26, 8-12
merging duplicate, 11-10–11-11, 11-17–11-19
adding to hierarchies, 10-23, 17-11
quality problems, 19-21 changing for levels, 9-29
rejecting duplicate, 11-12–11-13 connecting to, 2-8, 2-16, 2-17
rejecting from delivery, 11-21
creating derivations in, 6-9, 7-36, 7-43, 7-45
reporting problems. See ragged hierarchies
creating for lookups, 19-32
retrieving from source database, 3-7
creating to hold catalogs, 2-15
setting input rows, 15-15
duplicate rows, 11-8–11-9
standardizing incoming, 5-22 in hierarchies, 3-31, 7-12, 10-4
standards for quality of, 19-4 merging duplicate rows, 11-10–11-11
storing merged, 20-6
pivoting. See pivoting
tracking dimensional changes. See SCDs
recursive relationships, 3-16–3-18
data acquisition
data staging area, 1-6
source, 11-7 data warehouses, 1-5
data acquisition architecture, 1-6
default processing, 11-5
automating management tasks in, 13-30
interleaving, 11-6
benefits of job control process in, 13-4
Cognos approach to building, 1-8
conformed dimensions in, 5-7
data delivery to, 5-21, 21-5
data marts as subsets of, 1-9
obtaining user requirements for, 1-7
role of templates in, 7-11
2 © 2003, Cognos Incorporated

INDEX
database tables Designer, 1-14

adding columns to in SQLTXT, 2-26 exploring interface, 1-27
adding manually in SQLTXT, 2-25 user interface elements, 1-16
adding using the Import Wizard, 2-28 DIMBUILD command, 22-7
databases dimbuild.exe, 20-4
supported by DecisionStream, 5-22 dimension breaking, 20-12, 20-18. See also memory
DATABUILD command, 22-6 specifying when to break, 20-13
databuild.exe, 20-4 Dimension Build wizard, 1-23, 4-6, 10-25
DataStream derivations using, 7-32
creating, 16-22 dimension builds, 4-4, 4-5
DataStreams, 1-17, 1-18, 1-21, 3-7, 3-9 applying SCDs to, 9-28–9-34
adding derivations to, 6-10, 16-22 creating manually, 9-13
in hierarchies, 7-5 creating to reference ragged hierarchies, 17-25
purpose of, 3-28 creating using Dimension Build wizard, 4-6, 4-8, 7-32,
viewing the mapping of, 3-29, 3-30 17-15
visualizations for, 4-30 executing, 9-16, 9-32, 10-27, 17-15, 17-28, 21-45
Date Hierarchy wizard, 1-23 executing from the CLI, 22-7
handling weeks in, 3-33 naming, 4-8
using, 3-33, 3-36 removing fostering from, 17-28
date ranges setting properties for, 4-8
specifying for late arriving facts, 14-9 visualizations for, 1-19
DDL statements dimension caching, 20-15
exporting, 20-14 on demand, 20-16
DecisionStream dimension data
architecture, 1-14 delivery modules, 21-11
purpose, 1-4, 1-13 delivery of, 4-33
tools for developing catalogs, 1-23 drilling down on, 4-39
DecisionStream Designer. See Designer in fact builds, 5-17, 21-10
definition files setting properties for, 4-27
connecting to, 2-31 setting properties for delivery of, 21-11
importing table data into, 2-30 tracking changes. See SCDs
maintaining, 2-23 dimension domains, 20-6
deleting dimension elements
connections, 2-21 adding to fact builds, 8-7, 8-14–8-15, 19-34
delivery modules, 1-17, 1-19, 1-21 associating to hierarchies, 4-27, 8-14–8-15
adding fact tables to, 21-15 modifying, 14-13
dimension data, 21-11 setting properties of, 4-22
fact data, 21-12–21-13 dimension tables
metadata, 21-12–21-13 adding surrogate keys to, 9-15
derivations, 4-12, 6-5 adding to dimension builds, 9-14
adding to data sources, 6-9, 8-13 creating indexes in, 21-25, 21-38, 21-43
adding to DataStreams, 6-10, 16-22 creating to populate levels, 17-25–17-27
adding to fact builds, 6-11, 6-15, 12-25 exploring in SQLTerm, 17-15, 17-28
calculations in, 6-6, 6-15 fine-tuning structure of, 20-14
functions for, 6-8 natural keys in, 9-7
impact on aggregation, 6-12 reordering to accommodate data, 17-27
in data sources, 7-36, 7-43, 7-45, 17-11, 19-33, 19-35 viewing data in, 9-28
in DataStreams, 16-22 dimensional framework, 1-15, 3-6–3-7
in fact builds, 19-23, 19-35 dimensional history
in hierarchies, 10-5 preserving, 10-9, 10-11, 10-12, 10-26
operators for, 6-7 using effective date attributes to preserve, 10-10, 10-15,
options for calculating, 6-12, 6-13 10-26
purpose, 6-4 dimensional reference model. See dimensional framework
testing, 6-16
using, 6-6
using user-defined functions in, 12-21
design client. See Designer
© 2003, Cognos Incorporated 3

F
dimensions, 1-22, 3-6, 4-12. See also dimension elements
and SCDs Fact Build wizard, 1-23, 4-13, 14-12, 16-10, 19-29
adding to catalogs, 7-16 types of fact builds, 4-14
creating, 3-36, 19-31 using, 4-26–4-28
custom, 11-15 fact builds, 1-15, 4-4, 8-4
definition, 3-4 adding attributes to, 8-10, 8-16
example, 1-11 adding data sources to, 4-17, 4-18, 4-26, 8-12
generating surrogate keys for, 9-10 adding derivations to, 6-11, 6-15, 12-25, 19-23
hierarchies in, 3-8 adding dimension elements to, 8-7, 8-14–8-15, 19-34
lookups in, 3-8 adding measures to, 8-9, 8-16
shared across data marts, 3-5 analyzing the results of executing, 19-37, 21-43
shared in a data mart, 1-9, 1-12 configuring data delivery for, 4-24
standardized. See conformed dimensions controlling feedback, 4-35
templates in, 3-9 creating manually, 8-6, 8-12
documenting creating to deliver pivot tables, 16-10
catalogs, 4-41, 4-44 creating using Fact Build wizard, 4-13, 4-26–4-28, 14-12
drilling down data integrity lookup in, 5-20
on dimension data, 4-39 delivery modules in, 4-10
duplicate fact rows, 11-8, 11-9 dimension data delivery in, 4-5, 21-10–21-11
dynamic members, 7-27 duplicating, 11-17, 16-20
elements, 4-10, 4-12
E executing, 4-34, 4-37, 11-19, 12-26, 14-14, 15-15, 15-16,
16-23, 19-24, 19-37, 21-16, 21-22, 21-41, 21-43
effective date attributes executing from a batch file, 22-11
options, 10-17
executing from the CLI, 22-6
specifying the source, 10-15, 10-16, 10-26
executing in Check Only mode, 20-19
using to preserve dimensional history, 10-10
how data is processed using, 11-4, 11-5
email nodes, 13-9, 13-14
implementing late arriving facts in, 14-8
executing mapping data source columns to elements in, 8-16
dimension builds, 9-16, 9-32, 21-45 metadata delivery in, 4-11, 21-12–21-13
fact builds, 12-26, 15-15, 15-16, 16-23, 19-24, 21-16,
modifying properties of, 21-40
21-22, 21-41, 21-43
optional lookups in, 19-27
JobStreams, 13-21, 13-29
renaming, 16-20
execution modes
setting properties for, 4-26, 11-18
Check Only, 20-5 setting properties to reject data, 11-21
for fact builds, 4-34 translation lookups in, 7-52
Normal, 20-5
types of, 4-14
Object Creation, 20-5
using dimension data in, 5-17
exploring. See also viewing
using variables in, 12-17
dimension tables, 17-15, 17-28
viewing log files for, 4-37
hierarchies, 3-32, 3-37 viewing results of execution, 16-23
hierarchy properties, 3-24 visualizations for, 1-17, 4-29
levels, 3-25
fact data, 4-11
log files, 19-16
delivery of, 4-33
user interface, 1-26–1-28
merging duplicate, 11-10–11-11
exporting processing, 11-4, 11-5, 15-4
components to packages, 18-6, 18-10 rejecting duplicate, 11-12–11-13
DDL statements, 20-14
setting properties for delivery of, 4-27, 13-29, 21-8–21-9
external user-defined functions
fact deliveries
creating, 12-19
visualizations for, 4-32
implementing, 12-20
fact delivery modules
relational table, 21-8
text file, 21-9
types of, 21-7

INDEX
fact tables, 4-16 H

adding to delivery modules, 21-15
hash tables, 20-6–20-8
assigning surrogate keys to, 9-9, 9-11
log file references, 20-8
creating, 21-21
Help topics
creating composite indexes in, 21-29, 21-39
exploring, 1-26
creating keys on, 21-28, 21-37
hierarchies, 1-22, 3-6, 3-8. See also auto-level hierarchies
fine-tuning structure of, 20-14
and ragged hierarchies
indexes, 21-24–21-25
adding levels to, 3-22–3-23, 7-31, 10-21
renaming, 11-17
associating dimension elements to, 8-14–8-15
setting properties for, 16-23
associating dimension elements to, 4-27
subscribing to elements for, 21-21
attributes in, 7-5
substituting surrogate keys in, 5-18
auto-level, 3-8. See also auto-level hierarchies
using dimension element keys to update, 21-27
balanced, 17-4
using indexes to update, 21-30
components of, 3-9, 3-26
using natural keys in, 9-7
creating from multiple tables, 3-19–3-20, 3-22, 7-16–7-24
viewing data in, 11-18, 11-19
creating from the columns of one table, 3-14–3-15
filters. See data delivery and user-defined functions
creating from the rows of one table, 3-16–3-17
level, 21-32, 21-33, 21-34
creating manually, 10-21
output, 21-35
creating templates for, 7-9, 7-10
types of, 21-31
creating using the Hierarchy wizard, 3-11
flat files, 2-8, 2-23, 2-25
creating with Date Hierarchy wizard, 3-33, 3-36
foster members, 7-27–7-29
data sources in, 3-31, 7-12
fostering, 7-28, 19-7
DataStreams in, 7-5
removing from dimension builds, 17-28
definition, 7-4
functions. See also aggregation and user-defined functions
derivations in, 10-5
debugging data with, 19-19, 19-23
determining types, 3-13
for derivations, 6-8
exploring, 3-37
used in calculations, 6-8
exploring properties of, 3-24
fostering in, 7-28, 19-7
G
handling weeks in, 3-34
graphical interface. See user interface multiple data sources in, 10-4
multiple parents in, 19-10, 19-11, 19-12
non-unique IDs in, 19-14
setting data access for, 7-14
setting properties for, 19-15
testing with Reference Explorer, 3-31, 7-35, 7-47
unbalanced. See ragged hierarchies
unmatched members in, 19-8, 19-9
Hierarchy wizard, 1-23, 3-11
using on multiple tables, 3-20, 7-41
using on the columns of one table, 3-15
using on the rows of one table, 3-17
historical data, changing. See SCDs
horizontal partitioning, 21-18, 21-31, 21-32, 21-33, 21-34,
21-35

I L
Import Wizard, 2-27–2-28 late arriving dimension details
importing processing, 10-13
components from packages, 18-7, 18-8, 18-11 late arriving facts
table data into definition files, 2-30 definition, 14-5
indexes implementing, 14-8, 14-13
compared to keys, 21-26 necessary conditions for processing, 14-7
composite, 21-29, 21-30 specifying date ranges for, 14-9
creating composite, 21-39 when they can occur, 14-6
creating in dimension tables, 21-25, 21-43 leaf nodes, 17-6
creating in fact tables, 21-24, 21-29 level filters, 21-32
creating single-column, 21-25, 21-38 considerations, 21-33
types of, 21-23 examples, 21-34
using to update fact tables, 21-30 levels, 3-9
interface. See user interface adding attributes to, 7-6, 7-7, 10-21
internal user-defined functions, 12-14 adding data sources for, 7-17–7-24
adding static members to, 7-32, 21-44
J adding to hierarchies, 3-22–3-23, 7-16–7-22, 10-21
adding to hierarchies manually, 7-31
JobStream nodes, 13-9, 13-18
changing data sources for, 9-29
JobStreams, 1-22
creating dimension tables to populate, 17-25–17-27
action on node failure, 13-16
exploring, 3-25
adding nodes to, 13-23–13-26 mapping attributes of, 3-29
adding to catalogs, 13-7 members, 7-27–7-29
adding variables to, 13-23
populating, 7-27
characteristics, 13-6
specifying output, 15-16
creating, 13-23
viewing attributes of, 3-27
definition, 13-5
Library. See catalogs or Library folder
executing, 13-21, 13-29 Library folder, 1-16
executing from the CLI, 22-8 literals
executing nodes, 13-20
definition, 7-25
linking nodes in, 13-19, 13-28
mapping to attributes, 7-26
nesting, 13-18
log files
nodes in, 13-9
exploring, 19-16
user-defined functions in, 12-23 for troubleshooting, 20-4
variables in, 13-8, 13-15 hash table references in, 20-8
inspecting, 19-24
using, 4-35
K viewing for JobStream, 13-29
keys lookups, 1-22, 3-8, 7-48
compared to indexes, 21-26 creating based on a template, 19-31
creating on fact tables, 4-16, 21-28, 21-37 creating data sources for, 19-32
Kimball, Ralph, 5-7 creating for data integrity, 5-11, 5-18–5-19
definition, 7-4
design requirements, 7-49
in fact builds, 7-52, 19-27
optional, 7-48, 19-25, 19-26
to process late arriving facts, 14-11
translation, 7-48, 7-50, 7-51

INDEX
M O
mapping ODBC Administrator
attributes, 7-17–7-24 using to add data sources, 10-20
data in hierarchies, 3-10 OLTP, 1-6
data source columns to fact build elements, 4-19, 8-16 using versus data marts, 1-10
data source columns to hierarchy attributes, 10-24 operational source systems. See OLTP
measures, 4-12 operators
adding to fact builds, 8-9, 8-16 for derivations, 6-7
members. See also dynamic members, foster members, optimal schemas
and static members snowflake, 5-16
definition, 3-9 star, 5-15
memory optional lookups, 7-48, 19-25
allocating to members, 20-6 designing, 19-26
reducing usage, 20-11. See also dimension breaking in fact builds, 19-27
setting limits, 20-9–20-10 output filters, 21-35
specifying options, 20-8
menus P
exploring, 1-28
packaging, 18-4
merging
changing behavior of, 11-18 components in packages, 18-5
duplicate data, 11-10–11-11, 11-17–11-19 exporting components to packages, 18-6, 18-10
importing components from packages, 18-7, 18-11
metadata, 4-11
importing identical components, 18-8
delivery modules, 21-12–21-13
page pool, 20-6
delivery of, 4-33
setting properties for, 4-27, 21-12 page table, 20-6
metrics. See measures partitioning
horizontal, 21-18, 21-31-21-35
multiple parents, 19-10
vertical, 21-18, 21-19
accepting, 19-11
pivoting, 16-4
ignoring, 19-12
applying multi-pivot technique, 16-20–16-23
multiple pivots, 16-16, 16-18, 16-21
mapping, 16-17 considerations, 16-5
creating pivot values, 16-13
implementing, 16-6, 16-17, 16-18
N
mapping to data sources, 16-14
naming conventions modifying pivot values, 16-21
for hierarchy attributes, 7-8 multiple pivots, 16-16, 16-21
natural keys, 9-7 single pivots, 16-7, 16-12, 16-13
Navigator, 18-12 PowerCubes
using to search for components, 18-13, 18-14, 18-17 creating, 4-39
nodes saving, 4-40
action on failure, 13-16 procedure nodes, 13-9, 13-12, 13-25, 13-26
alert, 13-9, 13-13 product
build, 13-9, 13-10, 13-24 user interface, 1-16–1-22
condition, 13-9, 13-17, 13-25 properties
email, 13-9, 13-14 providing a value for NULL values, 19-37
executing in JobStreams, 13-20 setting for dimension builds, 4-8
in JobStreams, 13-9 setting for dimension data delivery, 4-27, 21-11
JobStream, 13-9, 13-18 setting for dimension elements, 4-22
linking, 13-19, 13-28 setting for fact builds, 4-26, 11-18, 21-40
procedure, 13-9, 13-12, 13-25, 13-26 setting for fact data delivery, 4-27, 13-29, 21-8–21-9
SQL, 13-9, 13-11, 13-24, 13-26 setting for fact tables, 16-23
non-unique Ids, 19-14 setting for hierarchies, 19-14
NULL values setting for input rows, 19-37
providing a value for, 19-37 setting for metadata delivery, 4-27, 21-12
setting to reject data, 11-21

R SQL columns
mapping to fact build elements, 4-19, 8-16
ragged hierarchies
SQL Helper
circular references, 17-30
user interface, 2-13
creating, 17-19
using, 7-35
creating dimension builds to reference, 17-25
SQL nodes, 13-9, 13-11, 13-24, 13-26
exploring in PowerPlay, 17-29
SQL statements
leaf nodes, 17-5, 17-6
creating, 2-11
resolving, 17-7, 17-8, 17-17
creating for fact builds, 4-18, 4-26
recursive relationships, 3-16–3-18, 17-5
modifying, 7-22, 19-35
Reference Explorer, 1-23
running different types of, 2-10
using to explore hierarchies, 3-31, 7-35, 7-47, 17-12
SQLTerm, 1-23
reference structures
debugging data with, 19-18
displaying, 2-10
reject files, 11-12, 19-6
exploring, 1-28
analyzing, 11-21
rejected data
using to explore dimension tables, 17-15, 17-28
analyzing, 19-6
using to verify SQLTXT specifications, 2-32
saving to a fact delivery, 19-17
using to view data, 2-17, 9-16, 9-28
relational table delivery module, 21-8
SQLTXT
relationships
verifying specifications, 2-32
between multiple tables, 3-19–3-20
SQLTXT database
between rows of one table, 3-16–3-18
adding tables to, 2-25, 2-27–2-28
recursive, 17-5
SQLTXT Designer, 1-23, 2-23
repeating B-tree indexes, 21-23
defining columns in, 2-26
reporting data. See data warehouses
Import Wizard, 2-27–2-28
restoring
catalogs, 2-19, 2-21
standardizing
Result variable
incoming data, 5-22
in JobStreams, 13-15
star schema, 4-16, 5-15
RUNDSJOB command, 22-8
static members, 3-9, 7-27, 7-29
rundsjob.exe, 20-4
adding to levels, 7-32, 21-44
surrogate keys, 9-5
S adding to dimension tables, 9-15
saving adding to dimensions, 9-10
catalogs, 2-21, 2-32 assigning to fact tables, 9-9, 9-11
PowerCubes, 4-40 example, 9-8
SCDs, 9-18–9-19 for data integrity lookups, 5-18–5-19
applying to dimension builds, 9-28–9-34 in data marts, 9-6
definition, 9-4 in operational systems, 9-6
issues with, 9-20 substituting in fact tables, 5-18
managing, 9-25–9-26 value to SCDs, 9-6
methods of handling, 9-21–9-23 versus natural keys, 9-5
Type 2, 14-4 syntax
viewing changes to, 9-32 of user-defined functions, 12-8
schemas, 5-14
optimal snowflake, 5-16
optimal star, 5-15
parent-child, 17-5
snowflake, 5-16
star, 5-15
server engine, 1-14
shared dimensions. See conformed dimensions and dimensions
single pivots, 16-7, 16-13
mapping, 16-6
slowly changing dimensions. See SCDs
snowflake schema, 4-16, 5-16

INDEX
T user-defined functions, 12-4

adding arguments to, 12-7
targets
adding to catalogs, 12-6
connecting to, 2-8, 2-16, 2-17
creating, 12-5
fine-tuning tables in, 20-14
creating external, 12-19
templates, 1-19, 1-22
creating internal, 12-14
accessing data with, 7-13
creating variables for, 12-11
creating for hierarchies, 7-10, 10-21
external, 12-18
creating to populate levels, 17-25–17-27
implementing, 12-8
defining for hierarchies, 7-9
implementing external, 12-20
in dimensions, 3-9
in JobStreams, 12-23
to manage SCDs, 9-25–9-26
internal, 12-18
use in the data warehouse, 7-11
syntax of, 12-8
using to set effective date, 10-16
testing, 12-9, 12-15
testing
using in derivations, 12-21
derivations, 6-16
using in fact builds, 12-25–12-27
hierarchies, 3-32
using in output filters, 12-22
user-defined functions, 12-9, 12-15
variables outside of, 12-12
text file delivery module, 21-9
tools
V
for developing catalogs, 1-23
transformation model, 1-17, 1-21 variables, 22-12
elements in, 4-12, 4-19, 4-20 adding to JobStreams, 13-8, 13-15, 13-23
relating hierarchies to elements of, 4-27 declaring, 22-13
visualizations for, 4-31 in user-defined functions, 12-11
Transformer outside of user-defined functions, 12-12
using to explore data marts, 4-39 using in fact builds, 12-17
translation lookups, 7-48 vertical partitioning, 21-18, 21-19
creating, 7-50, 7-51 applying for fact delivery, 21-21–21-22
in fact builds, 7-52 viewing. See also exploring
Tree pane, 1-16 catalog documentation, 4-44
Type 1 SCDs, 9-21, 9-22, 9-24, 9-25, 10-6, 10-7 data in dimension tables, 9-28
Type 2 SCDs, 9-21, 9-23, 9-24, 9-25, 10-6, 10-7, 10-8, 14-4 data in fact tables, 11-18, 11-19
data marts using Cognos BI tools, 4-39
U DataStream mapping, 3-29, 3-30
level attributes, 3-27
UDFs. See user-defined functions
log files for fact builds, 4-37
unbalanced hierarchies. See ragged hierarchies
log files for JobStreams, 13-29
unique B-tree indexes, 21-23
results of fact build execution, 16-23, 21-16, 21-22
unmatched members, 19-8
SCD changes, 9-32
accepting, 11-14, 19-9, 19-34
Visualization pane, 1-16
definition, 11-13
visualizations
user interface
DataStream mapping, 4-30
catalog elements, 1-17–1-22
dimension build, 1-19
customizing, 1-24
fact build, 1-17, 4-29
DecisionStream Designer elements, 1-16
fact delivery, 4-32
exploring, 1-26–1-28
JobStream, 1-20
SQL Helper, 2-13
reference structure, 1-18
SQLTerm, 2-11
transformation model mapping, 4-31
SQLTXT Designer, 2-24
user requirements
W
defining, 1-7
weeks
handling in date hierarchies, 3-33



DS7001 Instructor Guide PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DS7001 Instructor Guide PDF

Uploaded by

Copyright:

Available Formats

Course Code: DS7001

Restricted Rights Legend. The Software and all

ii © 2003, Cognos Incorporated

COURSE OVERVIEW ............................................................................................... IV

INSTRUCTIONAL MATERIALS ............................................................................VII

STUDENT GUIDE TABLE OF CONTENTS ........................................................ XVI

POST-CLASS AGENDA (CONT'D).................................................................... XVIII

© 2003, Cognos Incorporated iii

Who Should Attend

Participants will learn:

• data modeling strategy

iv © 2003, Cognos Incorporated

Suggested Course Schedule

© 2003, Cognos Incorporated v

vi © 2003, Cognos Incorporated

Demos appear after covering one or more topics or features of the

In some of the modules, a supplementary workshop is included. If

Student Data Media

© 2003, Cognos Incorporated vii

PDF version of the Instructor Guide

viii © 2003, Cognos Incorporated

Demos, Workshops, and Workshop Solutions

© 2003, Cognos Incorporated ix

Important Setup Instructions for This Course

Configure the Instructor and Student Computer

Microsoft ODBC Data Access Components.

Microsoft Office (specifically Microsoft Access Components).

Cognos PowerPlay Transformer for Windows Series 7 Version 2 on the

Cognos PowerPlay for Windows Series 7 Version 2 on the instructor's

Cognos DecisionStream Series 7 Version 2.

Adobe Acrobat Reader version 5 (install after Web browsers).

Install student data files.

Install the presentation files on instructor computer.

Customize the introductory slide in STARTDS7001.PPT (optional).

x © 2003, Cognos Incorporated

Set up data source names (DSNs) in ODBC Administrator

© 2003, Cognos Incorporated xi

General Setup and Instructor Preparation

• Who is the contact person for class setup?

• What is the classroom setup? Is there a white board? Is there a

• Will the physical environment be set up prior to your arrival

• What time does the class start?

• What Cognos office is responsible for sending the Student

• If the course has been previously taught on the computers you

• Run through at least one module in a classroom with a PC viewer.

• Run through the full course at least once on a computer.

• Have a set of product reference manuals in the classroom.

xii © 2003, Cognos Incorporated

Bold Bold style is used in demo and workshop step-

Italic Italic style is used to emphasize terms that are

CAPITALIZATION All file names, table names, column names,

To keep capitalization consistent with this

Help Footnotes are used to reference the Help file

© 2003, Cognos Incorporated xiii

Advance to next slide Left-click, PAGE DOWN, SPACE, N,

Return to previous slide BACKSPACE, PAGE UP, P, LEFT or UP

Change pointer to a pen Right-click/Pen or CTRL+P

Erase drawings on screen E

Make the screen white W or ',' (toggle to restore)

Make the screen black B or '.' (toggle to restore)

End the slide show ESC, CTRL+BREAK, '-'

• A page containing an animation slide (multiple clicks to complete

xiv © 2003, Cognos Incorporated

In this course, we will:

You will learn: