You are on page 1of 473

Introduction and Course Overview

Week 1

Spring 2023 - CST2205 - Data Modelling


Agenda

• Introductions

• Course Overview

• Purpose of Data Modelling

• Understanding Stakeholders

2
Course Overview
• Model various operational and dimensional representations of data for
addressing business requirements

• Learning objectives include:

• Understand IT problems related to BI and infrastructure

• Analyze, build and design in a way that maintains data integrity of


databases and data warehouses

• Identify client requirements and build reliable systems to address

3
them
Course Objectives
At the end of this course, you should be able to:

• Create a BI Dimensional Model

• Configure and use tools

• Understand and implement design changes to address implementation


challenges in BI

• Create metadata models that generate predictable reporting and


analysis results
4
Learning Resources
• Sample chapters / readings from various textbooks

• PowerPoint slides for lectures

• Various articles

• Data extracts and files for hands-on learning

5
Course Evaluation
Component Weighting

Participation 10%

Presentation(s) 10%

Quizzes / Tests 15%

Practical Skills Assessment 25%

Final Exam 40%


6
Expectations
• Show up
• Attend class on time
• Actively participate
• Speak up
• Ask questions in class
• Email me with questions or concerns
• Book time for office hours if needed
• Support others
• Be a good teammate on group assignments
• Respect contributions of others in class
7
Purpose of Data Modeling

8
Why Data?
• Process of using data to inform your decision-making process and
validate a course of action (source: HBR)

• Benefits of DDDM include:


1. More confident and informed decisions → higher certainty of
outcomes

2. More proactive organization → create new competitive


advantages

3. Reduction in costs and waste → improve operational efficiency


9
10
Importance of Good Data Management
• Source of truth

• Accelerate decision-making

• Improve understanding of the business

11
What Happens with Bad Data
Management
x• Multiple versions of the same metric

x• Stakeholders lose trust in data-driven methods


x• Product managers prioritize the wrong things

x• Overtooling → Inflated operating costs for software

12
Data Modeling
• Creates a common foundation / single source of truth

• Improves understanding of the business

• Evolves as the business changes

How do we know it’s a good model?

• People use it

• Speed of onboarding

• Scalable and easy to maintain


13
Understanding Stakeholders

14
Vestibulum congue tempus Vestibulum congue tempus
Data Generators Data Managers Vestibulum congue tempus
Data Consumers

Lorem ipsum dolor sit amet, Lorem ipsum dolor sit amet, Lorem ipsum dolor sit amet,
Creates raw data
consectetur adipiscing elit, sed do Ensure
consectetur adipiscing elit, sed do Use to make
consectetur adipiscing elit, sed do
eiusmod tempor. eiusmod tempor. eiusmod tempor.
understanding and decisions
best practices

Vestibulum congue tempus


Data Modeller
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor. Ipsum dolor sit amet elit, sed do
Responsible for designing and building staging, intermediate and final
eiusmod tempor.

data models to meet objectives of all stakeholders.

15
Data Generators (Developers, Customers, APIs)
• Creates raw data from websites, apps or other data collection
endpoints

• 3Vs of data
• Volume

• Variety

• Veracity

• Often a disconnect between transactions vs. analysis


• What is the difference between a database vs. data warehouse?
16
Data Managers (BI developers, Data Engineers,
Manager)
• Responsible for the extract-transform-load (ETL) or extract-load-
transform (ELT) process
• What’s the difference?

• Modelling preference for scalability and simplicity

• Technical knowledge (SQL, Python, Cloud tools)

• Constant tradeoff between immediate vs. long-term

17
18 Source: Indicative Blog - Ecosystem of Modern Data Infrastructure
Data Consumers (Analysts and
Executives)
• Motivated to solve their business questions

• Strongly avoid any JOIN operations*

• Will often use analysis tools they are most comfortable


• Excel / Google Sheets

• Curated dashboards (Power BI, Tableau)

• Cloud analytics (Google Analytics, Amplitude)

• Short-term >> long-term


19
Data Modeling – Why is it Difficult?
• Data Generators may create data that is difficult to transform

• Data Managers may be restricted to BI tools or databases that


encourage a specific approach
• Tableau prefers one big tables vs. Power BI encourages more
relational models

• Data Consumers will have varying needs that will change over time

20
One recipe for success
Build two data models

1. Relational / Dimensional modelling for Data Managers

2. Load one big table data marts for Data Consumers built off #1

21
Summary
• Always keep in mind the purpose of a data team is to improve
decision-making
• Tooling and advanced analytics are a means but not the end itself

• Data Modellers create value through


• Creating a single common source of truth

• Accelerates analysis and decision-making through pre-transformation

• Help improve the understanding of the business

22
Summary
• There are 3 different stakeholders of data
• Data Generators - create raw data

• Data Managers - manage data for analytical purposes

• Data Consumers - use data to make decisions

• Goal of this course


• Be able to design and execute a good data model taking account of
above

23
Data Warehouse Concepts: Approaches
Week 1 – Day 2

Spring 2023 - CST2205 - Data Modelling

1
Agenda
• Characteristics of a Data Warehouse

• Functions of a Data Warehouse

• Normalization vs. Denormalization Approach

• Data Warehouse vs. Database

• The Two Data Warehouse Concepts: Kimball vs. Inmon

• An Automated Data Warehousing Tool


2

2
Overview
• Data warehouse (DWH) designing, two approaches stand out:
• Inmon and the Kimball methodology

• Which data warehouse approach is better and more effective


• No definite answer as both methods have their benefits and drawbacks.

• GOAL:
• the basics of a data warehouse, it’s characteristics

• compare Kimball vs. Inmon.

3 Source: https://www.astera.com/type/blog/data-warehouse-concepts/

Source: https://www.astera.com/type/blog/data-warehouse-concepts/

When it comes to data warehouse (DWH) designing, two of the most widely discussed and
explained data warehouse approaches are the Inmon and the Kimball methodology. For years,
people have debated over which data warehouse approach is better and more effective for
businesses. However, there’s still no definite answer as both methods have their benefits and
drawbacks.

Here, we look at the basics of a data warehouse, it’s characteristics, and compare the two popular
data warehouse approaches – Kimball vs. Inmon.

The key data warehouse concept allows users to access a unified version of truth for timely
business decision-making, reporting, and forecasting. DWH functions like an information system
with all the past and commutative data stored from one or more sources.

3
Characteristics of a Data Warehouse
The following are the four characteristics of a Data Warehouse:

• Subject-Oriented

• Integrated

• Time-variant

• Non-volatile

4
Source: https://www.astera.com/type/blog/data-warehouse-concepts/

Source: https://www.astera.com/type/blog/data-warehouse-concepts/

Characteristics of a Data Warehouse

The following are the four characteristics of a Data Warehouse:

Subject-Oriented:
A data warehouse uses a theme, and delivers information about a specific subject instead of a
company’s current operations. In other words, the data warehousing process is more equipped to
handle a specific theme. Examples of themes or subjects include sales, distributions, marketing,
etc.

Integrated:
Integration is defined as establishing a connection between large amount of data from multiple
databases or sources. However, it is also essential for the data to be stored in the data warehouse
in a unified manner. The process of data warehousing integrates data from multiple sources, such
as a mainframe, relational databases, flat files, etc. Furthermore, it helps maintain consistent
codes, attribute measures, naming conventions, and, formats.

Time-variant:
Time-variant in a DW is more extensive as compared to other operating systems. Data stored in a
data warehouse is recalled with a specific time period and provides information from a historical
perspective.

Non-volatile:

4
In the non-volatile data warehouse, data is permanent i.e. when new data is inserted, previous
data is not replaced, omitted, or deleted. In this data warehouse, data is read-only and only
refreshes at certain intervals. The two data operations performed in the data warehouse are data
access and data loading.

4
Functions of a Data Warehouse
• Data warehouse functions as a repository.

• The prominent functions of the data warehouse are:

• Data Cleaning • Data Transformation


• Data Integration • Data Loading
• Data Mapping • Refreshing
• Data Extraction

5
Source: https://www.astera.com/type/blog/data-warehouse-concepts/

Source: https://www.astera.com/type/blog/data-warehouse-concepts/

Functions of a Data Warehouse

Data warehouse functions as a repository. It helps organizations avoid the cost of storage systems
and backup data at an enterprise level. The prominent functions of the data warehouse are:

• Data Cleaning
• Data Integration
• Data Mapping
• Data Extraction
• Data Cleaning
• Data Transformation
• Data Loading
• Refreshing

5
Normalization vs. Denormalization
• Normalization
• A way of data re-organization.

• Addresses two requirements for an enterprise data warehouse:


• Eliminating data redundancy and
• Protecting data dependency

• Denormalization
• Increases the functionality of the database system’s
infrastructure.

6
Source: https://www.astera.com/type/blog/data-warehouse-concepts/

Source: https://www.astera.com/type/blog/data-warehouse-concepts/

Normalization vs. Denormalization Approach


Normalization is defined as a way of data re-organization. This helps meet two main requirements
in an enterprise data warehouse i.e. eliminating data redundancy and protecting data
dependency. On the other hand, denormalization increases the functionality of the database
system’s infrastructure.

6
Data Warehouse vs. Database
Data Warehouse Database
Serves as an information system - contains An amalgamation of related data.
historical and commutative data from one or
several sources.
Used for analyzing data. Used for recording data
Subject-oriented collection of data. A database is an application-oriented
collection of data.
Uses Online Analytical Processing (OLAP). A database uses Online Transactional
Processing (OLTP).
Tables and joins are denormalized, hence Database tables and joins are normalized,
simpler. therefore, more complicated.
Data modeling techniques are used for ER modeling techniques are used for
designing designing
7
Source: https://www.astera.com/type/blog/data-warehouse-concepts/

Source: https://www.astera.com/type/blog/data-warehouse-concepts/

The main differences between data warehouse and database are summarized in the table below:

Database
• A database is an amalgamation of related data.
• A database is used for recording data
• A database is an application-oriented collection of data.
• A database uses Online Transactional Processing (OLTP).
• Database tables and joins are normalized, therefore, more complicated.
• ER modeling techniques are used for designing.

Data Warehouse
• Data warehouse serves as an information system that contains historical and commutative
data from one or several sources.
• A data warehouse is used for analyzing data.
• Data warehouse is the subject-oriented collection of data.
• Data warehouse uses Online Analytical Processing (OLAP).
• Data warehouse tables and joins are denormalized, hence simpler.
• Data modeling techniques are used for designing.

7
The Two Data Warehouse Concepts

Image Source: https://miro.medium.com/max/1200/0*ZF5PhOye9Mkk7UFe

8
The Kimball Methodology

9
The Kimball Methodology

10
Source: https://www.astera.com/type/blog/data-warehouse-concepts/

Source: https://www.astera.com/type/blog/data-warehouse-concepts/

The Kimball Methodology

Initiated by Ralph Kimball, the Kimball data model follows a bottom-up approach to data
warehouse architecture design in which data marts are first formed based on the business
requirements.

The primary data sources are then evaluated, and an Extract, Transform and Load (ETL) tool is
used to fetch data from several sources and load it into a staging area of the relational database
server. Once data is uploaded in the data warehouse staging area, the next phase includes
loading data into a dimensional data warehouse model that’s denormalized by nature. This model
partitions data into the fact table, which is numeric transactional data or dimension table, which
is the reference information that supports facts.

Star schema is the fundamental element of the dimensional data warehouse model. The
combination of a fact table with several dimensional tables is often called the star schema.
Kimball dimensional modeling allows users to construct several star schemas to fulfill various
reporting needs. The advantage of star schema is that small dimensional-table queries run
instantaneously.

To integrate data, Kimball approach to Data Warehouse lifecycle suggests the idea of conformed
data dimensions. It exists as a basic dimension table shared across different fact tables (such as
customer and product) within a data warehouse or as the same dimension tables in various
Kimball data marts. This guarantees that a single data item is used in a similar manner across all

10
the facts.

An important design tool in Ralph Kimball’s data warehouse methodology is the enterprise bus
matrix or Kimball bus architecture that vertically records the facts and horizontally records the
conformed dimensions. The Kimball matrix, which is a part of bus architecture, displays how star
schemas are constructed. It is used by business management teams as an input to prioritize which
row of the Kimball matrix should be implemented first.

The Kimball approach to data warehouse lifecycle is also based on conformed facts, i.e. data
marts that are separately implemented together with a robust architecture.

10
Advantages - Kimball Methodology
• Fast to construct - no normalization is involved

• Star schema easily understood because of its denormalized structure

• Data warehouse system footprint is trivial because it focuses on individual business areas
and processes

• Enables fast data retrieval as data is segregated into fact tables and dimensions.

• A smaller team of designers and planners is sufficient

• Conformed dimensional structure for data quality framework allows BI tools to generate
reliable insights.

11
Source: https://www.astera.com/type/blog/data-warehouse-concepts/

Source: https://www.astera.com/type/blog/data-warehouse-concepts/

Advantages of the Kimball Methodology

Some of the main benefits of the Kimball Data Warehousing Concept include:

• Kimball dimensional modeling is fast to construct as no normalization is involved, which means


swift execution of the initial phase of the data warehousing design process.
• An advantage of star schema is that most data operators can easily comprehend it because of
its denormalized structure, which simplifies querying and analysis.
• Data warehouse system footprint is trivial because it focuses on individual business areas and
processes rather than the whole enterprise. So, it takes less space in the database, simplifying
system management.
• It enables fast data retrieval from the data warehouse, as data is segregated into fact tables
and dimensions. For example, the fact and dimension table for the insurance industry would
include policy transactions and claims transactions.
• A smaller team of designers and planners is sufficient for data warehouse management
because data source systems are stable, and the data warehouse is process-oriented. Also,
query optimization is straightforward, predictable, and controllable.
• Conformed dimensional structure for data quality framework. The Kimball approach to data
warehouse lifecycle is also referred to as the business dimensional lifestyle approach because
it allows business intelligence tools to deeper across several star schemas and generates
reliable insights.

11
Kimball Approach to Data Warehouse Lifecycle
(Source: Kimball Group)

12
Source: https://www.astera.com/type/blog/data-warehouse-concepts/

Source: https://www.astera.com/type/blog/data-warehouse-concepts/

Advantages of the Kimball Methodology

The Kimball approach to data warehouse lifecycle is also referred to as the business dimensional
lifestyle approach because it allows business intelligence tools to deeper across several star
schemas and generates reliable insights.

12
Disadvantages - Kimball Methodology
• Data isn’t entirely integrated before reporting - > ‘single source of truth is lost.’

• Irregularities can occur when data is updated -> redundant data is added to database
tables.

• Performance issues may occur due to the addition of columns in the fact table (in-depth
tables)
• dimensional data warehouse model becomes difficult to alter with any change in the business
needs.

• Model is business process-oriented (not on the enterprise as a whole -> it cannot handle
all the BI reporting requirements.

• Incorporating large amounts of legacy data into the data warehouse is complex.
13
Source: https://www.astera.com/type/blog/data-warehouse-concepts/

Source: https://www.astera.com/type/blog/data-warehouse-concepts/

Disadvantages of the Kimball Methodology

Some of the drawbacks of the Kimball Data Warehousing design concept include:

• Data isn’t entirely integrated before reporting; the idea of a ‘single source of truth is lost.’
• Irregularities can occur when data is updated in Kimball DW architecture. This is because in
denormalization technique, redundant data is added to database tables.
• In the Kimball DW architecture, performance issues may occur due to the addition of columns
in the fact table, as these tables are quite in-depth. The addition of new columns can expand
the fact table dimensions, affecting its performance. Also, the dimensional data warehouse
model becomes difficult to alter with any change in the business needs.
• As the Kimball model is business process-oriented, instead of focusing on the enterprise as a
whole, it cannot handle all the BI reporting requirements.
• The process of incorporating large amounts of legacy data into the data warehouse is complex.

13
The Inmon Method

14

14
The Inmon Method

15
Source: https://www.astera.com/type/blog/data-warehouse-concepts/

Source: https://www.astera.com/type/blog/data-warehouse-concepts/

The Inmon Method

Bill Inmon -> father of data warehousing,

Came up with the concept to develop a data warehouse which identifies the main subject areas
and entities the enterprise works with, such as customers, product, vendor, and so on. Bill
Inmon’s definition of a data warehouse is that it is a “subject-oriented, non-volatile, integrated,
time-variant collection of data in support of management’s decisions.”

The model then creates a thorough, logical model for every primary entity. For instance, a logical
model is constructed for products with all the attributes associated with that entity. This logical
model could include ten diverse entities under product, including all the details, such as business
drivers, aspects, relationships, dependencies, and affiliations.

The Bill Inmon design approach uses the normalized form for building entity structure, avoiding
data redundancy as much as possible. This results in clearly identifying business requirements and
preventing any data update irregularities. Moreover, the advantage of this top-down approach in
database design is that it is robust to business changes and contains a dimensional perspective of
data across data mart.

Next, the physical model is constructed, which follows the normalized structure. This Bill Inmon
model creates a single source of truth for the whole business. Data loading becomes less complex
due to the normalized structure of the model. However, using this arrangement for querying is

15
challenging as it includes numerous tables and links.

This Inmon data warehouse methodology proposes constructing data marts separately for each
division, such as finance, marketing sales, etc. All the data entering the data warehouse is
integrated. The data warehouse acts as a single data source for various data marts to ensure
integrity and consistency across the enterprise.

15
Advantages - Inmon Method
• Data warehouse acts as a unified source of truth for the entire business, where all data is
integrated.

• Very low data redundancy -> less of a possibility of data update irregularities
• ETL-concept based data warehouse process more straightforward and less susceptible to
failure.

• Simplifies business processes -> logical model represents detailed business objects.

• Greater flexibility -> easier to update the data warehouse due to change in the business
requirements or source data.

• It can handle diverse enterprise-wide reporting requirements.

16
Source: https://www.astera.com/type/blog/data-warehouse-concepts/

Source: https://www.astera.com/type/blog/data-warehouse-concepts/

Advantages of the Inmon Method

The Bill Inmon design approach offers the following benefits :

• Data warehouse acts as a unified source of truth for the entire business, where all data is
integrated.
• This approach has very low data redundancy. So, there’s less possibility of data update
irregularities, making the ETL-concept based data warehouse process more straightforward
and less susceptible to failure.
• It simplifies business processes, as the logical model represents detailed business objects.
• This approach offers greater flexibility, as it’s easier to update the data warehouse in case
there’s any change in the business requirements or source data.
• It can handle diverse enterprise-wide reporting requirements.

16
Disadvantages - Inmon Method
• Complexity increases as multiple tables are added to the data model with time
.
• Resources skilled in data warehouse data modeling are required -> expensive
and challenging to find.

• The preliminary setup and delivery are time-consuming.

• Additional ETL process operation is required since data marts are create

17
Source: https://www.astera.com/type/blog/data-warehouse-concepts/

Source: https://www.astera.com/type/blog/data-warehouse-concepts/

Disadvantages of the Inmon Method

The possible drawbacks of this approach are as follows:

• Complexity increases as multiple tables are added to the data model with time.
• Resources skilled in data warehouse data modeling are required, which can be expensive and
challenging to find.
• The preliminary setup and delivery are time-consuming.
• Additional ETL process operation is required since data marts are create

17
Which Data Warehouse Approach to Choose?
A few aspects that can help to decide between the two approaches.

• Reporting Needs

• Project Deadline

• Prospective Recruitment Plan

• Frequent Changes

• Organizational Principles

18
Source: https://www.astera.com/type/blog/data-warehouse-concepts/

Source: https://www.astera.com/type/blog/data-warehouse-concepts/

Which Data Warehouse Approach to Choose?

Now that we’ve evaluated the Kimball vs. Inmon approach and seen the advantages and
drawbacks of both these methods, the question arises: Which one of these data warehouse
concepts would best serve your business?

Both these approaches consider data warehouse as a central repository that supports business
reporting. Also, both types of approaches use ETL concepts for data loading. However, the main
difference lies in modeling data and loading it in the data warehouse.

The approach used for data warehouse construction influences the preliminary delivery time of
the warehousing project and the capacity to put up with prospective variations in the ETL design.

Still not sure about the conclusion to Kimball vs. Inmon dilemma? We can help you decide which
one of these data warehouse approaches would help improve your data quality framework in the
best way?

We’ve narrowed down a few aspects that can help you decide between the two approaches.

• Reporting Needs: If you need organization-wide and integrated reporting, then the Bill
Inmon approach is more suitable. But if you require reporting focused on the business process
or team, then opt for the Kimball method.
• Project Deadline: Designing a normalized data model is comparatively more complex than

18
designing a denormalized model. This makes the Inmon approach a time-intensive process.
Therefore, if you have less time for delivery, then opt for the Kimball method.
• Prospective Recruitment Plan: The higher complexity of data model creation in the Inmon data
warehouse approach requires a larger team of professionals for data warehouse management.
Therefore, choose accordingly.
• Frequent Changes: If your reporting needs are likely to change more quickly and you are
dealing with volatile source systems, then opt for the Inmon method as it offers more
flexibility. However, if reporting needs and source systems are comparatively stable, it’s better
to use the Kimball method.
• Organizational Principles: If your organization’s stakeholders and corporate directors recognize
the need for data warehousing and are ready to bear the expenses, then the Bill Inmon data
warehouse method would be a safer bet. On the other hand, if the decision-makers aren’t
concerned about the nitty-gritty of the approach, and are only looking for a solution to
improve reporting, then it’s sufficient to opt for the Kimball data warehouse method.

18
Bottom-Line
• Both Kimball vs. Inmon data warehouse concepts can be used to design data warehouse
models successfully. In fact, several enterprises use a blend of both these approaches
(called hybrid data model).

• In the hybrid data model, the Inmon method creates a dimensional data warehouse model
of a data warehouse. In contrast, the Kimball method is followed to develop data marts
using the star schema.

• It’s impossible to claim which approach is better as both methods have their benefits and
drawbacks, working well in different situations. A data warehouse designer has to choose a
method, depending on the various factors discussed in this article.

• Lastly, for any method to be effective, it has to be well-thought-out, explored in-depth, and
developed to gratify your company’s business intelligence reporting requirements.
19
Source: https://www.astera.com/type/blog/data-warehouse-concepts/

Source: https://www.astera.com/type/blog/data-warehouse-concepts/

Bottom-line
• Both Kimball vs. Inmon data warehouse concepts can be used to design data warehouse
models successfully. In fact, several enterprises use a blend of both these approaches (called
hybrid data model).
• In the hybrid data model, the Inmon method creates a dimensional data warehouse model of
a data warehouse. In contrast, the Kimball method is followed to develop data marts using the
star schema.
• It’s impossible to claim which approach is better as both methods have their benefits and
drawbacks, working well in different situations. A data warehouse designer has to choose a
method, depending on the various factors discussed in this article.
• Lastly, for any method to be effective, it has to be well-thought-out, explored in-depth, and
developed to gratify your company’s business intelligence reporting requirements.

19
Introduction to Dimensional Modelling
(Kimball)
Week 2 – Day 1

Spring 2023 - CST2205 - Data Modelling

1
Agenda

• Goals of Data Warehouse (DW) and Business Intelligence


(BI)

• Introduction to Kimball / Dimensional Models

• Key Concepts for Dimensional Models

• Practice - Sales Process

2
High-Level Class Schedule
• Focus on Dimensional Modelling by studying Kimball
methodology + applying to business processes

• Set up dimensional models in Power BI

• Data preparation in PowerQuery

• Time intelligence functions and modelling

• Advanced topics

3 • Emerging technologies and techniques

3
Recap of Week 1
• Technologies used in this course
• Power BI

• IBM Cognos Framework Manager (later half)

• Goal of data analytics - improved decision-making

• Overview of 3 stakeholders
• Data Generator
• Data Manager
• Data Consumer
4

4
Data Modeling
• Creates a common foundation / single source of truth

• Improves understanding of the business

• Evolves as the business changes

• How do we know it’s a good model?


• People use it

• Speed of onboarding

5
• Scalable and easy to maintain

5
Data Generators
Vestibulum congue tempus
Data Managers
Vestibulum congue tempus
Data Consumers
Vestibulum congue tempus

Lorem ipsum dolor sit amet, Lorem ipsum dolor sit amet,
Creates raw data Ensure
Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed do consectetur adipiscing elit, sed do Use toadipiscing
consectetur make elit, sed do
eiusmod tempor. eiusmod tempor. eiusmod tempor.
understanding and decisions
best practices

Data Modeller
Vestibulum congue tempus

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor. Ipsum dolor sit amet elit, sed do
Responsible for designing and eiusmod
building
tempor. staging, intermediate and final

data models to meet objectives of all stakeholders.

6
Principles / Goals of Data Warehouse

7
Have you ever heard the following?

We collect tons of data, but we can’t access it!

We need to slice and dice the data every way!

Just show me what’s important!

We spend entire meetings arguing about who has the right


numbers instead of making decisions!
8 Source: Chapter 1 of The Data Warehouse Toolkit (3rd edition)

8
Principles for Success
• Make information easily accessible → simple and fast

• Present consistent information → credible and consistent

• Must adapt to changes → adaptable but not disruptive

• Must be timely → deliver data fast to support decisions

• Must be secure → protect confidential information

• Authoritative and Trustworthy → single source of truth

• Must be accepted by the business → is it useful?


9

9
Publisher Metaphor
• Kimball talks about the metaphor of a magazine publisher in
describing the role of a data modeller
• Understand the reader → Understand business users

• Ensure magazine appeals to readers → Deliver high-quality, relevant


and accessible information to users

• Sustain the publication → Sustain the DW/BI environment

10

10
Introduction to Dimensional Models

11

11
What is a dimensional model for a DW?
• Represents a business process

• Star schema with center representing a fact surrounded by


dimensions

• Keeps it simple by minimizing normalization of surrounding


dimensions

12

12
What is a dimensional model for a DW?

13 Source: Figure 1.1 of Chapter 1 of The Data Warehouse Toolkit (3rd edition)

13
Star Schema OLAP Cubes
• Implemented in RDMS • Implemented in multidimensional
database environments
• Resemble a star-like structure with
the center being a fact table • Often, performance aggregations or
precalculated summary tables created
• Often the foundation that OLAP
cubes build and managed by cube engine
• Leverages SQL to access • Superior query performance
information
• Ability to drill up/down without
• Models a business process and issuing new queries
considers the grain (level of detail) in
fact table
• Analytically robust exceeding
SQL
• Slow load times, especially large
14 data sets

14
Growing industry inclination towards
star schema > OLAP
• Star schema in a RDMS serves as the foundation for many OLAP
deployments
• Rise of new technology and cloud data warehouse reducing
performance gaps for analysis
• Moore’s Law
• Massive parallel processing (MPP)
• Columnar data warehouse
• OLAP cube data structures vary widely across vendors making it
more difficult to switch
Bottom-line: Simplicity, cost and time to value favour star schema
15

Recommended reading: https://www.holistics.io/blog/the-rise-and-fall-of-the-olap-cube/

15
We interrupt this presentation with …
DBMS Design approaches: Normalization
• Process to organize data efficiently and reduce redundancy.

• It involves breaking down a database into multiple tables and


establishing relationships between them.

Goal:

Minimize data duplication / ensure data integrity and consistency

16

Database normalization is a process used in database design to organize data efficiently and
reduce redundancy. It involves breaking down a database into multiple tables and establishing
relationships between them. The goal is to minimize data duplication and ensure data integrity
and consistency.

16
Normalization – Why?
• Helps maintain data accuracy, improves query performance, and
simplifies data maintenance.
• Basic principles (also referred to as normal forms):

First Normal Form (1NF) • Each column in a table contains only


atomic values (cannot be further
Second Normal Form (2NF)
divided).
• Eliminates duplicates
Third Normal Form (3NF)
• Ensures every piece of data has its
own dedicated place in the table.
17

Normalization is important because it helps maintain data accuracy, improves query


performance, and simplifies data maintenance. Here are the basic principles of database
normalization, often referred to as normal forms:

1. First Normal Form (1NF): This form requires that each column in a table contains only atomic
values, meaning it cannot be further divided. It eliminates duplicate rows and ensures every
piece of data has its own dedicated place in the table.
2. Second Normal Form (2NF): In addition to meeting 1NF requirements, 2NF states that each
non-key column in a table must be fully dependent on the table's primary key. It involves
removing partial dependencies, where non-key columns depend on only part of the primary
key.
3. Third Normal Form (3NF): This form builds upon 2NF by eliminating transitive dependencies.
It means that no non-key column should depend on other non-key columns. Each non-key
column should be directly dependent only on the primary key.

There are also higher normal forms, such as the Boyce-Codd Normal Form (BCNF) and the Fourth
and Fifth Normal Forms (4NF and 5NF), which deal with more complex scenarios and
dependencies. These forms aim to further reduce redundancy and ensure data integrity.

Normalization can be achieved by dividing tables and creating relationships using primary keys
and foreign keys. Primary keys uniquely identify each row in a table, while foreign keys establish
relationships between tables by referencing the primary key of another table.

By normalizing a database, you ensure that data is organized in a logical and efficient manner,
leading to easier data management, reduced data duplication, and improved query performance.

17
It also allows for easier modification and expansion of the database structure as needs change
over time.

17
Normalization – Why?
• Helps maintain data accuracy, improves query performance, and
simplifies data maintenance.
• Basic principles (also referred to as normal forms):

First Normal Form (1NF) • Each non-key column in a table must be


fully dependent on the table's primary
Second Normal Form (2NF)
in addition to meeting 1NF
key.
requirements • Involves removing partial dependencies,
where non-key columns depend on only
Third Normal Form (3NF)
part of the primary key.
18

Normalization is important because it helps maintain data accuracy, improves query


performance, and simplifies data maintenance. Here are the basic principles of database
normalization, often referred to as normal forms:

1. First Normal Form (1NF): This form requires that each column in a table contains only atomic
values, meaning it cannot be further divided. It eliminates duplicate rows and ensures every
piece of data has its own dedicated place in the table.
2. Second Normal Form (2NF): In addition to meeting 1NF requirements, 2NF states that each
non-key column in a table must be fully dependent on the table's primary key. It involves
removing partial dependencies, where non-key columns depend on only part of the primary
key.
3. Third Normal Form (3NF): This form builds upon 2NF by eliminating transitive dependencies.
It means that no non-key column should depend on other non-key columns. Each non-key
column should be directly dependent only on the primary key.

There are also higher normal forms, such as the Boyce-Codd Normal Form (BCNF) and the Fourth
and Fifth Normal Forms (4NF and 5NF), which deal with more complex scenarios and
dependencies. These forms aim to further reduce redundancy and ensure data integrity.

Normalization can be achieved by dividing tables and creating relationships using primary keys
and foreign keys. Primary keys uniquely identify each row in a table, while foreign keys establish
relationships between tables by referencing the primary key of another table.

By normalizing a database, you ensure that data is organized in a logical and efficient manner,
leading to easier data management, reduced data duplication, and improved query performance.

18
It also allows for easier modification and expansion of the database structure as needs change
over time.

18
Normalization – Why?
• Helps maintain data accuracy, improves query performance, and
simplifies data maintenance.
• Basic principles (also referred to as normal forms):

First Normal Form (1NF) • Eliminate transitive dependencies - no


non-key column should depend on other
Second Normal Form (2NF) non-key columns.
• Each non-key column should be directly
Third Normal Form (3NF)
Builds upon 2NF dependent only on the primary key.

19

Normalization is important because it helps maintain data accuracy, improves query


performance, and simplifies data maintenance. Here are the basic principles of database
normalization, often referred to as normal forms:

1. First Normal Form (1NF): This form requires that each column in a table contains only atomic
values, meaning it cannot be further divided. It eliminates duplicate rows and ensures every
piece of data has its own dedicated place in the table.
2. Second Normal Form (2NF): In addition to meeting 1NF requirements, 2NF states that each
non-key column in a table must be fully dependent on the table's primary key. It involves
removing partial dependencies, where non-key columns depend on only part of the primary
key.
3. Third Normal Form (3NF): This form builds upon 2NF by eliminating transitive dependencies.
It means that no non-key column should depend on other non-key columns. Each non-key
column should be directly dependent only on the primary key.

There are also higher normal forms, such as the Boyce-Codd Normal Form (BCNF) and the Fourth
and Fifth Normal Forms (4NF and 5NF), which deal with more complex scenarios and
dependencies. These forms aim to further reduce redundancy and ensure data integrity.

Normalization can be achieved by dividing tables and creating relationships using primary keys
and foreign keys. Primary keys uniquely identify each row in a table, while foreign keys establish
relationships between tables by referencing the primary key of another table.

By normalizing a database, you ensure that data is organized in a logical and efficient manner,
leading to easier data management, reduced data duplication, and improved query performance.

19
It also allows for easier modification and expansion of the database structure as needs change
over time.

19
Normalization – OK! A few more!
• Other higher normal forms deal with more complex scenarios and
dependencies :
• Boyce-Codd Normal Form (BCNF)

• Fourth and Fifth Normal Forms (4NF and 5NF)

• These forms aim to further reduce redundancy and ensure data


integrity.

20

There are also higher normal forms, such as the Boyce-Codd Normal Form (BCNF) and the Fourth
and Fifth Normal Forms (4NF and 5NF), which deal with more complex scenarios and
dependencies. These forms aim to further reduce redundancy and ensure data integrity.

20
Normalization – Final Words
• It can be achieved by dividing tables and creating relationships
using primary keys and foreign keys.
• Primary keys uniquely identify each row in a table
• foreign keys establish relationships between tables by
referencing the primary key of another table.
• By normalizing a database, ensure that:
• data is organized in a logical and efficient manner,
• leading to easier data management,
• reduced data duplication, and

21
• improved query performance.

Normalization can be achieved by dividing tables and creating relationships using primary keys
and foreign keys. Primary keys uniquely identify each row in a table, while foreign keys establish
relationships between tables by referencing the primary key of another table.

By normalizing a database, you ensure that data is organized in a logical and efficient manner,
leading to easier data management, reduced data duplication, and improved query performance.
It also allows for easier modification and expansion of the database structure as needs change
over time.

21
That Being Stated…
OLTP and 3NF Models
• For processing transactions and running a business, OLTP and 3NF
models are immensely useful
• Large degree of normalization

• UPDATE / INSERT occurs in one place

22

22
Fundamental Concepts
of Dimensional Models

23

23
Business Process
• Represent the operational activities of an organization

• Understanding this drives the design of a data model

• General rule → 1 business process = 1 data model

• Examples inside a Restaurant

• Sourcing ingredients for dinner service

• Taking an order from a table

• Preparing the order in the kitchen

• Advertising your restaurant for new customers

24
• Paying employees

24
Fact Table
• Large volume of measurable events (high # of rows)

• Grain of data - smallest / atomic detail possible

• Multiple foreign keys to relate to surrounding dimensions

• 3 types: transaction, periodic snapshot, accumulating snapshot

• SQL: SELECT and Aggregations

25

25
Dimension
• Describe who, what, where, when, how and why?

• Wide with large number of columns

• SQL: GROUP BY, WHERE, HAVING

Note: For Kimball-style dimensional modelling, avoid over normalizing


dimensions to create a snowflake-schema

26

26
Grain
• Represents the level of atomic detail in a fact table

• Determines the level of roll-up or drill-down to a query


• Always possible to roll-up but not drill-down

• Example: Sales Order vs. Sales Order Line

Each row in this table represents (blank)

27

27
4-Step Design Process
1. Select the business process

2. Declare the grain

3. Identify the dimensions

4. Identify the facts

28

28
Case Study

29

29
Practice: Grocery Receipt
• What is the business process?

• What is the grain?

• What are the facts?

• What are the dimensions?

30 Source: https://www.flexengage.com/industry-grocery/

30
Practice: Starbucks Receipt
• What is the business process?

• What is the grain?

• What are the facts?

• What are the dimensions?

31 Source: https://www.flexengage.com/industry-grocery/

31
Practice: Movie Theatre
• What is the business process?

• What is the grain?

• What are the facts?

• What are the dimensions?

What else is missing?

32 Source: Reddit

32
Summary
• Simplicity and usefulness are key drivers in Kimball-style dimensional
modelling

• New technology and the simplicity of star schema makes it more attractive
than OLAP cubes

• Normalization is good for OLTP but violates the simplicity principle in


dimensional modelling

• Dimensional models represent a business process breaking it down into facts


and dimensions

• 4-step process to design the data model


33

33
Different Modeling Approaches & ETL
Week 2 – Day 2

Spring 2023 - CST2205 - Data Modelling

1
Agenda
• Notes on adapting changes

• Alternative modelling methods

• ETL process comparison


• Power BI (PowerQuery)

• MS SQL Server (Stored Procedures + Scheduled Jobs)

• Dbt (data build tool)

• Start of Inventory Process


2

2
Recap from Tuesday
• Dimensional Models must be simple and useful

• Star Schema
• Facts

• Dimensions

• Granularity is important
• Each row in this table represents ‘X’

3
• Recap - Grocery and Starbucks

3
Practice: Grocery Receipt
• What is the business process?

• What is the grain?

• What are the facts?

• What are the dimensions?

4 Source: https://www.flexengage.com/industry-grocery/

4
Sample Solution - Grocery Sales Process

5
Practice: Starbucks Receipt
• What is the business process?

• What is the grain?

• What are the facts?

• What are the dimensions?

6 Source: https://www.flexengage.com/industry-grocery/

6
Sample Solution - Starbucks In-Store Sales

7
Practice: Movie Theatre
• What is the business process?

• What is the grain?

• What are the facts?

• What are the dimensions?

What else is missing?

8 Source: Reddit

8
Sample Solution - Cineplex Sales Process

9
Example of Retail Sales Process

10
Source: Figure 3.12 of Chapter 3 of of The Data Warehouse Toolkit (3rd edition)

10
Notes on Adapting Changes

11

11
What happens when...
• New dimension attributes:
• Add a new sub-category of grocery items

• New dimensions:
• New customer loyalty program

• New measured facts:


• AIR MILES reward points

• Retail fashion child SKU vs. parent SKU


12

12
Surrogate Keys on Dimensions
• Kimball encourages creation of surrogate keys for dimensions rather
than using the natural PK/FK in the native table

• Buffer between DW and DB changes (i.e. recycle product codes)

• Integration of multiple source tables

• Improvement on performance (i.e. join an integer vs. string)

• Better handling of NULL or unknown conditions

• Support dimension attribute change tracking


13

13
Other Processes - Promotions and Inventory
• Figure 3.12 in prior slide showed dim_promotions and
dim_products
• What were the products on promotion but did not sell?

• What were the products on hand?

• Separate business process for creation and administration of


marketing promotions + management of inventory
• What is the grain for each process?
14

14
Modelling Architecture Approaches

15

15
Kimball's DW/BI Architecture

16 Source: Figure 1.7 of Chapter 1 of The Data Warehouse Toolkit (3rd edition)

To build on our understanding of DW/BI systems and dimensional modeling fundamentals by


investigating the components of a DW/BI environment based on the Kimball architecture. You
need to learn the strategic significance of each component to avoid confusing their role and
function.

As illustrated in Figure 1.7, there are four separate and distinct components to consider in the
DW/BI environment: operational source systems, ETL system, data presentation area, and
business intelligence applications.

16
Independent Data Mart

17 Source: Figure 1.8 of Chapter 1 of The Data Warehouse Toolkit (3rd edition)

With this approach, analytic data is deployed on a departmental basis without concern to sharing
and integrating information across the enterprise, as illustrated in Figure 1.8. Typically, a single
department identifies requirements for data from an operational source system. The department
works with IT staff or outside consultants to construct a database that satisfies their
departmental needs, reflecting their business rules and preferred labeling. Working in isolation,
this departmental data mart addresses the department's analytic requirements.

Meanwhile, another department is interested in the same source data. It's extremely common for
multiple departments to be interested in the same performance metrics resulting from an
organization's core business process events. But because this department doesn't have access to
the data mart initially constructed by the other department, it proceeds down a similar path on
its own, obtaining resources and building a departmental solution that contains similar, but
slightly different data. When business users from these two departments discuss organizational
performance based on reports from their respective repositories, not surprisingly, none of the
numbers match because of the differences in business rules and labeling.

17
Independent Data Mart (cont)
• Deployed by department without concern to integration across
entire enterprise

• Often occurs with large companies because of a need for


speed over long-term strategy

• However, this approach leverages a lot of dimensional


modelling

18

18
Hub-and-Spoke Corporate Information Factory (Inmon)

19 Source: Figure 1.9 of Chapter 1 of The Data Warehouse Toolkit (3rd edition)

The hub-and-spoke Corporate Information Factory (CIF) approach is advocated by Bill Inmon and
others in the industry. Figure 1.9 illustrates a simplified version of the CIF, focusing on the core
elements and concepts that warrant discussion.

With the CIF, data is extracted from the operational source systems and processed through an
ETL system sometimes referred to as data acquisition. The atomic data that results from this
processing lands in a 3NF database; this normalized, atomic repository is referred to as the
Enterprise Data Warehouse (EDW) within the CIF architecture. Although the Kimball architecture
enables optional normalization to support ETL processing, the normalized EDW is a mandatory
construct in the CIF. Like the Kimball approach, the CIF advocates enterprise data coordination
and integration. The CIF says the normalized EDW fills this role, whereas the Kimball architecture
stresses the importance of an enterprise bus with conformed dimensions.

19
Hub-and-Spoke Corporate Information
Factory (Inmon)
• Extract data from various source systems (data acquisition)

• Enterprise Data Warehouse (EDW) makes normalization


mandatory

• Like Kimball, still advocates coordination and integration

• Queries run in the EDW but also department-centric data


marts are used to populate BI tools

20

20
Hybrid Kimball + Inmon

21 Source: Figure 1.10 of Chapter 1 of The Data Warehouse Toolkit (3rd edition

21
Hybrid Kimball + Inmon
• CIF-centric EDW used to organize data supporting Kimball-
style dimensional models

• Often driven by pre-existing investment in a 3NF EDW

• More expensive than single approach if you are starting from


scratch (time, complexity and coordination)

22

22
Comparison of 3 Methods
Kimball Data Marts Inmon CIF
• Focus on business • Single departments • Also encourages
process • Faster in short-term coordination
• Coordination + • May result in • Favours BI
integration duplication and professionals who like
• Simplicity (resist confusion from lack of normalization
normalization) coordination • Queries between EDW
and Data Mart layers
may be inconsistent

23

23
ETL Process Comparison

24

24
Considerations for ETL / ELT management
• Business requirements

• Compliance

• Data Quality

• Security

• Integration

• Latency

• Archiving and Lineage


25

25
Power Query
• Data preparation tool in Microsoft Excel, Power BI, Analysis
Services

• Utilizes a language called M

• GUI enables point-and-click transformation of data

• Audit log records each step in the process alongside source code

26

26
Power Query

27
Source: https://learn.microsoft.com/en-us/power-query/power-query-what-is-power-query

27
SQL Server / DB or DW tools
Stored Procedure

• Write a transformation using SQL to CREATE, INSERT or UPDATE a table or


its records

• Utilize interface like SSMS to manage files

• Also recommend setting up a Git repository for version control

Scheduled Jobs

• Need a way to run stored procedures on a schedule

• Manage dependencies (i.e. run update_fact_inventory before

28 update_dim_product)

28
Dbt (data build tool)
• Transformation workflow built for CDWs

• Utilize SQL and Python - SELECT statements converted to DML/DDL

• Integration with other tools

• Git

• Airflow / Prefect (orchestration)

• Cloud vs. Core (open source)

29

29
Example dbt Workflow

30 Source: Medium Article - What does dbt give you?

30
Summary
• Sales process shows us a good example of the fundamentals of the
dimensional model

• Difference BI / DW architectures exist but Kimball remains relevant because


of its principles of simplicity and usability

• Using Power Query for ETL in this course but keep in mind alternatives like
dbt because:
• Importance of documentation and version control

• Integration with other tools

31

31
Next Class
• Inventory and Procurement Data Models

• Snapshotting

• Additive vs. Semi-Additive Facts

• Building based on the Business Architecture

• Preparation - Read Chapter 4 of The Data Warehouse Toolkit (3rd


edition)

32

32
Inventory Value Chain

Source: Figure 4.1 of Chapter 1 ofSource: Figure 1.10 of Chapter 1 of The Data Warehouse Toolkit (3rd edition)
33

33
Modelling Inventory and Procurement
Week 3 – Day 1

Spring 2023 - CST2205 - Data Modelling

1
Agenda
• DW Planning - Key Terms

Inventory Case Study Procurement Case Study


• Inventory Value Chain • Single vs. Multiple Fact Tables

• 3 types of inventory models • Slowly changing dimensions

• Periodic Snapshot
(SDC

• Inventory Transactions

• Accumulating Snapshots
2

2
Resources
• Chapter 4 – Inventory

• Chapter 5 - Procurement

• Chapter 19 - ETL Subsystems


and Techniques

Reference:

Chapter 4 – Inventory
https://learning.oreilly.com/library/view/the-data-
warehouse/9781118530801/9781118530801c04.xhtml

Chapter 5 - Procurement
https://learning.oreilly.com/library/view/the-data-
warehouse/9781118530801/9781118530801c05.xhtml

3
DW Planning - Overview of Key Terms

4
Building The Business - One Process At
A Time
• Often, processes will share common dimensions

• Conformed Dimensions

• Bus architecture refers to a common structure which everything


connects and derives power (old electrical power industry reference)

• Goal = Plan which models to build first + identify shared dimensions

• Enterprise DW Bus Matrix


• Business Processes x Common Dimensions
5

Building the enterprise's DW/BI system in one comprehensive effort is too daunting - building it
as isolated pieces defeats the overriding goal of consistency.

For long-term DW/BI success, you need to use an architected, incremental approach to build the
enterprise's warehouse. The approach advocated by Kimball is the enterprise data warehouse bus
architecture.

A bus is a common structure to which everything connects and from which everything derives
power. The bus in a computer is a standard interface specification that enables you to plug in a
disk drive, DVD, or any number of other specialized cards or devices. Because of the computer's
bus standard, these peripheral devices work together and usefully coexist, even though they
were manufactured at different times by different vendors.

5
Relationship between Sales & Inventory

6 Source: Figure 4.8 of Chapter 4 The Data Warehouse Toolkit (3rd edition)

We modeled data from several processes of the retailer's value chain.

Although separate fact tables in separate dimensional schemas represent the data from each
process, the models share several common business dimensions: date, product, and store. We've
logically represented this dimension sharing in Figure 4.8. Using shared, common dimensions is
absolutely critical to designing dimensional models that can be integrated.

6
Relationship between Sales & Inventory

7 Source: Figure 4.9 of Chapter 4 The Data Warehouse Toolkit (3rd edition)

You can envision many business processes plugging into the enterprise data warehouse bus, as
illustrated in Figure 4.9.

Ultimately, all the processes of an organization's value chain create a family of dimensional
models that share a comprehensive set of common, conformed dimensions.

The enterprise data warehouse bus architecture provides a rational approach to decomposing the
enterprise DW/BI planning task. The master suite of standardized dimensions and facts has a
uniform interpretation across the enterprise. This establishes the data architecture framework.
You can then tackle the implementation of separate process-centric dimensional models, with
each implementation closely adhering to the architecture. As the separate dimensional models
become available, they fit together like the pieces of a puzzle. At some point, enough dimensional
models exist to make good on the promise of an integrated enterprise DW/BI environment.

The bus architecture enables DW/BI managers to get the best of two worlds - they have an
architectural framework guiding the overall design, but the problem has been divided into bite-
sized business process chunks that can be implemented in realistic time frames. Separate
development teams follow the architecture while working fairly independently and
asynchronously.

The bus architecture is independent of technology and database platforms. All flavors of
relational and OLAP-based dimensional models can be full participants in the enterprise data
warehouse bus if they are designed around conformed dimensions and facts. DW/BI systems
inevitably consist of separate machines with different operating systems and database
management systems. Designed coherently, they share a common architecture of conformed

7
dimensions and facts, allowing them to be fused into an integrated whole.

7
DW Bus Matrix - Dimensions x Business
Processes

8 Source: Figure 4.10 of Chapter 4 The Data Warehouse Toolkit (3rd edition)

Use an enterprise data warehouse bus matrix to document and communicate the bus
architecture (also called thr conformance or event matrix).

Working in a tabular fashion, the organization's business processes are represented as matrix
rows. It is important to remember you are identifying business processes, not the organization's
business departments.

The matrix rows translate into dimensional models representing the organization's primary
activities and events, which are often recognizable by their operational source. When it's time to
tackle a DW/BI development project, start with a single business process matrix row because that
minimizes the risk of signing up for an overly ambitious implementation. Most implementation
risk comes from biting off too much ETL system design and development. Focusing on the results
of a single process, often captured by a single underlying source system, reduces the ETL
development risk.

The columns of the bus matrix represent the common dimensions used across the enterprise. It is
often helpful to create a list of core dimensions before filling in the matrix to assess whether a
given dimension should be associated with a business process. The number of bus matrix rows
and columns varies by organization. For many, the matrix is surprisingly square with
approximately 25 to 50 rows and a comparable number of columns. In other industries, like
insurance, there tend to be more columns than rows.

After the core processes and dimensions are identified, you shade or “X” the matrix cells to
indicate which columns are related to each row. Presto! You can immediately see the logical
relationships and interplay between the organization's conformed dimensions and key business

8
processes.

8
Benefits from Conformed Dimensions
• Consistency

• Reusability

• Combine performance
measures from
fact_orders

different fact tables into


single report (utilize fact_inventory fact_sales

FULL OUTER JOIN) dim_product

Conformed dimensions serve as the cornerstone of the bus because they're shared across business process fact
tables.

Conformed dimensions go by many other aliases: common dimensions, master dimensions, reference
dimensions, and shared dimensions. Conformed dimensions should be built once in the ETL system and then
replicated either logically or physically throughout the enterprise DW/BI environment. When built, it's extremely
important that the DW/BI development teams take the pledge to use these dimensions. It's a policy decision that
is critical to making the enterprise DW/BI system function; their usage should be mandated by the organization's
CIO.

9
Inventory Value Chain

10 Source: Figure 4.1 of Chapter 4 The Data Warehouse Toolkit (3rd edition)

Most organizations have an underlying value chain of key business processes.

The value chain identifies the natural, logical flow of an organization's primary activities. For
example, a retailer issues purchase orders to product manufacturers. The products are delivered
to the retailer's warehouse, where they are held in inventory. A delivery is then made to an
individual store, where again the products sit in inventory until a consumer makes a purchase.

The figure illustrates this subset of a retailer's value chain. Obviously, products sourced from
manufacturers that deliver directly to the retail store would bypass the warehousing processes.

10
Inventory Value Chain (continue)
• How many business processes exist in this value chain?

• What occurs first? What occurs last?

• What dimensions do these processes share in common?

11

Operational source systems typically produce transactions or snapshots at each step of the value
chain. The primary objective of most analytic DW/BI systems is to monitor the performance
results of these key processes. Because each process produces unique metrics at unique time
intervals with unique granularity and dimensionality, each process typically spawns one or more
fact tables. To this end, the value chain provides high-level insight into the overall data
architecture for an enterprise DW/BI environment.

11
Inventory Models

12

12
Inventory Periodic Snapshot
• Each row represents the combination of:
• product_key

• store_key

• date_key

• How much quantity of inventory on hand at given location on a given


date?

• Potential for a very dense table - why?

13

Optimized inventory levels in the stores can have a major impact on chain profitability. Making
sure the right product is in the right store at the right time minimizes out-of-stocks (where the
product isn't available on the shelf to be sold) and reduces overall inventory carrying costs. The
retailer wants to analyze daily quantity-on-hand inventory levels by product and store.

The business process targeted for analyzing is the periodic snapshotting of retail store inventory.
The most atomic level of detail provided by the operational inventory system is a daily inventory
for each product in each store. The dimensions immediately fall out of this grain declaration:
date, product, and store. This often happens with periodic snapshot fact tables where you cannot
express the granularity in the context of a transaction, so a list of dimensions is needed instead. In
this case study, there are no additional descriptive dimensions at this granularity. For example,
promotion dimensions are typically associated with product movement, such as when the
product is ordered, received, or sold, but not with inventory.

13
Row-Density Sample Calculation
A. 10 stores

B. 10,000 products

C. 365 days of the year

• A x B x C = 36,500,000 rows per year

14

14
Inventory Periodic Snapshot

15 Source: Figure 4.2 of Chapter 4 The Data Warehouse Toolkit (3rd edition)

The simplest view of inventory involves only a single fact: quantity on hand.

The product dimension could be enhanced with columns such as the minimum reorder quantity
or the storage requirement, assuming they are constant and discrete descriptors of each product.

If the minimum reorder quantity varies for a product by store, it couldn't be included as a product
dimension attribute. In the store dimension, you might include attributes to identify the frozen
and refrigerated storage square footages.

15
Working with semi-additive facts
• Inventory levels can be summed across stores and products but
NOT across dates
• Quantity_on_hand are known as semi-additive facts

• Implications on SQL calculation


• Cannot do AVG() across rows

• Could do AVG() over (partition by product / store)

• OLAP products could also provide aggregation rules to make semi-


additive less problematic
16

In the inventory snapshot schema, the quantity on hand can be summarized across products or
stores and result in a valid total.

Inventory levels, however, are not additive across dates because they represent snapshots of a
level or balance at one point in time. Because inventory levels (and all forms of financial account
balances) are additive across some dimensions but not all, we refer to them as semi-additive
facts.

16
Enhancing Inventory Facts
Quantity_on_hand = Number of units of inventory

Quantity_sold = Number of units sold / shipped

Value_at_cost = Cost per unit (x qty = Cost of Goods Sold)

Value_at_current_sell_price = Sales per unit (x qty = Revenue)

17

For most inventory analysis, quantity on hand isn't enough. Quantity on hand needs to be used in
conjunction with additional facts to measure the velocity of inventory movement and develop
other interesting metrics such as the number of turns and number of days' supply (time series
analysis).

If quantity sold (or equivalently, quantity shipped for a warehouse location) was added to each
fact row, you could calculate the number of turns and days' supply. For daily inventory snapshots,
the number of turns measured each day is calculated as the quantity sold divided by the quantity
on hand. For an extended time span, such as a year, the number of turns is the total quantity sold
divided by the daily average quantity on hand. The number of days' supply is a similar calculation.
Over a time span, the number of days' supply is the final quantity on hand divided by the average
quantity sold.

17
Enhanced Inventory Snapshot

18 Source: Figure 4.3 of Chapter 4 The Data Warehouse Toolkit (3rd edition)

In addition to the quantity sold, inventory analysts are also interested in the extended value of
the inventory at cost, as well as the value at the latest selling price.

The quantity on hand is semi-additive, but the other measures in the enhanced periodic snapshot
are all fully additive. The quantity sold amount has been rolled up to the snapshot's daily
granularity. The valuation columns are extended, additive amounts. In some periodic snapshot
inventory schemas, it is useful to store the beginning balance, the inventory change or delta,
along with the ending balance. In this scenario, the balances are again semi-additive, whereas the
deltas are fully additive across all the dimensions.

The periodic snapshot is the most common inventory schema. We'll briefly discuss two
alternative perspectives that complement the inventory snapshot just designed. For a change of
pace, rather than describing these models in the context of the retail store inventory, we'll move
up the value chain to discuss the inventory located in the warehouses.

18
Inventory Transactions
• Granularity can also represent each activity in the value chain:
• Receive a product
• Place product into inspection hold
• Release product from inspection
• Return product to vendor due to inspection failure
• Place product in bin
• Pick product from bin
• Package product for shipment
• Ship product to customer
• Receive product from customer
• Return product to inventory following customer return
19
• Remove product from inventory

A second way to model an inventory business process is to record every transaction that affects
inventory.

Each inventory transaction identifies the date, product, warehouse, vendor, transaction type, and
in most cases, a single amount representing the inventory quantity impact caused by the
transaction.

19
Transaction Inventory Fact

20 Source: Figure 4.4 of Chapter 4 The Data Warehouse Toolkit (3rd edition)

The resulting schema is illustrated in the case of the granularity of the fact table is one row per
inventory transaction.

It contains detailed information that mirrors individual inventory manipulations. The transaction
fact table is useful for measuring the frequency and timing of specific transaction types to answer
questions that couldn't be answered by the less granular periodic snapshot.

It is impractical to use the transaction fact table as the sole basis for analyzing inventory
performance - it is too cumbersome and impractical for broad analytic questions that span
dates, products, warehouses, or vendors.

20
Inventory Accumulating Snapshot
1. INSERT = product is received at warehouse

2. UPDATE
a. Inspection

b. Bin Placement

c. Initial Shipment

d. Last Shipment

3. Facts measure milestone events (i.e. time from inspect to bin


placement)
21

Another inventory model is the accumulating snapshot.

Accumulating snapshot fact tables are used for processes that have a definite beginning, definite
end, and identifiable milestones in between. In this inventory model, one row is placed in the fact
table when a particular product is received at the warehouse.

The disposition of the product is tracked on this single fact row until it leaves the warehouse. In
this example, the accumulating snapshot model is only possible if you can reliably distinguish
products received in one shipment from those received at a later time; it is also appropriate if you
track product movement by product serial number or lot number.

21
Accumulating Snapshot Fact Table

22 Source: Figure 4.6 of Chapter 4 The Data Warehouse Toolkit (3rd edition)

The accumulating snapshot fact table provides an updated status of the lot as it moves through
standard milestones represented by multiple date-valued foreign keys. Each accumulating
snapshot fact table row is updated repeatedly until the products received in a lot are completely
depleted from the warehouse.

22
Which one is best?
• Transactions provide granular data but even with compute resources certain
business questions are impractical to answer

• Other challenges could be dimensionality from different source systems


involved in different activities - may point to multiple models

• Periodic snapshot tables are good for trends over time but the potential
explosion from the date granularity must be managed properly

• Accumulating snapshot tables are good for workflow / pipeline analysis to


measure efficiency of inventory movements but rely heavily on UPDATES

• Often, the combination is needed to answer all kinds of business questions


23

23
Fact Table Comparisons

24 Source: Figure 4.7 of Chapter 4 The Data Warehouse Toolkit (3rd edition)

24
Procurement Process

25

This subject area has cross-industry appeal because it is applicable to any organization that
acquires products or services for either use or resale.

Effective procurement of products at the right price for resale is obviously important to retailers
and distributors. Procurement also has strong bottom line implications for any organization that
buys products as raw materials for manufacturing. Significant cost savings opportunities are
associated with reducing the number of suppliers and negotiating agreements with preferred
suppliers.

25
What is procurement?
• Source / purchase materials or products for lowest possible price

• Possible business questions might include:

• Which products or materials are most frequently purchased?

• Can we consolidate orders to take advantage of volume


discounts?

• Are employees buying from preferred vendors?

• Does your price match the negotiated price?

26
• Are vendors delivering on time?

Demand planning drives efficient materials management.

After demand is forecasted, procurement's goal is to source the appropriate materials or


products in the most economical manner. Procurement involves a wide range of activities from
negotiating contracts to issuing purchase requisitions and purchase orders (POs) to tracking
receipts and authorizing payments.

The following list gives a better sense of a procurement organization's common analytic
requirements:
• Which materials or products are most frequently purchased? How many vendors supply these
products? At what prices? Looking at demand across the enterprise (rather than at a single
physical location), are there opportunities to negotiate favorable pricing by consolidating
suppliers, single sourcing, or making guaranteed buys?
• Are your employees purchasing from the preferred vendors or skirting the negotiated vendor
agreements with maverick spending?
• Are you receiving the negotiated pricing from your vendors or is there vendor contract
purchase price variance?
• How are your vendors performing? What is the vendor's fill rate? On-time delivery
performance? Late deliveries outstanding? Percent back ordered? Rejection rate based on
receipt inspection?

26
Procurement Process (simplified)

Requisition Order Shipment Receipt Payment

Internal employee Purchasing Vendor ships the Warehouse receives Accounting verifies
goods to the buyer the goods ordered match between order,
Approval process Set of instructions for shipping and receipt
fulfilling an order Terms of sale may Verifies receipt
Prevent fraud apply against order Pay invoice

Document Generated Document Generated Document Generated Document Generated Document Generated
Purchase Requisition Purchase Order Shipping Label Bill of Lading Invoice
Packing Slips Goods Receipt Payment Receipt

27

27
Single Procurement Fact Table

28 Source: Figure 5.1 of Chapter 4 The Data Warehouse Toolkit (3rd edition)

Using the four-step dimensional design process:


1. Procurement is the business process to be modeled.
2. In studying the process, there is a number of procurement transactions, such as purchase
requisitions, purchase orders, shipping notifications, receipts, and payments.
3. You could initially design a fact table with the grain of one row per procurement
transaction with transaction date, product, vendor, contract terms, and procurement
transaction type as key dimensions.
4. The procurement transaction quantity and dollar amount are the facts.

28
Single vs. Multiple Transaction Fact
Tables
• User requirements

• Are they separate stakeholders with different BI needs?

• Multiple business processes

• Are there separate control numbers and processes?

• Multiple source systems with different grains

• Is there a single ERP or multiple source systems?

• Dimensionality

• Are dimensions applicable to all transactions or some?


29

For business users, do they describe the various procurement transactions differently. To the
business, purchase orders, shipping notices, warehouse receipts, and vendor payments are all
viewed as separate and unique processes.

You are faced with a design decision. Should you build a blended transaction fact table with a
transaction type dimension to view all procurement transactions together, or do you build
separate fact tables for each transaction type?

As dimensional modelers, you need to make design decisions based on a thorough understanding
of the business requirements weighed against the realities of the underlying source data.

29
Using the Business Matrix to identify
processes

30 Source: Figure 5.2 of Chapter 4 The Data Warehouse Toolkit (3rd edition)

You can include two additional columns identifying the atomic granularity and metrics for each
row.

In this example, there are separate fact tables for purchase requisitions, purchase orders,
shipping notices, warehouse receipts, and vendor payments.

Users view these activities as separate and distinct business processes, the data comes from
different source systems, and there is unique dimensionality for the various transaction types.
Multiple fact tables enable richer, more descriptive dimensions and attributes. The single fact
table approach would have required generalized labeling for some dimensions.

30
Multiple Fact
Table -
Procurement
Example

31 Source: Figure 5.3 of Chapter 4 The Data Warehouse Toolkit (3rd edition)

Multiple fact tables may require more time to manage and administer because there are more
tables to load, index, and aggregate.

Some would argue this approach increases the complexity of the ETL processes, however, it may
simplify the ETL activities. Loading the operational data from separate source systems into
separate fact tables likely requires less complex ETL processing than attempting to integrate data
from the multiple sources into a single fact table.

31
Slowly Changing Dimensions

32

32
Slowly Changing Dimensions
• Dimensions may be updated slowly over time
• For example - customer updates their address or phone number

• Infrequently identified by the business user


• Proactiveness is required on the part of the BI professional

• Strategy for each attribute needed

• May not be a good replacement for events or fast changing


dimensions
• For example - Airbnb booking stages
33

Although dimension table attributes are relatively static, they aren't fixed forever; attribute
values change, albeit rather slowly, over time.

Dimensional designers must proactively work with the business's data governance
representatives to determine the appropriate change-handling strategy. You shouldn't simply
jump to the conclusion that the business doesn't care about dimension changes just because they
weren't mentioned during the requirements gathering.

Although IT may assume accurate change tracking is unnecessary, business users may assume the
DW/BI system will allow them to see the impact of every attribute value change.

33
Types of Slowly Changing Dimensions
Type Description Example
Type 0 Retain Original (never changes) Dates or original primary keys

Type 1 Overwrite (replace) Product Category


Phone Number

Type 2 Add new row + start_date + end_date Address


Phone / Device on App

Type 3 Add new attribute (new columns) Current_Device + Last_Used_Device

Type 4 Add Mini-Dimension using bands / ranges Demographics

34

For each dimension table attribute, you must specify a strategy to handle change - when an
attribute value changes in the operational world, how will you respond to the change in the
dimensional model?

You may need to employ a combination of these techniques within a single dimension table.

34
Best Practices for SDC
• Record SDC as close to source data as possible

• Leverage Slowly Changing Dimension Manager (refer to Chapter 19


ETL Subsystems and Techniques in Kimball)

• Leverage a change data capture system for more efficient updates


(incremental vs. full refresh)

• Add a date (9999-12-31) for expiration_date of Type 2 instead of


using NULL to assist in BETWEEN function

35

35
Summary
• Bus Matrix enables us to see conformed dimensions and plan our
build

• Conformed dimensions have many benefits including consistency


and easier report / metric building

• 3 types of inventory models - combination are necessary


• Periodic Snapshot

• Transaction

• Accumulating Snapshot
36

36
Summary (cont’d)
• Procurement is a good candidate for a multiple fact table
• Each process may be distinct and warrants its own star schema

• Conformed dimensions will still apply in this situation

• Slowly changing dimensions


• Necessary because data may change

• Distinguish slowly changing vs. events / transactions

• Choice between replace, adding new row or adding new column

37

37
Next Class
Review:

• Chapter 6 - Order
Management

• Chapter 7 - Accounting
processes

38

38
Modelling
Customer Relationship Management (CRM)
and
Human Resources Management (HRM)
Week 4 – Day 1

Spring 2023 - CST2205 - Data Modelling

1
Agenda
Customer Relationship Human Resources Management
Management (CRM) (HRM)
• Overview • Overview
• Modelling CRM in Kimball • Challenges with HRM data
• Employee-Manager relationships
• Analytical Outputs
• Slowly Changing Dimension
• Exposure to modern technology - revisited
Customer Data Platform

2
Resources
• Chapter 8 - Customer
Relationship Management

• Chapter 9 - Human Resources


Management

• Chapter 19 - ETL Subsystems


and Techniques

Reference:

Chapter 8 - Customer Relationship Management


https://learning.oreilly.com/library/view/the-data-
warehouse/9781118530801/9781118530801c08.xhtml

Chapter 9 - Human Resources Management


https://learning.oreilly.com/library/view/the-data-
warehouse/9781118530801/9781118530801c09.xhtml

3
Customer Relationship Management
(CRM)

4
What is CRM?
“Strategy to manage interactions with existing and potential
customers” (Salesforce)
• Contact Management
• Sales Management
• Agent Productivity
• Tracking the entire customer lifecycle
• Potential
• New Customer
• Repeat Customer
5

5
Benefits of CRM
• Increased customer satisfaction and retention

• Better marketing return on investment (ROI)

• Cross-team collaboration

• Improved sales forecasting

• Enhanced products and services

6
Modelling CRM

7
Analytics Loop for CRM

8 Source: Figure 8.1 of Chapter 8 The Data Warehouse Toolkit (3rd edition)

8
Aggregated Facts as Dimension
Attributes
We can calculate aggregated facts and store as dimensions in
the customer table, why would we do this?
• Date of first purchase

• Repeat customer

• Customer lifetime value (CLV)


Recommend to use higher level attributes vs. specific dollar
values - why?
9

9
What Are Possible Attributes For
Customer Segments?

10

If time allows, show example of Google Ads


https://instapage.com/blog/adwords-audience-insights

10
Warning For Data Science Uses
Data leakage
Using attributes for a prediction to which you would not have
access at the time of event

• Obvious through high correlation with dependent variable


• Using attributes like customer lifetime value, repeat customer,
etc could result in leakage
• Why?
• Example - Predict CLV using first 7 days of user activity

11

https://machinelearningmastery.com/data-leakage-machine-learning/

11
Dealing With Possible Many-to-many
Relationships
• Using a bridge table can help break many-to-many
relationships (replace with one-to-many in both directions)
• Example - Multiple Customer Contacts

12 Source: Figure 8.9 of Chapter 8 The Data Warehouse Toolkit (3rd edition)

12
How would you break this many-to-many
relationship?

Visit_ID Cust_ID Date_Session Cust_ID Sales_Order_ID Date_Order

1 1 20230101 1 A1 20230101

2 1 20230102 2 B1 20230102

3 2 20230103 2 B2 20230102

13

13
Customer Transaction (Event) Fact Table

14 Source: Figure 8.12 of Chapter 8 The Data Warehouse Toolkit (3rd edition)

14
Analytical Outputs from CRM

15

If time allows, show example of Google Ads


https://instapage.com/blog/adwords-audience-insights

15
Cohort Charts
Analyze / Assess
this cohort chart:
1. Identify the
business process
2. Identify the grain
3. Identify the facts
4. Identify the
dimensions

16 Source: https://mode.com/blog/cohort-analysis-helps-look-ahead/

See more at: https://mode.com/blog/cohort-analysis-helps-look-ahead/

For people who analyze customer behavior, the table above is a familiar one. This Mixpanel chart
measures retention rates across different user cohorts. By moving down the table, you can see
how retention is changing over time.

This report, which shows how sticky your product is over time, has become one of the most
important measures of health for many companies. It's supported by a wide variety of tools, from
out-of-the-box reporting tools like Google Analytics and KISSmetrics to data-collection services
like Segment and Keen.io. Industry experts agree, this is the best tool for co-horting customers
and measuring retention.

Though temporal cohorts like these are great for measuring what's happening, they have a major
drawback: they don't tell you what to do next. If retention is improving, great! But what should
we do next? (Do more? Do less?) More troublingly, if retention is falling, how do we fix it?

16
Lifetime Value (LTV)
Analyze / Assess
this cohort chart:
1. Identify the
business process
2. Identify the grain
3. Identify the facts
4. Identify the
dimensions

17 Source: https://www.brightcove.com/en/resources/blog/calculate-lifetime-value-svod-business/

See more at: https://www.brightcove.com/en/resources/blog/calculate-lifetime-value-svod-


business/

How much revenue do you think you will receive from your next customer? Will they stay for a
few months, then cancel? Or will they join the ranks of your multi-year subscribers?

Each of these questions relies on understanding the lifetime value (LTV) of a customer and using
that value to compare natural segments of your users to find veins of satisfied (and long-
subscribing) customers. How else would you know whether spending $30 or $60 to acquire a new
customer is worthwhile?

17
How would we build these in SQL?
1. Identify a repeat customer vs. a new customer?

2. Calculate cumulative sum of sales amount?

3. Build the analytical table to calculate LTV cohorts

• Refer to next slide, start with conceptual output

18

18
• What would these outputs look like as a chart?

Date_Key
Cust_ID Sales_ID Sales_Amount Date_Key Row_Number Running_Total
20230101
1 A1 $100 20230101 1 $100
20230102
2 B1 $50 20230101 1 $50
20230103
2 B2 $50 20230103 2 $100
20230104
3 C1 $200 20230104 1 $200
20230105

19

19
Identify Repeats using ROW_NUMBER()
WINDOW Function
with sales (
select
Cust_ID
, Sales_ID
, Sales_Amount
, Date_Key
from sales
)
20

20
Running Total
with sales (
select
Cust_ID
, Sales_ID
, Sales_Amount
, Date_Key
from sales
)
SELECT
*
, SUM(Sales_Amount) OVER (PARTITION BY Cust_ID
ORDER BY Date_Key ASC) AS Running_Total_Sales
FROM sales
21

21
Generate table for LTV - multiply sales
with dates
Trick - use cross join

with sales (
select * from table
)
, date_spine (
Select date_key from dates
)
, final as (select * from sales
cross join date_spine using (date_key)
)

22

22
Sample Sales Funnel

23 Source: https://revenuehunt.com/build-sales-funnel-shopify-store/

See more at: https://revenuehunt.com/build-sales-funnel-shopify-store/

Over the past few years, eCommerce sales funnels have emerged as one of the fastest ways to
get more sales, increase Average Order Value and sell more products to every visitor.

The visual representation of an eCommerce sales funnel are used by marketers all the time. As
you get closer to a sale, the number of people that go to the next step decreases.

The main advantages of having a sales funnel are:

– Increased sales through better targeting. Sales funnels advise customers on finding the right
products for them and help them make confident purchasing decisions.

– Getting to know your customers. During different steps of the funnel, you’ll have the chance to
collect actionable data about your customers and define your buyer personas, while considering
demographics, customer motivation, behavior patterns, and goals. This information will help you
when retargeting customers with segmented marketing campaigns.

– Building long term relationships with your audience. Only 2% of shoppers typically convert on
their first visit to an eCommerce store. However, with the right incentives, it’s easy to capture
new visitors’ emails. This way you can start building a relationship with them and you can close
more sales further down the road.

23
Event Collection

A customer data platform (CDP) enables tracking


users across web + app by doing:

• Identity calls

• Consistent events

• Feeding events to various SaaS apps

24 Source: https://www.indicative.com/resource/modern-data-infrastructure/ and https://youtu.be/2fErE_8hFEc

See more at: https://www.indicative.com/resource/modern-data-infrastructure/ and


https://youtu.be/2fErE_8hFEc

A simplified look at the current data landscape shows that an architecture where companies own
and control their own data—where the data warehouse is the central hub connecting to all other
tools and a gravity well for all business data—is emerging.

That architecture represents a major shift in how data is ingested, stored, and analyzed by
companies of all sizes—and key players in the data analytics industry aren’t keeping up.

To get a sense of the modern data infrastructure and the solutions leading the charge, the
infographic includes tools that are considered the best in their class and that are believed will play
a big role in the future of data.

24
Reverse ETL

Feed data from DW back into SaaS Apps:

• Calculate attributes for Google Ads

• Trigger alerts on Slack or Zendesk

• Predictive models for conversion into


Salesforce

25 Source: https://www.indicative.com/resource/modern-data-infrastructure/ and https://hightouch.com/blog/reverse-etl

See more at: https://www.indicative.com/resource/modern-data-infrastructure/ and


https://hightouch.com/blog/reverse-etl

What is Reverse ETL?

Reverse ETL is the process of copying data from your central data warehouse to your operational
systems and SaaS tools so your business teams can leverage that data to drive action and
personalize customer experiences.

Data warehouses are only accessible to technical users who know how to write SQL. However,
this is often where your core metrics and customer definitions live. For a B2B business, this might
include metrics like active workspaces, last login date, churn rate, LTV, lead score, etc. For a B2C
business, this might include items in cart, recent purchasers, pages viewed, etc.

Reverse ETL is all about syncing this data to your downstream tools, thus further unlocking the
value of your data warehouse. Instead of reacting to your data as it's persisted into a reporting
tool, Reverse ETL allows you to take a proactive approach and put it in the hands of your
operational teams so they take action in your business applications.

25
Human Resources Management (HRM)

26

26
Business Matrix for HRM
Check your understanding for
fact types:
• Transaction
• Periodic Snapshot
• Accumulating Snapshot

27 Source: Figure 9.4 of Chapter 9 The Data Warehouse Toolkit (3rd edition)

27
Challenges with modelling HRM data
• Combination of snapshots and transactions
• Employee vs. Supervisor / Manager as the JOIN key
• Type 2 SDCs and potential for multiple changes occurring at the
same time

28

28
Two Ways To Model Manager-employee
Relationship

29 Source: Figure 9.5 and 9.6 of Chapter 9 The Data Warehouse Toolkit (3rd edition)

29
Business Matrix for HRM
What strategy would you use Type Description

to meet business use case? Type 0 Retain Original (never changes)

• Address Change? Type 1 Overwrite (replace)

• Change in Manager? Type 2 Add new row + start_date + end_date

• Change in Employee Type 3 Add new attribute (new columns)

Type 4 Add Mini-Dimension using bands / ranges


Position?

30

30
Notes About Survey / Text Data
Text data is often messy and large so we should:
• Confirm value with business stakeholders
• Store outside the fact table
• Why?
• Try to normalize this data
• Categorize (i.e. good, bad, neutral)
• Reduce dimensionality (i.e. NPS score >= 7 or <7)
• Handling NULLs (i.e. “Not Answered”)
31

31
Sample Survey Schema

32 Source: Figure 9.10 of Chapter 9 The Data Warehouse Toolkit (3rd edition)

32
Reality Check - Vendor Software
• Given complexity of HRM, majority of companies utilize vendor
solutions with BI plug-ins
• Caveat, still the job of the data modeller for centralized reporting
and amalgamating different information across processes and
software

33

33
Final Note about CRM and HRM
• Due to specialized knowledge / nature of these disciplines, often
a market for specialized talent
• Salesforce Analyst
• Revenue Operations
• HR Analytics
• Consultant (Salesforce)

34

34
Summary CRM
• Many benefits from CRM including improving retention / LTV
• Importance of using aggregated facts as dimensions
• Bridge tables help to break many-to-many relationship issues
• Cohort analysis and LTV analysis - concept + pseudo-SQL
• Role of new technology (CDP, Reverse ETL) in the CRM
process

35

35
Summary HRM
• Combo of transactions, accumulating facts and snapshots
• Many challenges with HRM → proliferation of vendor tools
• Role-playing / outrigger dimensions to handle employee-
manager
• Importance of attribute-specific strategy for SDCs
• FYI for working with textual attributes

36

36
Next Class
Finalize Kimball knowledge by:

• Reviewing case studies for


multiple businesses /
industries

• Chapter 18 + 19

37

37
Appendices

38

38
Useful SQL Tips and Tricks
How to Perform User Cohort Analysis SQL?: 4 Easy Steps
https://hevodata.com/learn/cohort-analysis-sql-2/

How to Calculate LTV (Lifetime Value) in SQL


https://hightouch.com/blog/how-to-calculate-lifetime-value-in-sql#calculating-ltv-using-sql

Cohort Analysis on Databricks Using Fivetran, dbt and Tableau


Applied marketing analytics on the Modern Data Stack
https://www.databricks.com/blog/2022/08/25/cohort-analysis-databricks-using-fivetran-dbt-and-
tableau.html

39

39
Useful SQL Tips and Tricks
• Date Spine
• Build dimension(s) - first purchase / cohort date or attribute
• Use a cross-join to create an exploded table of facts (same
revenue / sales if no new activity)
• OR range join for aggregated cohorts

40

40
Data Validation
Week 5 – Day 1

Spring 2023 - CST2205 - Data Modelling


Agenda
• What is data validation?

• Importance of data validation

• Concepts

• Techniques
• SQL

• Power Query / Power BI

2 • Examples in real-life (IRL)


What Is Data Validation?
• Ensure data is clean and good quality
• Occurs at multiple stages of ETL
• Data at source / generation
• Transformation layer in ETL
• Loading into BI tool
• Not just missing values
• Conformity to business rules (i.e. total net sales = sales -
discounts)
3 • Identifying duplicates (PK is unique)
Importance Of Data Validation
• Impact on metrics / measures
• Duplicates can cause inflated numbers
• 0 is non-null counted in average calculation
• Impact on quality of life
• Data types - facts that should be dimensions and vice versa
• Formatting and precision - decimals vs. integers
• Formatting - Geography or dates

4
Data Validation Concepts
CHECK TYPE DESCRIPTION EXAMPLE
Data Type Is the attribute the correct type? phone_number accepting only integers

Range Are numbers within a low/high range? Lat / Long between -180.0 and 180.0

Format Is the data in the right format? “YYYY-MM-DD” vs. “YYYY-DD-MM”

Consistency Data entry and logic matches business rule A row cannot have a ship date without an
and is consistent order date (accumulating fact table)

Unique Are PK or other attributes unique? Surrogate customer_id is unique

Presence Not NULL Email* address required for signup

Length Character length under or meet a threshold Passwords must be minimum 10 characters

Accepted Values Values are found on a list 7 days of the week or 50 states in the USA

5
Techniques

SQL Power BI
• Using COUNT (PK) vs. COUNT • Data Profiling in Power Query
(field)
• Errors after transformation steps in
• IFNULL (test,1,0) Power Query
• Output of a JOIN for something
• Writing DAX functions
like product_category vs.
product_category table • Reviewing visual outputs of charts
• Testing business rules with • Multi-row index cards
WHERE • Histograms
• WHERE ship_date is not null and • Tables
order_date is null
6
Examples in Real Life

7
Power BI

8 Source: Tabular Editor + Excelerator BI Blog


Other Tools
Great Expectations - SQL-based
platform

Dbt - testing + orchestration in


Transformation layer

Amundsen - Open source data catalog


created by Lyft engineers

9 Source: https://www.indicative.com/resource/modern-data-infrastructure/
dbt (data build tool)

10 Source: https://docs.getdbt.com/docs/build/tests
Metadata Models with Framework
Manager
Week 6 – Day 1

Spring 2023 - CST2205 - Data Modelling

1
Agenda
• IBM Cognos High-Level Overview

• Framework Manager (FM) role in suite of tools

• IBM Cognos Analytics Data Modules


• Trend towards cloud-based democratized data modelling

2
IBM Cognos Analytics History
• Cognos was a Canadian company founded in 1969 and
acquired by IBM in 2008 for $4.9 B

• IBM integrated and renamed “Cognos BI and Financial


Performance Management” in 2009

• Redesign version 11 and renamed to Cognos Analytics in 2015

• Shift to cloud-based and data modules

• IBM Cognos BI and Framework Manager will not have new

3
features developed after 2022

3
IBM Cognos Analytics Components

4
IBM Cognos Analytics Components

When a user runs a report, interactively or in the background, the metadata and the data of the
report are accessed through a combination of the package from which the report was authored,
and the data source from which the package was modeled.

The data source includes that connection string to the database and may include a signon that
allows access to the database.

The data source is used to query theh database and retrieve the appropriate data and the result
set is presented back to the user.

There may be multiple connections for a given data source and multiple signons for a given
connection.

5
The Role Of A Metadata Model In Cognos
Analytics

Relational Files Other


Cubes

An IBM Cognos model provides a business presentation view of an organization’s data sources.
Models let users to analyze and report on their data sources.

A metadata model is different from a typical data model in that it can hide the structural
complexity of your underlying data sources. By creating a metadata model, you have more
control over how your data is presented to end users. You can also choose which data to display
to your end users and how it will be organized. The overall goal of modeling the metadata is to
create a model that provides predictable results and an easy-to-use view of the metadata for
authors and analysts.

Your underlying data sources may be very diverse. For example, you may have operational or
reporting data in one or more relational databases. You may also have legacy data in various
other file formats. You may even have online analytical processing (OLAP) sources that include
cubes (such as Cognos PowerCubes), as well as other sources such as SAP BW.

6
Data Modeler – 3 Main Goals
Accuracy

Usability Performance

What is your first priority?


7

Data modelers have three main goals:

• Accuracy: Reports must contain accurate data.


• Usability: Packages produced from a model must be understandable by report authors and
other users.
• Performance: Report data, should be retrievable in a reasonable amount of time.

As a modeler, accuracy is your first priority. You must determine whether usability or
performance is the next highest priority for your authors and users.

7
Framework Manager Work Flow
Framework Manager

Create Model
Import Prepare
Project Metadata for
Metadata Metadata
Reporting

Set Create and


Publish Manage
Security
Packages

Data Content IBM Cognos


Sources Store Analytics - Reporting
8

8
Identify Data Model Types And Data
Structures

The example diagram presents an operational database model with the data in normalized form
on the left, and a reporting database model in a star schema on the right.

There are basically two types of relational models: operational and reporting.

Operational databases are used to track day-to-day business operations, hence the name
operational. They are usually normalized or part of an enterprise resource planning (ERP) vendor
package.

Reporting databases are typically a copy of the operational data that are structured to make
reporting faster and easier. They are usually dimensional, taking the form of a star schema design.

In general, it is recommended that you create a logical model that conforms to star schema
concepts. This is a requirement for IBM Cognos Analytics - Reporting and has also proved to be an
effective way to organize data for your users.

9
Understand Merits Of Each Model Type
Operational Databases

• Designed to maximize accuracy and minimize redundancy.

• Optimized for writing and updating data,

• Often need multiple joins to produce accurate queries, which


impacts the speed of queries in a negative way.

10

Operational databases are designed to maximize accuracy and minimize redundancy. They are
optimized for writing and updating data, rather than reading and reporting on data. They often
need multiple joins to produce accurate queries, which impacts the speed of queries in a negative
way.

10
Understand Merits Of Each Model Type
Operational Systems

• Designed with one goal - to get data into the database quickly.

• Databases are normalized to reduce redundancy.

• Little to no redundancy ensures that there is data integrity and that


database triggers function properly

• The right data is captured.

11

Operational systems are designed with one goal in mind: to get data into the database quickly.
These databases are normalized to reduce redundancy. Having little to no redundancy ensures
that there is data integrity and that database triggers function properly, so that the right data is
captured.

11
Understand Merits Of Each Model Type
Reporting Databases (Star Schema)

• Structured - transactional data is stored in a fact table.

• Reference data is stored in separate dimension tables.

• Joined together only as necessary to produce the reporting data.

• Star schema database - fewer tables than a fully normalized


database,
• Query performance is much faster

12

Reporting databases, on the other hand, are structured such that transactional data is stored in a
fact table. Reference data is stored in separate dimension tables. These are the two basic
components of what is known as a star schema. They are joined together only as necessary to
produce the reporting data.

Because a star schema database contains fewer tables than a fully normalized database, query
performance is much faster. Typically, a query against a star schema database focuses on one
central fact table and makes integrity checks against the related dimension tables. If the query is
to retrieve information about a specific subject area only, such as all the products that belong to a
particular product line, then the query will be even faster.

In a virtual star schema, you collapse the relationships between tables in order to form
dimensions.

12
Identify Data Model Types And Data
Structures

13

The example diagram presents the same two models as the previous diagram, with the
relationships of the operational model collapsing to form the star schema of the reporting model.
Extract, Transform, and Load (ETL) tools can be used to create a star schema data warehouse, or
you may use a metadata modeling tool to emulate a star schema structure by generating the
appropriate SQL at report design time. The second option will not improve performance, but will
yield predictable results.

Framework Manager cannot create a warehouse, but it can emulate a star schema structure by
collapsing query subjects to simplify the view and generate the appropriate SQL at run time.

13
Example

14

This example presents three normalized tables that represent three hierarchical levels before
they are collapsed into a star schema dimension. Products roll up into product types, and product
types roll up into product lines.

The Product Line table has two rows that indicate two lines of products sold by the company.

The Product Type table contains four rows to indicate the four types of products that fall under
the previous two product lines (two types per product line).

The Product table contains the greatest level of detail. It contains six rows to represent the six
products that fall under the four product types.

Reporting data is de-normalized, as shown in the following example.

14
Example

15

This example shows a de-normalized dimension table created from the three normalized tables
shown previously. The tables have now been collapsed into a star schema dimension.
The Product Line table forms the first two columns of the new dimension table (PL# and PL_Desc),
the Product Type table forms the next two columns (PT# and PT_Desc), and the Product table
forms the last two columns (Prod# and Prod_Desc).

The main characteristic of this table is its redundancy. Note that each product line (Classic Tents
and Moose Boots) is repeated, once for each product that the product line contains. The same
applies for product type. This type of table is unsuitable for a normalized system, but is ideal for a
reporting and querying structure.

As stated previously, facts and dimensions are the two basic components of a star schema. Fact
tables are the focal point of the star schema, and typically contain the most rows. There are
typically no descriptive attributes in a fact table. Instead, there are foreign keys that relate to the
dimension tables, which contain descriptive attributes.

Facts in a fact table are also known as metrics, measures, or key performance indicators. There
are cases where you may encounter factless fact tables, in which only foreign keys are found. For
example, in the case of a library, you may have a fact table that only contains a book key, a
customer key, and a day key, which records which books were checked out by which customer
and when.

Dimension tables provide the descriptive information for the star schema. Dimension tables may
be conformed, meaning that they apply to multiple fact tables across the business. Conformed
dimensions prevent islands of information by providing context to multiple potential queries.

15
Examine Relationships And Cardinality

16

The example diagram shows the relationship cardinality indicators for the three types of
relationships: one-to-one, one-to-many, and many-to-many.

One-to-one relationships occur when one unique row in a table relates to exactly one row in
another table. In the cardinality example provided, each employee can only have one security
number. One security number can only be associated with one employee.

A one-to-many relationship might occur, as in the example, when an order is taken and an Order
Header table is populated with data such as date, customer name, and sales staff name. This
table is related to an Order Details table that contains data about individual items sold in that one
order, such as order detail code, product number, and quantity. Therefore a relationship exists
between Order Header and Order Details, whereby each Order Header must contain one or many
Order Details, and each Order Detail must appear on one and only one Order Header.

Many-to-many relationships occur when many unique rows in a table relate to many rows in
another table. The example shows that many suppliers may provide a single part. However, a
single supplier may provide many parts.

In addition to the instances of cardinality, there are two possible types of cardinality: optional and
mandatory. Specify an optional relationship (minimum cardinality of 0) when you want the query
to retain the information on the other side of the relationship in the absence of a match. An
optional relationship generates an outer join, which results in a null value when there is no data
for one table that matches a row of data in another table. Make sure that you only define
optional relationships when required, as generating outer joins can negatively affect
performance.

16
Examine Relationships And Cardinality

Mandatory cardinality

17

This example shows an example of mandatory cardinality and how it is indicated by using
relationship indicators. The 1..n notation on the Product table means that there will be at least
one Product record for every Order record.

In this mandatory relationship, a row of data, a product record, must exist, in order for a row of
data in another table, an order, to exist. In other words, as in the example, you cannot have an
order without a product.

17
Examine Relationships And Cardinality

Optional cardinality

18

This example shows an example of optional cardinality and how it is indicated by using
relationship indicators. In the example, the 0..n on the Sales table indicates that there can be 0 or
more records for each Sales Rep record. So, it is possible to have a Sales Rep that has no sales to
their credit. However, you cannot have a Sale record that does not have a Sales Rep connected to
it.

In this optional relationship in the example, a row of data, a sale, does not have to exist in order
for a row of data in another table, a sales rep, to exist.

18
Identify Different Data Traps

Chasm/ Relationship Trap

19

A chasm trap is a many-to-many relationship where more than one row in a table is related to
more than one row in another table. The structure cannot record and maintain data (it lets the
information fall into a chasm). This structure is not necessarily incorrect when designing at a high
level, it just does not show all the necessary details.

Notice in the many-to-many below. It shows that a single supplier may provide many parts, and
many suppliers may provide a single part. If each supplier can potentially supply every single part,
how do you report on the suppliers that provide specific parts? This is typically resolved with a
bridge table that records the details of the relationship between the two tables.
A transitive relationship trap exists if there is more than one join path between two tables. This
relationship trap resembles a wheel shape.

The example shows the circular logic of a relationship trap.

An Order table is the hierarchical parent of an Order Detail table. Adjoined to both of those
tables is a Customer table. It presents the quandary of what is the best path to link a customer to
an order? Is it by way of the Customer-to-Order relationship, or by way of the Customer-to-Order
Detail relationship?

This kind of trap may make it difficult to write queries that retrieve the appropriate data, because
it may not be clear which table columns must be included in the query. Going through either path
produces some sort of result, but which path is the correct or more efficient one for your specific
query?

Fan traps, also known as parallel relationships, involve multiple one-to-many relationships that

19
fan out from a single table, implying that the two other tables have no connection to each other.

19
Identify Different Data Traps

Fan Trap
20

Fan traps, also known as parallel relationships, involve multiple one-to-many relationships that
fan out from a single table, implying that the two other tables have no connection to each other.

The example shows a fan trap or parallel relationship with a Division table related to a Branch
table such that one division may have many branches. The Division table is also related to an
Employee table such that one Division may have many Employees.

The modeler should examine fan traps to determine whether a crucial relationship is missing. For
example in the diagram, Branch may actually have a direct relationship to Employee.

20
Identify Different Data Traps

Fan Trap
21

The final data trap you will examine is the connection trap. The connection trap suggests that
there may be an optional path through different entities. For example, in the following graphic,
what is the relationship between Branch and Employee? If an employee does not work for a
branch, do they work for a division?

The example shows a connection trap whereby there is a serial relationship from left to right with
a Division table being related to a Branch table, which is related to an Employee table. However,
there is also a relationship directly between the leftmost Division table and the rightmost
Employee table indicating that the middle entity may be bypassed for some records.

The relationship in the sample infers that there may be a direct relationship between the
Employee and Division tables. An employee may be able to work directly for a division, instead of
working for a branch within a division. For example, the employee may work from a home office.
The problem with this trap is that there must be a reliable path through all truly related entities.

21
Identify Different Data Traps
Four basic data traps:

• Chasm trap (many-to-many relationship)

• Transitive relationship trap (more than one path between two tables)

• Fan trap (multiple one-to-many relationships that fan out from a


single table)

• Connection trap (an optional path through different entities)

22

There are four basic data traps that are defined in the upcoming paragraphs. A data trap does not
always indicate that there is a problem, only that an area or scenario created in the model is
worth inspection, and possible refinement. Be careful of these four data traps.
• Chasm trap (many-to-many relationship)
• Transitive relationship trap (more than one path between two tables)
• Fan trap (multiple one-to-many relationships that fan out from a single table)
• Connection trap (an optional path through different entities)

These are data modeling traps, not metadata modeling traps, so you cannot use Framework
Manager to fix them in the data source. However it is useful to know of them so that you can
make metadata model designs which can handle them and generate the appropriate SQL at run
time to provide predictable results.

A chasm trap is a many-to-many relationship where more than one row in a table is related to
more than one row in another table. The structure cannot record and maintain data (it lets the
information fall into a chasm). This structure is not necessarily incorrect when designing at a high
level, it just does not show all the necessary details.

Notice in the many-to-many below. It shows that a single supplier may provide many parts, and
many suppliers may provide a single part. If each supplier can potentially supply every single part,
how do you report on the suppliers that provide specific parts? This is typically resolved with a
bridge table that records the details of the relationship between the two tables.

A transitive relationship trap exists if there is more than one join path between two tables. This
relationship trap resembles a wheel shape.

22
Identify Data Model Types And Data
Structures

Measure
Shortcut
Dimension
Model Query
Subject
Query Item Regular
Dimension

Data Source
Query
Relationship Subject

23

Most data that you will model will be structured data. This means that the source data is stored in
physical and logical structures, such as tables, views, cubes, and so on. The structure of the data
may be dependent upon the usage of the data as well as the data source itself. In IBM Cognos
Analytics, you can work with both relational and dimensional data sources.

A relational model has a basic metadata structure that resembles tables and columns in a
database. Relational models can have either an operational (normalized) or reporting structure
(star-schema). You can use Framework Manager to transform metadata from an operational data
source into a model that optimizes it for reporting.

Related to relational models are what are known as Dimensionally Modeled Relational (DMR)
models. They are built from relational data sources but are modeled with a dimensional structure
(like OLAP) consisting of dimensions, hierarchies, and measures.

23
Data Modelling Concepts in Framework
Manager
Week 6 – Day 2

Spring 2023 - CST2205 - Data Modelling

1
Agenda
• Framework Manager Project Structure

• Data Modelling Approach

• Importance of Star Schemas

• Relational Modelling Concepts specific to FM

• Common Data Modelling Traps

2
Quick Review
• IBM FM version 11 will continue to be supported but no
plans for future enhancements by IBM

• Trend towards cloud-based and data democratization

• Data Modules (the new FM) in Cloud Analytics

• Power BI equivalent to IBM Cognos FM is Power Query


Editor + Relationship / Model view

3
Framework Manager Interface

4
Framework Manager Work Flow
Framework Manager

Create Model
Import Prepare
Project Metadata for
Metadata Metadata
Reporting

Set Create and


Publish Manage
Security
Packages

Data Content IBM Cognos


Sources Store Analytics - Reporting
5

5
Framework Manager Projects
Start (mandatory):
Foundation Objects View

Middle (optional):
- Dimensional View (OLAP)
- Consolidation View

End (mandatory):
Presentation View

A middle layer is highly


recommended to manage
complex transformations
6

6
Defining Metadata Elements Query Subjects:
Data source, model or stored
procedure

Query Items:
Components of subject (i.e.
columns of table)

Regular Dimensions:
Descriptive information, hierarchies
for OLAP-style query

Measure Dimensions:
Facts for OLAP-style query

Shortcut:
Alias or reference visible for report
7 authors

7
What do Report Authors View

8
Data Modelling Approach

9
Examine Key Modeling Recommendations

10

Following are ten key modeling recommendations that apply to both operational and dimensional
(star schema) data sources. If proper design has been implemented, then dimensional data will
usually require much less modeling in the Framework Manager environment.

NOTE: These recommendations are designed to be a guideline. Modelers must do what is


appropriate for their situation. These recommendations cannot account for every modeling need,
and therefore, modelers will always need to decide when to use some or all of these
recommendations, and when to model outside of the paradigm presented here.

1. Define reporting requirements and data access strategies. This will help you to find the
correct data and define a data access strategy. Based on available data sources, data
volumes, and environmental factors such as network speed, hardware processing power, and
so on, an appropriate data access strategy should be planned and implemented to ensure
acceptable response times to report requests.

2. Import only required reporting objects in manageable chunks in a phased approach and
alter data source query subjects as little as possible. Leaving data source query subjects as
simple all-inclusive select statements reduces future maintenance. For example, when a table
has a new column added to it, simply update the data source query subject that references it
in Framework Manager and the new column will appear as a new query item.

3. Verify that relationships reflect those in the data source and that the query item properties
are set correctly. For relationships, notice the following items.
• cases where a dimension query subject relates to a fact query subject on
different keys

10
• cases where there are multiple valid relationships between query subjects
• dimension query subjects that belong to multiple hierarchies

1. Model in freehand to identify modeling challenges and how query subjects are used (which
query subjects are treated as facts, dimensions, or both). Identifying these issues on paper
can provide a clear modeling plan.

2. Use model query subjects to control query generation and usage and to consolidate
metadata. You should use simplified, abstracted model query subjects to resolve modeling
challenges.
• Create aliases where required to control query paths.
• Modeling as a virtual star schema to control SQL generation (what is a fact?
what is a dimension?).
• Remove descriptive (dimensional) attributes from fact tables.
• Consolidate related information into one model query subject for a cleaner
presentation (for example, placing all product related query items in one
model query subject).

10
Examine Key Modeling Recommendations

11

NOTE: These recommendations are designed to be a guideline. Modelers must do what is


appropriate for their situation. These recommendations cannot account for every modeling need,
and therefore, modelers will always need to decide when to use some or all of these
recommendations, and when to model outside of the paradigm presented here.

6. Customize metadata for runtime.


• Use parameter maps and session parameters to handle dynamic column or
row retrieval.
• Use prompt values and query macros to add mandatory user prompts and
security filters.

7. Specify determinant information where required to enable accurate aggregation in cases


where a level of granularity has repeating keys, your data contains BLOBs, or you want to
avoid the distinct clause on unique values when grouping or enhance performance for regular
dimensions.

8. Resolve any relationship ambiguities, such as multiple joins between two query subjects, by
deleting surplus joins and by creating role-playing dimensions.

9. Create regular and measure dimensions if authors need to perform OLAP-style queries on
relational data.

10. Create the business view as a set of star schema groupings. Use those groupings to build
logical business groupings in the business view and to indicate conformed dimensions based
on naming conventions.

11
Define Reporting Requirements

12

12
Reporting Requirements
• Identify the BI related problems to be solved
• Setting the scope of your project

• Setting its methodology, multilingualism, performance,


security, and presentation.

• Interview authors and users to determine needs


• focus is on information required by key business decision
makers (framework of key performance indicators (KPIs)
13

As a modeler, you must identify the business-intelligence related problems to be solved.


Problems to be solved include setting the scope of your project and setting its methodology, as
well as multilingualism, performance, security, and presentation.

Interview authors and users to determine what their needs are and if possible, view existing
reports that IBM Cognos will be replacing.

Ralph Kimball (author on the subject of data warehousing and business intelligence) recommends
an interview-based approach to determine report requirements. The focus is on what information
the key business decision makers require to do their jobs. This will result in a framework of key
performance indicators (KPI) and business contexts.

You should ask questions such as:

• Do you and the IBM Cognos users agree on the model requirements?
• Does the data source contain the data and metadata you need?
• Does the same data exist in more than one source?
• Which data source tables are the fact tables, which are the dimensions, and which are both fact
tables and dimensions? What are the keys and attributes of each dimension?
• Do fact tables contain only facts and foreign keys? Do they also contain dimensional attributes
that should be in dimension tables?
• What are the required relationships and are there multiple paths between tables?

13
Questions to Ask
• Do you and the users agree on the model requirements?

• Does the data source contain the data and metadata you need?

• Does the same data exist in more than one source?

• Which data source tables are the fact tables

• Which are the dimensions?

• Which are both fact tables and dimensions?

• What are the keys and attributes of each dimension?


14

You should ask questions such as:

• Do you and the users agree on the model requirements?


• Does the data source contain the data and metadata you need?
• Does the same data exist in more than one source?
• Which data source tables are the fact tables, which are the dimensions, and which are both
fact tables and dimensions? What are the keys and attributes of each dimension?
• Do fact tables contain only facts and foreign keys? Do they also contain dimensional attributes
that should be in dimension tables?
• What are the required relationships and are there multiple paths between tables?

14
More Questions to Ask
• Do fact tables contain only facts and foreign keys?
• Do they also contain dimensional attributes that should
be in dimension tables?

• What are the required relationships?


• Are there multiple paths between tables?

15

You should ask questions such as:

• Do you and the users agree on the model requirements?


• Does the data source contain the data and metadata you need?
• Does the same data exist in more than one source?
• Which data source tables are the fact tables, which are the dimensions, and which are both
fact tables and dimensions? What are the keys and attributes of each dimension?
• Do fact tables contain only facts and foreign keys? Do they also contain dimensional attributes
that should be in dimension tables?
• What are the required relationships and are there multiple paths between tables?

15
Explore Data Sources To Identify Data
Access Strategies
Consider your data access strategy:

• What are your data sources?

• Are there multiple data sources?

• Are your data sources operational or reporting data?

• What type of relationships are involved in your source data?

• What relationships will you need to build in your model

16

Consider your data access strategy. Questions you should ask include?

• What are your data sources?


• Are there multiple data sources?
• Are your data sources operational or reporting data?
• What type of relationships are involved in your source data and what relationships will you
need to build in your model?

Understand and list your data access strategies as much as possible based on what you know and
can find out about your data sources.

16
Modeling Approach

17

17
Model In Stages
• Model using an iterative approach.

• Develop your model in stages - start with a subset of


report requirements and then import additional metadata
as needed.

• During modeling activities:


• Discover other requirements

• The data itself may present you with other reporting options
18

A recommended practice is to model in stages using an iterative approach.

A common tendency is to immediately import all metadata to meet all reporting requirements
which creates a complex set of objects to work with as a starting point. The recommended
approach is to develop your model in stages. You should start with a subset of report
requirements and then import additional metadata as needed.

As you progress in your modeling activities, you may discover other requirements or the data
itself may present you with other reporting options that were not thought of by the end users.

18
Modeling in Layers

19

Modeling in layers means that you use a Presentation View for logical groupings. A Presentation
View contains only the star schema groupings. This logically groups objects appropriate for the
business and easily allows you to create separate packages for different reporting needs.

This graphic represents different approaches to layer modeling using various models.

Views are presented as labeled rectangles. From left to right, model

1. Presentation View on top of a blank view, on top of the Foundation Objects View.
2. Presentation View directly on top of the Foundation Objects View.
3. Presentation on top of a Business Logic View, on top of the Data Source View.
4. Presentation View on top of the Consolidation View, on top of the Foundation Objects View.

19
Modeling in Layers
Start (mandatory):
Foundation Objects View

Middle (optional):
- Dimensional View (OLAP)
- Consolidation View

End (mandatory):
Presentation View

A middle layer is highly recommended to


manage complex transformations
20

The decision to have another layer between the Foundation Objects View and the Presentation
View involves several factors. For example, what is the size of the model? Do you need to reduce
development time rather than ensure ease of maintenance later in the modeling cycle?

20
Modeling in Layers
Start (mandatory):
Foundation Objects View

• Contains data source and model query


subjects, as well as calculations and filters.

End (mandatory):
Presentation View

• Appropriate for large projects

• Difficult to maintain if data source structure


changes frequently
21

You can model in layers using no middle layer. When you do so, the Foundation Objects View
contains data source and model query subjects, as well as calculations and filters. Though it is
appropriate for large projects, it can be difficult to maintain if the data source structure changes
frequently. This method requires the least duplication of query subjects, keeping the physical size
of the project files to a minimum. It is best suited to large implementations or situations where a
data warehouse has already been set up to accommodate the majority of the specialized business
logic for reporting.

While it requires less development time, this method can require more maintenance when in
production, since you will need to remodel to reflect any changes to the underlying data source
objects, since the published objects are simply shortcuts to the data source query subjects.
Therefore, if you expect ongoing changes to the underlying data structure, this may not be a
suitable option.

21
Modeling in Layers
Start: Data Source View

Middle: Business Logic View


• All model query subjects with relationships
• All calculations and filters

End: Presentation View

• Insulates between data source and reports

• Less maintenance

• Longer development time, project files,


22
models

Using a Business Logic View layer, lets you set up the complex queries and reuse foundation layer
objects in multiple locations. This provides insulation from the underlying data source for the
reports. No work is required in the Data Source View since it is all done in the Business Logic
View.

Creating the model query subjects (and rebuilding all their relationships) in the Business Logic
View takes extra work, but it provides a layered structure for improved model maintenance,
readability and portability. For example, to move the application from one database vendor to
another, all you have to do is re-map the model query subjects in the Business Logic View to the
tables in the new data source. There is no need to rebuild the reports, or remodel the metadata.

22
Modeling in Layers
More flexibility than business logic view

Model query subjects and calculations at


Foundation or Consolidation as required

Both Consolidation View + Business Logic


View will result in larger physical size
because of additional model query
subjects

23

Using a Consolidation View layer, is a compromise between the previous two methods, you
create model query subjects and their relationships in the lowest layer. The middle layer acts as a
consolidation layer with some business logic (where required). For example, this view is where
snowflake dimensions are consolidated. It also acts as an insulation layer between reports and
the data source.

Calculations may appear in either the lowest or the middle layer, depending on where the related
model query subject is created. A model query subject that is created to resolve a reporting issue
will be created in the lowest layer. Therefore, any calculations required for that object will be
defined in the lowest layer.

23
Relational Modelling Concepts

24

24
Key Concepts
1. Cardinality - the relationships between subjects (tables)

2. Determinants - reflects granularity by creating subsets


or groups of data within a subject
• Closely related to keys and indexes

3. Multiple-fact, multiple-grain queries


• Occurs where conformed dimensions related to multiple
fact tables and required to compute different metrics
25

25
Importance of Cardinality
• IBM Cognos uses cardinality in the following ways:
• To identify query subjects as facts or dimensions

• To avoid double-counting fact data

• To support loop joins

• To optimize access to the underlying data source system

26

26
Determinants

27

27
Introducing Determinants

28

28
Example: Time Dimension

29

29
Specifying Determinants
Grain = Day identified as
unique
Group by
Year
Month

Month Key has year


embedded

Settings will determine


output of IBM Cognos
SQL

30

30
Specifying Determinants
Grain = Day identified as
unique
Group by
Year
Month

Month Key does NOT have


year
Therefore, we need to nest
it underneath the Year Key

Settings will determine


output of IBM Cognos SQL

31

31
Specifying Determinants
• Uniquely Identified OR
Group BY but NOT both

• Evaluated in order of
specification

1. Year
2. Quarter
3. Month
4. Day

32

32
Multiple-Fact, Multiple-Grain

33

33
Example Of Multiple Fact Multiple Grain Query
• Multiple Grains
• Year > Month > Day (Time
Dim)
• Product Line > Type > Name
(Product Dim)

34

34
Correct Output
By default, aggregation from each fact table at lowest common level of
granularity

Example:
Quantity from Sales, Expected volume from Product forecast, Month
from Time, and Product name from Product

Output:
Month x Product

35

35
What Happens Without Correct Determinants?
Incorrect aggregations could result duplicating data

April 2007 - 30 days + February = 29 days


30 x 1,690 = 50,700
30 x 125 = 3,750
29 x 245 = 7,134
36

36
Summary
• Best practice to create middle layers to manage complexity in
a project

• Importance of a virtual star schema whether you do relational


modelling vs. OLAP

• Importance of cardinality + determinants for the generated


SQL and output of models

• Avoiding common data traps by sticking to Kimball best


practices
37

37
Data Modelling Concepts in Framework
Manager
Week 7 – Day 1

Spring 2023 - CST2205 - Data Modelling

1
Agenda
• Calculations and Filters

• Presentation View

• Evaluating SQL in Cognos

2
Calculation and Filters

3
Create Calculations
• Add business logic to provide report authors with values used
regularly
REVENUE = QUANTITY * UNIT SALE PRICE

• Calculations can use query items, parameters, and functions in their


expressions.

• Two types of calculations:


• embedded

• stand-alone
4

Create calculations
You can add business logic to your model by creating calculations. Creating calculations is a way
to provide report authors with values that they regularly use. An example of this might be where
report authors regularly report on revenue. The calculation might read REVENUE = QUANTITY *
UNIT SALE PRICE. Calculations can use query items, parameters, and functions in their
expressions. There are two types of calculations: embedded and stand-alone.

If you want to create a calculation specifically for one query subject or dimension, you can embed
the calculation directly in that object. For query subjects, this calculation can be done for either
data source query subjects or model query subjects. However, it is recommended that you apply
calculations in model query subjects wherever possible. This allows for better maintenance and
change management.

Create a stand-alone calculation when you want to apply the calculation to more than one query
subject or dimension. Stand-alone calculations are also valuable if you need to do aggregation
before performing the calculation. You can also create stand-alone calculations as an alternate
way to present information rather than in a query subject or dimension.
If you start with an embedded calculation, you can later convert it into a stand-alone calculation
that you can apply to other query subjects.

4
Example: Revenue, Gross Profit and
Margin in GOSALES

5
Filter Data Two options for filters

1. Embed directly in query subject;


2. Stand-alone applied to more than
1 query subject or dimension.

3 Usage Settings
1. Always
2. Optional
3. Design Mode Only

Data Modelers can also write a


WHERE clause in the SQL definition

Filter Data
There are two types of filters you can create: embedded and stand-alone. The filter
appears in different locations in the Query Subject Definition, depending on which of the
two types it is.

The example shows the Filters tab of a Query Subject Definition dialog box. Displayed
under the expanded Filters folder are six stand-alone filters. There are two filters
displayed in the Filters pane along with their Usage settings. One is an embedded filter,
and the other is one of the six stand-alone filters.

Embedded filters are appropriate when the filter is intended for just one query subject or
dimension. Stand-alone filters are appropriate when required in multiple query subjects or
dimensions, or to make commonly used filters readily available for authors.

Filters have a Usage setting with the following options:

• Always - the filter will always be applied, regardless of whether the filtered query item
is in the query or not
• Optional - users may choose to enter a filter value or leave it blank (only applies to
filters that use a prompt value or macro)
• Design Mode Only - limits the amount of data that is retrieved when testing in
Framework Manager or when authoring reports

You can also restrict the data that a query retrieves by adding a WHERE clause to the
SQL definition, or by setting governors.

6
Customizing Metadata for Runtime
• Modify query subjects to dynamically control the data returned
using:
• Session parameters
• Parameter maps
• Macros

• Equivalents also exist in other BI tools:


• PBI Power Query Parameters
• SQL Server Parameters
• dbt (data-build-tool) macros

Parameters and macros


Parameters and macros can be used to dynamically return data from specific columns or rows,
and even from specific data sources. You can modify query subjects to dynamically control the
data returned using session parameters, parameter maps, and macros.

An example of why you would do this is to dynamically implement security by using a calculation
to retrieve a user's account, group, or role information, and then implement row-level security
based on values stored in the data source.

You could also retrieve location specific data for a particular user. For example, if a user works in
Mexico, you can use a calculation to return sales values for Mexico by querying the
Revenue_Mexico column rather than other columns in the table such as Revenue_Canada or
Revenue_Germany.

A session parameter returns session information at run time (for example, runLocale or
account.UserName).

A parameter map is a two-column table that maps a set of keys (source) to a set of substitution
values. The keys must be unique, and the table is used to substitute one value for another.

A macro is a fragment of code that you can insert within filters, calculations, properties, and so
on, that are to be evaluated at run time. Macros are enclosed by the # character.

PBI Power Query Parameters - https://learn.microsoft.com/en-us/power-query/power-query-


query-parameters

7
SQL Server Parameters - https://learn.microsoft.com/en-us/sql/relational-databases/stored-
procedures/parameters?view=sql-server-ver16
dbt (data-build-tool) macros - https://docs.getdbt.com/docs/build/jinja-macros

7
Customizing Metadata for Runtime

Parameters store a value and can Parameter Maps are key-value


be called in a query subject pairs that allow users to call the
value based on a more human-
8 understood key

8
Customizing Metadata for Runtime

Parameters and macros


The example shows a filter expression filtering on a query item where it equals the result of a
macro called SecurityLookup. The macro takes as a parameter a session parameter called
account.defaultName. There is a parameter map table below the filter expression and a box
showing the generated SQL based on all three components working together.

The diagram shows the practical use of all three metadata customizing tools: a session parameter,
a parameter map, and a macro. In the diagram, the filter expression is filtering the value of a
query item called Sales_Staff_Code where it equals the value returned by a macro
(SecurityLookup). The macro takes as its argument a session parameter called
account.defaultName.

Here is how it works. A user named Bart Scott is signed into IBM Cognos. The session parameter,
account.defaultName is evaluated and returns the value Bart Scott as an argument to the macro
statement. The macro, SecurityLookup, is a function that consults the parameter map for the key,
Bart Scott. It is evaluated and the value 60 is returned by the macro to the filter expression,
where the generated SQL becomes:

WHERE Sales_Target_Fact.Sales_Staff_Code = 60

These tools comprise a very dynamic and elegant way of customizing metadata at runtime.

9
Dynamically Retrieve Language Column

10

10
Presentation View

11

11
What is a Presentation View?
The only visible definitions seen by end
users when package exported to the
Cognos Analytics portal

Recommended to use star schema

1. More reusable
2. Easier to understand
3. Supports dependencies on things

12

What is a presentation view?


A presentation view provides a logical and simplified presentation of metadata for report authors.
It groups related model objects together and provides authors with commonly used tools, such as
filters and calculations.

Use a presentation view


Generally, a consolidation view and a foundation objects view are hidden from report authors.
Presentation views often consist of shortcuts to consolidation view model query subjects,
arranged in star schema groupings (a fact query subject and all its related dimensions). You can
create and publish several packages that are based on the presentation view, each one providing
a different view of metadata for different reporting needs.

You do not have to model and present as a star schema. For example, if your model is designed to
satisfy only a certain set of pre-built reports from which authors cannot stray, then you can model
your metadata to that specific end. However, if you are modeling to a broader and largely ad hoc
audience, then modeling as a star schema is an excellent choice for achieving predictable results.

12
Populate a Presentation View
Build each query group
using Create Star
Schema Grouping
wizard

Remember importance
of cardinality for
identifying

Facts = * end
Dimensions = 1..0 end

13

Populate a presentation view


The Create Star Schema Grouping wizard creates logical groupings of central fact tables and their
related dimensions. These groupings consist of shortcuts to the underlying objects and are placed
in a namespace so that the same dimension names can occur in other star schema groupings. This
allows authors to identify conformed dimensions.

As you model, you should document your logical groupings with a dimension map. You can then
use the dimension map to quickly create your star schema groupings.

The consolidation view normally contains objects which are based on objects in the foundation
objects view. The model query subjects are related to each other in the foundation objects view.
These objects are then grouped for presentation in the presentation view.

13
Identify Conformed Dimensions
Importance of
Consistent Names for
Conformed Dimensions

• Same dimension
names across queries
enable stitch queries
• Stitch queries
aggregate on the
conformed dimension
• Enables multiple KPIs
in a single report
14

Identify conformed dimensions


This diagram illustrates how you can query two facts, Sales Fact and Sales Target Fact, by using
one or all of the conformed dimensions in a presentation view. The dimension shortcuts, in each
namespace, have the same name.

You must use at least one conformed dimension to report across facts to allow for stitch queries
and to ensure that each fact is correctly aggregated. The dimension shortcuts in each namespace
of the presentation view should have the same name to indicate they are conformed and point
back to the same original query subject.

Modelers and authors can quickly identify conformed dimensions in the presentation view based
on naming conventions. If designed correctly, dimensions with the exact same name in different
namespaces are shared between the facts.

Dimensions that are not shared between facts (non-conformed) can still be used in multi-fact
queries providing at least one conformed dimension is used.

14
Data Source Querys

15

15
Overview Of Data Source Query Subjects

16

Overview of data source query subjects


In this diagram, the SQL on the left side for the imported RETURNED_ITEM data source query
subject is a simple, all-inclusive select statement. If you do not alter this SQL and new columns are
added to the table, they will automatically be included when you update the query subject or test
it. If you modify the SQL as seen on the right side of the diagram, new columns will need to be
added manually in the SQL statement.

Sometimes customized SQL is required for a specific application. You can modify the SQL as
required, to generate SQL that meets specific needs. You can also implement parameter driven
dynamic SQL. However, you should alter the simple select statements as little as possible to
generate the most efficient SQL and simplify model maintenance.

You should try to have only one instance of a SQL statement per table to reduce future
maintenance. This is not always possible, but should be implemented as much as possible.

16
Set the SQL Type
• You can set the SQL type for data source query subjects

• Available SQL Type settings are:


• Cognos
• Native
• Pass-through

17

Set the SQL Type


At run time, IBM Cognos Analytics generates native SQL that is designed to use the optimizers of
the database. It is optimized for database vendor and version, and it leverages the features of
databases wherever possible.

The SQL Type setting is local to the query subject and impacts how a table-based query is defined
and used in query generation. By default, Framework Manager uses Cognos SQL to create and
edit query subjects.

Cognos SQL adheres to SQL standards and is portable. It can contain metadata from multiple data
sources and it has fewer database restrictions. Cognos SQL works with all relational and tabular
data sources. If you need to port your model from one vendor to another, you should use Cognos
SQL since it works with all relational and tabular data sources. It also allows IBM Cognos to
generate the most optimized SQL possible, for example, by removing unused elements at query
time. If a database does not support a particular function, using Cognos SQL will allow the
function to be performed locally if Limited Local processing is allowed.

Native SQL allows SQL that is specific to your database. It may not be portable and it cannot
contain metadata from multiple data sources. When you edit a query subject, you can specify
Native SQL. Native SQL is the SQL the data source uses, such as Oracle SQL. Native SQL lets you
use keywords that are not available in Cognos SQL. You can copy and paste SQL from another
application into Framework Manager for quick replication of application specific requirements
and leverage work already done.

When viewing generated Cognos SQL at run-time for a query subject that is set to Native SQL, the

17
native SQL appears as a sub-query contained between {}. IBM Cognos may add statements to the
SQL you enter in order to optimize the performance of the query.

Pass-through SQL lets you use native SQL without any of the restrictions the data source imposes
on sub-queries. There are some databases that do not extend support for all constructs to sub-
queries. In these cases, as well as cases where you require constructs that are not supported by
the query layer, use Pass-through SQL.

You should use the Pass-through SQL setting with caution as it may have a negative performance
impact. With Cognos SQL and Native SQL, when SQL is generated, IBM Cognos may create
wrappers for sub-query constructs, and pass the entire construct (wrapper and sub-query) to the
database. Some vendors may not support this. Pass-through SQL will tell IBM Cognos to send only
the sub-query to the database and then process the remaining SQL construct (wrapper) locally.

When viewing Cognos SQL for a query subject that is set to Pass-through SQL, the native SQL that
you typed will appear as a sub-query contained between {{}}.

17
IBM Cognos Query Generation Architecture
• IBM Cognos User
interfaces submit
the SQL
• The selected SQL
type determines
how IBM Cognos
generates the SQL

18

IBM Cognos Query Generation Architecture


The diagram shows how the IBM Cognos user interfaces submit SQL, how the selected SQL type
determines how IBM Cognos generates the SQL, and how Pass-through SQL bypasses the IBM
Cognos Query Generation to run directly against the data source.

Cognos SQL is generated by one layer in the query engine and then passed to another for
conversion to native SQL and optimization. The query is then passed to the appropriate database.
If you have chosen the Native SQL option, IBM Cognos will send the SQL directly to the
optimization layer mentioned above and then on to the appropriate database.

Pass-through SQL will simply send the sub-queries of unsupported sub-query constructs directly
to the database.

18
SQL Type – Cognos SQL
• Adheres to SQL standards
• It can contain metadata from multiple data sources
• fewer database restrictions
• Works with all relational and tabular data sources.
• To port model from one vendor to another use Cognos SQL
• Allows IBM Cognos to generate the most optimized SQL possible -
removes unused elements at query time.
• If a particular function is unsupported, using Cognos SQL will allow
the function to be performed locally if Limited Local processing is
19 allowed.

SQL Types - Cognos SQL


Adheres to SQL standards and is portable. It can contain metadata from multiple data sources and
it has fewer database restrictions. Cognos SQL works with all relational and tabular data sources.
If you need to port your model from one vendor to another, you should use Cognos SQL since it
works with all relational and tabular data sources. It also allows IBM Cognos to generate the most
optimized SQL possible, for example, by removing unused elements at query time. If a database
does not support a particular function, using Cognos SQL will allow the function to be performed
locally if Limited Local processing is allowed.

19
SQL Type – Native SQL
• Allows SQL that is specific to your database.
• It may not be portable
• It cannot contain metadata from multiple data sources
• Lets you use keywords that are not available in Cognos SQL.
• You can copy and paste SQL from another application into Framework
Manager for quick replication of application specific requirements and
leverage work already done.
• Native SQL appears as a sub-query contained between {}
• IBM Cognos may add statements to the SQL you enter in order to
20 optimize the performance of the query

SQL Types - Native SQL


Allows SQL that is specific to your database. It may not be portable and it cannot contain
metadata from multiple data sources. When you edit a query subject, you can specify Native SQL.
Native SQL is the SQL the data source uses, such as Oracle SQL. Native SQL lets you use keywords
that are not available in Cognos SQL. You can copy and paste SQL from another application into
Framework Manager for quick replication of application specific requirements and leverage work
already done.

When viewing generated Cognos SQL at run-time for a query subject that is set to Native SQL, the
native SQL appears as a sub-query contained between {}. IBM Cognos may add statements to the
SQL you enter in order to optimize the performance of the query.

20
SQL Type – Pass-through SQL
• Use native SQL without any of the restrictions the data source imposes on
sub-queries.
• Some databases that do not extend support for all constructs to sub-
queries.
CAUTION: use Pass-through SQL setting with caution – possible negative
performance impact.
• Some vendors may not support IBM Cognos created wrappers for sub-
query constructs
• Send only the sub-query to the database - processes the remaining
21 wrapper locally ~ sub-query contained between {{}}.

SQL Types – Pass-through SQL


Lets you use native SQL without any of the restrictions the data source imposes on sub-queries.
There are some databases that do not extend support for all constructs to sub-queries. In these
cases, as well as cases where you require constructs that are not supported by the query layer,
use Pass-through SQL.

You should use the Pass-through SQL setting with caution as it may have a negative performance
impact. With Cognos SQL and Native SQL, when SQL is generated, IBM Cognos may create
wrappers for sub-query constructs, and pass the entire construct (wrapper and sub-query) to the
database. Some vendors may not support this. Pass-through SQL will tell IBM Cognos to send only
the sub-query to the database and then process the remaining SQL construct (wrapper) locally.

When viewing Cognos SQL for a query subject that is set to Pass-through SQL, the native SQL that
you typed will appear as a sub-query contained between {{}}.

21
Overview Of Stored Procedure Query Subjects
• Two types:
• Data Query - returns a single result set based on a simple or
complex query
• Data Modification - leverages a stored procedure in the data source
to modify the data source
• If a stored procedure returns multiple result sets, IBM Cognos Analytics
only supports the first result set
• Import a stored procedure into FM - create a query subject, or use
Metadata Import Wizard
22 Stored Procedures: https://www.sqlshack.com/sql-server-stored-procedures-for-beginners/

Overview of stored procedure query subjects


There are two types of stored procedure query subjects, data query and data modification. Both
types can accept arguments. A data query stored procedure returns a single result set based on a
simple or complex query. A data modification stored procedure leverages a stored procedure in
the data source to modify the data source.

If a stored procedure returns multiple result sets, IBM Cognos Analytics only supports the first
result set. Framework Manager defines the metadata according to the result set returned by the
stored procedure when it is first created. If an existing stored procedure returns a different result
set than when it was created, it will cause an error.

You can import a stored procedure into Framework Manager by either creating a query subject,
or using the Metadata Import Wizard. If you use the Metadata Import Wizard, the query subject
will appear to be broken until you verify its projection list.

Some data source systems allow for multiple stored procedures with the same name; however
each accepts a different number and/or type of argument that determines which stored
procedure is used. This is known as an overloaded signature. To work with overloaded signatures,
create multiple stored procedures with unique names, and then create a separate query subject
for each result set.

22
Using Prompt Values
• Use prompt values when user input is required Also, available for
for variables beyond the report author’s control • Parameter Maps
• The syntax for using a prompt as a value is: • Session Parameters
?PromptName? • Expressions (filters,
calculations and
relationships)

23

Use prompt values


Use prompt values when user input is required for variables beyond the report author’s control.
The syntax for using a prompt as a value is ?Prompt Name?.

In general, it is better to define prompts in the reporting application to make use of the additional
prompt features. However, there are some variables that report authors cannot modify such as
parameters in a stored procedure. For these, you can use Framework Manager to define prompts.

Prompt values can also be used in:


• parameter maps
• session parameters
• expressions including filters, calculations, and relationships

If a stored procedure with an order number parameter returns rows for a specified order, instead
of using a hard-coded order number as the argument for the stored procedure query subject, you
can use a prompt, such as ?Order Number?. This will allow the end-user to specify which order
they want to retrieve information for.

23
Data Modification Stored Procedures
• Update data sources by adding, updating, or deleting records
• If available in a package, report authors can define conditions that
trigger stored procedures to execute

• Limitations to the usage:


• Many organizations do not allow external applications to update source data
• Database administrators may limit which tables and columns can be modified

24

Data modification stored procedures


Data modification stored procedures update data sources by adding, updating, or deleting
records. If the stored procedures are available in a package, report authors can define conditions
that trigger stored procedures to execute.

There are some potential limitations to the usage of data modification stored procedures. Many
organizations do not allow external applications to update source data. Additionally, database
administrators may limit which tables and columns can be modified.

IBM Cognos Analytics data sources have two properties that control how report authors can use
stored procedure query subjects:
• Transaction Access Mode: controls the level of access for each transaction. This value can be set
to either Read-Only, or Read-Write.
• Transaction Statement Mode: specifies the action that will occur when the transaction ends.
This value can be set to either: Rollback, Commit, or Autocommit.

24
Example: Stored Procedures

25

25
Generated SQL

26

26
SQL by a BI Tool vs. Analyst
BI Tool Analyst
• Subqueries • Common-Table Expressions (CTEs)
• Robotic Aliases • Aliases interpreted by humans
• Formatting / spacing is off • Formatting + spacing
• Use of comments / annotations

27

27
Governors That Affect SQL Generation
Project governors can limit queries and affect the SQL generated at run time:
• Outer Joins (Allow, Deny)
• Cross-Product Joins (Allow, Deny)
• Shortcut Processing (Automatic, Explicit)
• SQL Join Syntax (Explicit, Implicit)
• Grouping of Measure Attributes (Enable, Disable)
• SQL Generation for Level Attributes (Group by, Minimum)
• SQL Generation for Determinant Attributes (Group by, Minimum)
• SQL Parameter Syntax (Marker, Literal)
• Use WITH clause when generating SQL (Yes, No)

28

Governors that affect SQL generation


Several project governors can limit queries and affect the SQL generated at run time.

These include:
• Outer Joins (Allow, Deny)
• Cross-Product Joins (Allow, Deny)
• Shortcut Processing (Automatic, Explicit)
• SQL Join Syntax (Explicit, Implicit)
• Grouping of Measure Attributes (Enable, Disable)
• SQL Generation for Level Attributes (Group by, Minimum)
• SQL Generation for Determinant Attributes (Group by, Minimum)
• SQL Parameter Syntax (Marker, Literal)
• Use WITH clause when generating SQL (Yes, No)

This course examines only some of the available governors in detail, since some are used in rare
cases. Refer to the product documentation for details on each governor setting.
The SQL Join Syntax governor controls how SQL is generated for inner joins. Selecting Explicit will
generate INNER JOIN syntax, and selecting Implicit will use WHERE syntax. This setting you
choose depends on your own personal preference.

The Use WITH clause when generating SQL governor lets you choose to use the WITH clause with
IBM Cognos SQL if your data source supports it.

The With clause governor toggles Common Table Expression syntax. IBM Cognos Framework
Manager currently only supports the Non-Recursive form of common table expressions. The With

28
clause is used to avoid scanning the same table several times if the query against it is required
more than once in a larger query.

28
Explore SQL Generation

29

Explore SQL generation

Frequently asked questions are:


• What is the coalesce function?
• Why are the same columns being selected in two different derived tables?
• Why is there a full outer join?
• What does the XSUM function do?

The SQL in the example is IBM Cognos SQL generated, based on the selection of various query
items, and with auto aggregation enabled. SQL with auto aggregation values is the default setting
in IBM Cognos Analytics - Reporting.

29
Using Derived Tables – Example 1
Derived tables:
• Are aliased sub-selects
• Enable developers to return values without observing complex SQL
• Are generated in both IBM Cognos SQL and native SQL

30

Using derived tables

Derived tables:
• Are aliased sub-selects
• Enable developers to return values without observing complex SQL
• Are generated in both IBM Cognos SQL and native SQL

A derived table retrieves a record set that fulfills the requirements of the parent query. Although
the use of derived tables can create queries that are very long and verbose, the advantage is that
they articulate the work being done by the query in blocks that can be linked back to the
database.

In the example, the second SELECT statement is a derived table, and D2 is the derived table alias.

Not only are derived tables essential for complex queries that require layers of calculations and
filters, but they also make the queries easier to debug. Each block of native SQL in a derived table
query can be executed independently in the native interface of a database vendor, and are
therefore more easily diagnosed when the behavior is not as expected.

The outer blocks of SQL are derived from the inner blocks of SQL.

In Oracle environments, derived tables are known as in-line views.

30
Using Derived Tables – Example 2

31

Using derived tables


Derived tables use alias names that make it easy to identify the query from which the projected
items come. In the following example, there are two derived tables, D2 and D3, which achieve the
final projections list. The derived table alias names are also used in the Join statement.

Derived tables are instrumental in stitch queries, so they will be reviewed first before stitch query
SQL is examined.
Each derived table returns values that are used to achieve the final projection list (the first select
statement) as well as values to achieve the final join statement. Notice the alias names in the first
select statement and in the final join statement.

31
Identifying Stitch Query SQL

32

Identifying stitch query SQL


Stitch queries are used to achieve predictable results for multi-fact queries.

There are three essential components of a stitch query:


• coalesce function
• full outer join
• multiple queries that query some of the same information

In the example, the SQL represents a multi-fact query based on one conformed dimension (Time
Dimension) and two facts (Sales_Fact.Revenue and Sales Target.Sales Target). If the query
requires local processing, IBM Cognos includes a local relational engine that is able to process
local stitch queries efficiently.

Year1 is queried more than once. Repeated information comes from conformed dimensions used
in multi-fact queries. This is done to generate result sets that can be merge-sorted together in the
full outer join.

For each additional fact table that is included in a query, there will be another full outer join. If
you added another fact to the query, there would be another full outer join and another derived
table for the new fact introduced into the query.

32
What Is A Coalesce Function?
select
coalesce(D2.DAY_DATE,D3.DAY_DATE) as DAY_DATE,
coalesce(D2.ORDER_METHOD,D3.ORDER_METHOD) as ORDER_METHOD

33

What is a coalesce function?


A coalesce function:
• merges query items that exist on multiple sides of the query
• indicates that a query item is part of a conformed dimension

Anything included in the report as a column, filter, or prompt can be treated as a conformed
dimension if it is common to the fact items in the query. In the example below, the coalesce
function is used in a SQL statement to combine data from the Sales Fact and Returned Items Fact
tables into a single report set. Both tables contain the DAY_DATE and ORDER_METHOD columns,
which are treated as conformed dimensions. The Sales Fact table contains the SALE_TOTAL fact
item, while the Returned Items Fact table contains the RETURN_QUANTITY fact item.
The resulting report set includes both of these fact items.

Coalesce functions:
• return the first non-NULL value from a series of expressions. It returns NULL if the series
contains only NULL.
• create the results set, including nulls, but not necessarily by the shared dimension key. For
example, if two different products had the same product name, they would end up as a single
result on this report.

33
Non-conformed Dimensions In Generated SQL

34

Non-conformed dimensions in generated SQL


Non-conformed dimensions will not use a coalesce function, and only show up in the derived
table of the fact to which they are related.

If you are using what you expect to be a conformed dimension in a multi-fact query and no
coalesce function is generated, you should investigate your model to ensure that no query path
has been missed, or that the IBM Cognos query engine is not identifying the dimension as a fact
based on cardinality.

In the example, REASON_DESCRIPTION comes from the Return_Reason_Dimension table, which


is not conformed between the Sales Fact and Returns Fact tables. Because it is not conformed, it
will only be a part of the derived table query that is related to the Returned Items Fact table. By
observing the SQL, you can quickly determine a non-conformed dimension by the absence of the
coalesce function.

34
Why is there an XSUM?
IBM Cognos SQL uses windowed aggregates to improve readability:

IBM Cognos SQL:


XSUM(Sales_Fact.SALE_TOTAL for Time_Dimension.MONTH_KEY) as SALE_TOTAL

Native SQL:
sum("Sales_Fact"."SALE_TOTAL") AS "SALE_TOTAL"

Oracle uses sum and over syntax, as shown below:


select
"T0"."C0" "ORDERDATE", "T0"."C1" "ACTUALREVENUE", sum("T0"."C1")
over (order by "T0"."C0" asc rows unbounded preceding) "ACTUALREVENUE1"
From …

35

Why is there an XSUM?

XSUM in IBM Cognos SQL indicates a windowed aggregate, which indicates what value is being
aggregated and to what level(s):

• IBM Cognos SQL:

XSUM(Sales_Fact.SALE_TOTAL for Time_Dimension.MONTH_KEY) as SALE_TOTAL

• Native SQL:

sum("Sales_Fact"."SALE_TOTAL") AS "SALE_TOTAL"

The X in XSUM stands for extended, which indicates that the overall total for each row of a
particular grouping will be calculated and retrieved.

For database vendors who support SQL-OLAP aggregates such as Oracle and DB2, you can also
quickly identify to what levels facts are aggregated. For example, Oracle uses sum and over
syntax, as shown below:

select
"T0"."C0" "ORDERDATE", "T0"."C1" "ACTUALREVENUE", sum("T0"."C1") over (order by "T0"."C0"
asc rows unbounded preceding) "ACTUALREVENUE1"
From

35
Summary
• Calculations and Filters can be defined as embedded or separate
• Recommend embedded for easier maintainability
• Presentation View
• Importance of building as a star schema
• Conformed Dimensions being named the same across queries
• SQL in Framework Manager
• Best practice is to use Cognos SQL unless function does not
exist
• Generated SQL
• Stored procedures as a means to write reusable SQL
• Report queries using COALESCE, FULL OUTER JOIN and Table
aliases
36

36
Common Query Tasks in Power BI Desktop
Week 10 – Day 1

Spring 2023 - CST2205 - Data Modelling

1
Agenda
• Connect to data

• Shape and combine data

• Group rows

• Pivot columns

• Create custom columns

• Query formulas
2

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-common-query-
tasks

In the Power Query Editor window of Power BI Desktop, there are a handful of commonly used
tasks.

The common query tasks are:

• Connect to data
• Shape and combine data
• Group rows
• Pivot columns
• Create custom columns
• Query formulas

You can use a couple data connections to complete these tasks.

2
Connect to Data
• To connect to data in Power BI Desktop:
 Select Home

 Choose Get data.

Power BI Desktop presents a menu with


the most common data sources.

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-common-query-
tasks

3
Connect to Data
• Power Query Editor inspects the
data source and then presents
the data in the Navigator dialog
box after you select a table.

• Use Transform Data to edit,


adjust, or shape the data before
loading

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-common-query-
tasks

The example shows the selection of an Excel Workbook. Power Query Editor inspects the
workbook, then presents the data it found in the Navigator dialog box after you select a table.

Next, use Transform Data to edit, adjust, or shape, the data before you load it into Power BI
Desktop. Editing is especially useful when you work with large datasets that you want to pare down
before loading.

4
Connect to Data
• Connecting to different types of
data is a similar process.

• Example shows connecting to a


Web page

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-common-query-
tasks

Connecting to different types of data is a similar process. This example shows how to connect to a
Web data source (select Get data > More, and then choose Other > Web > Connect). When the
From Web dialog box appears, where you can type in the URL of the webpage.

Power BI Desktop inspects the webpage data and shows preview options in the Navigator dialog
box. When you select a table, it displays a preview of the data.

Other data connections are similar. Power BI Desktop prompts you for the appropriate credentials
if it needs you to authenticate your connection.

5
Shaping and Combining Data

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-common-query-
tasks

You can easily shape and combine data with Power Query Editor. When you shape data, you
transform a data source into the form and format that meets your needs.

In Power Query Editor, you can find many commands in the ribbon, and in context menus. For
example, when you right-click a column, the context menu lets you remove the column. Or select a
column and then choose the Remove Columns button from the Home tab in the ribbon.

You can shape the data in many other ways in this query. You can remove any number of rows from
the top or bottom. Or add columns, split columns, replace values, and do other shaping tasks. With
these features, you can direct Power Query Editor to get the data how you want it.

6
Group Rows
• Group By button in the
Transform tab or the
Home tab of the ribbon

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-common-query-
tasks

In Power Query Editor, you can group the values from many rows into a single value. This feature
can be useful when summarizing the number of products offered, the total sales, or the count of
students.

This example shows how many Agencies each state has. (Agencies can include school districts,
other education agencies such as regional service districts, and more.) Select the State Abbr
column, then select the Group By button in the Transform tab or the Home tab of the ribbon.
(Group By is available in both tabs.)

7
Group By Dialog Box
• Group By button in
the Transform tab or
the Home tab of the
ribbon

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-common-query-
tasks

When Power Query Editor groups rows, it creates a new column into which it places the Group By
results. You can adjust the Group By operation in the following ways:

1. The unlabeled dropdown list specifies the column to be grouped. Power Query Editor defaults
this value to the selected column, but you can change it to be any column in the table.
2. New column name: Power Query Editor suggests a name for the new column, based on the
operation it applies to the grouped column. You can name the new column anything you want,
though.
3. Operation: Choose the operation that Power Query Editor applies, such as Sum, Median, or
Count Distinct Rows. The default value is Count Rows.
4. Add grouping and Add aggregation: These buttons are available only if you select the
Advanced option. In a single operation, you can make grouping operations (Group By actions)
on many columns and create several aggregations by using these buttons. Based on your
selections in this dialog box, Power Query Editor creates a new column that operates on
multiple columns.

Select Add grouping or Add aggregation to add more groupings or aggregations to a Group By
operation. To remove a grouping or aggregation, select the ellipsis icon (...) to the right of the row,
and then Delete.

And with Power Query Editor, you can always remove the last shaping operation. In the Query
Settings pane, under Applied Steps, just select the X next to the step recently completed.

8
Pivot Columns

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-common-query-
tasks

You can pivot columns and create a table that contains aggregated values for each unique value in a
column. For example, to find out how many different products are in each product category, you
can quickly create a table to do that.

To create a new table that shows a count of products for each category (based on the
CategoryName column), select the column, then select Transform > Pivot Column.

9
Pivot Columns

10

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-common-query-
tasks

The Pivot Column dialog box lets you know which column’s values the operation uses to create new
columns. When you expand Advanced options, you can select which function to apply to the
aggregated values.

When you select OK, Power Query Editor displays the table according to the transform instructions
provided in the Pivot Column dialog box.

10
Create Custom Columns

11

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-common-query-
tasks

In Power Query Editor, you can create custom formulas that operate on multiple columns in your
table. Then you can place the results of such formulas into a new (custom) column. Power Query
Editor makes it easy to create custom columns.

With the Excel workbook data in Power Query Editor, use the Add Column tab on the ribbon, and
then select Custom Column. A dialog box appears. This example creates a custom column called
Percent ELL that calculates the percentage of total students that are English Language Learners
(ELL).

11
Query Formulas

12

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-common-query-
tasks

You can edit the steps that Power Query Editor generates. You can also create custom formulas,
which let you connect to and shape your data more precisely. Whenever Power Query Editor does
an action on data, the formula associated with the action is displayed in the formula bar. To view
the formula bar, go to the View tab of the ribbon, and then select Formula Bar.

Power Query Editor keeps all applied steps for each query as text that you can view or modify. You
can view or modify the text for any query by using the Advanced Editor. Just select View and then
Advanced Editor.

12
Query Formulas – Advanced Editor

13

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-common-query-
tasks

In the screenshot of the Advanced Editor, an example of the query steps associated with the
USA_StudentEnrollment query is displayed. These steps are created in the Power Query Formula
Language, often referred to as M. Power BI Desktop provides an extensive set of formula
categories.

13
References – Common Query Tasks
Data Sources in Power BI
https://learn.microsoft.com/en-us/power-bi/connect-data/desktop-data-sources

Connect to Data in Power BI


https://learn.microsoft.com/en-us/power-bi/connect-data/desktop-connect-to-data

Shape and Combine Data (Tutorial)


https://learn.microsoft.com/en-us/power-bi/connect-data/desktop-shape-and-combine-data

Power Query M Language Specification


https://learn.microsoft.com/en-us/powerquery-m/power-query-m-language-specification

Power Query M Function Reference


https://learn.microsoft.com/en-us/powerquery-m/power-query-m-function-reference
14

14
Modeling Relationships
in Power BI Desktop
Week 11 – Day 1

Spring 2023 - CST2205 - Data Modelling

1
Agenda
Rewind:

• Purpose of relationships

• Relationship properties

• DAX functions for modeling

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-relationships-
understand

In the Power Query Editor window of Power BI Desktop, there are a handful of commonly used
tasks.

The common query tasks are:

• Connect to data
• Shape and combine data
• Group rows
• Pivot columns
• Create custom columns
• Query formulas

You can use a couple data connections to complete these tasks.

2
Relationships – An example

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-relationships-
understand

A model relationship propagates filters applied on the column of one model table to a different
model table. Filters will propagate so long as there's a relationship path to follow, which can
involve propagation to multiple tables.

Relationship paths are deterministic, meaning that filters are always propagated in the same way
and without random variation. Relationships can, however, be disabled, or have filter context
modified by model calculations that use particular DAX functions.

Remember - Model relationships don't enforce data integrity.

In this example, the model consists of four tables: Category, Product, Year, and Sales. The
Category table relates to the Product table, and the Product table relates to the Sales table. The
Year table also relates to the Sales table. All relationships are one-to-many (the details of which
are described later in this article).

3
Relationships – An Example

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-relationships-
understand

A query, possibly generated by a Power BI card visual, requests the total sales quantity for sales
orders made for a single category, Cat-A, and for a single year, CY2018. It's why you can see filters
applied on the Category and Year tables.
• The filter on the Category table propagates to the Product table to isolate two products that
are assigned to the category Cat-A.
• Then the Product table filters propagate to the Sales table to isolate just two sales rows for
these products.

These two sales rows represent the sales of products assigned to category Cat-A. Their combined
quantity is 14 units.

At the same time, the Year table filter propagates to further filter the Sales table, resulting in just
the one sales row that is for products assigned to category Cat-A and that was ordered in year
CY2018. The quantity value returned by the query is 11 units. Note that when multiple filters are
applied to a table (like the Sales table in this example), it's always an AND operation, requiring
that all conditions must be true.

4
Star Schema Design Principles

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-relationships-
understand

It is recommended to apply star schema design principles to produce a model comprising


dimension and fact tables. It's common to set up Power BI to enforce rules that filter dimension
tables, allowing model relationships to efficiently propagate those filters to fact tables.

The image shows the model diagram of a sales analysis data model. It shows a star schema design
comprising a single fact table named Sales. The other four tables are dimension tables that
support the analysis of sales measures by date, state, region, and product.

To note, the model relationships connect all tables. These relationships propagate filters (directly
or indirectly) to the Sales table.

5
Disconnected Tables
• A table that isn’t related to another model table (still a valid
model)
• Not intended to propagate filters to other model tables

• Accepts "user input" allowing model calculations to use the input


value in a meaningful way.

• Power BI Desktop what-if parameter feature creates a


disconnected table.
6

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-relationships-
understand

It's unusual that a model table isn't related to another model table.

Such a table in a valid model design is described as a disconnected table. A disconnected table
isn't intended to propagate filters to other model tables. Instead, it accepts "user input" (perhaps
with a slicer visual), allowing model calculations to use the input value in a meaningful way.

The Power BI Desktop what-if parameter is a feature that creates a disconnected table.

6
Disconnected Tables
• Example:
• Disconnected table is loaded with a range of currency exchange
rate values.

• A filter is applied to filter by a single rate value


• A measure expression can use that value to convert sales values

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-relationships-
understand

For example, consider a disconnected table that's loaded with a range of currency exchange rate
values. As long as a filter is applied to filter by a single rate value, a measure expression can use
that value to convert sales values.

The Power BI Desktop what-if parameter is a feature that creates a disconnected table.

7
Relationship Properties
• A model relationship relates one column in a table to one
column in a different table.

• It's not possible to relate a column to a different column in the


same table.
• Model relationships are not used to generate a model hierarchy
based on, for example, employee reports to another employee
• Use parent-child functions
8

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-relationships-
understand

A model relationship relates one column in a table to one column in a different table. (There's
one specialized case where this requirement isn't true, and it applies only to multi-column
relationships in DirectQuery models. For more information, see the COMBINEVALUES DAX
function article.)

Note
It's not possible to relate a column to a different column in the same table. This concept is
sometimes confused with the ability to define a relational database foreign key constraint that's
table self-referencing. You can use this relational database concept to store parent-child
relationships (for example, each employee record is related to a "reports to" employee).

However, you can't use model relationships to generate a model hierarchy based on this type of
relationship. To create a parent-child hierarchy, see Parent and Child functions.

8
Data Types of Columns
• Data type for both the "from" and "to” column of the
relationship should be the same.
• Working with relationships defined on DateTime columns might
not behave as expected.

• The engine that stores Power BI data only uses DateTime data
types
• Date, Time and Date/Time/Timezone data types are Power BI
formatting constructs implemented on top.
9

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-relationships-
understand

The data type for both the "from" and "to' column of the relationship should be the same.

Working with relationships defined on DateTime columns might not behave as expected. The
engine that stores Power BI data, only uses DateTime data types; Date, Time and
Date/Time/Timezone data types are Power BI formatting constructs implemented on top. Any
model-dependent objects will still appear as DateTime in the engine (such as relationships,
groups, and so on).

As such, if a user selects Date from the Modeling tab for such columns, they still don't register as
being the same date, because the time portion of the data is still being considered by the engine.
Read more about how Date/time types are handled. To correct the behavior, the column data
types should be updated in the Power Query Editor to remove the Time portion from the
imported data, so when the engine is handling the data, the values will appear the same.

9
Cardinality
• Each model relationship is defined by a cardinality type -
unique values to the "many" side (column can contain
duplicate values)
• One-to-many (1:*) Note:
If a data refresh operation
• Many-to-one (*:1) attempts to load duplicate
values into a "one" side
• One-to-one (1:1) column, the entire data
refresh will fail.
• Many-to-many (*:*)
10

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-relationships-
understand

Each model relationship is defined by a cardinality type. There are four cardinality type options,
representing the data characteristics of the "from" and "to" related columns. The "one" side
means the column contains unique values; the "many" side means the column can contain
duplicate values.

Note
If a data refresh operation attempts to load duplicate values into a "one" side column, the
entire data refresh will fail.

The four options, together with their shorthand notations, are described in the following bulleted
list:

• One-to-many (1:*)
• Many-to-one (*:1)
• One-to-one (1:1)
• Many-to-many (*:*)

10
Cardinality
• In Power BI Desktop, the designer automatically detects and
sets the cardinality type.
• The model is queried to know which columns contain unique
values.

• Power BI Desktop can get it wrong (sometimes):


• Tables are yet to be loaded with data, or

• Columns that are expected to contain duplicate values currently

11 contain unique values.

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-relationships-
understand

When you create a relationship in Power BI Desktop, the designer automatically detects and sets
the cardinality type. Power BI Desktop queries the model to know which columns contain unique
values.

For import models, it uses internal storage statistics; for DirectQuery models it sends profiling
queries to the data source. Sometimes, however, Power BI Desktop can get it wrong. It can get it
wrong when tables are yet to be loaded with data, or because columns that you expect to contain
duplicate values currently contain unique values. In either case, you can update the cardinality
type as long as any "one" side columns contain unique values (or the table is yet to be loaded
with rows of data).

11
Cardinality

12

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-relationships-
understand

In Power BI Desktop model view, you can interpret a relationship's cardinality type by looking at
the indicators (1 or *) on either side of the relationship line. To determine which columns are
related, you'll need to select, or hover the cursor over, the relationship line to highlight the
columns.

The Many-to-many cardinality type isn't currently supported for models developed for Power BI
Report Server.

12
Cross Filter Direction
• Interpret a relationship's cross
filter direction by noticing the
arrowhead(s) along the
relationship line.
• Single arrowhead represents a
single-direction filter in the
direction of the arrowhead
• Double arrowhead represents a
bi-directional relationship.
13

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-relationships-
understand

In Power BI Desktop model view, you can interpret a relationship's cross filter direction by
noticing the arrowhead(s) along the relationship line. A single arrowhead represents a single-
direction filter in the direction of the arrowhead; a double arrowhead represents a bi-directional
relationship.

13
Cross Filter Direction
• Each model relationship is defined with a cross filter direction
• Your setting determines the direction(s) that filters will
propagate.
• Possible cross filter options are dependent on the cardinality type.
Cardinality Type Cross Filter Options
One-to-many (or Many-to-one) Single
Both
One-to-one Both
Many-to-many Single (Table1 to Table2)
Single (Table2 to Table1)
14 Both

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-relationships-
understand

Each model relationship is defined with a cross filter direction. Your setting determines the
direction(s) that filters will propagate. The possible cross filter options are dependent on the
cardinality type.

Single cross filter direction means "single direction", and Both means "both directions". A
relationship that filters in both directions is commonly described as bi-directional.

For one-to-many relationships, the cross filter direction is always from the "one" side, and
optionally from the "many" side (bi-directional). For one-to-one relationships, the cross filter
direction is always from both tables. Lastly, for many-to-many relationships, cross filter direction
can be from either one of the tables, or from both tables. Notice that when the cardinality type
includes a "one" side, that filters will always propagate from that side.

When the cross filter direction is set to Both, another property becomes available. It can apply bi-
directional filtering when Power BI enforces row-level security (RLS) rules. For more information
about RLS, see Row-level security (RLS) with Power BI Desktop.

You can modify the relationship cross filter direction, including the disabling of filter propagation,
by using a model calculation. It's achieved by using the CROSSFILTER DAX function.

Bear in mind that bi-directional relationships can impact negatively on performance. Further,
attempting to configure a bi-directional relationship could result in ambiguous filter propagation

14
paths. In this case, Power BI Desktop may fail to commit the relationship change and will alert you
with an error message. Sometimes, however, Power BI Desktop may allow you to define
ambiguous relationship paths between tables. Resolving relationship path ambiguity is described
later in this article.

We recommend using bi-directional filtering only as needed. For more information, see Bi-
directional relationship guidance.

14
Cross Filter Direction
• Single cross filter direction means "single direction", and
• Both means "both directions".
• A relationship that filters in both directions is commonly
described as bi-directional.
• You can modify the relationship cross filter direction, including the
disabling of filter propagation, by using a model calculation.

Recommendation: Use bi-directional filtering only as needed.


15

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-relationships-
understand

Single cross filter direction means "single direction", and Both means "both directions". A
relationship that filters in both directions is commonly described as bi-directional.

For one-to-many relationships, the cross filter direction is always from the "one" side, and
optionally from the "many" side (bi-directional). For one-to-one relationships, the cross filter
direction is always from both tables. Lastly, for many-to-many relationships, cross filter direction
can be from either one of the tables, or from both tables. Notice that when the cardinality type
includes a "one" side, that filters will always propagate from that side.

When the cross filter direction is set to Both, another property becomes available. It can apply bi-
directional filtering when Power BI enforces row-level security (RLS) rules. For more information
about RLS, see Row-level security (RLS) with Power BI Desktop.

You can modify the relationship cross filter direction, including the disabling of filter propagation,
by using a model calculation. It's achieved by using the CROSSFILTER DAX function.

Bear in mind that bi-directional relationships can impact negatively on performance. Further,
attempting to configure a bi-directional relationship could result in ambiguous filter propagation
paths. In this case, Power BI Desktop may fail to commit the relationship change and will alert you
with an error message. Sometimes, however, Power BI Desktop may allow you to define
ambiguous relationship paths between tables. Resolving relationship path ambiguity is described
later in this article.

15
We recommend using bi-directional filtering only as needed. For more information, see Bi-
directional relationship guidance.

15
Cross Filter Direction
• Single cross filter direction means "single direction", and
• Both means "both directions".
• A relationship that filters in both directions is commonly
described as bi-directional.
• You can modify the relationship cross filter direction, including the
disabling of filter propagation, by using a model calculation.

Recommendation: Use bi-directional filtering only as needed.


16

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-relationships-
understand

Single cross filter direction means "single direction", and Both means "both directions". A
relationship that filters in both directions is commonly described as bi-directional.

For one-to-many relationships, the cross filter direction is always from the "one" side, and
optionally from the "many" side (bi-directional). For one-to-one relationships, the cross filter
direction is always from both tables. Lastly, for many-to-many relationships, cross filter direction
can be from either one of the tables, or from both tables. Notice that when the cardinality type
includes a "one" side, that filters will always propagate from that side.

When the cross filter direction is set to Both, another property becomes available. It can apply bi-
directional filtering when Power BI enforces row-level security (RLS) rules. For more information
about RLS, see Row-level security (RLS) with Power BI Desktop.

You can modify the relationship cross filter direction, including the disabling of filter propagation,
by using a model calculation. It's achieved by using the CROSSFILTER DAX function.

Bear in mind that bi-directional relationships can impact negatively on performance. Further,
attempting to configure a bi-directional relationship could result in ambiguous filter propagation
paths. In this case, Power BI Desktop may fail to commit the relationship change and will alert you
with an error message. Sometimes, however, Power BI Desktop may allow you to define
ambiguous relationship paths between tables. Resolving relationship path ambiguity is described
later in this article.

16
We recommend using bi-directional filtering only as needed. For more information, see Bi-
directional relationship guidance.

16
Relevant DAX Functions
DAX functions relevant to model relationships include:
• RELATED
• RELATEDTABLE
• USERELATIONSHIP
• CROSSFILTER
• COMBINEVALUES
• TREATAS
• Parent and Child functions
17

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-relationships-
understand

There are several DAX functions that are relevant to model relationships. Each function is
described briefly in the following bulleted list:

RELATED: Retrieves the value from "one" side of a relationship. It's useful when involving
calculations from different tables that are evaluated in row context.

RELATEDTABLE: Retrieve a table of rows from "many" side of a relationship.

USERELATIONSHIP: Allows a calculation to use an inactive relationship. (Technically, this function


modifies the weight of a specific inactive model relationship helping to influence its use.) It's
useful when your model includes a role-playing dimension table, and you choose to create
inactive relationships from this table. You can also use this function to resolve ambiguity in filter
paths.

CROSSFILTER: Modifies the relationship cross filter direction (to one or both), or it disables filter
propagation (none). It's useful when you need to change or ignore model relationships during the
evaluation of a specific calculation.

COMBINEVALUES: Joins two or more text strings into one text string. The purpose of this function
is to support multi-column relationships in DirectQuery models when tables belong to the same
source group.

17
TREATAS: Applies the result of a table expression as filters to columns from an unrelated table.
It's helpful in advanced scenarios when you want to create a virtual relationship during the
evaluation of a specific calculation.

Parent and Child functions: A family of related functions that you can use to generate calculated
columns to naturalize a parent-child hierarchy. You can then use these columns to create a fixed-
level hierarchy.

17
References – Common Query Tasks
Start Schema Design Principles
https://learn.microsoft.com/en-us/power-bi/guidance/star-schema

Create and use parameters to visualize variables in Power BI Desktop


https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-what-if

Parent and Child functions


https://learn.microsoft.com/en-us/dax/parent-and-child-functions-dax

Data types in Power BI Desktop


https://learn.microsoft.com/en-us/power-bi/connect-data/desktop-data-types#datetime-types

Bi-directional Relationship Guidance


https://learn.microsoft.com/en-us/power-bi/guidance/relationships-bidirectional-filtering

18

18
Filter Context, Time Intelligence,
Calculation Groups in Power BI Desktop
Week 12 – Day 1

Spring 2023 - CST2205 - Data Modelling

1
Filter Context

Source: https://learn.microsoft.com/en-us/training/modules/dax-power-bi-modify-filter/1-
introduction

2
What is Filter Context?
• Filters applied during the evaluation of a measure or
measure expression

• Can be applied
• directly to columns
• ie. on Fiscal Year in the Date table for the value FY2020.

• Indirectly - model relationships propagate filters


• Ie. Sales table receives filter via relationship with the Date

3 table, filtering the Sales table rows by date column

Source: https://learn.microsoft.com/en-us/training/modules/dax-power-bi-modify-filter/1-
introduction

Filter context describes the filters that are applied during the evaluation of a measure or measure
expression. Filters can be applied directly to columns, like a filter on the Fiscal Year column in the
Date table for the value FY2020. Additionally, filters can be applied indirectly, which happens
when model relationships propagate filters to other tables. For example, the Sales table receives
a filter through its relationship with the Date table, filtering the Sales table rows to those with an
OrderDateKey column value in FY2020.

3
Filter Context – NOTE!
• Calculated tables and calculated columns aren't
evaluated within filter context.

• Calculated columns are evaluated in row context


• the formula can transition the row context to filter context,
if it needs to summarize model data.

Source: https://learn.microsoft.com/en-us/training/modules/dax-power-bi-modify-filter/1-
introduction

Calculated tables and calculated columns aren't evaluated within filter context. Calculated
columns are evaluated in row context, though the formula can transition the row context to filter
context, if it needs to summarize model data.

4
Filter Context – Slicer/Implied Filter

Source: https://learn.microsoft.com/en-us/training/modules/dax-power-bi-modify-filter/1-
introduction

At report design time, filters are applied in the Filters pane or to report visuals. The slicer visual is
an example of a visual whose only purpose is to filter the report page (and other pages when it's
configured as a synced slicer).

Report visuals, which perform grouping, also apply filters. They are implied filters; the difference
is that the filter result is visible in the visual. For example, a stacked column chart visual can filter
by fiscal year FY2020, group by month, and summarize sales amount. The fiscal year filter isn't
visible in the visual result, yet the grouping, which results in a column for each month, behaves as
a filter.

5
Filter Context – Slicer/Implied Filter
• Filters can be added when a report user interacts with the report.
• They can modify filter settings in the Filters pane, and
• They can cross-filter or cross-highlight visuals by selecting visual
elements like columns, bars, or pie chart segments.
• These interactions apply additional filters to report page visuals
(unless interactions have been disabled).
• By understanding filter context, defining the correct formula for your
calculations will achieve the desired results
6

Source: https://learn.microsoft.com/en-us/training/modules/dax-power-bi-modify-filter/1-
introduction

Not all filters are applied at report design time. Filters can be added when a report user interacts
with the report. They can modify filter settings in the Filters pane, and they can cross-filter or
cross-highlight visuals by selecting visual elements like columns, bars, or pie chart segments.
These interactions apply additional filters to report page visuals (unless interactions have been
disabled).

It's important to understand how filter context works. It guides you in defining the correct
formula for your calculations. As you write more complex formulas, you'll identify times when you
need to add, modify, or remove filters to achieve the desired result.

6
Filter Context – Example

Source: https://learn.microsoft.com/en-us/training/modules/dax-power-bi-modify-filter/1-
introduction

Here is an example that requires your formula to modify the filter context. Your objective is to
produce a report visual that shows each sales region together with its revenue and revenue as a
percentage of total revenue.

The Revenue % Total Region result is achieved by defining a measure expression that's the ratio
of revenue divided by revenue for all regions. Therefore, for Australia, the ratio is 10,655,335.96
dollars divided by 109,809,274.20 dollars, which is 9.7 percent.

The numerator expression doesn't need to modify filter context; it should use the current filter
context (a visual that groups by region applies a filter for that region). The denominator
expression, however, needs to remove any region filters to achieve the result for all regions.

7
Total_Sales_YTD_LY__DATEADD = CALCULATE([Total_Sales_YTD],DATEADD(DATE_TABLE[Date],-1, YEAR))

Summary – Filter Context

The key to writing complex measures is mastering these concepts:


1. Understanding how filter context works.
2. Understanding when and how to modify or remove filters to
achieve a required result.
3. Composing a formula to accurately and efficiently modify filter
context.

Source: https://learn.microsoft.com/en-us/training/modules/dax-power-bi-modify-filter/1-
introduction

The key to writing complex measures is mastering these concepts:

1. Understanding how filter context works.


2. Understanding when and how to modify or remove filters to achieve a required result.
3. Composing a formula to accurately and efficiently modify filter context.

Mastering these concepts takes practice and time. Rarely will students understand the concepts
from the beginning of training. Therefore, be patient and persevere with the theory and activities.
We recommend that you repeat this module at a later time to help reinforce key lessons.

8
Time Intelligence

9
Introducing Time Intelligence
• It is simple to work with time intelligence (in Power BI)
• It allows you to do a wide range of complex time-based
reporting.

Requirements:

• You need a date table

10

Source: https://towardsdatascience.com/power-bi-working-with-time-intelligence-
3496d288bb61

Power BI makes it simple to work with time intelligence. Once you understand the concepts, you
will be able to do a wide range of complex time-based reporting.

To make time intelligence work in Power BI, you need a date table. It definitely will not work
without it.

10
Steps to Use Time Intelligence
Steps to use time intelligence functions

1. Build a date table

2. Build a generic measure

3. Add a time intelligence function to the generic measure

11

Source: https://towardsdatascience.com/power-bi-working-with-time-intelligence-
3496d288bb61

11
Steps to Use Time Intelligence
Steps to use time intelligence functions:

1. Build a date table

2. Build a generic measure

3. Add a time intelligence function to the generic measure

12

Source: https://towardsdatascience.com/power-bi-working-with-time-intelligence-
3496d288bb61

12
Step 1. TI – Build a Date Table

DATE_TABLE = CALENDAR(DATE(2019,01,01),DATE(2022,01,01))

• All dates from Jan 1st 2019 to Jan 1st 2022 generated
• time intelligence functions will not work with
missing dates.

13

Source: https://towardsdatascience.com/power-bi-working-with-time-intelligence-
3496d288bb61

13
Step 1. TI – Build a Date Table

DATE_TABLE =
CALENDAR(DATE(2019,01,01),
DATE(2022,01,01))

.. and alternatively add a


Month_Year column:
Month_Year =
FORMAT(DATE_TABLE[Date],
“MMM-YYYY”)

14

Source: https://towardsdatascience.com/power-bi-working-with-time-intelligence-
3496d288bb61

You can experiment with the FORMAT function as well. Power BI provides a lot of different
formatting styles for the date columns.

14
Step 1. TI – Build a Date Table

15

Source: https://towardsdatascience.com/power-bi-working-with-time-intelligence-
3496d288bb61

The image shows how your model can look with the DATE_TABLE added. The date column of the
DATE_TABLE is joined against the FACT_TABLE using dates. This is a 1:Many relationship.

15
Step 2. TI – Build a Generic Measure
SALES_MEASURE = SUM(FACT_TABLE[TOTALS])

16

Source: https://towardsdatascience.com/power-bi-working-with-time-intelligence-
3496d288bb61

The measure you build is important. Always remember what you are trying to answer. In the
example is a generic sum of sales without any filters.

16
Step 3. TI – Add Time Intelligence
Function to the Measure
SAMEPERIODLASTYEAR_Measure =
CALCULATE([Sales_Measure] ,
SAMEPERIODLASTYEAR(’Date_Table’
[Date]))

This shows how much the sales


were in the same period of last year.

17

Source: https://towardsdatascience.com/power-bi-working-with-time-intelligence-
3496d288bb61

The sales for Apr 2020 were 509 and in April 2021, it was 4444. The SAMEPERIODLASTYEAR
measure pulls up the sales from last year so you can make a simple side to side comparison
without much code.

17
Step 3. TI – Add Time Intelligence
Function to the Measure – More…

18

Source: https://towardsdatascience.com/power-bi-working-with-time-intelligence-
3496d288bb61

You can replace the previous time intelligence function with a new one to create a new measure
that measures something different. If you want to swap in cumulative totals by month and
cumulative totals by year - swapping out the time intelligence measure for another one - the first
one is Month-to-Date and the other one is Year -to-Date. Here you can see the cumulative sum by
month and year.

18
Step 3. TI – Add Time Intelligence
Function to the Measure – More…
Total_Sales_YTD =
CALCULATE([Total_Ses_YTD], SAMEPERIODLASTYEAR (DATE_TABLE[Date]))

19

Source: https://towardsdatascience.com/power-bi-working-with-time-intelligence-
3496d288bb61

What if you want the total year-to-date sales of this year but also compare it to the same period
last year - you want to compare yearly cumulative totals using 2020 with 2019.

Here since there is already a total by year measure, all that is needed is to add to the measure
and the visual displays the last year’s running total.

19
Total_Sales_YTD_LY__DATEADD = CALCULATE([Total_Sales_YTD],DATEADD(DATE_TABLE[Date],-1, YEAR))

Step 3. TI – Add Time Intelligence


Function to the Measure – More…
Total_Sales_YTD_LY__DATEADD =
CALCULATE([Total_Sales_YTD],DATEADD(DATE_TABLE[Date],-1, YEAR))

20

Source: https://towardsdatascience.com/power-bi-working-with-time-intelligence-
3496d288bb61

One very under rated function is the DATEADD. It allows you go in any directions in time as long
as you know what durations you want to travel in.

DATEADD allows you to move in years, months, or days; the only difference is that you must
compute the duration rather than calling a generic function like PREVIOUSYEAR or
SAMEPERIODLASTYEAR. Just be careful not to enter a wrong interval.

20
Total_Sales_YTD_LY__DATEADD = CALCULATE([Total_Sales_YTD],DATEADD(DATE_TABLE[Date],-1, YEAR))

Summary – Adding Time Intelligence

Steps to use time intelligence functions:


1. Build a date table
2. Build a generic measure
3. Add a time intelligence function to the generic measure

21

Source: https://towardsdatascience.com/power-bi-working-with-time-intelligence-
3496d288bb61

The steps to use the time intelligence functions in Power BI.

1. Build a date table, use it as a dimension/filtering/theme table, it’s important. All of these time
intelligence function will not work if there isn’t a date table.
2. Build a generic measure, give it some thought, see if the result of this measure answers the
business question you have.
3. Add a time intelligence function to the generic measure, for example swap out DATEYTD with
DATEQTD will give you different results of course, but the pattern is the same.

21
Time Intelligence - Functions
CLOSINGBALANCEMONTH FIRSTDATE PREVIOUSDAY
CLOSINGBALANCEQUARTER FIRSTNONBLANK PREVIOUSMONTH
CLOSINGBALANCEYEAR LASTDATE PREVIOUSQUARTER
DATEADD LASTNONBLANK PREVIOUSYEAR
DATESBETWEEN NEXTDAY SAMEPERIODLASTYEAR
DATESINPERIOD NEXTMONTH STARTOFMONTH
DATESMTD NEXTQUARTER STARTOFQUARTER
DATESQTD NEXTYEAR STARTOFYEAR
DATESYTD OPENINGBALANCEMONTH TOTALMTD
ENDOFMONTH OPENINGBALANCEQUARTER TOTALQTD
ENDOFQUARTER OPENINGBALANCEYEAR TOTALYTD
ENDOFYEAR PARALLELPERIOD

22 Source: https://learn.microsoft.com/en-us/dax/time-intelligence-functions-dax

Source: https://learn.microsoft.com/en-us/dax/time-intelligence-functions-dax

Data Analysis Expressions (DAX) includes time-intelligence functions that enable you to
manipulate data using time periods, including days, months, quarters, and years, and then build
and compare calculations over those periods. Before using any time-intelligence functions, make
sure to mark one of the tables containing date column as Date Table.

22
Calculation Groups

23

Source: https://learn.microsoft.com/en-us/training/modules/dax-power-bi-calculation-groups/

Occasionally, you might need to add many similar measures to your model. For example, consider
that your model includes measures for sales, cost, and profit. You then want to create a report
that shows year-to-date (YTD) sales, YTD cost, and YTD profit, in addition to prior year (PY) sales,
PY cost, and PY profit. Adding numerous measures can be time-consuming and can clutter the
Fields pane with an overwhelming number of fields. Instead of creating each YTD and PY
measure, you can quickly add these measures to your model by creating a Data Analysis
Expressions (DAX) calculation group.

23
What are Calculation Groups?
• Special type of calculated table

• Allows developer to quickly create many similar


measures

• Helps declutter the Fields pane.

• Helpful when creating time intelligence calculations,


such as: Prior year (PY), Year-over-year (YoY), Year-over-year
percentage (YoY%), Year-to-date (YTD)
24

Source: https://learn.microsoft.com/en-us/training/modules/dax-power-bi-calculation-
groups/introduction

A calculation group is a special type of calculated table. The rows that you add to the table are
known as calculation items. Calculation items are reusable calculations that Microsoft Power BI
can apply to any measure.

Calculation groups allow you to quickly create many similar measures, and they can help declutter
the Fields pane.

You'll likely find that calculation groups are helpful when you need to create time intelligence
calculations. Time intelligence means modifying the filter context for date filters to achieve
calculations, such as:

• Prior year (PY)


• Year-over-year (YoY)
• Year-over-year percentage (YoY%)
• Year-to-date (YTD)

For example, you want to create these four time intelligence calculations for each of the six sales
measures: gross sales, net sales, cost, gross profit, net profit, and quantity. Without using
calculation groups, that process would involve creating 24 (6 x 4) separate measures.

24
Using Calculation Groups

25

Source: https://learn.microsoft.com/en-us/training/modules/dax-power-bi-calculation-
groups/introduction

From a report authoring perspective, a calculation group is available in the Fields pane, and it
looks like a regular a table. When you add the calculation group field to a visual, the visual will
group by the calculation items.

For example, the example matrix visual shows monthly sales for fiscal year 2022. A calculation
group provides three different perspectives, including PY (prior year), YoY (year-over-year), and
YoY percent.

25
Working with Calculation Group
Functions
Four special DAX functions are available to use when
defining calculation items:
• SELECTEDMEASURE

• SELECTEDMEASURENAME

• SELECTEDMEASUREFORMATSTRING

• ISSELECTEDMEAUSURE

26

Source: https://learn.microsoft.com/en-us/training/modules/dax-power-bi-calculation-
groups/introduction

Four special DAX functions are available for you to use when defining calculation items:

• SELECTEDMEASURE - Returns a reference to the measure that's currently in context when the
calculation item is evaluated. This function doesn't take any parameters.
• SELECTEDMEASURENAME - Returns a string value of the name of the measure that's currently
in context when the calculation item is evaluated. This function doesn't take any parameters.
• SELECTEDMEASUREFORMATSTRING - Returns a string value of the format string of the
measure that's currently in context when the calculation item is evaluated. This function
doesn't take any parameters.
• ISSELECTEDMEAUSURE - Returns a Boolean value that indicates whether the measure that's
currently in context is one of those specified in the list of parameters. You can pass in one or
more measures.

26
References Links
Time intelligence functions
https://learn.microsoft.com/en-us/dax/time-intelligence-functions-dax

Dates in Power BI - Working with Dates — can be somewhat fun


https://towardsdatascience.com/dates-in-power-bi-ada30f85e4b3

Power BI modelling - Some tips. Wish I had known them before


https://towardsdatascience.com/power-bi-modelling-bcd4431f49f9

27

27
Performance Optimization in Power BI
Desktop
Week 12 – Day 1

Spring 2023 - CST2205 - Data Modelling

1
Performance optimization, also known as
performance tuning, involves making changes to
the current state of the data model so that it runs
more efficiently

Source: https://learn.microsoft.com/en-us/training/modules/optimize-model-power-bi/1-
introduction

2
Introducing Performance Optimization
• Reports run well in test and development environments

• Production deployments => performance issues arise

• Report user perspective, poor performance is characterized by:

• Report pages that take longer to load

• Visuals taking more time to update

• Negative user experience

Source: https://learn.microsoft.com/en-us/training/modules/optimize-model-power-bi/1-
introduction

Reports may run well in test and development environments, but when deployed to production
for broader consumption, performance issues arise.

From a report user's perspective, poor performance is characterized by report pages that take
longer to load and visuals taking more time to update. This poor performance results in a
negative user experience.

3
Data Analysts Focus
• Approximately 90 percent of time working data

• 9 out of 10 times, poor performance is a direct result of:


• A bad data model

• Bad data analysis expressions (DAX), or

• The mix of the two

• Process of designing for performance can be tedious,


and it is often underestimated
4

Source: https://learn.microsoft.com/en-us/training/modules/optimize-model-power-bi/1-
introduction

As a data analyst, you spend approximately 90 percent of your time working with your data, and
nine times out of ten, poor performance is a direct result of a bad data model, bad Data Analysis
Expressions (DAX), or the mix of the two.

The process of designing a data model for performance can be tedious, and it is often
underestimated. However, if you address performance issues during development, you will have
a robust Power BI data model that will return better reporting performance and a more positive
user experience.

4
Focus on Performance at Development
• Address performance issues during development:
• A robust power BI data model

• Better reporting performance, and

• More positive user experience

• Maintain as organization grows, the size of its data grows,


and its data model becomes more complex.

• Optimize early, mitigate the negative impacts


5

Source: https://learn.microsoft.com/en-us/training/modules/optimize-model-power-bi/1-
introduction

However, if you address performance issues during development, you will have a robust Power BI
data model that will return better reporting performance and a more positive user experience.

Ultimately, you will also be able to maintain optimized performance. As your organization grows,
the size of its data grows, and its data model becomes more complex. By optimizing your data
model early, you can mitigate the negative impact that this growth might have on the
performance of your data model.

5
Focus on Performance at Development
• Address performance issues during development:
• A robust power BI data model

• Better reporting performance, and

• More positive user experience

• Maintain as organization grows, the size of its data grows,


and its data model becomes more complex.

• Optimize early, mitigate the negative impacts


6

Source: https://learn.microsoft.com/en-us/training/modules/optimize-model-power-bi/1-
introduction

However, if you address performance issues during development, you will have a robust Power BI
data model that will return better reporting performance and a more positive user experience.

Ultimately, you will also be able to maintain optimized performance. As your organization grows,
the size of its data grows, and its data model becomes more complex. By optimizing your data
model early, you can mitigate the negative impact that this growth might have on the
performance of your data model.

6
Data Model Size

Source: https://learn.microsoft.com/en-us/training/modules/optimize-model-power-bi/1-
introduction

A smaller sized data model uses less resources (memory) and achieves faster data refresh,
calculations, and rendering of visuals in reports. Therefore, the performance optimization process
involves minimizing the size of the data model and making the most efficient use of the data in
the model, which includes:

• Ensuring that the correct data types are used.


• Deleting unnecessary columns and rows.
• Avoiding repeated values.
• Replacing numeric columns with measures.
• Reducing cardinalities.
• Analyzing model metadata.
• Summarizing data where possible.

7
References Links
Performance Analyzer - Examine Report Element Performance in Power BI Desktop
https://learn.microsoft.com/en-us/power-bi/create-reports/desktop-performance-analyzer

Data reduction techniques for Import modeling


https://learn.microsoft.com/en-us/power-bi/guidance/import-modeling-data-reduction

Apply auto date/time in Power BI Desktop


https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-auto-date-time

8
References Links - More
DirectQuery in Power BI
https://learn.microsoft.com/en-us/power-bi/connect-data/desktop-directquery-
about#implications-of-using-directquery

DirectQuery model guidance in Power BI Desktop


https://learn.microsoft.com/en-us/power-bi/guidance/directquery-model-guidance

Automatic aggregations
https://learn.microsoft.com/en-us/power-bi/enterprise/aggregations-auto

9
Dataflows in PowerBI
Week 13 – Day 1

Spring 2023 - CST2205 - Data Modelling

1
As data volume continues to grow, so does the
challenge of wrangling that data into well-formed,
actionable information.

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/dataflows/dataflows-
introduction-self-service

2
Dataflows

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/dataflows/dataflows-
introduction-self-service

As data volume continues to grow, so does the challenge of wrangling that data into well-formed,
actionable information. We want data that’s ready for analytics, to populate visuals, reports, and
dashboards, so we can quickly turn our volumes of data into actionable insights. With self-service
data prep for big data in Power BI, you can go from data to Power BI insights with just a few
actions.

3
When to Use Dataflows
• Dataflows support the following scenarios:
• Create reusable transformation logic that is shared by many
datasets and reports

• Create a single source of truth

• Encourage uptake - removing analysts' access to underlying data


sources.

• Strengthen security by exposing data in dataflows.

• Work with large data volumes and perform ETL at scale

Source: https://learn.microsoft.com/en-us/training/modules/optimize-model-power-bi/1-
introduction

Dataflows are designed to support the following scenarios:

• Create reusable transformation logic that can be shared by many datasets and reports inside
Power BI. Dataflows promote reusability of underlying data elements, preventing the need to
create separate connections with your cloud or on-premises data sources.

• Create a single source of truth, curated from raw data using industry standard definitions,
which can work with other services and products in the Power Platform. Encourage uptake by
removing analysts' access to underlying data sources.

• Strengthen security around underlying data sources by exposing data to report creators in
dataflows. This approach allows you to limit access to underlying data sources, reducing the
load on source systems, and gives administrators finer control over data refresh operations.

• If you want to work with large data volumes and perform ETL at scale, dataflows with Power BI
Premium scales more efficiently and gives you more flexibility. Dataflows supports a wide
range of cloud and on-premises sources.

You can use Power BI Desktop and the Power BI service with dataflows to create datasets,
reports, dashboards, and apps that use the Common Data Model. From these resources, you can
gain deep insights into your business activities. Dataflow refresh scheduling is managed directly
from the workspace in which your dataflow was created, just like your datasets.

4
When to Use Dataflows
• Use with Power BI Desktop/ Power BI service
• Create datasets, reports, dashboards, and apps that use
the Common Data Model.

• Dataflow refresh scheduling is managed directly from the


workspace in which your dataflow was created

Source: https://learn.microsoft.com/en-us/training/modules/optimize-model-power-bi/1-
introduction

You can use Power BI Desktop and the Power BI service with dataflows to create datasets,
reports, dashboards, and apps that use the Common Data Model. From these resources, you can
gain deep insights into your business activities. Dataflow refresh scheduling is managed directly
from the workspace in which your dataflow was created, just like your datasets.

5
Creating a Data Flow

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/dataflows/dataflows-
create

6
Creating a Dataflow
Dataflow
A collection of tables that are created and managed in
workspaces in the Power BI service.

Table
A set of columns that are used to store data, much like a
table within a database.

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/dataflows/dataflows-
create

A dataflow is a collection of tables that are created and managed in workspaces in the Power BI
service. A table is a set of columns that are used to store data, much like a table within a
database.

You can add and edit tables in your dataflow, and manage data refresh schedules, directly from
the workspace in which your dataflow was created.

7
Creating a Dataflow

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/dataflows/dataflows-
create

To create a dataflow, launch the Power BI service in a browser then select a workspace (dataflows
aren't available in my-workspace in the Power BI service) from the navigation pane. You can also
create a new workspace in which to create your new dataflow.

8
Creating a Dataflow
• There are multiple ways to create or build on top of a
new dataflow by:
• Define new tables

• Using linked tables

• Using a computed table

• Using a CDM folder

• Using import/export
9

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/dataflows/dataflows-
create

There are multiple ways to create or build on top of a new dataflow:

• Create a dataflow by using define new tables


• Create a dataflow by using linked tables
• Create a dataflow by using a computed table
• Create a dataflow by using a CDM folder
• Create a dataflow by using import/export

9
Creating Dataflow - Define New Tables

10

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/dataflows/dataflows-
create

Using the Define new tables option lets you define a new table and connect to a new data source.
When you select a data source, you're prompted to provide the connection settings, including the account to use
when connecting to the data source.

Once connected, you can select which data to use for your table. When you choose data and a
source, Power BI reconnects to the data source. The reconnection keeps the data in your
dataflow refreshed at the frequency that you select later in the setup process.

After you select the data for use in the table, you can use dataflow editor to shape or transform
that data into the format necessary for use in your dataflow.

10
Creating Dataflow – Using Linked Tables
• Enables referencing an existing table, defined in another
dataflow, in a read-only fashion.
• To reuse a table across multiple dataflows (date table, a
static lookup table)

• To avoid creating multiple refreshes to a data source;


• allows others to use that table, reducing the load to the
underlying data source.

• To perform a merge between two tables.


11
Linked tables are available only with Power BI Premium.

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/dataflows/dataflows-
create

Creating a dataflow by using linked tables enables you to reference an existing table, defined in
another dataflow, in a read-only fashion. The following list describes some of the reasons you
might choose this approach:

• If you want to reuse a table across multiple dataflows, such as a date table or a static lookup
table, you should create a table once and then reference it across the other dataflows.

• If you want to avoid creating multiple refreshes to a data source, it's better to use linked
tables to store the data and act as a cache. Doing so allows every subsequent consumer to use
that table, reducing the load to the underlying data source.

• If you need to perform a merge between two tables.

11
Creating Dataflow – Using Computed Tables
• Allows referencing to a linked table to perform
operations on top of it in a write-only fashion.

• Results in a new table that is part of the dataflow.

• Two ways to convert:


• Create a new query from a merge operation

• To edit or transform the table, create a reference or


duplicate of the table.
12
Linked tables are available only with Power BI Premium.

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/dataflows/dataflows-
create

Creating a dataflow by using a computed table allows you to reference a linked table and perform
operations on top of it in a write-only fashion. The result is a new table, which is part of the
dataflow. There are two ways to convert a linked table into a computed table. You can create a
new query from a merge operation. Or if you want to edit or transform the table, you can create
a reference or duplicate of the table.

12
Creating Dataflow – Create Computed Tables
1. Go to Edit tables.
2. Select the table you want to use as the basis for your
computed table and on which you want to perform
calculations.
3. In the context menu, choose Reference.
4. For the table to be eligible as a computed table, the
Enable load selection must be checked,
5. Right-click on the table to display this context menu.

context menu

13
Linked tables are available only with Power BI Premium.

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/dataflows/dataflows-
create

To create computed tables, after you have a dataflow with a list of tables, you can perform
calculations on those tables.

By selecting Enable load, you create a new table for which its source is the referenced table. The
icon changes, and shows the computed icon, as shown in the following image.

Any transformation you perform on this newly created table is run on the data that already
resides in Power BI dataflow storage. That means that the query won't run against the external
data source from which the data was imported, like the data pulled from the SQL database.
Instead the query is performed on the data that resides in the dataflow storage.

13
Creating Dataflow – Using a CDM Folder
• Allows you to reference a table that has been written by
another application in the Common Data Model (CDM)
format.

• You're prompted to provide the complete path to the


CDM format file stored in Azure Data Lake Storage
(ADLS) Gen 2

14

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/dataflows/dataflows-
create

Creating a dataflow from a CDM folder allows you to reference a table that has been written by
another application in the Common Data Model (CDM) format. You're prompted to provide the
complete path to the CDM format file stored in ADLS Gen 2.

There are a few requirements for creating dataflows from CDM folders, as the following list
describes:
• The ADLS Gen 2 account must have the appropriate permissions set up in order for PBI to
access the file.
• The ADLS Gen 2 account must be accessible by the user trying to create the dataflow.
• The URL must be a direct file path to the JSON file and use the ADLS Gen 2 endpoint; blob.core
isn't supported (example:
https://myaccount.dfs.core.windows.net/filesystem/path/model.json)

14
Creating Dataflow – Using Import/Export
• Lets you import a dataflow from a file

• Useful if you want to:


• Save a dataflow copy offline, or

• Move a dataflow from one workspace to another.

• When importing the targeted file, Power BI creates the


dataflow for you

• Optionally perform other transformations.


15

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/dataflows/dataflows-
create

Creating a dataflow by using import/export lets you import a dataflow from a file. This tool is
useful if you want to save a dataflow copy offline, or move a dataflow from one workspace to
another.

To import a dataflow, select the import box and upload the file. Power BI creates the dataflow for
you, and allows you to save the dataflow as is, or to perform other transformations.

15
Composite Models

16

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-composite-
models

16
Composite Models
• Reports can include data connections from more than
one DirectQuery or import data connection, in any
combination

• Capability consists of three related features:


• Composite models

• Many-to-many relationships

• Storage mode
17

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-composite-
models

Previously in Power BI Desktop, when you used a DirectQuery in a report, no other data
connections, whether DirectQuery or import, were allowed for that report. With composite
models, that restriction is removed. A report can seamlessly include data connections from more
than one DirectQuery or import data connection, in any combination you choose.

The composite models capability in Power BI Desktop consists of three related features:

• Composite models: Allows a report to have two or more data connections from different
source groups. These source groups can be one or more DirectQuery connections and an
import connection, two or more DirectQuery connections, or any combination thereof. This
article describes composite models in detail.

• Many-to-many relationships: With composite models, you can establish many-to-many


relationships between tables. This approach removes requirements for unique values in
tables. It also removes previous workarounds, such as introducing new tables only to establish
relationships. For more information, see Apply many-many relationships in Power BI Desktop.

• Storage mode: You can now specify which visuals query back-end data sources. This feature
helps improve performance and reduce back-end load. Previously, even simple visuals, such as
slicers, initiated queries to back-end sources. For more information, see Manage storage mode
in Power BI Desktop.

17
Using Composite Models
• Connect to different kinds of data by:
• Importing data to Power BI (most common way)

• Connecting directly to data in its original source


repository using DirectQuery.

18

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-composite-
models

With composite models, you can connect to different kinds of data sources when you use Power
BI Desktop or the Power BI service. You can make those data connections in a couple of ways:

• By importing data to Power BI, which is the most common way to get data.
• By connecting directly to data in its original source repository by using DirectQuery. To learn
more about DirectQuery, see DirectQuery in Power BI.

18
Using Composite Models
• With DirectQuery, it’s possible to create a Power BI
model, such as a single .pbix that does either / both of:
• Combines data from one or more DirectQuery sources.

• Combines data from DirectQuery sources and import


data.

• Composite models allows combining various types of


data: sales data from EDW, SQL Server database or data
19 imported from spreadsheet.

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-composite-
models

When you use DirectQuery, composite models make it possible to create a Power BI model, such
as a single .pbix Power BI Desktop file that does either or both of the following actions:

• Combines data from one or more DirectQuery sources.


• Combines data from DirectQuery sources and import data.

For example, by using composite models, you can build a model that combines the following
types of data:

• Sales data from an enterprise data warehouse.


• Sales-target data from a departmental SQL Server database.
• Data that's imported from a spreadsheet.

19
Using Composite Models
• Create relationships between tables as always and to those tables
from different sources.

• Cross-source relationships are created with a cardinality of many-to-


many

• Modify to desired cardinality (1:n, n:1, 1:1)

• DAX functions cannot retrieve values on the one side from the many
side

• Performance impact versus many-to-many relationships within the


20 same source.

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-composite-
models

You can create relationships between tables as you always have, even when those tables come
from different sources.

• Any relationships that are cross-source are created with a cardinality of many-to-many,
regardless of their actual cardinality.
• You can change them to one-to-many, many-to-one, or one-to-one. Whichever cardinality you
set, cross-source relationships have different behavior.
• You can't use Data Analysis Expressions (DAX) functions to retrieve values on the one side
from the many side.
• There might also see a performance impact versus many-to-many relationships within the
same source.

20
Set the Storage Mode
• Indicates whether the table
is based on DirectQuery or
import (Property pane)

• Status bar displays a


storage mode Mixed for
.Pbix file with tables from
DirectQuery and some
import tables

21

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-composite-
models

Each table in a composite model has a storage mode that indicates whether the table is based on
DirectQuery or import. The storage mode can be viewed and modified in the Property pane. To
display the storage mode, right-click a table in the Fields list, and then select Properties.

The storage mode can also be viewed on the tooltip for each table.

For any Power BI Desktop file (a .pbix file) that contains some tables from DirectQuery and some
import tables, the status bar displays a storage mode called Mixed. You can select that term in
the status bar and easily switch all tables to import.

21
Performance Implications
• With DirectQuery, always consider performance
• Ensure back-end source has sufficient resources
• visuals refresh in five seconds or less

• With composite models, a single visual can result in


sending queries to multiple sources
• often passes the results from one query across to a second
source
22

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-composite-
models

When you use DirectQuery, you should always consider performance, primarily to ensure that the
back-end source has sufficient resources to provide a good experience for users. A good
experience means that the visuals refresh in five seconds or less.

Using composite models adds other performance considerations. A single visual can result in
sending queries to multiple sources, which often pass the results from one query across to a
second source.

22
Performance Implications
• Each case has its own implications on performance

• As cardinality of the columns grows, attention required on the


impact on the resulting performance.

• Use of many-to-many relationships means separate queries must


be sent to the underlying source for each total or subtotal level

• A simple table visual with totals would send two source queries,
rather than one.

23

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-composite-
models

Each of these cases has its own implications on performance, and the exact details vary for each
data source. Although the cardinality of the columns used in the relationship that joins the two
sources remains low, a few thousand, performance shouldn't be affected. As this cardinality
grows, you should pay more attention to the impact on the resulting performance.

Additionally, the use of many-to-many relationships means that separate queries must be sent to
the underlying source for each total or subtotal level, rather than aggregating the detailed values
locally. A simple table visual with totals would send two source queries, rather than one.

23
Source Groups

24

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-composite-
models

A source group is a collection of items, such as tables and relationships, from a DirectQuery
source or all import sources involved in a data model. A composite model is made of one or more
source groups.

Examples:

• A composite model that connects to a Power BI Dataset called Sales and enriches the dataset
by adding a Sales YTD measure, which isn't available in the original dataset. This model
consists of one source group.

• A composite model that combines data by importing a table from an Excel sheet called Targets
and a CSV file called Regions, and making a DirectQuery connection to a Power BI Dataset
called Sales. In this case, there are two source groups as shown in the following image:
• The first source group contains the tables from the Targets Excel sheet, and the
Regions CSV file.
• The second source group contains the items from the Sales Power BI Dataset.

24
Source Groups and Relationships
• Two types of relationships in a composite model:

• Intra source group relationships

• Relate items within a source group together

• Cross source group relationships

• Relationships start in one source group and end in a


different source group

25

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-composite-
models

There are two types of relationships in a composite model:

• Intra source group relationships. These relationships relate items within a source group
together. These relationships are always regular relationships unless they're many-to-many, in
which case they're limited.

• Cross source group relationships. These relationships start in one source group and end in a
different source group. These relationships are always limited relationships.

25
Source Groups and Relationships

26

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-composite-
models

Image: Three cross source group relationships added, relating tables across the various source
groups together:

26
References Links
Common Data Model
https://learn.microsoft.com/en-us/common-data-model/

ADLS Gen 2 with Power BI


https://learn.microsoft.com/en-us/answers/questions/862478/adls-gen-2-with-power-bi

Use composite models in Power BI Desktop


https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-composite-models

DirectQuery in Power BI (performance advice)


https://learn.microsoft.com/en-us/power-bi/connect-data/desktop-directquery-about

Relationship Evaluation
https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-relationships-
understand#relationship-evaluation

27

27
Agile Methodology, Deployments,
Emerging Technologies
Week 14 – Day 1

Spring 2023 - CST2205 - Data Modelling

1
“One more kick at the can!”
• Agile Methodology

• Deployments

• Alternative Deployments

• Emerging Technologies

2
Agile Methodology in
BI Development

3
Before Agile... Waterfall
• Phases of a project in
sequential order
• One phase ends before another
begins
• Worst-case scenarios
• Build something nobody
needed
• Requirements changed
drastically during
development
• No longer useful
4

Source: Waterfall, Agile, Kanban, & Scrum: What’s the Difference? [2023] • Asana

Waterfall. Agile. Kanban. Scrum. What do these words have to do with project management, what
are the differences, and how can you pick the methodology that’s right for your team?

The waterfall method got its name from the way it looks when you draw the process out. Similarly
to a natural waterfall, projects look like they’re cascading from one project phase to
the next.

Implementing this project management methodology requires a lot of up-front planning and
preparation. A big part of waterfall project management is creating an airtight project plan so your
team clearly understands the project requirements and restraints before they get started on the
work. That’s because there isn't a lot of room for variation, adaptability, or error once a waterfall
project is set in motion.

With careful planning, you can successfully achieve your end product with clear, predictable
workflows. This project methodology is great for time management and progress tracking, though
it’s less flexible than other models, such as Agile.

4
Waterfall Methodology

5 Source: Waterfall, Agile, Kanban, & Scrum: What’s the Difference? [2023] • Asana

Source: Waterfall, Agile, Kanban, & Scrum: What’s the Difference? [2023] • Asana

What is the waterfall methodology?


The waterfall model divides each project into different phases and moves through the phases in
sequential order. No phase can begin until the phase before it is completed. Typically, each phase
ends in a project milestone that indicates the next phase can begin.

The specific phases of the waterfall process depend on exactly what your team is creating, but
typically they look similar to this:

1. Requirements phase, sometimes split into an additional analysis phase


2. System design phase
3. Implementation phase, also known as the development phase or coding phase—depending on
the type of project
4. Testing phase
5. Deployment phase, also known as the operations phase
6. Maintenance phase

5
Agile Methodology
• An iterative methodology where work is completed in short sprints.
• Prioritize flexible approach and continuous delivery, especially for
unexpected project changes
• it can suffer from scope creep
• Software development became prevalent in the early 2000s
• developers needed iterative approach to prototyping/ project
management
• Agile Manifesto the go-to resource to implement this methodology
6 Source: Waterfall, Agile, Kanban, & Scrum: What’s the Difference? [2023] • Asana

Source: Waterfall, Agile, Kanban, & Scrum: What’s the Difference? [2023] • Asana

What is Agile?
Agile project management is an iterative methodology where work is completed in short sprints. By
prioritizing a flexible approach and continuous delivery, the Agile method is more flexible when it
comes to unexpected project changes—however, it can suffer from scope creep as a result.

The Agile methodology was developed to counter traditional waterfall-style project management.
As software development became more prevalent in the early 2000s, developers needed an
iterative approach to prototyping and project management—and thus Agile software development
was born.

Since then, the Agile Manifesto has been the go-to resource for Agile values and principles for
anyone who’s looking to implement this methodology. The Agile methodology is no longer exclusive
to software development. Among others, marketing, IT, event planning, and product development
have adapted and modified the methodology to fit their industries.

6
Agile Methodology
• An iterative methodology where work is completed in short sprints.
• Prioritize flexible approach and continuous delivery, especially for
unexpected project changes
• it can suffer from scope creep
• Software development became prevalent in the early 2000s
• developers needed iterative approach to prototyping/ project
management
• Agile Manifesto the go-to resource to implement this methodology
7 Source: Waterfall, Agile, Kanban, & Scrum: What’s the Difference? [2023] • Asana

Source: Waterfall, Agile, Kanban, & Scrum: What’s the Difference? [2023] • Asana

What is Agile?
Agile project management is an iterative methodology where work is completed in short sprints. By
prioritizing a flexible approach and continuous delivery, the Agile method is more flexible when it
comes to unexpected project changes—however, it can suffer from scope creep as a result.

The Agile methodology was developed to counter traditional waterfall-style project management.
As software development became more prevalent in the early 2000s, developers needed an
iterative approach to prototyping and project management—and thus Agile software development
was born.

Since then, the Agile Manifesto has been the go-to resource for Agile values and principles for
anyone who’s looking to implement this methodology. The Agile methodology is no longer exclusive
to software development. Among others, marketing, IT, event planning, and product development
have adapted and modified the methodology to fit their industries.

7
Agile Methodology

8 Source: Waterfall, Agile, Kanban, & Scrum: What’s the Difference? [2023] • Asana

Source: Waterfall, Agile, Kanban, & Scrum: What’s the Difference? [2023] • Asana

How Agile works


Agile project management includes iterative backlog management, sprints, reflection, iteration, and
more sprints. Each Agile sprint typically lasts two to four weeks.

Each sprint goes through the following phases:

1. First, the product owner organizes the product backlog. The product backlog is a list of every task that
may be worked on during the sprint. This information is usually stored in a project management tool.

2. Before the sprint, the entire project team participates in sprint planning to identify the best tasks to
work on during the two-week period.

3. During the sprint, Agile teams meet frequently to discuss blockers and action items.

4. Once the sprint is over, team members get together to run a sprint retrospective and identify what went
well and what could have been better.

8
Different Flavours of Agile
• Kanban: Flexible framework for organizing tasks on a board with
different stages (i.e. in progress, in review, UAT)

• Scrum: Full framework embracing continuous improvement and


encouraging rituals such as daily stand ups with the team

9 Source: Waterfall, Agile, Kanban, & Scrum: What’s the Difference? [2023] • Asana

Source: Waterfall, Agile, Kanban, & Scrum: What’s the Difference? [2023] • Asana

What is Kanban?
Kanban is a subsect of the Agile methodology and functions within the broader Agile mentality. The
Agile philosophy is all about adaptive planning, early delivery, and continuous improvement—all of
which Kanban can support.

When someone speaks of Kanban in project management, they’re most commonly referring to
Kanban boards. A Kanban board represents stages of work with columns that hold the individual
tasks for each stage—but more on that in a little bit.

The Kanban framework is very flexible and can help your team become more dynamic and agile
over time.

What is Scrum?
Scrum is one of the most popular Agile frameworks. Unlike Kanban, which is generally used as a
tool to visualize work, Scrum is a full framework and you can “run teams” on Scrum. The framework
was pioneered by Taiichi Ohno and provides a blueprint of values, guidelines, and roles to help your
team focus on continuous improvement and iteration.

It’s much less flexible than Kanban but a great way for Agile teams to collaborate and get high-
impact work done.

9
Kanban Board: Example

10 Source: Waterfall, Agile, Kanban, & Scrum: What’s the Difference? [2023] • Asana

Source: Waterfall, Agile, Kanban, & Scrum: What’s the Difference? [2023] • Asana

How Kanban works


At its core, the modern Kanban framework is an online, visual method to manage work.

When people say “Kanban,” they are frequently referring to Kanban boards: the visual project
management view that brings the Kanban methodology to life.

In a Kanban board, columns represent the various stages of work. Within each column, visual cards
represent individual tasks and which stage they’re in. Typically these stages are ‘to do,’ ‘in
progress,’ and ‘done.’

Kanban boards are one of the most popular forms of visual project management. They’re most effective for
providing easy, at-a-glance insight into a project.

10
Benefits of Kanban Boards
• At-a-glance information, including but not limited to:
• tasks or deliverables, task assignee, due dates, relevant tags,
like priority or task type, task details, context, relevant files
• Flexible way to visualize work in progress.
• Can also customize Kanban board columns based on task
assignees, add a “swimlane,” or create columns by due dates.
• Key component of most project management tools that allows
you to view work in multiple ways.
11 Source: Waterfall, Agile, Kanban, & Scrum: What’s the Difference? [2023] • Asana

Source: Waterfall, Agile, Kanban, & Scrum: What’s the Difference? [2023] • Asana

Benefits of Kanban boards


When you use a Kanban board for visual project management, you provide your team with a wealth
of at-a-glance information, including but not limited to:

• Tasks or deliverables
• Task assignee
• Due dates
• Relevant tags, like priority or task type
• Task details
• Context
• Relevant files

Kanban boards are a flexible way for your team to visualize work in progress. Traditionally, Kanban
board columns display the stages of work, which is why they’re popular visual project management
tools for teams that run ongoing processes and projects like creative requests or bug tracking
projects.

You can also customize your Kanban board columns based on task assignees, add a “swimlane,” or
create columns by due dates.

Because of how effective they can be for visualizing work, Kanban boards are a key component of
most project management tools that allows you to view work in multiple ways.

11
Agile and Scrum: Example

12 Source; https://www.tuleap.org/agile/agile-scrum-in-10-minutes

Source: Waterfall, Agile, Kanban, & Scrum: What’s the Difference? [2023] • Asana

What is Scrum?
Scrum is one of the most popular Agile frameworks. Unlike Kanban, which is generally used as a
tool to visualize work, Scrum is a full framework and you can “run teams” on Scrum. The framework
provides a blueprint of values, guidelines, and roles to help your team focus on continuous
improvement and iteration.

It’s much less flexible than Kanban but a great way for Agile teams to collaborate and get high-
impact work done.

How Scrum works


To run a Scrum, teams typically assign a Scrum master, who is in charge of running the three
distinct Scrum phases and keeping everyone on track. The Scrum master can be your team lead,
project manager, product owner, or the person most interested in running Scrum.

The Scrum master is responsible for implementing the three traditional Scrum phases:

Phase 1: Sprint planning. A Scrum sprint is usually two weeks long, though teams can run faster or
shorter sprints. During the sprint planning phase, the Scrum master and team take a look at the
team’s product backlog and select work to accomplish during the sprint.

Phase 2: Daily Scrum standups. Over the course of the Scrum (also known as the Scrum “cycle
time”), teams traditionally meet for 15 minutes every day to check in on progress and make sure
the amount of assigned work is appropriate.

12
Phase 3: Sprint retrospective. When the Scrum is over, the Scrum master hosts a sprint retrospective
meeting to evaluate what work was done, route any unfinished work back into the backlog, and
prepare for the next sprint.

The goal of Scrum isn’t to build something in two weeks, ship it, and never see it again. Rather, Scrum
embraces a mindset of “continuous improvement,” where teams take small steps towards bigger
goals. By breaking work into smaller chunks and working on those chunks, Scrum helps teams better
prioritize and ship work more efficiently.

12
Benefits of Scrum
• Clearly established rules, rituals, and responsibilities
• Daily Scrum meetings, combined with sprint planning and sprint
review (or “retrospective” meetings)
• Helps teams continuously check in and improve on current
processes.
• Offers easy, built-in structure to manage and support most important
work
• Pre-set and limited amount of work and time for each sprint.
• Clearly defined responsibilities
13 Source: Waterfall, Agile, Kanban, & Scrum: What’s the Difference? [2023] • Asana

Source: Waterfall, Agile, Kanban, & Scrum: What’s the Difference? [2023] • Asana

Benefits of Scrum
Teams that run Scrum have clearly established rules, rituals, and responsibilities. Additionally, your
daily Scrum meetings, combined with sprint planning and sprint review (or “retrospective”
meetings), help teams continuously check in and improve on current processes.

Because it draws from a backlog of work, and begins with a sprint planning meeting, Scrum offers
an easy, built-in structure for team leads or product owners to manage and support their team’s
most important work. During a Scrum, your team has a pre-set and limited amount of work and
time for each sprint. This level of built-in prioritization is combined with clearly defined
responsibilities ensuring that everyone knows what they’re responsible for at all times.

13
Applying Agile to BI Development
• User Stories → Requirements Gathering
• What is the business process?
• What is the grain?
• What are the KPIs to determine dim/facts?
• Prototype → Early Feedback
• Mock SQL query + spreadsheet
• Dashboard design
• Publish + Continuous Feedback → Make it available
• Power BI Service Online
• Comments + User Acceptance Testing phase

14

14
Deployments

15

15
What is deployment?
• Publish your data models / reports to data consumers
• Ensure data is updated and accurate
• Monitor data for any issues
• Enable way to collect feedback / improvements

16

16
General Workflow
Deployment
Gather User Requirements
Publish the data model and report
Identify key stakeholders and ask
to the BI tool or interface for the
questions to gather information
data consumer. Collect feedback
about the data models (business
for future enhancements
process, grain, facts and
dimensions).

Testing
Identify issues and test the output
of the query to ensure accuracy
Build a Prototype and completeness. Get feedback
Design a mockup chart or on KPIs.
dashboard and confirm with
stakeholder. Write a SQL query to
generate the table for this model.
Integration
New business process or
improvement to an existing
model. Write the Python, SQL or
BI code to generate the model.

17

17
General Workflow

Enterprise Power BI Power BI


Operational Power
Data DAX Service (end
Database Query Editor
Warehouse Reports user reports)

18
Power BI Service
• Online workspace for business users / data consumers
• Explore dashboards and reports without ability to edit
• Download / export data in Excel and integration with MS Teams
• Administrator might configure things like scheduled refreshes in
the background or usage reports
• Best practice - test your dashboards and reports in PBI Service

https://learn.microsoft.com/en-us/power-bi/consumer/end-user-experience

19

19
Power BI – End User Experience

Source: https://learn.microsoft.com/en-us/power-bi/consumer/end-user-experience

Open the Power BI service in a browser or on a mobile device. From here, everyone works from the
same trusted dashboards and reports. Power BI updates the data automatically, so you're always
working with the freshest content.

The content isn't static, so you can dig in and look for trends, insights, and other business
intelligence. Slice and dice the content, and even ask it questions in your own words. Or, sit back
and let your data discover interesting insights, send you alerts when data changes, or email reports
to you on a schedule that you set. All your content is available to you anytime, in the cloud or on-
premises, from any device. That's just the beginning of what Power BI can do.

20
Alternative Deployments

21

Source: https://learn.microsoft.com/en-us/power-bi/transform-model/dataflows/dataflows-create

21
Alternative Deployment Workflow

Design data models and write Results of the dbt models are Data models are leveraged in
SQL code inside dbt to materialized as tables or SELECT statements or in a
materialize in cloud data views inside Google BigQuery more sophisticated query and
warehouse. Testing occurs in in a production data set called published in Metabase, a
a development data set (i.e. “dw” cloud-based BI tool, for data
zz_eoey) consumers.

22

22
Emerging Technologies

23

23
The Ecosystem (and Future) of the
Modern Data Infrastructure

24 Source: https://www.indicative.com/resource/modern-data-infrastructure/

Source: https://www.indicative.com/resource/modern-data-infrastructure/

The Ecosystem (and Future) of the Modern Data Infrastructure


A simplified look at the current data landscape shows that an architecture where companies own
and control their own data—where the data warehouse is the central hub connecting to all other
tools and a gravity well for all business data—is emerging.

That architecture represents a major shift in how data is ingested, stored, and analyzed by
companies of all sizes—and key players in the data analytics industry aren’t keeping up.

24
What is the Modern Data Stack?
• Majority of small-medium companies / new companies adopting
modern data stack leveraging the following:
• ETL/ELT tool + Event Collection
• Cloud Data Warehouse
• Transformation tool
• BI tool

25 Source: https://www.fivetran.com/blog/what-is-the-modern-data-stack

Source: https://www.fivetran.com/blog/what-is-the-modern-data-stack

25
What is the Modern Data Stack?
ETL/ELT tools are fully-managed SaaS Event Collection tools can be customer
platforms that enable low-code or no-code data platforms (CDP) or SaaS products that
extraction and loading of data from APIs or collect web and app event data. CDPs
databases into a data warehouse enhance event collection with features like
identity resolution and custom user journeys.

26

Source: https://www.fivetran.com/blog/what-is-the-modern-data-stack

26
Cloud Data Warehouse
• Central to a modern analytics
infrastructure
• Key Features
• Massive parallel processing
(MPP)
• Columnar data warehouse
• Cloud-based

27

27
Dbt (data build tool) as transformation
tool
• Transformation workflow built for CDWs

• Utilize SQL and Python - SELECT statements converted to DML/DDL

• Integration with other tools


• Git

• Airflow / Prefect (orchestration)

• Cloud vs. Core (open source)

• Partnership with Databricks

28

28
Business
Intelligence
Tools

29

Source: https://aws.amazon.com/blogs/business-intelligence/aws-recognized-as-a-challenger-in-
the-2023-gartner-magic-quadrant-for-analytics-and-business-intelligence-platforms/

29
Final Words

30

30
Business Intelligence Tools
• Modern data stack and power of the cloud

• Importance of accessible AI and ML

• Importance of soft skills


• Communication

• Critical Thinking

• Understanding bias and stakeholders

• Exam PL-300: Microsoft Power BI Data Analyst


https://learn.microsoft.com/en-us/certifications/exams/pl-300/
31

Source: https://keepcalms.com/p/remain-curious-and-keep-learning/

Exam PL-300: Microsoft Power BI Data Analyst - https://learn.microsoft.com/en-


us/certifications/exams/pl-300/

31
References Links
Agile Manifesto
Manifesto for Agile Software Development (agilemanifesto.org)

What is Agile methodology? (A beginner’s guide)


https://asana.com/id/resources/agile-methodology

What is the modern data stack?


https://www.fivetran.com/blog/what-is-the-modern-data-stack

32

32
Data Modeling – Course Recap
Week 14 – Day 1

Summer 2023 - CST2205 - Data Modelling

1
“There were 5 exabytes of information created
between the dawn of civilization through 2003, but
that much information is now created every two
days.”
~ Eric Schmidt, Executive Chairman at Google

2
What Did We Learn?
• Kimball Methodology using star schemas

• BI development process

• Identify the business process + grain + fact + dimensions

• Design and build data models in Power BI

• Power Query Editor

• Power BI DAX

• IBM Cognos Framework Manager

3 • Applying star schema and best practices

3
Final Exam

4
Logistics
What: Data Modeling (23S_CST2205_010) Final Exam

Where: Room B457


When: Thursday, August 17th from - 6:30pm - 8:30pm (120-minutes)

Logistics:

• Please arrive at least 10-minutes early

• Open-book

• Course Materials (slides, DW textbook)

• No headphones / review of the videos

5
Final Exam Structure
The final exam will consist of the following:

• Mix of multiple-choice / multi-select, match, true or false questions

• Delivery is online on Brightspace

• Students must be present in-person on the exam date

6
Final Exam Study Guide

7
Approximate Coverage
The exam material (all questions) will cover the following

• Principles of data warehousing and why we model data (10%)

• Different modelling approaches (Kimball, Inmon, Mart) (10%)

• Modelling business processes (30%)


• Inventory and Procurement + Order Management + CRM and HRM

• Kimball advanced topics (10%)

• Power BI (20%) - Data Validation + DAX + Time Intelligence

• IBM Cognos FM (10%)

8
Week 1 - Introducing Data Modeling - May 9 & 11
Slide deck - Data Modeling - Introduction: slides 9, 11-17, 19- 23

• Purpose of data modeling, understanding stakeholders (data generators, data


managers, data consumers), data modeling, recipe for success.

Slide Deck - Data Warehousing Concepts - Approaches: slides 1-23

• Characteristics of a data warehouse, functions of a data warehouse, normalization


vs. Denormalization, data warehouse vs. database, the Kimball methodology
(characteristics, advantages/disadvantages, data warehouse lifecycle); the Inmon
method (characteristics, advantages/disadvantages); how to choose Kimball vs
Inmon

9
Week 2 - Introducing Data Modeling - May 16 & 18
Slide deck - Dimensional Modelling - Introduction - slides 7-19, 23-37
• Principles / goals of data warehouse: principles for success; dimensional models
(star schemas, OLAP cubes, normalization (1NF, 2NF, 3NF); fundamental concepts
of dimensional models (business process, fact table, dimensions, grain), 4-step
design process, case studies
• Document - What is Database Normalization? - Pages 1-7

Slide deck: Data Modeling Approaches - slides 1-23


• Notes on adapting changes; surrogate keys on dimensions; modelling
architecture approaches (understand the Kimball's DW/BI architecture;
understand independent data mart; understand comparison of 3 methods - hub-
and-spoke corporate information factory (Inmon); understand hybrid Kimball +
Inmon

10

10
Week 3 - Modelling Business Processes - May 23 & 25
Slide deck: Modeling Inventory and Procurement - slides 1 - 40
• Review of key terms (bus matrix, conformed dimensions, value chain, row-density
calculation, semi-additive facts, periodic snapshots, accumulating snapshots);
fact table comparisons; single vs. multiple transaction fact tables; business
matrix and processes; slowly changing dimensions and best practices.
• Review models

11

11
Week 4 - Wrapping Up Kimball - May 30/June 1st
Slide deck - CRM) and HRM - slides 1-40
• Aggregated facts as dimension attributes; many-to-many relationships; event
collection; SQL examples; reverse ETL; survey / text data

Slide deck - Industry case studies - slides 1-27


• Bus matrix examples - which would you build first; bridge tables; identify the
grain; post design review examples

12

12
Week 5 - Data Validation & Workshops - June 6 & 8
Slide deck - Data Validation - slides 1-8
• What is data validation? Importance of data validation? Data validation concepts;
techniques

13

13
Week 6 - Metadata Modeling with Framework Manager -
June 13 & 15
Slide deck - Metadata Models with Framework Manager - slides 1-24
• Components; role of a netadata model; 3 goals of a data modeler; identify data
model types and data structures; understand merits of each model type
(operational, reporting); identify data model types and data structures;
relationships and cardinality; identify different data traps

Slide deck - Data Modelling Concepts with Framework Manager - slides 1-38
• Framework Manager projects; defining metadata elements (query subjects, items,
regular dimensions, measure dimensions); data modelling approach; key
modeling recommendations; reporting requirements; questions to ask; explore
data sources to identify data access strategies; model in stages; modeling in
layers (presentation view, business logic view, consolidation view, foundation
objects view, data source view); relational modelling concepts (cardinality,
determinants, multi-fact/multi-grain queries); importance of cardinality;
14

14
Week 7 - Metadata Modeling with Framework Manager -
June 20 & 22
Slide deck - Creating Baseline Projects - Extending Models and Framework Manager
- slides 1-27, 29-34
• Calculation and filters; customizing metadata for runtime; presentation view;
identify conformed dimensions; data source query subjects; set the SQL type
(Cognos, native SQL, pass-through SQL); query generation architecture; stored
procedure query subjects; using prompt values; data modification stored
procedures; SQL by a BI tool vs. Analyst; explore SQL generation; derived tables;
identifying stitch query SQL; what is a coalesce function?; Non-conformed
dimensions in generated SQL

15

15
Week 9 - Introducing Metadata Modeling with Power BI -
July 4 & 6
Reading - Query Overview in Power BI Desktop

Tutorial* - Shape and Combine Data in Power BI Desktop

* You will be checked for knowledge gained on tutorials


16

16
Week 10 - Transforming and Shaping Data - Measures,
Calculated Columns and DAX - July 11 & 13
Slide deck - Common Query Tasks in Power BI Desktop - slides 1-14

• Connect to data; shaping and combining data; group rows; pivot columns;
custom columns; query formulas; advanced editor;

Tutorial #1 – Create Your Own Measures in Power BI

Tutorial #2 - Create calculated columns in Power BI Desktop

Tutorial #3: Use DAX in Power BI Desktop

* You will be checked for knowledge gained on tutorials


17

17
Week 11 - Data Modeling in Power BI - July 18 & 20
Slide deck - Modeling Relationships in Power BI Desktop - slides 1-22

• Purpose of relationships, star schema design principles, disconnected tables,


relationship properties, data types of columns, cardinality, cross filter direction,
DAX functions

Reading: Work with Modeling view in Power BI Desktop

Tutorial #1: From Dimensional Model to Dashboard Report in Power BI Desktop

Tutorial #2: Design a Data Model in Power BI

* You will be checked for knowledge gained on tutorials


18

18
Week 12 - More DAX - Filtering Context, Time Intelligence, and
Calculation Groups, Optimize Model Performance - July 25 &
27
Slide deck - Filter Context, Time Intelligence, Calculation Groups in Power BI
Desktop - slides 1-27

• What is filter context?; Slicer/implied filter; time intelligence; steps to use time
intelligence (build date table, generice measure, add time intelligence function);
what are calculation groups?; Using calculation groups; working with calculation
group functions

Slide deck - Performance Optimization in Power BI Desktop - slides 1-9

• What is it; data analysts focus; focus on performance at development; data


model size
* You will be checked for knowledge gained on tutorials
19

19
Week 12 - More DAX - Filtering Context, Time Intelligence, and
Calculation Groups, Optimize Model Performance - July 25 &
27
Tutorial #1: Modify DAX filter context in Power BI Desktop models

Tutorial #2: Use DAX time intelligence functions in Power BI Desktop models

Tutorial #3: Add DAX calculation groups to Power BI Desktop models

Tutorial #4: Optimize a model for performance in Power BI

* You will be checked for knowledge gained on tutorials


20

20
Week 13 - Data Flows and Composite Models - August 1 & 3

Slide deck - Dataflows and Composite Models - slides 1-27

• What are dataflows; when to use dataflows; creating a dataflow (define new
tables, using linked tables, using computed tables, using a CDM folder, using
import/export); what are composite models?; Using composite models; setting the
storage mode; performance implications; what is a source group; source groups
and relationships

21

21
Week 14 - Agile Methodology, Deployments, Emerging
Technologies - August 8 & 10
Slide deck - Agile Methodology, Deployments, Emerging Technologies p 1-21

• Agile methodology, waterfall methodology, kanban board, scrum , and benefits;


what is deployment; general workflow; power BI service; end user experience

22

22
Tips For Studying
• Start by reviewing your quiz + solutions posted

• The type of questions will be very similar to the quizzes

• Review the slides and notes following the suggested key concepts in the next
slides

• Follow the blogs or textbook references on the slides for any concepts that
require more depth

• Identify any questions or concepts for the last class on Friday, April 14th

• You should expect to apply key concepts against common business processes
like sales, inventory or HR

23

23
Bonus Materials (not on quiz)

24

24
Power BI Ecosystem
Power BI Desktop (focus of course)
1. Get Data: Where to connect / extract data
2. Power Query Editor: Where to transform data pre-load
3. Model: Where to relate the data
4. Data: Where to find tabular data (similar to Excel)
5. Report: Where to create visuals and dashboards
6. Visualizations: Where to choose and customize charts
7. Fields: Where tables and columns are stored
Power Platform (larger ecosystem)
1. Power BI Service: End users interact with dashboards
2. Power Apps: Embedded apps connected to data
3. Power Automate: Automate repetitive workflows
4. Power Virtual Agents: Build chatbots
25

25
Power BI Levels of Adoption

26

26
Power BI Service
• Online workspace for business users / data consumers
• Explore dashboards and reports without ability to edit
• Download / export data in Excel and integration with MS Teams
• Administrator might configure things like scheduled refreshes in
the background or usage reports
• Best practice - Test your dashboards and reports in PBI Service

https://learn.microsoft.com/en-us/power-bi/consumer/end-user-experience

27

27
Power BI Ideal Team Structure

28

28
Where does Power BI fit in an
ecosystem?
• Power BI is part of the BI / visualization layer for modern analytics infrastructures
• While it is possible to do ETL in Power BI it is not scalable for big data
• Best organizations use a modern stack of tools alongside their BI and visualization
tool

29 Source: Deloitte Blog and Indicative Blog

29
Good Luck!

30

30
Resources For Lifelong Learning

31

31
Continue Learning
• YouTube Channels
• Guy in a Cube - Microsoft (ex)-Employees sharing Power BI tips
• Curbal - PBI Consultant sharing DAX, Power Query and other tips
• BI Elite - PBI Consultant sharing hands-on report building tips

• Blogs
• SQL BI - Blog and training site by authors of DAX book, top experts in DAX
• Radacad - Blog and training site by professional trainers with lots of free
resources

• Microsoft Resources
• Power BI Docs - Official documentation for Power BI general
• DAX Docs - Official documentation for Data Analysis Expressions language
• Power Query Docs - Official documentation for Power Query (aka M)
programming language
• Definitive Guide to DAX book - Equivalent to the Bible for learning DAX
32

YouTube Channels

Guy in a Cube - Microsoft (ex)-Employees sharing Power BI tips


Curbal - PBI Consultant sharing DAX, Power Query and other tips
BI Elite - PBI Consultant sharing hands-on report building tips

Blogs

SQL BI - Blog and training site by authors of DAX book, top experts in DAX
Radacad - Blog and training site by professional trainers with lots of free resources

Microsoft Resources

Power BI Docs - Official documentation for Power BI general


DAX Docs - Official documentation for Data Analysis Expressions language
Power Query Docs - Official documentation for Power Query (aka M) programming language

32
Definitive Guide to DAX book - Equivalent to the Bible for learning DAX

32
Continue Learning
• Microsoft Learning Paths

• Certification Exam - PL-300


Level of knowledge or certification based on the desired job role (refer
to Team Structure slide at end of ppt)

33

Microsoft Learning Paths

Certification Exam - PL-300

Level of knowledge or certification based on the desired job role (refer to Team Structure slide at
end of ppt)

33

You might also like