0% found this document useful (0 votes)
42 views62 pages

Power BI Data Modelling

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views62 pages

Power BI Data Modelling

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Power BI Data Modelling

https://learn.microsoft.com/en-us/power-bi/guidance/star-schema
What is Data Modelling?
Data modeling in Power BI is the process of creating a structured
representation of data and its relationships, enabling efficient
analysis and reporting
What is Schema in DBMS?

A schema in a database is the


blueprint or logical structure that
defines how data is organized and
stored. It provides a comprehensive
overview of the database's
architecture, outlining the various
components and their relationships.
Different Types of Schema Design

Flat Model
Hierarchical Model
Network Model
Relational Model
Star Schema
Snowflake Schema
Star Schema
A star schema in Power BI is a data modeling technique that organizes data
into fact and dimension tables, resembling a star shape
Foreign Key vs Primary key

Primary key:A primary key is a column (or a set of columns) in a table that uniquely
identifies each row within that table. No two rows can have the same primary key value.
A primary key cannot contain null values, meaning every row must have a valid
identifier. A table can have only one primary key

Foreign Key: A foreign key is a column (or set of columns) in one table that references
the primary key of another table, creating a link between them. This enforces referential
integrity by ensuring foreign key values match existing values in the referenced primary
key. Unlike primary keys, foreign keys can contain duplicate values and nulls (unless
restricted). A table can have multiple foreign keys to connect to various other tables.
Fact Table

A fact table is a central table


in a dimensional data model
used in data warehousing
and Business Intelligence
(BI) applications. It stores
measurements, metrics, or
facts related to a specific
business process or event
Dimension Table

Dimension tables are


crucial components of a
star schema, providing
descriptive attributes that
add context and allow for
detailed analysis of the
numerical data
(measures) stored in the
central fact table
Star schema relevance to Power BI semantic
models

Dimension tables enable filtering and grouping.

Fact tables enable summarization.


Normalization vs Denormalization
Normalization is the process of structuring data into multiple related tables to reduce redundancy and
improve consistency. Denormalization is the process of combining data from multiple tables into one table,
often to simplify queries or improve performance in certain cases.

SalesAmount
CustomerNam

CustomerID
ProductName

SalesAmount

ProductID
CustomerID

Quantity
OrderID
ProductID

Category

Quantity
OrderID

Date
Region

Price
Date

City
e

2025- John Hydera Coffee Kitche 1001 2025-08-01 C001 P001 2 10


1001 C001 South P001 5 2 10
08-01 Smith bad Mug n

2025- Mary Mumb Notebo Station 1002 2025-08-02 C002 P002 5 10


1002 C002 West P002 2 5 10
08-02 Jones ai ok ery

2025- John Hydera Tea Kitche


1003 C001 South P003 15 1 15 1003 2025-08-02 C001 P003 1 15
08-02 Smith bad Kettle n

ProductID ProductName Category Price


CustomerID CustomerName City Region
P001 Coffee Mug Kitchen 5
C001 John Smith Hyderabad South
P002 Notebook Stationery 2
C002 Mary Jones Mumbai West
P003 Tea Kettle Kitchen 15
Measures
In star schema design, a measure is a fact table column that stores values to be summarized. In a Power BI
semantic model, a measure has a different—but similar—definition. A model supports both explicit and
implicit measures.

Explicit measures are expressly created and they're based on a formula written in Data Analysis Expressions
(DAX) that achieves summarization. Measure expressions often use DAX aggregation functions like SUM, MIN,
MAX, AVERAGE, and others to produce a scalar value result at query time (values are never stored in the
model).

Implicit measures are columns that can be summarized by a report visual or Q&A. They offer a convenience for
you as a model developer, as in many instances you don't need to create (explicit) measures. For example, the
Adventure Works reseller sales Sales Amount column can be summarized in numerous ways (sum, count,
average, median, min, max, and others), without the need to create a measure for each possible aggregation
type
Surrogate keys
A surrogate key is a unique identifier that you add to a table to support star schema modeling. By
definition, it's not defined or stored in the source data. Commonly, surrogate keys are added to
relational data warehouse dimension tables to provide a unique identifier for each dimension table
row.
Degenerate dimension

A degenerate dimension is a dimension key stored in a fact table without a separate physical
dimension table.
OrderID Date CustomerID ProductID Quantity SalesAmount
1001 2025-08-01 C001 P001 2 10
1002 2025-08-02 C002 P002 5 10
1003 2025-08-02 C001 P003 1 15

OrderID is not linked to any dimension table. It’s Why not put it in a separate dimension table?
unique to each transaction. There are no extra  Because It has no descriptive attributes (like "OrderType" or
attributes for OrderID other than itself. Instead of
"OrderStatus").
creating a Dim_Order table with just OrderID in it
 Creating a separate table would just add unnecessary joins in
(which would be pointless), we keep it inside the fact
table. That OrderID column is a degenerate
Power BI.

dimension.  It's already unique per fact row.


Cardinality
cardinality describes the uniqueness of data values in a column — essentially, how many
distinct values there are compared to the total number of rows.

High cardinality

A column has many unique values (often close to the total row count).Example: OrderID, TransactionID,

Email Address.

Low cardinality

A column has few unique values compared to total rows. Example: Gender (Male/Female), Region

(North/South/East/West).
Why Cardinality Matters in Power BI

Power BI relationships have a cardinality type of Relationship (One-to-One, One-

to-Many, Many-to-Many).

High-cardinality columns (e.g., millions of unique IDs) can increase memory usage

and slow down visuals.

Choosing the right granularity for relationships depends on understanding

cardinality.
One-to-One (1:1) Relationship

Every value in the key column of Table A appears only once, and every
value in the key column of Table B appears only once.

EmpID EmployeeName EmpID DOJ


E001 John E001 2024-01-05
E002 Mary E002 2024-02-15
E003 Alex E003 2024-03-20
One-to-Many (1:*) Relationship (Most Common in
Power BI)

One table has unique keys (dimension), and the other table has repeating keys (fact)

CustID CustomerName OrderID CustID Amount


C001 John 1001 C001 500
C002 Mary 1002 C002 200
C003 Alex 1003 C001 300
Many-to-Many (:) Relationship in Power
BI
Both tables have duplicate values in the relationship column — neither table contains unique keys
for that column.
This usually happens when two fact tables share a dimension but at different granularity or when
you try to join two lists that overlap.

ProductName SupplierName SupplierName ProductName


Coffee Mug Supplier A Supplier A Coffee Mug
Notebook Supplier B Supplier A Tea Kettle
Tea Kettle Supplier A Supplier B Notebook
Notebook Supplier C Supplier C Notebook
Cross Filter Direction

Determines how filters flow between tables in your model.

Single → Filter flows from one table to another in one direction only.
(dimension → fact table).
Example: Selecting a Product filters Sales, but selecting a Sale does not filter
Products.

Both (Bidirectional) → Filters flow both ways.


Example: Selecting a Product filters Sales and selecting a Sale filters Products.
Single-direction (one-way) filtering

Dimension filters Fact. Fact does not filter Dimension.

CustomerID CustomerName OrderID CustomerID Amount


C001 John 1001 C001 500
C002 Mary 1002 C002 200
C003 Alex 1003 C001 300

Relationship: Customers[CustomerID] → Sales[CustomerID]


Cross filter direction: Single (from Customers ➜ Sales)
Bidirectional (both) filtering
ProductID ProductName Category Use when two dimensions need to filter each other
P001 Coffee Mug Kitchen through a fact (e.g., making slicers interact), or with
bridge tables. Use carefully.
P002 Notebook Stationery
P003 Tea Kettle Kitchen
Relationships:
Products[ProductID] ⇄ Sales[ProductID]
BrandID BrandName Brands[BrandID] ⇄ Sales[BrandID]
Cross filter direction: Both (Products ⇄ Sales, Brands ⇄ Sales)
B01 HomeWare
B02 PaperCo What happens in reports:
•Slicer on Category = Kitchen (from Products) filters Sales to P001
OrderID ProductID BrandID Amount & P003, and—because filters now flow both ways—Brands visuals
2001 P001 B01 150 also reflect only brands that sold Kitchen items (here, HomeWare).
2002 P002 B02 80 •Likewise, selecting BrandName = PaperCo filters Sales to B02, and
2003 P001 B01 90 Products visuals now show only items PaperCo sold (here,
2004 P003 B01 120 Notebook).
Active vs. Inactive Relationships
Active Relationship → Automatically
used by Power BI when filtering or
aggregating data.
Inactive Relationship → Exists in the
model but is ignored by default.
You can activate it in a DAX measure
with:
 The active relationship is shown as a solid
line in the model diagram.
 The inactive relationship is shown as a
dotted line — it exists in the model but isn’t
used by default in filtering or calculations.

CALCULATE(SUM(Sales[Amount]), USERELATIONSHIP(Orders[Date], Calendar[Date]))


Relationship Type: Regular vs.
Limited

• Regular Relationship → Based on unique matching keys and works


normally.

• Limited Relationship → Typically occurs in Many-to-Many or when


keys are not unique — filter behavior may be restricted.
Role-Playing Dimensions

Role-Playing Dimension → The same table is used multiple times


in a model with different roles (e.g., Date table used for Order Date,
Ship Date).
New Measure
A measure is a DAX formula used to perform calculations on the fly based on the
current filter context (e.g., slicers, filters, rows in a visual). Measures are not stored in
your data model — they are calculated dynamically when you view a report.
Key points:
• Stored as logic, not as data in the table.
• Calculated at query time depending on the report filters.
• Typically used for aggregations like SUM, AVERAGE, COUNT, % of total, YTD, etc.
• Results change based on slicers, filters, or visual context.
Total Sales = SUM(Sales[SalesAmount])

If you select 2024 in a slicer, Total Sales will show only 2024 sales automatically.
When to use:
When you want dynamic calculations that respond to filtering in visuals.
New Column
A calculated column is a new column in your table created using DAX.
It is stored in the model after calculation and does not change with slicers or report
filters (unless you refresh/recalculate the model).
Key points:
• Stored in the table like regular data.
• Takes up space in the data model.
• Calculated row by row at data refresh time.
• Good for classification, flags, or joining tables

Profit = Sales[SalesAmount] - Sales[CostAmount]

This creates a new column Profit for each row of the Sales table.
When to use:
When you need a permanent field for filtering, grouping, or joining that doesn’t change with slicers.
New Table
A calculated table is a table created from an expression using DAX.
It exists physically in your data model, just like imported tables.
Key points:
• Created from existing tables or data using DAX.
• Recalculated on model refresh.
• Can be used for relationship building, lookup tables, or summarization.

TopProducts =

TOPN(10, SUMMARIZE(Sales, Products[ProductName], "Total Sales", SUM(Sales[SalesAmount])), [Total


Sales], DESC)

When to use:
When you need a static table generated from existing data, such as for mapping, filtering, or specific analysis.
Calculation Group
A calculation group is a time-saving feature in Power BI (created in Tabular
Editor, not directly in Power BI Desktop) that lets you create reusable
calculation logic applied to multiple measures without rewriting DAX.
Key points:
• Contains calculation items — each item applies a transformation to a base
measure.
• Commonly used for time intelligence (YTD, MTD, QTD) or formatting logic.
• Reduces the number of measures you must create and maintain.
Comparison

Changes with Row-by-row or


Feature Stored in Model? Common Uses
Filters? Aggregated?
Aggregated at KPIs, totals, %
Measure ❌ No ✅ Yes
query time change
Row-by-row at Categories, joins,
Column ✅ Yes ❌ No
refresh flags
Depends on DAX Summary tables,
Table ✅ Yes ❌ No
query lookups
Applies to Time intelligence,
Calculation Group ❌ Logic only ✅ Yes
measures reusable logic
DAX
DAX is a collection of functions, operators, and constants that can be used in a
formula, or expression, to calculate and return one or more values. DAX helps
you create new information from data already in your model.

 Transforms data into insights – Enables advanced analysis beyond basic visuals.
 Supports complex calculations – Growth %, YoY trends, moving averages, and more.
 Enables flexible time intelligence – Compare performance across date ranges.
 Customizes metrics for your business – Tailor KPIs to unique business needs.
 Enhances decision-making – Delivers precise, relevant, and actionable insights.
 Drives real business impact – Better insights lead to better strategies and outcomes.
 Maximizes Power BI’s potential – Unlocks advanced features and deeper analytics.
DAX Syntax
Syntax includes the various elements that make up a formula, or more simply, how
the formula is written. For example, here's a simple DAX formula for a measure:
Syntax Explanation

A. The measure name, Total Sales.


B. The equals sign operator (=), which indicates the beginning of the formula. When
calculated, it will return a result.
C. The DAX function SUM, which adds up all of the numbers in
the Sales[SalesAmount] column. You’ll learn more about functions later.
D. Parenthesis (), which surround an expression that contains one or more arguments.
Most functions require at least one argument. An argument passes a value to a function.
E. The referenced table, Sales.
F. The referenced column, [SalesAmount], in the Sales table. With this argument, the SUM
function knows on which column to aggregate a SUM.
Aggregation Functions

Sometimes you want to mathematically combine values in your data. The mathematical
operation could be sum, average, maximum, count, and so on. When you combine values in your
data, it's called aggregating. The result of that mathematical operation is an aggregate.
Average

Returns the average (arithmetic mean) of all the numbers in a column.

AVG = AVERAGE(Orders[Sales])
AVERAGEA

Returns the average (arithmetic mean) of the values in a column.

AVG = AVERAGEA(Orders[Sales])
The AVERAGEA function in DAX (Data Analysis Expressions) calculates the average (arithmetic
mean) of the values in a column, with specific handling for non-numeric data types.
Key characteristics of AVERAGEA:
Syntax:
AVERAGEA(<column>) where <column> is the name of the column you want to average.
Handling of Non-Numeric Values:
Values that evaluate to TRUE are treated as 1.
Values that evaluate to FALSE are treated as 0.
Empty text ("") is treated as 0.
Any other non-numeric text values are also treated as 0
AVERAGE vs AVERAGEA

Unlike the AVERAGE function, which only considers numeric data,


AVERAGEA can include non-numeric values in its calculation by
converting them to numeric equivalents as described above.
AVERAGEX

• AVERAGE → takes a column and gives you its mean.


• AVERAGEX → takes a table and an expression, calculates the
expression row by row, then returns the mean of those values.
Product Quantity Price AVERAGE(Sales[Quantity])

A 2 50 Power BI goes row by row:


B 5 40 A → 2 × 50 = 100
B → 5 × 40 = 200
C 3 60
C → 3 × 60 = 180
Expression results = {100, 200, 180}
Limitations in Power BI Then it averages:(100 + 200 + 180) / 3 = 160
•Not supported in DirectQuery mode inside:
• Calculated columns
• Row-level security (RLS) rules
COUNT

Counts only numeric values (ignores blanks and text).


Product Quantity Price
COUNT(Sales[Quantity])
A 2 50 Values in Quantity = {2, 5, (blank), 0}
B 5 40
C (blank) 60 Numeric entries = 3 (2, 5, 0) → Blank is ignored
Result = 3
D 0 (blank)
COUNTA

Counts all non-blank values (numeric, text, logical).


Product Quantity Price
COUNTA(Sales[Quantity])
A 2 50
All are text and non-blank
B 5 40 Result = 4
C (blank) 60
D 0 (blank)
COUNTX

Evaluates an expression row by row, then counts non-blank results.


COUNTX(Sales, Sales[Quantity] * Sales[Price])
Row by row:

A → 2 × 50 = 100 → valid

B → 5 × 40 = 200 → valid

C → (blank) × 60 = blank → ignored

D → 0 × (blank) = blank → ignored

Non-blank results = {100, 200} → 2 rows


Result = 2
COUNTBLANK

• Counts the number of blank cells in a column.


• COUNTBLANK(Sales[Price])
• Values = {50, 40, (blank), (blank)}
• Blanks = 2
• Counts only true blanks, not zeros.
• Example: if Quantity = 0 → not counted as blank.
• Works only on a single column, not on entire rows.
COUNTROWS

Simply counts how many rows exist in a table.


The Sales table has 4 rows (A, B, C, D).
Result = 4
COUNTROWS with a Filtered Table
You can pass an expression (like a FILTER) to count rows that meet
conditions.
COUNTROWS(FILTER(Sales, Sales[Quantity] > 2))
COUNTX

• It goes row by row in a table, evaluates an expression, and then


counts how many results are not blank
COUNTX(Sales, Sales[Quantity] * Sales[Price])
Product Quantity Price
Row by row:
A 2 50
B (blank) 40 A → 2 × 50 = 100 → valid ✔
C 5 (blank)
D (blank) (blank) B → (blank × 40) = blank → ignored ✘

C → 5 × (blank) = blank → ignored ✘

D → (blank × blank) = blank → ignored ✘

Non-blank results = {100}


Result = 1
MAX Family
Product Quantity Price Comment
MAX: Returns the largest numeric value in a column.
A 2 50 "Good"
B (blank) 40 "Average"
C 5 (blank) (blank)
D 0 60 "Poor"

MAXA Similar to MAX, but considers logical values and text. MAXX(Sales, Sales[Quantity] * Sales[Price])
Row by row calculation:
MAXA(Sales[Quantity]) A → 2 × 50 = 100
Values = {2, blank, 5, 0} → works same as MAX here B → blank × 40 = blank
Result = 5 C → 5 × blank = blank
D → 0 × 60 = 0
Valid values = {100, 0}
MAXX:largest result from an expression evaluated row by row.
Maximum = 100
Result = 100
PRODUCT VS PRODUCTX
Product Quantity Price
A 2 50 PRODUCT: Multiplies all numeric values in a single column
B 3 40 together.
C 5 (blank)
PRODUCT(Sales[Quantity])
D 0 60
Values in Quantity = {2, 3, 5, 0}

Calculation: 2 × 3 × 5 × 0 = 0
Result = 0
PRODUCTX: Multiplies results of an expression row by row
Row by row calculation:
•A → 2 × 50 = 100
•B → 3 × 40 = 120
•C → 5 × blank = blank (ignored)
•D → 0 × 60 = 0
Valid results = {100, 120, 0}
Calculation: 100 × 120 × 0 = 0
Result = 0
SUM VS SUMX

Function What it Does Example Result


SUM Adds all numeric values in a column SUM(Sales[Quantity]) = 10
SUMX Iterates row by row, evaluates an expression, then sums SUMX(Sales, Quantity*Price) = 220
DISTINCTCOUNT vs
DISTINCTCOUNTNOBLANK

•DISTINCTCOUNT → includes blank as a distinct value.


•DISTINCTCOUNTNOBLANK → excludes blank completely.
Time & Date Functions
CALENDAR

CALENDAR(<start_date>, <end_date>)
It returns a table with a single column named Date that contains a
contiguous set of dates from the start date to the end date.

CalendarTable =
CALENDAR ( DATE(2024,12,28), DATE(2025,3,22) ) Date
2024-12-28
2024-12-29
2024-12-30
...
2025-03-21
2025-03-22
CALENDARAUTO

Builds a contiguous Date column by scanning all date/time columns


in your model and using the global min/max it finds. You don’t pass
start/end dates.

CalendarTable = CALENDARAUTO()
DATE

DATE (year, month, day)


Returns a datetime value for the specified year, month, and day.
Month and day can be positive or negative integers (DAX adjusts
automatically, e.g., month 13 = January next year).

Years before 1999

= DATE(08,1,2)

Years after 1899

= DATE(2008,14,2)
DATEDIFF

DATEDIFF ( <start_date>, <end_date>, <interval> )


start_date → earlier date

end_date → later date

Interval → the unit to return (DAY, MONTH, QUARTER, YEAR, HOUR, MINUTE, SECOND)

Suppose you want to calculate delivery duration (days between OrderDate and DeliveryDate):

DeliveryDays = DATEDIFF ( Sales_Transactions[OrderDate], Sales_Transactions[DeliveryDate], DAY )

For OrderDate = 2025-03-01, DeliveryDate = 2025-03-08, result = 7 days.


DATEVALUE

• Converts a date in text format to a date in datetime format.

OrderDate_Converted =

DATEVALUE ( Sales_Transactions[OrderDate_Text] )

"2025-03-20" becomes 20-Mar-2025 (datetime).

SampleDate =

DATEVALUE ( "31-Dec-2025" )

Returns 31-Dec-2025 (valid datetime).


DAY

• Returns the day of the month, a number from 1 to 31.

DAY ( DATE (2025, 2, 14) )


Returns 14.
EDATE & EMONTH

Function Purpose What It Returns Example (OrderDate = 2025-03-20)


Move forward/backward
by a number of months Same day of that shifted
EDATE EDATE(OrderDate, 3) → 2025-06-20
while keeping the same month.
day.
Find the last day of a Always returns the last
EOMONTH(OrderDate, 0) → 2025-03-31
EOMONTH month, shifted by a calendar day of that
EOMONTH(OrderDate, 1) → 2025-04-30
number of months. month.
Logical Functions
IF
The IF function checks whether a condition is TRUE or FALSE, and
then returns one value if the condition is TRUE, and another value if
it is FALSE.
Example (Column):

Sales Category =
IF ( Sales[SalesAmount] > 1000, "High", "Low" )

Example (Measure):

High Sales Check =


IF ( SUM ( Sales[SalesAmount] ) > 100000, "Above Target", "Below Target" )
SWITCH

SWITCH evaluates an expression (or condition) against a list of


values and returns the corresponding result

Category Group = Performance Label =


SWITCH ( SWITCH (
Products[Category], TRUE(),
"Bikes", "Two-Wheelers", SUM ( Sales[SalesAmount] ) > 100000, "Excellent",
"Accessories", "Add-ons", SUM ( Sales[SalesAmount] ) > 50000, "Good",
"Clothing", "Apparel", SUM ( Sales[SalesAmount] ) > 20000, "Average",
"Other" "Poor"
) )
AND

AND is a logical function that checks whether two conditions are


TRUE at the same time.If both are TRUE → returns TRUE.If either
one is FALSE → returns FALSE.

High Volume High Value (Measure) =


High Volume High Value (Column) =
IF (
IF (
AND ( SUM ( Sales[Quantity] ) > 100, SUM
AND ( Sales[Quantity] > 10, Sales[SalesAmount] > 5000 ),
( Sales[SalesAmount] ) > 50000 ),
"Yes",
"Yes",
"No"
"No"
)
)
OR

OR is a logical function that checks whether at least one of two


conditions is TRUE. If either one is TRUE → returns TRUE. If both
are FALSE → returns FALSE.

Special Deal (Column) = Special Deal (Measure) =


IF ( IF (
OR ( Sales[Quantity] > 20, Sales[SalesAmount] > 10000 ), OR ( SUM ( Sales[Quantity] ) > 200, SUM
"Eligible", ( Sales[SalesAmount] ) > 100000 ),
"Not Eligible" "Eligible",
) "Not Eligible"
)
Filter Context Functions
CALCULATE
Changes the filter context for an expression.

Total Sales USA =


CALCULATE (
SUM ( Sales[SalesAmount] ),
Customers[Country] = "United States"
)
FILTER

Returns a table that represents a subset of another table. Commonly


used inside CALCULATE to apply complex filtering.

High Value Sales =

CALCULATE (

SUM ( Sales[SalesAmount] ),

FILTER ( Sales, Sales[SalesAmount] > 5000 )

You might also like