Power BI Data Modelling
Power BI Data Modelling
https://learn.microsoft.com/en-us/power-bi/guidance/star-schema
What is Data Modelling?
Data modeling in Power BI is the process of creating a structured
representation of data and its relationships, enabling efficient
analysis and reporting
What is Schema in DBMS?
Flat Model
Hierarchical Model
Network Model
Relational Model
Star Schema
Snowflake Schema
Star Schema
A star schema in Power BI is a data modeling technique that organizes data
into fact and dimension tables, resembling a star shape
Foreign Key vs Primary key
Primary key:A primary key is a column (or a set of columns) in a table that uniquely
identifies each row within that table. No two rows can have the same primary key value.
A primary key cannot contain null values, meaning every row must have a valid
identifier. A table can have only one primary key
Foreign Key: A foreign key is a column (or set of columns) in one table that references
the primary key of another table, creating a link between them. This enforces referential
integrity by ensuring foreign key values match existing values in the referenced primary
key. Unlike primary keys, foreign keys can contain duplicate values and nulls (unless
restricted). A table can have multiple foreign keys to connect to various other tables.
Fact Table
SalesAmount
CustomerNam
CustomerID
ProductName
SalesAmount
ProductID
CustomerID
Quantity
OrderID
ProductID
Category
Quantity
OrderID
Date
Region
Price
Date
City
e
Explicit measures are expressly created and they're based on a formula written in Data Analysis Expressions
(DAX) that achieves summarization. Measure expressions often use DAX aggregation functions like SUM, MIN,
MAX, AVERAGE, and others to produce a scalar value result at query time (values are never stored in the
model).
Implicit measures are columns that can be summarized by a report visual or Q&A. They offer a convenience for
you as a model developer, as in many instances you don't need to create (explicit) measures. For example, the
Adventure Works reseller sales Sales Amount column can be summarized in numerous ways (sum, count,
average, median, min, max, and others), without the need to create a measure for each possible aggregation
type
Surrogate keys
A surrogate key is a unique identifier that you add to a table to support star schema modeling. By
definition, it's not defined or stored in the source data. Commonly, surrogate keys are added to
relational data warehouse dimension tables to provide a unique identifier for each dimension table
row.
Degenerate dimension
A degenerate dimension is a dimension key stored in a fact table without a separate physical
dimension table.
OrderID Date CustomerID ProductID Quantity SalesAmount
1001 2025-08-01 C001 P001 2 10
1002 2025-08-02 C002 P002 5 10
1003 2025-08-02 C001 P003 1 15
OrderID is not linked to any dimension table. It’s Why not put it in a separate dimension table?
unique to each transaction. There are no extra Because It has no descriptive attributes (like "OrderType" or
attributes for OrderID other than itself. Instead of
"OrderStatus").
creating a Dim_Order table with just OrderID in it
Creating a separate table would just add unnecessary joins in
(which would be pointless), we keep it inside the fact
table. That OrderID column is a degenerate
Power BI.
High cardinality
A column has many unique values (often close to the total row count).Example: OrderID, TransactionID,
Email Address.
Low cardinality
A column has few unique values compared to total rows. Example: Gender (Male/Female), Region
(North/South/East/West).
Why Cardinality Matters in Power BI
to-Many, Many-to-Many).
High-cardinality columns (e.g., millions of unique IDs) can increase memory usage
cardinality.
One-to-One (1:1) Relationship
Every value in the key column of Table A appears only once, and every
value in the key column of Table B appears only once.
One table has unique keys (dimension), and the other table has repeating keys (fact)
Single → Filter flows from one table to another in one direction only.
(dimension → fact table).
Example: Selecting a Product filters Sales, but selecting a Sale does not filter
Products.
If you select 2024 in a slicer, Total Sales will show only 2024 sales automatically.
When to use:
When you want dynamic calculations that respond to filtering in visuals.
New Column
A calculated column is a new column in your table created using DAX.
It is stored in the model after calculation and does not change with slicers or report
filters (unless you refresh/recalculate the model).
Key points:
• Stored in the table like regular data.
• Takes up space in the data model.
• Calculated row by row at data refresh time.
• Good for classification, flags, or joining tables
This creates a new column Profit for each row of the Sales table.
When to use:
When you need a permanent field for filtering, grouping, or joining that doesn’t change with slicers.
New Table
A calculated table is a table created from an expression using DAX.
It exists physically in your data model, just like imported tables.
Key points:
• Created from existing tables or data using DAX.
• Recalculated on model refresh.
• Can be used for relationship building, lookup tables, or summarization.
TopProducts =
When to use:
When you need a static table generated from existing data, such as for mapping, filtering, or specific analysis.
Calculation Group
A calculation group is a time-saving feature in Power BI (created in Tabular
Editor, not directly in Power BI Desktop) that lets you create reusable
calculation logic applied to multiple measures without rewriting DAX.
Key points:
• Contains calculation items — each item applies a transformation to a base
measure.
• Commonly used for time intelligence (YTD, MTD, QTD) or formatting logic.
• Reduces the number of measures you must create and maintain.
Comparison
Transforms data into insights – Enables advanced analysis beyond basic visuals.
Supports complex calculations – Growth %, YoY trends, moving averages, and more.
Enables flexible time intelligence – Compare performance across date ranges.
Customizes metrics for your business – Tailor KPIs to unique business needs.
Enhances decision-making – Delivers precise, relevant, and actionable insights.
Drives real business impact – Better insights lead to better strategies and outcomes.
Maximizes Power BI’s potential – Unlocks advanced features and deeper analytics.
DAX Syntax
Syntax includes the various elements that make up a formula, or more simply, how
the formula is written. For example, here's a simple DAX formula for a measure:
Syntax Explanation
Sometimes you want to mathematically combine values in your data. The mathematical
operation could be sum, average, maximum, count, and so on. When you combine values in your
data, it's called aggregating. The result of that mathematical operation is an aggregate.
Average
AVG = AVERAGE(Orders[Sales])
AVERAGEA
AVG = AVERAGEA(Orders[Sales])
The AVERAGEA function in DAX (Data Analysis Expressions) calculates the average (arithmetic
mean) of the values in a column, with specific handling for non-numeric data types.
Key characteristics of AVERAGEA:
Syntax:
AVERAGEA(<column>) where <column> is the name of the column you want to average.
Handling of Non-Numeric Values:
Values that evaluate to TRUE are treated as 1.
Values that evaluate to FALSE are treated as 0.
Empty text ("") is treated as 0.
Any other non-numeric text values are also treated as 0
AVERAGE vs AVERAGEA
A → 2 × 50 = 100 → valid
B → 5 × 40 = 200 → valid
MAXA Similar to MAX, but considers logical values and text. MAXX(Sales, Sales[Quantity] * Sales[Price])
Row by row calculation:
MAXA(Sales[Quantity]) A → 2 × 50 = 100
Values = {2, blank, 5, 0} → works same as MAX here B → blank × 40 = blank
Result = 5 C → 5 × blank = blank
D → 0 × 60 = 0
Valid values = {100, 0}
MAXX:largest result from an expression evaluated row by row.
Maximum = 100
Result = 100
PRODUCT VS PRODUCTX
Product Quantity Price
A 2 50 PRODUCT: Multiplies all numeric values in a single column
B 3 40 together.
C 5 (blank)
PRODUCT(Sales[Quantity])
D 0 60
Values in Quantity = {2, 3, 5, 0}
Calculation: 2 × 3 × 5 × 0 = 0
Result = 0
PRODUCTX: Multiplies results of an expression row by row
Row by row calculation:
•A → 2 × 50 = 100
•B → 3 × 40 = 120
•C → 5 × blank = blank (ignored)
•D → 0 × 60 = 0
Valid results = {100, 120, 0}
Calculation: 100 × 120 × 0 = 0
Result = 0
SUM VS SUMX
CALENDAR(<start_date>, <end_date>)
It returns a table with a single column named Date that contains a
contiguous set of dates from the start date to the end date.
CalendarTable =
CALENDAR ( DATE(2024,12,28), DATE(2025,3,22) ) Date
2024-12-28
2024-12-29
2024-12-30
...
2025-03-21
2025-03-22
CALENDARAUTO
CalendarTable = CALENDARAUTO()
DATE
= DATE(08,1,2)
= DATE(2008,14,2)
DATEDIFF
Interval → the unit to return (DAY, MONTH, QUARTER, YEAR, HOUR, MINUTE, SECOND)
Suppose you want to calculate delivery duration (days between OrderDate and DeliveryDate):
OrderDate_Converted =
DATEVALUE ( Sales_Transactions[OrderDate_Text] )
SampleDate =
DATEVALUE ( "31-Dec-2025" )
Sales Category =
IF ( Sales[SalesAmount] > 1000, "High", "Low" )
Example (Measure):
CALCULATE (
SUM ( Sales[SalesAmount] ),