Professional Documents
Culture Documents
POWER BI - Student
POWER BI - Student
● Power BI is a one stop solution for reporting, data visualization and business
intelligence.
● It can be used to automate the manual excel based reporting.
● It can be connected to multiple sources and can be used to create interactive
dashboards and reports that can be distributed to the internal and the external
users.
● These reports help the business stakeholders take informed decisions.
● It can be used a forecasting tool to analyse and predict future trends in the business
and be prepared for different scenarios.
● We can create fully automated ETL (Extract-Transform-Load) procedures to shape
and transform and load data from various sources.
2022 Gartner® Magic Quadrant™ for Analytics and Business Intelligence Platforms
3. Creating the data model – Understanding Foreign and Primary Keys in the data and creating
appropriate connections in Power BI Desktop.
4. Adding Calculated Fields and Dax -Formulas and Functions in Power BI Desktop
Once the data is connected, next step is to check if all the columns have appropriate data type. If not then
we have to change it. This has to be done in the Query Editor where we perform all the transformation
and cleaning of the data.
Once we feel that the data is cleaned and transformed, create the model in the power BI desktop.
Finally visualize the data as per the business requirements using various functions and DAX formulas in
Power BI Desktop.
Note: Power BI Does not work on the Aggregated data, but needs the raw data. We can perform
aggregations in Power BI as per the project requirement.
Connecting and Shaping Data with Power Query
1.Data Connectors: Power BI Has a huge library of Data Connectors for
connecting the most basic flat files to the huge databases like SQL server,
web connectors like Sharepoint and so on..
2.Shaping Data in Query Editor.
•Basic Table Transormations:
We can select the columns that need to be kept or removed.
we can keep rows or remove rows.
we can remove null values
we can remove duplicates
we can combine and separate the columns as per the requirement.
……………
Connecting and Shaping Data with Power Query
> Connect to Adventure-Works Product table.
-Example – get the product key and colour columns from the product
lookup table in the sales 2017 table.
Connecting and Shaping Data with Power Query
Refreshing Queries:
We can refresh all the queries together at a time by clicking on
Refresh queries in the PBI desktop. Or by individually refreshing
each query.
Connecting and Shaping Data with Power Query
From the data view, we can see data Category. This is generally used for
categorising the data as geographic data like addresses, countries,
continents, zip codes and so on…
1.Go the calendar table format the dates in to short date format.
2.Go the Sales table format the dates in to short date format.
3.Go to the Product lookup table and convert all the currency data to currency
format.
4.Connect to Territory lookup table and follow all the previous formatting steps
in the query editor. Convert all the geographic data to geographic format.
Connecting and Shaping Data with Power Query
Defining Hierarchies
Hierarchies are groups of nested columns that reflect multiple levels of
granularity..
Ex.
i.Geographic hierarchy may include country, state, city, zip codes….
ii.Products can have Hierarchies such as – Product category, product sub
category, product
iii.Date can have Hierarchies of: Year, Quarter, Month, Date.
Etc…
Connecting and Shaping Data with Power Query
1) Create new queries to connect to the AdventureWorks_Product_Categories and AdventureWorks_Product_Subcategories files from the course resources:
Name your queries AW_Product_Category_Lookup and AW_Product_Subcategory_Lookup Confirm that headers have been promoted and that detected data types are correct. Disable the report refresh
option for both connections.
Add a calculated column that extracts all characters before the dash ("-") in the ProductSKU column, named "SKUType" . Update the SKUType calculation above to return all characters before second dash,
instead of the first. Replace zeros in the ProductStyle column with "NA" . Update the DiscountPrice calculation to 15%.
3) Using the Statistics tools in the Query Editor, confirm the following values:
Add a new calculated column for the year of birth (named "BirthYear"), based on BirthDate
Add a conditional column to categorize customer income (named "IncomeLevel"), based on the following criteria (09:34 mark):
•If AnnualIncome >= $150,000, then IncomeLevel = "Very High"
•If AnnualIncome >= $100,000, then IncomeLevel = "High"
•If AnnualIncome >= $50,000, then IncomeLevel = "Average"
•Otherwise IncomeLevel = "Low“
Creating Table Relationships and Data Model
What is a Data Model?
In the relationship view of PBI, if we look at the different data
tables, it’s just collection of independent tables. This is not a Data
Model.
•Check this by creating a table visual, taking the Product name from the Product table, Order quantity from the
Sales table and Return Quantity from the Returns table. This will return the same total values in each rows,
since they don’t have any relationships defined between them, so they can’t filter out each other.
When the tables are connected with each other via a Primary-Key
and Foreign Key relationship, and are able to filter out the data in
each other then it’s called as a data model.
•Connect the above tables based on Product key and have a look at the matrix visual. Now since the
relationships are defined, Product table is now able to filter the data in the Sales table and Returns Table.
Creating Table Relationships and Data Model
Database Normalization:
Normalization is defined as the process of organising the tables and columns in a relational
database to reduce redundancy and preserve data integrity.
Used to:
1.Eliminate redundant data: Reducing the table size by decreasing the number of columns in 1
particular table. This increases the processing speed and efficiency of the report.
2.Minimize errors in the database: This may happen if we modify the data frequently.
3.Simplify Queries and give a proper structure to the database and helps in performing
meaningful analysis.
Note:
★ In a Completely normalized database, each and every table serves a
specific purpose and gives a distinct information( eg: product info- product
lookup table, calender info – calender lookup table, orders info – Sales
table etc… )
★ IN SHORT, It’s always better to have multiple small tables rather than
having one large tables which have multiple duplicate values.
Creating Table Relationships and Data Model
Dimension (Lookup) Tables And Fact (Data) Tables:
A model Generally has 2 types of tables: Dimension tables and Fact tables.
Fact (Data) Table: have numerical values (Qauntity Sold, Order Quantity, Return
Quantity, Sales etc. ) to the most granular level. They also contain an ID or a Key
column (Generally Foreign Key) to create a relationship in the model.
(Sales and Returns tables are Fact tables in our model.)
Dimension (Lookup) Table: have all the description (Names, Store, City, Colour,
Cost Price, etc. ) about the data. They also contain an ID or a Key column
(Generally Primary Key) to create a relationship in the model.
(Date, Customers, Products, Product category, Product Sub Category are the
Dimension tables.)
Creating Table Relationships and Data Model
Relationship VS Merged tables:
Merging data from multiple sources creates a lot of redundant and duplicate
information and so this is not advisable. This even utilizes more processing
power and memory of the computer compared to the relationship model.
Instead we can have multiple small tables and tie all the tables via a relationship
model.
Create Table relationships in the Power BI desktop.
Creating Table Relationships and Data Model
Snowflake Schema and Star Schema:
•Data Models having chain of dimension tables are said to have Snowflake
schema.
•Data Models having one central data table surrounded by multiple individual
dimension tables is said to have Star Schema.
This is generally understood by seeing the * and 1 near the tables in the
relationship/modelling view. This indicates that a single(1) instance in the
lookup table is connected to many(*) instances in the fact table.
We can also see the relationship cardinality by clicking on the relationship
line and click on edit.
In the relationship model, observe the direction of relationship between the various tables
shown by arrows on the relationship lines.
Generally filter flows from one-side of the relationship to the many-side of the relationship.
This means that we can filter the data table using the dimension table but opposite is not
possible.
Note:
Always arrange the dimension table above the fact tables. This is a visual indicator that the
filters flow ‘downstream’.
Example:
i. Make a table by taking Territory_Key from the territory table, Order_Qty from the sales table
and Return_Qty from the Returns table.
ii. But we have the territory key in the sales and the return table. If we chose the territory key
from here, then we won’t get proper results.
Creating Table Relationships and Data Model
Two-way Filters:
Now change the filter direction between the territory lookup and the
Sales_Table to a Bidirectional filter and check the table visuals.
Now we can filter the territory lookup table using the territory key from
the sales table and in-turn, it can then also filter the values in the
Returns_Table.
This will also work in the opposite direction if we have a single
direction filter between sales and territory table and a bidirectional
filter between territory and returns table.
i.Focus on building a normalized model from the start. Each table the
model must serve a special purpose.
ii.Always use relationships instead of using merged tables. Keep
tables long and narrow instead of short and wide tables.
iii.Keep Dimension tables above the fact tables in the data model view.
This just helps us to understand that the filters flow downstream from the
lookup tables to the data tables.
iv.Avoid using 2-way filters unless required.
v.Hide key columns in the fact tables from the report view.
Creating Table Relationships and Data Model
HOMEWORK: Creating Table Relationships & Data Models in Power BI
Using your Adventure Works report file, complete the following:
1) Navigate to the RELATIONSHIPS view, and perform the following actions (00:14 mark):
Right-click to delete each relationship between AW_Sales, AW_Customer_Lookup and
AW_Calendar_Lookup (including both date fields)
Use the Manage Relationships tool to delete all remaining relationships between all tables
2) Recreate all table relationships (using any method you prefer), and confirm the following :
•Cardinality is 1-to-Many for all relationships
•Filters are all One-Way (no two-way filters)
•Filter direction correctly flows "downstream" to data tables
•Data tables are not connected directly to one another
•Both data tables are connected to all valid lookup tables
•Product-related tables follow a snowflake schema
Creating Table Relationships and Data Model
3) Return to the REPORT view, and complete the following :
Edit (or insert) the matrix visual to show ReturnQuantity (values) by CategoryName
(rows) from the AW_Product_Category_Lookup table
Which category saw the highest volume of returns? How many?
Replace CategoryName with Year from the AW_Calendar_Lookup table
How many returns do you see in 2015 vs. 2016?
Replace Year with FullName from the AW_Customer_Lookup table
What do you see, and why?
Update the matrix to show both OrderQuantity and ReturnQuantity (values) by
ProductKey (rows) from the AW_Product_Lookup table
What was the total OrderQuantity for Product #338?
Creating Table Relationships and Data Model
4) Unhide the ProductKey field from the AW_Returns tables (using either the DATA or
RELATIONSHIPS view :
In the matrix, replace ProductKey from AW_Product_Lookup with ProductKey from the AW_Returns
table
Why do we see the same repeating values for OrderQuantity?
Edit the relationship between AW_Returns and AW_Product_Lookup to change the cross filter direction
from Single to Both
Why does the visual now show OrderQuantity values by product, even though we are using ProductKey from
AW_Returns?
How many orders do we see now for Product #338? What's going on here?
•Using DAX we can add measures and calculated columns for additional analysis.
CALCULATED COLUMNS
Calculated columns allows the report builder to add new formula based columns to the model.
Calculated columns give us values for each row and are visible to us in the data view of Power BI.
Calculated columns occupy the space in the model and thereby increasing the size of the file.
They are based on the row context, and not on the filter context. The are used for calculating values at row level (eg: if CP and SP is
given for each product, we can make a column for Profit.) But the calculated columns are not suitable for doing any aggregate calculations
like SUM, MIN, MAX, etc.
● It gives same value to each row in the table and it is ● Does not create new data in the table and hence doesn’t
stored into the model. Thereby increasing the file size. increase the file size.
● They recalculate each time on data source refresh or ● Measures recalculate each time when the filters in the
when the changes are made to the columns used for reports changes.
creating those calculated columns.
● Almost always used in the values field of the visual.
● Often used as columns for slicers or the filters. Used as
dimensions to change the view of the report. ● Measures live in the visuals on the report.
● Created automatically by PBI when we drag a ● Created by the report developer by actually entering
numerical column in the values pane of the visual and the DAX functions to define the calculated columns or
select the aggregation type (Min, Max, Average, Sum, the measures.
etc)
● They can be used anywhere in the report and
● They are only accessible only in the visual in which referenced in multiple visuals on the report and can be
they are created. They can’t be referenced anywhere used to create “measure-trees”.
else on the report
Example: Create measures:
1. Quantity Sold = SUM(Sales[OrderQuantity])
NOTE: Implicit Measures can not be used in creating Measure Trees – i.e. they can be used while creating other measures. On the other
hand the explicit measures can be used in creating the other measures as well.
DAX Syntax
Measure_Name = Function_Name(Table_name[Column_name])
Function_Name can be SUM, AVERAGE, MIN, ….
Table_Name must be an existing column in the data model
Column_name must be the column from the table name mentioned above. It cannot be from any other
table.
DAX Calculations in Power BI
DAX
OPERATORS
Comparison Operators Logical Operators Concatenation Operator
Comparison Meaning Example Text operator Meaning Examples
Text Meaning Example
operator operator
&& (double ampersand) Creates an AND ([Region] = "France") &&
= Equal to [Region] = condition between two ([BikeBuyer] = "yes"))
"USA" expressions that each
have a Boolean result. If
both expressions return & Connects, or [Region] & ",
== Strict equal to [Region] == TRUE, the combination (ampersand) concatenate " & [City]
"USA" of the expressions also s, two
returns TRUE; otherwise
the combination returns
values to
> Greater than [Sales Date] > FALSE. produce one
"Jan 2009" continuous
text value
|| Creates an OR condition (([Region] = "France") ||
< Less than [Sales Date] < (double pipe symbol) between two logical ([BikeBuyer] = "yes"))
"Jan 1 2009" expressions. If either
expression returns
TRUE, the result is
TRUE; only when both
expressions are FALSE
>= Greater than or [Amount] >=
is the result FALSE.
equal to 20000
DAY / MONTH / YEAR() : Returns day(1-31), month of the year(1-12), year of the given date
DAY (Date) / MONTH(Date) / YEAR(Date)
DATEDIFF(): Returns the difference between the 2 dates based on the given interval
DATEDIFF(Date1, Date2, Interval)
WEEKDAY / WEEKNUM(): Returns the weekday 1(Sunday) to 7(Saturday) or the week of the year
WEEKDAY(Date, [Return Type])
DAX Calculations in Power BI
Examples :
Create calculated columns:
IF() : Checks if the given condition is met and return one value if the condition is TRUE and another if the condition is
FALSE.
IF(Logical Condition, ResultIfTrue, ResultifFalse)
IFERROR() : Evaluates an expression and returns a specified value if the expression returns an error, otherwise the vale itself.
IFERROR(Value, ValueIfError)
AND(): Checks whether both the values are true and returns TRUE if both the arguments are TRUE else returns FALSE
AND(condition1, condition2)
OR(): Checks whether one of the argument is true and returns TRUE, else returns FALSE
OR(condition1, condition2)
DAX Calculations in Power BI
Examples :
Create calculated columns:
LEFT / MID / RIGHT():Returns number of characters from the start, middle or the end of the string
LEFT / RIGHT(Text, [NumOfCharacters])
MID(Text, StartPosition, NumOfCharacters)
UPPER / LOWER / PROPER(): Converts a string into Upper / Lower / Proper case.
UPPER / LOWER / PROPER(Text)
SUBSTITUTE(): Replaces the instance of the text with the new text in the string.
SUBSTITUTE(Text, OldText, NewText)
SEARCH(): Returns the position where a specified string of the character is found, reading left to right
SEARCH(FindText, WithinText)
DAX Calculations in Power BI
Examples :
Create calculated columns:
3. In Customers table:
SUBSTITUTE eg = SUBSTITUTE(d_Customers[Domain Name],"-","_")
Avoid using the RELATED function to create redundant calculated columns unless absolutely needed, since they can increase
the file size. Instead use them in the iterator functions.
Examples:
1. Pull the retail price form the product table in the sales table using the related function.
In the sales table add a calculated column
RetailPrice = RELATED(products[ProductPrice])
2. Revenue = Sales[ProductPrice] * Sales[Quantity]
DAX Calculations in Power BI
Basic Math and Stats Functions:
DIVIDE() : Performs division and returns the alternate result (or blank) if div/0
=DIVIDE(Numerator, Denominator, [AlternateResult])
DAX Calculations in Power BI
Examples :
Create Measures:
Create a matrix with Product categories, total order quantity, total return quantity, return rate and the average retail price.
Then add the product sub categories and the product name to understand the drill down buttons in the Power bi.
DAX Calculations in Power BI
Basic Math and Stats Functions:
COUNTA(): Counts the number of non-empty cells in a column (numerical and non-numerical)
=COUNTA(ColumnName)
Examples:
Number of returns = COUNTROWS(Returns)
Number of Orders = DISTINCTCOUNT(Sales[Order Number])
The number of order measure is different from the total order quantity measure as there are multiple items with the same order number. Explore
this more in the sales table by sorting the ordernumber in DESC
DAX Calculations in Power BI
CALCULATE():
Examples:
1. Bulk orders = CALCULATE([Number of Orders],
f_Sales[OrderQuantity] > 1)
CALCULATE changes the filter context on the visual based on the filters provided in the calculate formula.
CALCULATE modifies and overrules any other filter context.
DAX Calculations in Power BI
CALCULATE() and ALL() :
It returns the total of all the values in the column ignoring the any filter applied in the visual
=ALL(Table or ColumnName, [ColumnName1], [ColumnName2],…)
Examples:
1. total orders_ALL = CALCULATE([total Orders],ALL(f_Sales))
ALL removes any filter context that is applied to the visuals and returns the total of that column.
It is generally used in calculating the %of total values in a report.
DAX Calculations in Power BI
CALCULATE() and ALLSELECTED() :
It returns the total of all the values in the column ignoring the any filter applied in the visual. But it accepts the filters from outside of the
visual.
= ALLSELECTED( [<tableName> | <columnName>[, <columnName>[, <columnName>[,…]]]] )
Examples:
1. All Selected prod category revenue =
CALCULATE([Revenue_Sumx],ALLSELECTED(d_Product_Categories))
Here create a matrix with product color and the total revenue
measure. Take the measure created with all selected function in a
separate card. Take the categories in the slicer visual.
Filter works as an iterator function such that it checks the entire table row by row and then creates a subset of the table based on the
condition on the given column name.
FILTER is not used by itself as creating the subset of the tables is not required more often. Hence it is generally used inside the
CALCULATE or SUMX functions
Examples:
Create a calculated table.
1. red Products = FILTER( d_Products,d_Products[ProductColor] == "Red")
2. High end products = CALCULATE( [total Orders],FILTER(d_Products,
d_Products[ProductPrice] > [Overall Avg Prod Price]))
DAX Calculations in Power BI
ITERATOR (“X”) FUNCTIONS :
These functions allow the user to loop through a particular calculation row by row and calculate the results.
=SUMX(Table, Expression)
• SUMX
• COUNTX
• AVERAGEX
• RANKX
• MAXX/MINX
Example:
These functions allow the user tp calculate the common time comparisons.
DATESYTD
Performance To-Date = CALCULATE(Measure, DATESYTD(Calendar[Date]))
DATEADD
Previous Period = CALCULATE(Measure, DATEADD(Calendar[Date], -1, MONTH))
DATESINPERIOD
Running Total = =CALCULATE(Measure, DATESINPERIOD(Calendar[Date], MAX(Calendar[Date]), -10, DAY))
To calculate Moving averages, use the running total calculation above and divide by the number of intervals.
DAX Calculations in Power BI
TIME INTELLIGENCE FUNCTIONS:
Examples
1. YTD_Revenue = CALCULATE([Revenue_Sumx],DATESYTD(d_Calendar[Date]))
Allows to access the fields within the DAX measures or calculated columns through either physical or virtual
relationships between the tables.
Examples:
1. RELATED
2. RELATEDTABLE
3. USERELATIONSHIP
4. CROSSFILTER
Used to access the data from tables which are not related directly but are related to each other via shared tables (Indirectly related tables).
= CROSSFILTER(Left Column , Right Column , Filter Direction)
We can see the Customers and the territories tables that are not directly connected to each other.
They are connected via Sales table.
If we want to know the average annual income of the customers according to country.
Create a matrix with country from the territories table and the Annual income from the customers
table. We see that all the values are same in all the rows.
Examples:
Instead of average we can also calculate the countrywise total income for the customers in the same way.