You are on page 1of 18

04 – Data Import

Using The Query Editor


Chris Webb
chris@crossjoin.co.uk
Agenda
• Power BI data import concepts
• Data import objectives
• Removing and renaming columns
• Setting data types
• Filtering rows
• Aggregating data
• Pivoting and unpivoting
• Transforming existing columns and adding new columns
• Handling errors
• Merging and Appending
• Duplicating and Referencing
• Special cases
Importing Data Into Your Model
• When importing data into the model you must:
• Choose your data source type
• Connect to the data source
• You then have a choice:
• Load the data to the model direct
• Clean/filter/transform the data before loading
• 95% of the time you will need to clean/filter/transform the data
• This takes place in the Power BI Query Editor window
Data Import Concepts
• Power BI can create multiple jobs to import data from one or more
sources
• These jobs are called queries
• The output of a query is a table in the Data Model
• A query is made up of one or more steps
• Each step takes data from the data source, or the previous step, and
applies some kind of change to it
• The last step in the query is the output of the query
Data Import Objectives
• Create dimension (lookup) tables and fact tables
• Remove any columns or rows that contain data you don’t need
• Reformat data so each column contains a single kind of value
• Rename columns and tables to make them user friendly
• Set data types for columns
• Ensure that dimension tables can be related to fact tables
• Perform any calculations that can be created at data load time
Dimensional Modelling
• Power BI works when your data is modelled according using the
concepts of dimensional modelling
• This means you should not:
• Put all your data in one big table
• Spread your data out over thousands of small tables
• Think of fact tables as containing the events/transactions you want to
report on
• Think of dimension tables as lookup tables that provide more
information about one aspect of an event/transaction
Removing Columns You Don’t Need
• Removing columns you don’t need is important for two reasons:
• Unnecessary columns make it harder for users to find what they do need
• Unnecessary columns make it harder for you as a developer to find what you
need
• They also take up memory in your model
• Functionality:
• Remove Columns
• Remove Other Columns
Naming Columns And Tables
• Naming columns in tables properly is extremely important for
usability – both for your users and you as a report designer
• You need to make sure you use:
• Human readable names – not the names used in the data source
• Names that match the standard terms used in your business
• Functionality:
• Double-click on a column to rename it
• The name of the table is also the name of the query shown in the Query
Settings pane
Setting Data Types
• A very easy step to miss out!
• In most cases Power BI will work out what type of data (eg number,
text, date) a column contains
• But you should always check after loading to be sure
• Also, remember some transformations could also change the data
type so you should check before you finish too
• Functionality:
• Highlight column and select data in Data Type dropdown on ribbon
• Using Locale option handles problems like US/EU date and number formats
Filtering Rows
• If you don’t need all the data from a data source, don’t load it
• Again, you’ll save memory and data loading time by doing this
• Functionality:
• Select/deselect individual values in column dropdown – remember to press
Load More to see more possible values
• Different data types will give you different options for filtering text, numbers,
dates etc
Aggregating Data
• Aggregating data means reducing the level of detail
• For example taking daily sales and summing up to get monthly totals
• Doing this will, again, save memory
• Think carefully before doing this though:
• Will you lose detailed information you need to report on?
• Will your values (eg distinct counts) still aggregate correctly afterwards?
• Functionality:
• Group By button
Pivoting And UnPivoting
• It is important to have just one column for one kind of value
• If you don’t do this, it makes it very difficult to show aggregated
values and filter your data in your reports
• Common example: each month’s sales are in their own column
• Solution: unpivot the data so columns become rows and you have one
row for each month’s data
• Functionality:
• Unpivot button turns columns into rows
• Pivot button turns rows into columns
Transforming Columns And Adding New
Ones
• Cleaning up data often involves multiple steps, each of which needs
some kind of change or transformation to your data
• Examples include:
• Splitting dates into year, month and date columns
• Making text uppercase, lowercase or proper case
• Doing find-and-replace on values
• Performing row-level calculations
• Functionality:
• Add Column tab adds new columns with new data in
• Transform tab replaces values in existing columns
Handling Errors
• Error values can result from:
• Rogue values in columns after you have set a data type
• Problems with calculations
• You must handle these problems inside your queries!
• Functionality:
• Remove/Keep errors
• Replace Errors
Merging And Appending
• Data from multiple queries can be combined in two ways
• Merging is like a SQL join or a lookup relationship
• Different kinds of joins like left, inner and anti joins, are possible
• Appending is adding data from one query underneath data from
another query
• Functionality:
• Merge Queries button
• Append Queries button
Duplicating And Referencing Queries
• Duplicating a query creates an exact copy of a query
• Referencing a query uses the output of one query as the source for
another query
• This is a good way of sharing common between multiple queries that use the
same data source
• You may want to disable data load for the query that is referenced
• Functionality:
• Right-click on query in query pane and select Duplicate or Reference
Parameters
• Parameters are really just queries that return single values
• But they are given special treatment by the UI, and values returned by
parameters can be used for
• Server and database names for data sources
• Filtering by specific values
• This makes it easy for end users and developers to switch to different
data sources or filter data loaded by different values
• Also, .pbix files containing parameters can be exported as templates
with no data
Special Cases
• For some reports it is very difficult to create the table/graph
structures you need using the tools in the Report view
• Eg Show top 10 countries and for each country show the top 10 products and
a total others row
• In these cases, you can use the data import stage to format the data
exactly the way you need it
• Warning: doing this too often will make maintenance more difficult
and increase data load times

You might also like