Chris Webb chris@crossjoin.co.uk Agenda • Power BI data import concepts • Data import objectives • Removing and renaming columns • Setting data types • Filtering rows • Aggregating data • Pivoting and unpivoting • Transforming existing columns and adding new columns • Handling errors • Merging and Appending • Duplicating and Referencing • Special cases Importing Data Into Your Model • When importing data into the model you must: • Choose your data source type • Connect to the data source • You then have a choice: • Load the data to the model direct • Clean/filter/transform the data before loading • 95% of the time you will need to clean/filter/transform the data • This takes place in the Power BI Query Editor window Data Import Concepts • Power BI can create multiple jobs to import data from one or more sources • These jobs are called queries • The output of a query is a table in the Data Model • A query is made up of one or more steps • Each step takes data from the data source, or the previous step, and applies some kind of change to it • The last step in the query is the output of the query Data Import Objectives • Create dimension (lookup) tables and fact tables • Remove any columns or rows that contain data you don’t need • Reformat data so each column contains a single kind of value • Rename columns and tables to make them user friendly • Set data types for columns • Ensure that dimension tables can be related to fact tables • Perform any calculations that can be created at data load time Dimensional Modelling • Power BI works when your data is modelled according using the concepts of dimensional modelling • This means you should not: • Put all your data in one big table • Spread your data out over thousands of small tables • Think of fact tables as containing the events/transactions you want to report on • Think of dimension tables as lookup tables that provide more information about one aspect of an event/transaction Removing Columns You Don’t Need • Removing columns you don’t need is important for two reasons: • Unnecessary columns make it harder for users to find what they do need • Unnecessary columns make it harder for you as a developer to find what you need • They also take up memory in your model • Functionality: • Remove Columns • Remove Other Columns Naming Columns And Tables • Naming columns in tables properly is extremely important for usability – both for your users and you as a report designer • You need to make sure you use: • Human readable names – not the names used in the data source • Names that match the standard terms used in your business • Functionality: • Double-click on a column to rename it • The name of the table is also the name of the query shown in the Query Settings pane Setting Data Types • A very easy step to miss out! • In most cases Power BI will work out what type of data (eg number, text, date) a column contains • But you should always check after loading to be sure • Also, remember some transformations could also change the data type so you should check before you finish too • Functionality: • Highlight column and select data in Data Type dropdown on ribbon • Using Locale option handles problems like US/EU date and number formats Filtering Rows • If you don’t need all the data from a data source, don’t load it • Again, you’ll save memory and data loading time by doing this • Functionality: • Select/deselect individual values in column dropdown – remember to press Load More to see more possible values • Different data types will give you different options for filtering text, numbers, dates etc Aggregating Data • Aggregating data means reducing the level of detail • For example taking daily sales and summing up to get monthly totals • Doing this will, again, save memory • Think carefully before doing this though: • Will you lose detailed information you need to report on? • Will your values (eg distinct counts) still aggregate correctly afterwards? • Functionality: • Group By button Pivoting And UnPivoting • It is important to have just one column for one kind of value • If you don’t do this, it makes it very difficult to show aggregated values and filter your data in your reports • Common example: each month’s sales are in their own column • Solution: unpivot the data so columns become rows and you have one row for each month’s data • Functionality: • Unpivot button turns columns into rows • Pivot button turns rows into columns Transforming Columns And Adding New Ones • Cleaning up data often involves multiple steps, each of which needs some kind of change or transformation to your data • Examples include: • Splitting dates into year, month and date columns • Making text uppercase, lowercase or proper case • Doing find-and-replace on values • Performing row-level calculations • Functionality: • Add Column tab adds new columns with new data in • Transform tab replaces values in existing columns Handling Errors • Error values can result from: • Rogue values in columns after you have set a data type • Problems with calculations • You must handle these problems inside your queries! • Functionality: • Remove/Keep errors • Replace Errors Merging And Appending • Data from multiple queries can be combined in two ways • Merging is like a SQL join or a lookup relationship • Different kinds of joins like left, inner and anti joins, are possible • Appending is adding data from one query underneath data from another query • Functionality: • Merge Queries button • Append Queries button Duplicating And Referencing Queries • Duplicating a query creates an exact copy of a query • Referencing a query uses the output of one query as the source for another query • This is a good way of sharing common between multiple queries that use the same data source • You may want to disable data load for the query that is referenced • Functionality: • Right-click on query in query pane and select Duplicate or Reference Parameters • Parameters are really just queries that return single values • But they are given special treatment by the UI, and values returned by parameters can be used for • Server and database names for data sources • Filtering by specific values • This makes it easy for end users and developers to switch to different data sources or filter data loaded by different values • Also, .pbix files containing parameters can be exported as templates with no data Special Cases • For some reports it is very difficult to create the table/graph structures you need using the tools in the Report view • Eg Show top 10 countries and for each country show the top 10 products and a total others row • In these cases, you can use the data import stage to format the data exactly the way you need it • Warning: doing this too often will make maintenance more difficult and increase data load times