You are on page 1of 71

1.

Connecting to CSV or text files

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] We first look at how to connect to static files. Unlike Excel files, text
files store unlimited rows of data, but do not store formatting or
calculations. The commonly used CSV files use commas to separate  the fields
for each data record or row. DSV files use other separators such as tabs or
colons. Fixed width text files use a fixed number of characters allocated to each
field to separate the fields, rather than the actual separator. Let's see how we
can import a CSV file into Power BI. We download the CSV file of weekly flu
mortality numbers from the CDC website we see here. In the FluView, let's go
ahead and download the data. We select the data source as ILINet. Click on
national for the radio button, and then let's also select the past three seasons
for the flu data. Go ahead and click download data. You'll also find a copy of the
CSV in the exercise files. Now that you've downloaded the data, go ahead and
open up Power BI. When we open up Power BI, we click on get data on the
splash screen. We set up the text CSV connector to the file we just
downloaded. Go ahead and click on this CSV file. And rather than loading, we
just go ahead and hit transform data. Now we see here the data table from the
CDC flu mortality numbers. And when you look over here to the applied
steps on the right hand side, notice that we have a source step and a changed
type step. The gear wheel indicates that we can edit this existing step, so we go
ahead and double click on the source step. We see the path. We also see that it
opens as a CSV document and uses a comma as a delimiter. Go ahead and click
okay. Now we rename this query CSV CDC flu data. Now I'm going to go ahead
and save this Power BI file. So I go to file, save as. So now in this message, go
ahead and apply later. And save this file as CDC flu data. Great, we've just
imported our first CSV.

Manually entering data

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] We also have the option to manually enter values by typing or
pasting them directly into a new data table. We may find this method
helpful when we are getting data from a source difficult to directly extract data
from or an unstructured source. In Power BI, the easiest way to manually enter
data is to first exit out of this flash screen and then we can edit the queries by
hitting the edit queries button at the top of the home screen. This takes us back
into the query editor where we can go ahead and enter data right up here at the
top. In this dialog box, we are going to enter the data to create this table
manually. In column one, we're going to call this time period. I'm going to enter
in months one through 10. Then we add the amounts for each of these
months. Now we can rename this query by calling it sample set down here at
the name box at the bottom. I'm going to go ahead and call it sample data, hit
okay. Now we save this file as manual sample. Select to apply later, manual
sample, and hit save. And there we have it.

Connecting to an Excel file

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] When setting up an Excel connection, we first need to select the
file location, and second we need to select the tables or tabs to use within this
Excel file. We got an Excel file that maps U.S. geo ID's to areas such as counties
from the website we see here on the screen. Download the all geo codes
file, you can find this file in the folder zero one zero three of the Exercise files if
you don't feel like downloading it from the website. We open up the file to
inspect the formatting and the tabs within it. Notice that we only see one single
tab of data with several rows above the headers. Notice that for now we do not
see the geo ID field, but we will create it later. When we open up Power BI, we
click on get data on the splash screen. Now we select the Excel connector, and
point to our Excel file. We hit open, we can preview tables in this file and we see
two options in our navigator window. If we click on the first option, we see that
the formatting looks a lot like the formatting of the Excel file that we looked at
earlier. Notice the headers and the rows on top before we get to the data
table. Next click on the option below. We see this looks very much like the
previous selection, however, we do not see any of the additional rows on
top. Power BI is using AI to take the labels off the top rows in the table so we do
not have to do so ourselves. Lets load this option by clicking on the button next
to it, so we see the check mark show up, we click on transform data, now we can
go ahead and edit this query. We first rename this geo ID to county
mapping. Because we are going to revisit another connection option later, we're
going to also call this Excel manual file and hit enter to rename it. Renaming the
query to a name like this makes it easier to read so that other users can pick it
up and work with it later as well. Notice now in the applied steps on the right
hand side of the screen, we see a source step and a navigator step which is
where we select the tables and we also see two other steps for promoting
headers and changing data types. Now we save this file as U.S. census, select to
apply the changes later, hit save, again, the final version of this file is in the zero
one zero three folder of the Exercise files. Great, we've just connected to our
first Excel file.

Connecting to a PDF file

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] PDF files format reports and datasets in an easy-to-read and share
format but this very formatting can make it difficult to extract data from
them. Let's obtain weather data for the LinkedIn Learning studios just outside
Santa Barbara, California from the NOAA CDO web portal. Here we select the
Weather Observation Type as a daily summary which will provide
precipitation and temperature information. We select the date range that's
already in the box. Next, we go ahead and make sure that we search for
Stations and now we're going to go ahead and just type in Santa Barbara. When
we hit Search, we go to another screen. Let's just go ahead and select this first
airport option which we add to the cart. We then go to the cart and view the
items. We select the Format Output as PDF. We already see the date range and
the airport in Santa Barbara that we want to select to. Continue. Then we enter
our email addresses to receive the report at. You would then receive an email
notification to access your own PDF file link. You can access this file in the 01_04
exercise folders. Microsoft recently developed a direct Power BI PDF
connector. On the Flash screen, select Get Data. We select to connect to the
PDF. We choose the NOAA Santa Barbara weather.pdf that you can also access
in your own exercise files. And we hit Open. Here we see a list of the tables in
the PDF file. We do not have to connect to all these tables and we'll talk more
later about how we can connect to them using another technique. For now, we
are just going to connect to the Table002 Page 1. We see some of the dataset
here on the screen. We click on Transform Data. I know that I highly recommend
changing the names of the queries but this particular PDF file is something
we're using to test out how the PDF connector works. We're going to explore
another method using a folder of PDFs to better access the files. So that being
said, I'm just going to leave the query, the default query that we found it. We
save this file as NOAA PDF. And this gives us an initial look into how the PDF
connector works.

Connecting to folders

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] Receiving the same file every month in the same file type? We
could spend our time importing them separately or we could expedite the
process by putting them in the same folder. When we add a new file to the
folder, we just have to refresh the query. Using a folder as a data connection
means that we also import all the sub folders within the folder. This method
works best when the format of the files to combine in the dataset have the
same columns and formatting. We use the same initial process to set up the
folder connection for all file types, but the transformation process to create a
combined dataset differs by file type. Now let's go back to the same NOAA
weather portal we used to download the PDF file except this time, we will
choose to receive it in the CSV format. Again, we select daily summaries for
2019 and we search for Santa Barbara again. We select the airport and add that
to the cart. Let's select to receive the file as a CSV format. We choose the date
range for the current year, view the items in the cart. Now in the station detail
and data flag options, choose to select all the options. Conversely, in the select
data types, choose to select all the weather types. Precipitation, air temperature,
wind, and weather type. Select to continue, and here's where we enter our email
address to go ahead and submit the order. After we download the weather for
Santa Barbara and for Los Angeles for both 2018 and 2019, we go and save
them in a NOAA CSV folder. We also convert them into Excel files and save
them for 2018 and 2019 weather. In both these Excel files, we can see the Los
Angeles and the Santa Barbara weather patterns. You can access these files in
your own 0105 exercise files. We also put the PDF in its own standalone
folder, which we will cover later how to transform them all as a single query. We
start at the Power BI desktop splash screen. We select get data, we select the
folder connection, and we then copy the path of the folder that we just saw the
files in. Let's start by connecting to the NOAA CSV folder. Select the folder and
click okay. And we click okay again. On the next screen, we see a list of the
NOAA CSV files, as well as some metadata about their dates and their folder
paths and sources. We do not initially select to combine them, but rather
transform the data. We now have a table that shows a row for each file
name. The folder path appears last in the columns, and the second column
shows the name of the CSV file. The first column is called content. In the content
column, we can access the files through the binary hyperlink. Later, we will walk
through how to transform these four separate lines into an entire dataset, but
for now, we're just going to save this as a NOAA CSV Power BI file. We will apply
these queries later, and we hit save. Now let's set up a connection to the PDF
folder and the Excel folder using Power BI again. On the splash screen, again,
select get data. Select the folder option again. We browse for the particular
folder for the PDF and the Excel files. First, we're going to connect to the Excel
files and hit okay. Select to transform the data. Again, we see a connection to
the folder with the same table layout that we saw with the CSV files. However,
we are going to combine these separately and in a different manner as we will
learn later in the transformation process. We are also going to add the NOAA
folder PDF connection to this same file. Then select folder, and select the NOAA
PDF folder path. Again, hit okay, and select to transform the data. We'll come
back to this later. Here, we see a query for the NOAA Excel folder and for the
NOAA PDF folder. For now, we are going to save this as its own separate file
folder. Apply this later, and we are going to call this NOAA Excel and PDF
folders. Hit save. We will work through the file transformation process for both
these connections later.

Question 1 of 3
When connecting to a PDF file where the data table you want to extract from is
on multiple pages, how will this appear in the query editor?


The query editor cannot recognize any of the tables within a PDF file.

a single dataset that the query editor automatically joins together

 separate tables or table objects with unique names that you can set up
individual queries for

Correct

Any physical breaks in the data tables will show up as separate tables
when you initially connect to a PDF file.


The query editor will separate the tables into separate tables that you
have no way of distinguishing between.

Question 2 of 3

When copying data over from the value source (such as the web) into the new
table in the query editor, what is NOT a potential pain point to watch out for?

 different font sizes between data sources

Correct

The query editor does not differentiate between font sizes, and you can
also see how the number looks after you paste it into the editor or else
you can manually enter it.


units such as dollar sizes connected to a numeric value

leading or trailing spaces around numeric values

commas between large numbers

Question 3 of 3
The ETL process when connecting to data in a folder is the same for every
connection type.


TRUE

 FALSE

Correct

The initial connection setup for connecting to the folder is the same (the
extraction process), but the transformation steps (and thus the ETL
process) will differ based on the file types within the folder.

2.

Connecting to databases

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] Relational databases consist of tables in a data warehouse that
typically use SQL for querying and maintaining these databases. We will find it
easiest to work with these databases if we at least understand the database
configuration to connect to the optimal and correct tables. Relational database
connections in Power BI include SQL Server, Oracle, and HANA, among many
other options. Multidimensional cubes, OLAP cubes, or SAS models work as
predefined queries referencing databases where the user does not write a query
to access the data, but rather connects to a model with predefined dimensions
and calculations. For example, SQL Server Analysis Services uses MDX queries to
create a cube database model instead of getting data from the relational
database using SQL commands. If we cannot find the database connector in
Power BI, we can check to see if a platform like Microsoft Azure supports
it. Azure functions as a cloud computing service supporting software platforms
and infrastructure as services for both Microsoft products and third party
software and systems. It supports connections for both relational databases and
cubes. We can also access databases through an ODBC connection if we can
install the vendor's ODBC driver on our own computer. ODBC stands for Open
Database Connectivity. It functions as an interface for Microsoft, enabling access
to data management systems using SQL. In order to connect to the
database, we need our own set of database access credentials, including the
server name, the database name if applicable, and any access
credentials. Microsoft systems often use shared Windows credentials across
multiple platforms. Note that the database credentials in Power BI Desktop may
not transfer between users when you share Power BI files, so make sure other
users also have database access credentials if you share it with them. When they
open up the file on their computer, they will have to enter their own
credentials to refresh the data from the databases. We can also customize
configurables connection settings options, such as credentials,
encryption, privacy levels, and native database queries. In some organizations,
the IT group may control the privacy level and we cannot change it
ourselves. The premise behind changing the privacy levels becomes imperative
when sharing Power BI files because privacy levels will likely vary across a wide
user base. I'm in the CDC flu data Power BI file we created earlier by connecting
to a CSV connection. I'm going to show how to connect this to a SQL Server and
SQL Server Analysis Service model. However, you will unlikely be able to
access the database yourself, so just follow along and see how database
connections work in Power BI. We select from New Source, first we're going to
select SQL Server. I'm going to enter my Server name, and enter my database
name. Notice the radio button to select between Import and DirectQuery. Let's
set up our query using Import. If we want to get the data using a custom SQL
query, we would enter it by expanding the box below. I would recommend
testing SQL queries out beforehand to make sure they work in an application
like Visual Studio. In this example, we will not use a SQL query, so go ahead and
close out this Advanced Options menu, and proceed to the next screen by
clicking OK. From here, we select the ILINet table we want to use from the
database, confirm. Now we see we have our connection to the database
table set up in SQL Server. Now I'm going to set up or show how you would
access a SQL Server Analysis connection. Again, go to New Source. Let's click
More to see more of the options that are available. In Database, Azure, and
Other, we see the ODBC, the Azure options, and here in the Database tab, we
see the databases. Connect to the SQL Server Analysis Services database. Notice
that similarly to connecting to SQL Server, we also see the Server and Database
options available on this screen. We also see two radio button options like we
saw in the SQL Server. However, this time it's between Import and Connect
Live. If we select the Import option, we will see an area to enter the MDX or DAX
query below. We can choose to enter a query here, or if we do not enter a
query, we can access tables for the model in screens after that. If we select
Connect Live, it will disable the MDX or DAX query, and Power BI will become a
front end connection only. Let's exit out of here, and we see how we would set
up these database access connections.

Comparing data connection modes

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] Earlier we saw the following Data Connection Modes for
databases. Import, DirectQuery and Live. In the cloud services, we can also
leverage the Push mode. In the future, Microsoft will continue to develop and
improve the connection options available for all these Data Connection
Modes. A recent update to Power BI Desktop allows us to connect to both
DirectQuery and Import in combination for different queries in the same file. We
use Import as the typical default data connection type. Microsoft recommends
this method because it takes advantage of the high performance query engine
and allows for us to leverage the full range of options for working with
data. DirectQuery allows us to connect to data in it's original source
repository. This option works best when the underlying data provides interactive
queries returning data in less than five seconds, or if this data changes
frequently and we need the latest updates. Power BI supports DirectQuery for
several database options including some of these we see here on the
screen. However, check the Microsoft and the Power BI documentation for the
latest applicable databases. A Live Connection allows us to connect many
reports or dashboards to one Power BI connection, such as a large scale BI
project like Sequel Server Analysis Services. It provides advantage, such
as, creating a centralized approach where developers can secure and control
data model access. Partitioning the model to process independently and more
efficiently. Minimizing discrepancies through consolidated calculations and
analysis. Scaling subscriptions across many users. Avoiding memory or size
constraints by pushing the intensive work to the database. And streamlining
development through lower implementation times. However, it also creates
many disadvantages, such as, disabling many of the key features and
capabilities in Power BI. And it also makes Power BI a front end process where
we cannot add query steps or custom M code, although we can add the
measures after loading the model. Push Mode is a common approach for real-
time data seen in Power BI cloud services. It is not a formalized data source per
se, but rather data pushed through an external streaming service such as Azure
Stream Analytics.

Query folding and native queries

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] Databases, by design, process data efficiently. Query folding
optimizes query performance by pushing as much work as possible back to the
database connection behind the scenes. In order for this functionality to
work, the database we connect to must have a server with query folding
capabilities. If the database does not support query folding, we can still extract
and transform the data, but it will perform less efficiently. We cannot leverage
query folding with an Excel connection because Excel is not a database, and
therefore has no engine to fold queries. On a high level, SQL logic works by
selecting the data table, selecting the columns from this table, and then filtering
the columns with conditions. Query folding translates our transformation
commands into SQL and sends them back to the database without actually
writing any SQL code ourselves. Accessing the native query allows the query to
perform more efficiently, even for large data sets. To avoid breaking query
folding capabilities, do not use a custom SQL query, unless you are a SQL
expert, custom M code, or parameters dynamically as filters. While accessing the
native query optimizes performance, we also may want to break the query
folding capabilities to use the functionalities that enable a great deal of
flexibility in the query editor we will see later. Let's go into Power BI. Let's look
at our initial relational database connection to SQL server for the CDC flu
mortality numbers, where we see the Source and Navigation query editor steps
on the right. If we right-click on the Source step, we see that Native Query is
grayed out, indicating it is not an option. Then if we right-click on the
Navigation step, we see the View Native Query option available, meaning that
query folding is available at this step. We select View Native Query. Here we see
all the columns in the database table. It looks very much like SQL code for those
of you familiar with SQL. Let's set up this database connection again and see
how it works if we use the custom SQL statement. What we're going to do here
is we're just going to set up the connection again, and this time we're going to
enter SQL code into the Advanced Options box. We open up the Advanced
Options button at the bottom, and we write, "Select all the columns "from the
database table," and hit Okay. We see the query loading. Now we see the same
table that we saw earlier, and we selected a database table using the SQL server
connection. When we right-click on the Source step, we see the View Native
Query option is not available, meaning that the SQL code that we used did
indeed break the query folding capabilities.

Question 1 of 2

What is the connection mode that Microsoft recommends in Power BI?


DirectQuery

Push

 Import

Correct

Import connection mode allows you to maximize your capabilities across


Power BI, including creating custom queries and calculations after
loading the data.


Live
Question 2 of 2

What is the SQL programming language?


Works on many types of connections in the query editor, including Excel
files, all types of databases, and API connections

Users can only use it to insert data into databases, and you can't query
relational databases in the query editor using the language.

 a functional language developed several decades ago as a way to query


datasets from relational databases

Correct

SQL is a programming language that has become a mainstay in IT as it


queries many different relational database types.


a query language only used in Power BI in conjunction with database
connections
2.

3.

Connecting to web tables

Selecting transcript lines in this section will navigate to timestamp in the video
- [Tutor] We have access to vast online libraries of free public data. We can
download this data manually, but web connections eliminate the potential for
manual errors and allow us to easily refresh data sources. Tread carefully
though, because we do not have direct control over even established external
data sources, which could move, delete, or change the format of the data
without our knowledge. We have the capability to open web pages as different
file types, including text files, CSV documents, Excel workbooks, PDF files, JSON
files, HTML pages, and XML tables. HTML stands for hypertext markup
language. It serves as the standard markup language for creating web
pages. Tags that label pieces of content such as headings and tables represent
the HTML elements. When we go to a webpage we do not see the HTML
code, but rather the content rendered by this code. Let's see how the web code
works for the NOAA Weather Service Station list. We see that this file has a TXT
extension on the end. If we right-click on the page and hit inspect it pulls up the
code behind the website. We see the HTML code that renders the
webpage even though the interface file appears in a .TXT format. We can see
where the data element tags go in this webpage when we open up the body. If
we connect to a webpage not set up in a format the Power BI has the native
capabilities to render we may run into issues. The CDC web portal we looked at
earlier also supports other files as well. Here we see the zip code to GEOID
mapping in a CSV file format where we see commas separate the data fields. We
can also look at the population by GEOcode we earlier downloaded. We will
come back to this later when we set up an Excel workbook connection in Power
BI. Let's go into Power BI and set up a web data connection. We can type in
web, or we can go and look at other where the web option presents itself. Click
on the web option. We see where we can put the URL of the web link to the
data. We can copy and paste the NOAA station list into our Power BI file. We hit
OK. Click connect. Notice essentially that we see a data preview with the data in
two columns set up by the fix with delimiter set at position 41. We hit transform
data to see what the data table looks like. We still see the commas separating
the data fields. So what we need to do is go back into the source step by
double-clicking on it to open up the gear wheel. Notice that the file opens up as
a CSV document, so we changed this to text file, hit OK. We now see all the data
in a single column. We can drag the arrows to expand the column to see all the
data points. We will come back later to transform the data to separate it into
separate columns. We're going to save this file as NOAA station list, and save
it. There we have it, we've connected the NOAA station list online to a Power BI
file. We can also set up web connections to the US Census population Excel
file we set up earlier by downloading the Excel file. In the US Census Power BI
file hit added queries, and we add a new source as a web connector. Let's first
set up this mapping of the GEOcode by population. If we double-click on the all
GEOcodes link it will download the data. However we can also point it to the
path for this file and Power BI will open it up as a web connection. I'm going to
copy this URL link and paste it into the URL, and then add the Excel extension
on the end. And we hit OK, and connect. One second, we need to go back in
and add an S here. So we choose to reconnect again and add an S to the
GEOcodes. Again notice the navigator shows us two views to connect to, and
this time we're again going to select the second option, which will give us the
table without the headers and the unnecessary rows on top. We hit OK. We now
see this data connection loaded in the same way that we loaded an Excel
file except the connection is not to an Excel file on our desktop, but rather one
online. Let's rename this GEOID to county and put a web at the end so we
know that it's a web connection, and we're going to use the web
connection from here on out for this US Census data. We also need to establish
the connection to the zip code to GEOID mapping URL. Copy this web link, and
again go into the Census file and select a web data connection again. Paste the
link in and hit OK. We see that the Power Query editor in Power BI has already
picked up on the commas that separate the fields. We hit OK. And we're going
to rename this query population by zip code. And there we have it, we've set up
two additional web connections in this particular US Census file, which we will
transform from here.

Querying API data

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] API stands for application programming interface. Web developers
use it to connect their web applications to other web applications. An API
connection works like a query where the original application sends out a unique
query to the online data source, which returns a subset of the data we can
use. The application we want to connect to must offer an API connection in
order for us to query the data. An API key serves as a unique access key that
allows us to create our own API queries. Safeguard your own API key and do not
share it with others. We are going to set up an API connection to the U.S.
Census Data we saw earlier. When we look through the API documentation on
the U.S. Census website, we see that we need to request an API key to query this
connection. We will receive this API key in an email which includes not only the
API key but also documentation for setting up the API query perimeters and
Endpoints. We can set up API connections in many ways which can sometimes
make them confusing and tricky to set up, but do not let this deter you. We set
up a URL query string for the U.S. Census API connection we saw earlier by
breaking it down into separate components. The API Endpoint, the query
perimeters, and our own API key. We pick back up with our U.S. Census Power
BI file. To set up the API connection here, we create a new web connection and
paste in the API query string into the URL box. We enter the entire URL with the
API Endpoint query perimeters and key, we hit Okay, we created the
API connection successfully, but we need to convert these list links into a useful
data table. We will do this using transformation steps. Rename this query U.S.
Census API for County Populations, and hit Enter to keep the rename. We also
need to save this file to make sure that we keep the updates including the new
API, apply the changes later. You can find the end state of this file in the
exercise file, but remember you will need to insert your own API key for this
connection to work.

Querying REST API connections

Selecting transcript lines in this section will navigate to timestamp in the video
- [Narrator] The more complicated REST API data connection uses command
requests, including the GET request, which asks the server to retrieve a
response, and the POST request, which asks the server to create a new
resource. To get our REST API query to work, we need to determine the resource
endpoint, the query string parameters, and the headers. When we create a rest
API connection, one of the formats that the data returns in is in a JSON
format. JSON stands for JavaScript Object Notation. It transmits data objects as
human readable text with attribute value pairs and array data types. The data is
setup and name value pairs separated by commas, uses curly braces to hold
objects and square brackets to hold arrays. We are going to obtain weather
info from the NOAA website using an API query. We start with its developer API
documentation. We see here how the different query parameters and paths are
configured. And, if you're interested in learning more, you can check out the
NOAA API documentation yourself as well. We first need to request an API
token. You can do so on the NOAA website by entering your email address. We
are going to use the free Swagger Inspector online tool to test and develop our
REST API connection. Based on the NOAA API documentation, we need to set
up the query as a GET request. So, we set the request to GET and the
parameters of the Swagger Inspector, and we then paste the AP ENDPOINT next
to it. We then hit the send button to see what results are returned. We get a
status of 400 bad requests, and it says, "Token parameter is required." So, this is
where need to pass and put our token in the GET request. In the authentication
and headers tab, we can see that we have request headers boxes. We've put
token in the header box. And then, in the value box, we've put our own API
token. We then hit send again. We see the query ran successfully because we
see a response returned in the JSON format below, and we also see a status of
200 okay, meaning that the query ran. We can now add another query
parameter to the endpoint, so we just type data to the end of it and send the
request again. We receive an error message that we need to add parameters
such as start date to query. This means we need to add the query
parameters that we saw in our NOAA API documentation. We go back to the
parameters page, and we're going to first add the dataset ID. And, the dataset
ID for the daily summary of weather is GHCND. We then need to add the start
date. And, the start date, note that it is a query string with no spaces in between
it, and, also, all of the letters are in lower case. We go over to the value box, and
we first put the year and then the start of the year, 01 01, for January first. And,
notice that this is set up as a string parameter in a text format, which is
unusual and not the typical date formatting that we would see in other data
science applications. But, this is how the query is set up, and this is how we need
to input the parameters. You can also add the end date, which we're going to
say 2019 12 31. We add another new parameter, this time for the Santa Barbara
station ID. So, we put station ID, and the Santa Barbara station ID is GHCND. We
also need to add one more parameter, and that is for, we need to add a limit of
1000 because, otherwise, the query defaults to returning 25 records, which is
not going to cover even a month of data. It's only going to cover the 25, so we
need to expand it to 1000. So, we put limit of 1000 in here just because that's
the highest and, if we want to add, say, other stations later, we've already
defaulted it to the highest limit. We send this API request again, and, this
time, we see our desired dataset returned in a JSON format. Now, we go into
Power BI again. We are going to add our NOAA API query to the NOAA station
list we set up earlier. We go to new source, web connection, so, here, we have a
few options we can leverage for the web connection. We can choose advanced
options, which will give us a space to enter the header for the token, and we
can also separate the URL string into several different parts. For the sake of the
example being easier to set up, I'm going to just paste the entire string we put
together with the Swagger Inspector. But, if you want to break it apart to test it
our yourself, you can as well. Power BI has the capabilities to leverage API
connections, but if you're doing a lot of API connections, it might be better to
look to other API flow sources or ways to organize and query the data. So, we
select advanced, and we first put in the query parameters that we set up in the
Swagger Inspector. So, here, we can just paste the entire URL string. We copy,
and then we go over to Power BI and paste it in the URL parts. We then need to
add the HTTP request header parameters down at the bottom. We put token in
the first box. We can then paste the token into the header parameters, and we
say okay. We've already passed our parameters in. We know that this query
works. We don't need to use or choose from any of these other access
options. We hit connect. We now see the connection works, but, instead of a
data table, we see a table with hyperlinks to list and record. There are Power BI
objects, and we will talk more about these later in this course. Let's rename this
query NOAA API connection. We're going to also click on the source step to
make sure that the web connection raised the data in a JSON format. So, we
double click on the source step, and we see that we used open file as a
JSON. We can also see that we can no longer see our token field in the web
connection, and the reason for this is privacy concerns. However, we will see
later that we can see the token in the M code. Be careful with your API token
and key because it can be seen in other parts of the query. We confirm that
works. There we have it. We've established a connection to the NOAA API
query, and we are going to revisit this step later when we learn more about
objects and transformations.

Configuring OData feeds

Selecting transcript lines in this section will navigate to timestamp in the video
- [Announcer] OData stands for Open Data Protocol.  They allow for an easy way
to connect to online data without having to manually build REST API
connections. We can learn more about OData connections and how to create
them from their website we see here on the screen. For this example, we are
going to use an already-existing OData connection for the CDC Weekly Flu
Mortality numbers in the form of a URL link. On the CDC website we see here,
we can see the options for viewing the data, visualizing data, exporting, and
using an API key. If we click on these three dots at the end and choose Access
data via OData, we see a dialogue box pop open on the screen. We copy this
OData end point. In Power BI, we choose a new source to connect to. We
choose the OData feed. We then paste in the URL we copied from the CDC
website and hit Okay. Click Connect again on the next screen. We see the data
connection established, but I would recommend inspecting it. I know for a fact
that this CDC OData connection sometimes does not contain the most recent
numbers or matching totals between the state, regional, and national levels, but
we can see the connection works. We hit Okay, rename this query by double
clicking on it: CDC Flu Numbers OData. We save the Power BI file to keep the
updates, choose to apply the changes later for now. There we have it, we have
now connected to APIs in three different ways in the past three videos.

Installing Python

Selecting transcript lines in this section will navigate to timestamp in the video
- We can use python code as it's own data connection in Power BI. The python
script must produce a result in a data frame output. I would recommend
running the script first in a compiler like spider, or anaconda, to
troubleshoot any potential issues. To download python, first go to the python
website and download the latest version of python we see on the screen. Once
downloaded, run the python executable. Click on install now. We see now that
the setup was successful. Close out of this window. Now open up command
prompt. Let's check to see if python is installed. Type python, space, dash dash,
version. Okay, we see we have the most recent version. We need to make sure
the PIP command line tool is installed so we can run the modules such
as pandas, in our python script. To do this, go ahead and close out of the
current command prompt. Open up the installer again by going to our
downloads folder. Select modify. Click on next. Now, on the advanced options,
click to make sure that python is added to the environmental variables. We click
on that to get the checkbox. Now hit install. We see the modify was
successful. Close out of this window, and go back into the command
prompt. Check to see if PIP is installed. We type in PIP, space, dash dash,
version. Great, we see we have PIP installed. Now we can use PIP to install
pandas by typing in PIP, install, pandas. We see that since we already have
pandas installed, we don't see the pandas loading. However, on your machine,
you will see pandas downloaded and loaded to your command prompt. We are
now going to install matplotlib, by typing in PIP install matplotlib. Again, we
already have this on our machine, but your machine will go through the process
of getting it. Great, we've now installed the python packages we need and we
can start using python in Power BI.

Running Python scripts

Selecting transcript lines in this section will navigate to timestamp in the video
- In Power BI, we connect to a Python script  by first selecting Get Data at the
top. We can select More at the bottom which will bring up a new screen. Now
type in Python in the search bar, and select Python script. We see at the bottom,
this is where Python is installed. However, if you need to change the settings on
your computer to get Python to run, let's click out of this menu, and point to a
different folder. On File, select Options and Settings, Options. Choose the
Python scripting tab. Click the drop-down menu, and select Other, where you
can point your Python home directory somewhere else. I'm going to cancel that
because mine is already fine. I'm going to select to get data again. Again search
for the Python script. Select the Python script again. Here I paste in a Python
script that uses Panda and NumPy to produce a data frame output. It doesn't do
much, as you will see, but it shows how a Python script would work in Power
BI. We click OK. We now click on our data frame output, and we see the data
frame that the Python script produces. Click on Transform Data, and we see the
data frame again in the Power Query Editor. Finally, let's just save this file, apply
the changes later. We'll save this Power BI file as Python script example. Hit
Save. There we have it. We see how Python scripts work in Power BI.

Question 1 of 4

What is an example of a web language NOT supported by the Power BI web


connector?

 JSON

Incorrect

JSON web language is commonly used to transmit data and it is one of


the several web languages the query editor can read.

 XML

Incorrect

XML is not necessarily a commonly seen web language but it is a


language that the query editor can read.


HTML

 Python

Correct
DirectQuery connection mode works best if you need to connect to
interactive queries that you can easily refresh to update the data.

Question 2 of 4

In order to run Python scripts as a way to extract data and bring it into Power BI,
you only need to install it in Power BI.


TRUE

 FALSE

Correct

Power BI cannot access the Python run through the Anaconda suite as
that is a separate application that runs on your computer (remember you
use the Anaconda prompt to install Python modules for that application
specifically), so you need to set up Python directly through your
computer's command prompt.

Question 3 of 4

When you choose Access Data via OData by selecting the three dots at the top
right of the screen, a text box pops up. What is the purpose of this pop-up text
box?


It is for you to copy the Socrata OData documentation, and then paste it
into the OData feed text box.

It is for you to select Socrata OData documentation, which will open the
OData feed text box.

 It is for you to copy the OData endpoint, and then paste that URL into
the OData feed text box.

Correct

This, plus a few other steps, will connect you to the API.


It is for you to type in an OData endpoint, and then paste that URL into
the OData feed text box.

Question 4 of 4

The GET request in a REST API connection allows you to do what?


It works the same as the GET request in the SOAP API connections.

 It works the same as the POST request in REST API connections.

Incorrect

GET and POST are two HTTP requests for REST API connections, but they
do not work the same way—the API documentation for the data source
will tell you which request you need to use.

 It is an HTTP request that allows you to query your data through the REST
API architecture.

Correct

If you need to set up your query through the REST API structure, the
documentation will specify the type of HTTP request you need to make,
such as a GET request.


It allows you to pass your parameters directly into the web connection as
a query parameter string.

4.

Leveraging metadata

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] We defined metadata as information about a value associated with
a value or simply data about data. Every value has a metadata record. Metadata
provides useful information for web data, such as updated day. If a function
uses this value, when it constructs a new value, it does not preserve the old
metadata records. In Power BI, let's look at our NOAA data connections. The
NOAA API connection we set up earlier which returns the results as a list and
the metadata as a record. We duplicate this query by right clicking and selecting
duplicate and name it NOAA API metadata. If we click on the metadata records
hyperlink, it takes us to see the metadata details where we see the record again
linked to the results set. If we click on this records hyperlink, we see key
information or metadata about our API connection, such as the query returns
853 results with a limit of 1000 records. Also, notice at the top, the formula bar
points to the metadata results of the result set. And there we have it. We can
see an example of metadata in Power BI.

Leveraging data types

Selecting transcript lines in this section will navigate to timestamp in the video
- Using the right data type in Power BI determines our success in setting up
calculations. We commonly use the number data type which includes decimal
numbers, fixed decimal numbers, and whole numbers. We have several date
data types such as date/time, date, time, date/time/time zone, and duration. We
could run into issues with date data types if the reference field is not an actual
date, or if the date comes from a different locale such as Europe which uses the
day/month/year format rather than the US month/day/year format. Text data
types include text values, but we can also set up numeric IDs as text data
types. For example, we can make numeric US zip codes a text value because we
would like to retain their leading zeros. Other less frequently used data types
include true/false or Boolean values, and binary which are typically file types. If
we go into the US Census Power BI file we set up earlier, for the Population by
Zip Code Query, we see the Query Editor automatically adds query steps in the
applied steps section to promote the headers and change the data type. Notice
that it changes the zip code to a number instead of a text value. And as a result,
we lose the leading zeros. We can remove the entire Query Editor step to
change the data types. But this removes all the updated data types from the
other data table fields. So instead, we select to change the zip code, Z-Z-T-A to
a text value. We want to replace this as a current step rather than adding a new
one. The rationale for this is because if we convert this numeric value into a text
value, it's already lost its leading zeros. We select, "Replace current." And there,
we see that the leading zeros now apply to the zip codes. We can also change
the Geo ID value to a text value because these also need to preserve the leading
zeros. We can see from this menu at the bottom the Use Locale selection will
allow us to change the date, for example, into a European date format. On the
Geo ID, let's replace the numeric value with this text value. Click, "Replace
current," and we've updated this field as well. There we have it. We can see how
we can easily change the data types in Power BI to enable ourselves to do
calculations and other transformations later.

Making initial field transformations

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] We have already extracted data and in looking into the
metadata as well as changing the data types, we started the transformation
step of the ETL process. To make a useful data table, we can undertake helpful
initial transformation steps for columns such as promoting the first row as
headers for the column names, renaming columns, removing unneeded
columns, moving columns around or duplicating them. In the US Census Power
BI file we set up connections to earlier, we see in the query for the Population by
Zip Code where we selected the source, then promoted the headers and
changed the data types. If we click on the Source step, we see where the
headers are in row one instead of the header row. We then see that the headers
are promoted in the second step. We can easily move the columns around in
the Power BI data table. Let's say we want to put the CPOP which represents the
population for the entire county right next to the population for the zip
code. To do so, we can grab onto the column header and move it over so that
it's right next to the Population by Zip Code. We can insert the step
here because it's not going to cause the query to break. Select Insert. In this
dataset, we only really need the columns for the zip code, the GEOID, the
population for the zip code and the population for the county. To highlight all
four of these columns, click on the column headers and hold down the Control
key to highlight multiple columns. We then right click on these column
headers and select Remote Other Columns. You now have four columns
remaining in the dataset. Let's rename these columns. Let's call the first one Zip
Code. We'll keep the GEOID in the second column the same. We'll call this
Population by Zip Code. And we'll call this last column Population by County. If
we go into the Change Type step, we see that we run into a problem and the
reason is because we are changing the data type of a column that we have
renamed. Because we only have four columns remaining in the data table, let's
just go ahead and remove the final step pair where we changed the data
type and instead change the data types here ourselves. Let's change the
Population by Zip Code to a whole number and the Population by County to a
whole number as well. We see how the initial field transformations pair down
the dataset to get rid of unneeded columns and to also make it a bit cleaner
and easier to read. Let's save this file to keep our updates for the US
Census. We'll apply these later. Let's see what impact these initial column
transformations have on viewing the native query. Let's go back into our Power
BI file for the CDC Flue data and select the ILINet which is going to show our
SQL query. Let's see what impact moving columns around has on viewing the
native query. Let's just move the region to after the year. Now let's see if we can
still view the native query. We can. We go into it. We see that the column order
may have changed in Power BI but we still see it in the same position in the
query. This is because the order that we pull the columns in a SQL query does
not really matter. If we rename a column, let's see what type of impact it
has. Let's rename this Region. And right click on Rename Columns and view the
Native Query. We see that it has added a new piece of code to view the Native
Query that it renames the Region column into a proper capitalization. And we're
going to save our changes and we're going to revisit this query folding as we
examine other transformation processes.

Splitting fields

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] We can split a field into two or more fields based on a selected
delimiter, number of characters or position. Here, we see that we separate the
full US zip code that includes both the five-digit zip code and the four-digit
delivery code within the zip code into two separate fields. We use the dash as a
delimiter, or separator, to split the two fields on. In the Power BI NOAA station
list file, go the the query for the stations. Let's rename this query NOAA
Stations. We first need to remove the Changed Type step, because we don't
have a column to reference. We can see that we have a step-level error
message. It says, "'Column2' of the table was not found." Remember, the Power
BI initially separated the query into two columns. We changed the query so that
Column2 does not exist anymore. We can click on the little X to remove the
Changed Type step. We now have all our data in a single column. To split
Column1 into separate columns, select it, go to Split Column on the Home
tab and select By the Number of Positions. We choose 12 characters because we
can see that the station IDs have around 12 characters in them. We split once, as
far left as possible to pair off the station ID. If we use the Repeated function, we
would end up with it being split into columns of 12 characters repeatedly. And
we can see that, especially when we look at the station name, 12 characters is
not going to contain all the information we want. Click OK. We now have two
columns where the column names use a 1 or 2 after them to reflect their column
origins. We can repeat this process to split up the remaining columns by their
character positions, or we can perform this process much more efficiently by
splitting all the columns at once. We remove the Changed Type step and also
remove the Split Column by Position. Now, we go into Split Column again and
choose By Positions. The query uses AI to determine estimated splitting
positions, and we see when we confirm that it does a pretty good job. From
here, we can rename the first column Station ID, and we can also call Column15
the Station Name. The other fields we see here give latitude and longitude and
other geographic information. And there we have it, we can see how we can
easily use the Column Splitting functionality to make a single column into an
actual useful data table.

Merging fields

Selecting transcript lines in this section will navigate to timestamp in the video
- Merging columns works the opposite way of splitting them. It takes two or
more columns and combines them into a single field in a functionality similar to
concatenation. We see here, we take a five-digit zip code and combine it with
the four-digit delivery code for the zip code into a final zip code. It uses a dash
to separate the fields. In order for the merge functionality to work, the merge
fields must have the same data type. In our US Census Power BI file, we go to
the GEOID to county mapping, the web connection, to show how the merging
functionality works. We want to combine the state code and the county code
into a single field called GEOID. First we need to make sure that both these data
types are set to the same for both the fields. They're currently both
numbers. We want to actually use text data types instead to preserve the
leading zeroes. We go ahead and select "Text" and replace the current step so
we've updated the state code to include the leading zeroes. We do the same for
the county code. Again, select "Text" and replace the current step. Now, we can
combine these two fields into a merged field. Select both of them, now right
click and select "Merge Columns" from the drop down list. We are going to call
this new field "GEOID" and we're not going to use the separator between
them, meaning that we will see it all as a five-digit text field. Click OK and we
see that not only have we created a GEOID field, but we also no longer see
our separate state and county fields.

Cleaning text fields

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] In the NOAA station list Power BI file file we set up earlier, we
already split the data into separate columns  and renamed some of them. Now
we can go into the columns and make changes to make them cleaner and also
use in later transformations. In Station ID, when we click into a value in the
column, we see that there are spaces after the Station ID. This is because when
we split the positions by a fixed number of spaces, it leaves extra spaces
between the fields rather than trimming it to the exact length. Selecting the
Station ID column, we right-click, go down to the Transform function where we
can see that we can Trim, Clean, Capitalize Each Word, we have a number of
options to transform this column. For the text field, when we select Clean, it
removes any non-printing characters. Let's go ahead and do that to make sure
we don't have any in the Station ID field. Now lets remove the trailing spaces by
again right-clicking on the Station ID column name, selecting Transform
again and choosing to Trim. We see the applied step for trimming the text. Now
let's go back into the same value and we see that we have eliminated the
trailing spaces. Now lets go into the Station Name and we're going to remove
the trailing or leading spaces for this column as well. And we're also going to
capitalize each letter. Right now we see everything in capital letters and
capitalizing each word makes it a bit easier to read. Again, we right click on the
column name, Transform and select to Capitalize Each Word. There we have
it. Some simple text transformation steps that can work wonders for our
data and help make the table easier to read and work better in transformation
and other steps later.

Transforming numerical fields

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] In our CDC flu data we have the total number of flu mortality
deaths by week where we see the year and the week indicating the
timeframe. First of all, when we look at the CSV file we first need to remove the
top row select remove rows, remote the top rows and we've removed one row
at the top. Now we can promote the headers so that we have column
names. On the home tab go to use first row as headers. I think it would be really
helpful to add an index column. We see that the order of the year and the
week show that it goes in chronological order beginning with 2016 week 40. To
do this, we can add a column and select to add an index column, which we see
defaults to start at zero. We can also start it at one but let's start at zero because
that's the first week. From here we can use numerical transformations to change
the index column. To see what the options are, right click on index and show the
transform options. Remember that with the text value we saw trim, we saw
clean, but now, because it's a number value we see numeric transformations. We
see rounds, absolute value, factorial, logs, powers, fractional powers, and then
we also see the text transformation function if we wanted to use it. We can also
leverage calculations in the add column menu. Let's go to standard and select
multiply. I'm going to multiply by seven so that we take the weeks out, which is
given by the index, and multiply it by seven to give us the days out. Select seven
and hit okay. Now we see we created a new column, which gives us the number
of days from the start. I'm going to rename this days into measurement and
there we have it. We can see the numeric transformation options available in
Power BI.

Transforming numerical fields

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] In our CDC flu data we have the total number of flu mortality
deaths by week where we see the year and the week indicating the
timeframe. First of all, when we look at the CSV file we first need to remove the
top row select remove rows, remote the top rows and we've removed one row
at the top. Now we can promote the headers so that we have column
names. On the home tab go to use first row as headers. I think it would be really
helpful to add an index column. We see that the order of the year and the
week show that it goes in chronological order beginning with 2016 week 40. To
do this, we can add a column and select to add an index column, which we see
defaults to start at zero. We can also start it at one but let's start at zero because
that's the first week. From here we can use numerical transformations to change
the index column. To see what the options are, right click on index and show the
transform options. Remember that with the text value we saw trim, we saw
clean, but now, because it's a number value we see numeric transformations. We
see rounds, absolute value, factorial, logs, powers, fractional powers, and then
we also see the text transformation function if we wanted to use it. We can also
leverage calculations in the add column menu. Let's go to standard and select
multiply. I'm going to multiply by seven so that we take the weeks out, which is
given by the index, and multiply it by seven to give us the days out. Select seven
and hit okay. Now we see we created a new column, which gives us the number
of days from the start. I'm going to rename this days into measurement and
there we have it. We can see the numeric transformation options available in
Power BI.

Filtering and removing duplicates

Selecting transcript lines in this section will navigate to timestamp in the video
- In the U.S. Census Power BI file, we look at the GEOID to county web
query, where we want to focus on the State and County details pertaining to
each GEOID. Filtering, we can remove, or keep, rows based on whether or not it
meets our set criteria, or conditions. Filtering works similarly to removing
rows, but we may find it easier when we're working with ranges, or several
conditions. We see the summary level column on the very far left hand side of
the table. Ten pertains to the United States, so it's a country level. 40, pertains to
Alabama, a state, and 50, pertains to the counties within the state. If we click on
the summary level drop down, we see, only click load more, we can see more of
the filtering options. We only want to keep 40 and 50. So we want to remove 10,
61, 162, and 170. We can do that by unselecting select all, and select 40 and
50. Now we see we have a reduced size data table that contains only the states
and counties for the GEOID to county mapping. Next, we'll see how to remove
duplicates. Let's duplicate the GEOID to web query. We right click and select
duplicate. I'm going to rename this query GEOID to county testing. Notice that
the summary level shows multiple instances of 40 and 50. If we click to remove
duplicates on the summary level, will it only remove duplicates on the particular
field? Will it only leave 40 and 50 left with multiple values? Or will it simply act
on this single field? We right click on summary level, and select remove
duplicates, and we see that in only focuses on that one particular column. So
now we're only left with two values, 40 and 50. But what if we include
GEOID? Does that change the functionality of removing duplicates? Let's
remove the last step, remove duplicates by clicking on the X. Now, let's select
those summary level in GEOID, right click and select to remove duplicates. We
see now that the remove duplicates feature works on both columns, rather than
one. So if we want to remove duplicates from an entire table, we would have to
select the entire table for it to work.
Accessing native query in cleaning

Selecting transcript lines in this section will navigate to timestamp in the video
- [Narrator] We earlier setup a connection to a SQL server for the CDC flu
data to test out how database connections work in Power BI. We discussed how
Power BI gives us the capability to use Query Folding to optimize
performance for database connections. Let's rename this query SQL Server
Connection. We Right Click and Select Rename and hit Enter to save it. Next,
let's see if we move a column, what the impact on the Query Folding will be. We
already know, from Right Clicking on the Navigation Step that we can view the
Native Query, and we see that it looks like a SQL query. Now, let's make more
adjustments to this data table, and see what the impact on Query Folding
is. Click on Total Patients. Click on this column. Right Click and select Moved To
The Beginning. Whether or not we want to leave it here is not particularly
relevant, but let's see if we can still view the Native Query after performing this
operation. We Right Click on Reorder and we see that, yes, we still can view the
Native Query. I'm going to remove this step, because I don't particularly think
it's going to be important later. Next, let's remove all the columns except for
Year Week, which we select by holding down CTRL and selecting on the Week
column. ILI Total and Total Patients. Right Click, and we select Remove Other
Columns. Now, if we Right Click this step in the Applied Steps, we see that, yes,
we can still view the Native Query. Now, let's rename these columns to make it
easier to read. Rename Year, so it uses lower cases, and same with Week. ILI
Total we're going to rename as Total Cases, and we're going to use lower
case for Total Patients as well. Now, if we Right Click on this step, we see we can
still view the Native Query. Next, let's merge two columns together, and see if
we can still use Query Folding. Click on Year + CTRL and then Click on
Week Right Click and select Merge Columns. Let's use the dash as a
separator, and we'll rename this Date. We Right Click on the Merged Columns
step, we see now that the View Native Query is grayed-out, meaning that
merging columns does break the Query Folding capabilities, and I can also tell
you that splitting the columns breaks the Native Query Folding as well. Let's exit
out so we get rid of this step, and next thing, what we're going to do is we're
going to replace the Week 41 by Right Clicking on it and Replacing
Values. We're going to Replace 41 with a Value of 60. You hit OK and then Right
Click on the step, we see that replacing a value also breaks the Query Folding
capabilities. So, we just remove this step. Now, let' filter out, so we just have
2019 Data. Select 2019 and hit OK. We have a much shorter list because we are
only looking at 2019 data. Now, let's remove duplicates for the week just to see
how it works. Right Click and select Remove Duplicates. Right Click on this step,
we see that now we can no longer view the Native Query. So, we click out of this
step and let's see what the Native Query looks like now. We select View Native
Query and what we see here is that at the beginning, we saw all the columns
selected and we didn't see any filters. Now what we see is we see filters, we see
the renaming of the columns, and it looks, very much like SQL code. So, this is
how Query Folding takes a database connection and through our
commands, pushes SQL code back to the database. Therefore, we don't have to
perform SQL code or write SQL code in order to leverage the capabilities of the
database.

Question 1 of 7

You can rename a column in Power BI to include spaces between the terms,
such as "Date Imported."

 TRUE

Correct

Unlike database field names for example, you don't have to label the
fields as single strings where you represent a space with an underscore—
if the fields come from the databases like this, it can be easier to work
with data later in this format because the column labels are more
readable.


FALSE

Question 2 of 7

Filtering a column and removing duplicates from the same column will give you
the same result.

 FALSE

Correct

Filtering and removing duplicates are completely different functions that


may result in the same result by coincidence, but do not rely on luck! If
you start with a column of values 1,1,2,2,3,3 and remove the value 2, you
get a new column with values 1,1,3,3 but if you remove duplicates from
this column of values, you get 1,2,3 which is not the same result.


TRUE

Question 3 of 7
Power BI consumes and interprets data in the same way that Excel does.


TRUE

 FALSE

Correct

Converting data types in Excel is typically done implicitly, which means


that it will usually figure out that you want $100 to be a number data
type, you can't assume that Power BI will do the same unless you
explicitly set up the data type as a number.

Question 4 of 7

How do you define metadata?


URL path for web datasets

the date you access a data file

small set of data in the same format as the larger dataset

 data or information about data

Correct

Metadata is just data or information about a dataset such as update


dates and file size.

Question 5 of 7

You start with a column of values 0,1,2,3,0,2,4,6 and a = average of the column
values after removing 0, b = average of the column values after replacing 0 with
6, and c = average of original column values. How do a and b and c compare to
each other?

 a = 3, b = 3.75, c = 2.25

Correct
How this logic works is important because if you remove values you will
impact the result of other calculated numbers later because you removed
two of the denominator fields, so now you divide by 6 instead of by 8, a
= (1+2+3+2+4+6)/6 = 18/6 = 3; b = (6+1+2+3+6+2+4+6)/8 = 30/8 =
3.75; c = (0+1+2+3+0+2+4+6)/8 = 18/8 = 2.25.


a = 2.25, b = 3.75, c = 3

a=b=c

a = 2.25, b = 3, c = 3

Question 6 of 7

Merging the text field "Power BI debuted in" and the numeric value 2015 into a
single field gives you what result?


"Power BI debuted in 2015"

"Power BI debuted in "+2015

"Power BI debuted in 2015"

 Error message

Correct

The double plus sign should be a single delimiter and you cannot
substitute a single instance of the plus sign as the same delimiter.

Question 7 of 7

Why would you make initial field transformations?

 to remove columns of data you do not need

Correct

Removing unneeded columns makes the data set more clear and easier
to read.

to change the data from Excel or CSV format to Power BI format

to prepare data for SQL queries

to add data columns missing from the initial data set

5.

Introducing table objects

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] We define a programming object as a particular instance of a class
with a combination of variables, functions, and data structures. In Power BI,
Power Query Editor objects include tables, lists, records, values,
binaries, functions, errors, and parameters. We rarely use binaries,
functions, lists, and records by themselves but instead combined with other
objects, like table objects. In the U.S. Census Power BI file we set up earlier, let's
go back to the Excel connection in the web, and let's go to the Query Editor
Source step. When we set up this connection, remember that we selected for
Power BI to drill automatically into this third table here. It drills into the table
and it used this as our query. You click on the Navigation step, we can see that it
selected that item as the table object that we wanted to drill into. Let's go back
to the Source step, and we're going to duplicate this query and I'm going to
rename it GEOID to county testing with objects. Now we go back again to the
Source step, let's hover over this last item in the table. What this functions as is
a navigation table essentially. It allows us to select the table objects from within
the file. We see here that the Navigation step will provide us the name of the
item, and then the type of programming or the Power BI object that it is. Now if
we hover over this last item, table object that select to use in a query, we see
that we can preview it by clicking on it so we don't enter into it, so that the
hyperlink doesn't actually drill into it. But we can see this allows us to
preview the table object. Conversely, we also can drill down into it by selecting
the actual table. Let's just go ahead and create a new Navigation step, because
we already duplicated this query. Now we only see that selected table that we
drilled into in the data table. The query only shows a single table object. And
this is the query that we selected. Let's try something else out really
quickly. Let's remove the Change Type, Promote Headers, and Navigation which
takes us back to the original Source Step. Notice that, in this data column, we
see three table objects. If we click on the diverging arrows at the top, and let's
uncheck Use original column name as prefix and select Okay to see what
happens. What we see here is an expanded data table built from combining the
table objects, and all three of them that we saw in the Source step. In the Name
field, let's click on Load more, and we see all three of those tables listed in this
data table. This is a pretty neat feature being able to easily combine table
objects. However, I would recommend exercising caution when combining data
tables and these table objects, even from within the same query or multiple files
within a source, because we don't know if the columns line up. We need to
know what is in the tables beforehand. So combining them can do some really
impressive calculations and transformations later, but we need to know our
data before we proceed any farther. We will find using these table
objects immensely useful for many types of queries in Power BI.

Introducing list and record objects

Selecting transcript lines in this section will navigate to timestamp in the video
- [Narrator] If we think of Table objects as objects with rows and columns we
can visualize List objects as a single column in a table. Lists do not have a data
type nor do they have an explicit column name except the name of the List. If
we visualize Lists as columns in a table then Record objects are its' row
counterpart. Like Table objects we can drill into and expand List and Record
objects in queries. In the Power BI file NOAA station list let's select the NOAA
API connection. Remember that the NOAA API connection return results for the
API query in a List object. We know that because we see here in this little
table we have a List hyperlink that we can click on. To drill down into it we
simply click on the List object. We can also tell this is a List object because we
see the List icon next to the query name. If we click on one of these
Record hyperlinks let's see where it takes us. We drill into the Record object for
an individual data point for the daily weather summary. While it may be nice to
know this information we want to have the entire data sent and not a single
record or single row within this list. We click on "X" to remove the last step we
did. We again can expand the List object. This time let's convert this entire list to
a table. We can highlight the list then up top for the List tools we see a new
Transform section, select To Table. We see here that we can now enter a
delimiter and select how to handle errors. We're just going to select not to use a
delimiter for now and select OK. While this Table object looks very similar to our
original List object it is now in fact a Table object as we see with the Table icon
next to the query. And we can expand out all the records into a combined data
table. We select the diverging arrows at the top. Let's unselect Use original
column name as prefix. We hit OK. We see an expanded data table that includes
all the data in the NOAA API query we set up. The API query we set up for the
US Census data returns the request as a List object of list objects. We can
convert this List object into a Table object by selecting To Table from the List
tools transform functions. Again, we're not going to use a delimiter and we'll
figure out errors if we run into them. We see more List objects contained within
the original List object. Let's choose to expand out this column. Let's say expand
to new rows. We now see all the data for this US Census API connection in a
single column. While this may make it easy to view and we know we can see all
the data in a single column it's not particularly useful for data analysis because
we don't want the state region and all the other field headers to be in the same
column. We want to have a data table with multiple columns. I'm going to save
this query by renaming it and saying Single Column. And we'll come back to this
later. And then I'm going to duplicate this and rename this query Multiple
Columns. I'm going to delete the last step where we expanded the entire
table into a single column. So now we're back at the Table object that contains
many List objects. Now let's click on the diverging arrows again and select
Extract Values. This allows us to extract the items from each List object across a
row of the data table rather than a column. We're going to select a delimiter to
use. This will allow us to separate the fields in the list. I want to use three upright
bars as a custom delimiter because I know that that is rarely seen in many data
sources. Select OK. We expand this out and we can see the field names in row
one followed by each of the values for the fields in the rows below. So although
they're combined into a single column we know that this is pretty easy to split
the columns by delimiter because we put the delimiter in ourselves. So we
highlight the column again and select to split by delimiter and we use the three
upright bars again. And select each occurrence of the delimiter. Click OK. Where
we see now we have a data table with multiple columns in the columns we want
them to be. We can then use the first row as headers. If we look through and
see if we have anything else we want to change. So there we have it, this is how
we can use Record and List objects to transform queries. And it's pretty neat to
see how their functionalities work and how we can use them for really
interesting query transformations.

Working with binary objects

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] We define binary objects as files combined or read using other
functions. In the NOAA CSV folder Power BI connection we set up earlier, we see
the content column contains hyperlinked binary objects in each row. If we click
on one of the binary hyperlinks, this takes us into a view of a single data
table from one CSV file in the folder. Let's get rid of those last steps. We start
again with the source step. To combine all these binary objects, click on
the button with two downward arrows next to the column header. We click
okay. We now see all the files combined into a single data table. We also see
another folder and several other Power BI objects in the folder as well. We will
revisit and discuss these more later. If we go into our NOAA Excel folder
connection, we can see that we also have access to these two downward
arrows to connect the two binary objects, which are Excel files. Let's click on it
and see what it does. What we see here are the Los Angeles and Santa
Barbara table objects within the Excel file for 2018. However, we're missing the
2019 data. So using this button is not going to work to combine Excel files. Let's
go into the PDF file. We put the PDF file in a folder, and then we connected to
the folder. This is the single PDF file and the binary object, let's see what that
looks like. So we've trilled into the binary object in the NOAA PDF folder. Now
we see all the tables in the PDF as table objects in the query editor. We click on
one of these, we can preview it, and we can see what it looks like. It's got the
headers and then we actually have our data down here. We can use these two
diverging arrows to combine all the table objects into a single data
table. Uncheck your original column name as prefix and click okay. Now we see
a table object that we combined through expanding the data tables to
include the table objects within the query. From here, we can promote the
header so that we actually have the proper information in place. I scroll down,
and this is going to take a bit of work. So and then we can look at other
connections. Our curiosity, if we go back into the NOAA Excel file, and we click
on the binary object, this takes us to the same view that we saw earlier when we
tried to combine the files. We see the two table objects within a single Excel
file. We're going to revisit later how to connect to the Excel folder, so that it
includes both files. But for now, it's pretty neat that we can combine the
PDFs using binary objects and table objects.

Grouping data

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] Grouping allows us to aggregate a field based on another selected
field. Here we see we choose to aggregate the total field based on the group
field. The aggregation we see here is the sum function. Grouping increases
calculation options and improves query performance but it also eliminates more
granular data analysis because it removes much of the real level details. In the
US Census Power BI file, let's duplicate the query population by zip code. We
rename it by adding the suffix grouping to the end of the query. Now in this
new query, select the zip code column by highlighting it. In the transform
tab, we select group by all the way on the left hand side. We select group by zip
code. We're going to call this new column max population. And it's going to
return the maximum population for each zip code. We can also access advanced
grouping options by selecting the advanced radio button. This allows us to add
more dimensions and more values to the grouping if we choose to. We're going
to stick with the grouping we set up and hit okay. What the grouping
functionality does is for each of the zip codes it returns the maximum
population of an affiliated county. We will revisit this more later and do some
interesting transformation steps on it so stay tuned.

Pivoting data

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] On the screen here, we see the data on the left, and the grouping
of that data on the top right. In the grouping, the sales across all years for a
particular group are reported. Notice the pivot table on the bottom right is a bit
different. Here we add the years in a separate columns at the top of the table,
which adds a new dimension to the way we aggregate the data. The data
reported are all the same, just in a different manner, and it's a lot easier to
read than the original chart. In the NOAA station list Power BI file, we go to the
NOAA API connection. Notice that in the data table, we have five columns.  The
second column is labeled as datatype, but what it really is is categories of
weather such as precipitation and temperature ranges. We would like to put
these categories as separate columns at the top. To do so, we first select this
categories column called datatype, which, yes, is a bit confusing with our
previous discussions. We select the column, we go to Transform, and we select
the Pivot Column function in that ribbon. We select to use the values as the
values we want to pivot on. We then can choose from the advanced options to
not aggregate the data. The reason I'm choosing not to aggregate the data is
because if we do have duplicates, that's another concern and we want to see the
shape of the data table with the columns on top. We don't necessarily want to
sum up or try and figure out how to aggregate the data. We select OK. Now
each of the categories has its own field. But we notice the precipitation
column shows values 100 times higher than their actual amounts, because the
API query does not include the decimal place. To change the precipitation into
actual inches, rather than one without a decimal place, we're going to divide this
entire column by 100. We saw this earlier, with the numeric column
transformation. We select the precipitation column, go to the transformation
tab, and this time select the Standard, Divide, and we select to divide the entire
precipitation column by 100. Now we have an updated precipitation
number which is in inches. So pretty cool, we can see how a combination of the
pivot functionality and also transforming the data column can make a big
difference to the way that we see a data table. Remember in the US Census
Power BI file, we set up a US Census API connection where we later transformed
it into a single column of data. Although we ultimately found a more effective
method using another functionality, what if we had, instead, the only option of
receiving the data this way? We can use the pivot functionality to change the
shape of the data table. This column is technically a text data type, and we
cannot directly pivot a text column. If we look at how the pattern of data
appears in the table, notice that the date labels appear every eight rows. So we
would need to create a table with eight columns when we pivot the table. We
first add the index column using one as the start value for the first row. We
select Add Column, add Index Column From 1. Next we create a modulo column
that references this index column using mod eight to segment the rows into
columns that repeat one to eight over, down the entire column. To do so, we
select Standard, and Modulo, and we put eight in as the value we want to use,
and hit OK. We see that the Modulo column starts at one and goes to zero at
the final value of the data set. However, I'm just going to use this as start the
index at zero, which will make it a little easier to read. We start at zero
instead, and then we see that the modulo goes from zero to seven for what we
know to be the column labels in the first eight rows of the data table. We need
to make the modulo into a text value, which will allow us to pivot this data
set. The last thing we need to do is take the Index column, use the integer
divide functionality, so that we see rows one through eight with row zero, and
then the next eight rows as one, and then we can pivot the data set. So let's go
ahead and do that. We select the column, we go to Standard, calculation, select
Integer Divide, and again we put eight in as the value. We are going to leave
this column as a whole-number data type, which allows us to pivot this
table. I'm going to delete the Index column, because we're not going to use
this. We want to see how the pivot functionality works with these three
columns. We select the Modulo column, we select from the Transform tab to
pivot the column, and we use the initial Column 1 as the values column. And we
select to not aggregate the values. We hit OK, and now what we see is a new
data table where the integer division that we saw before we pivoted the
column is the row number, and the eight columns we see after that are the
columns of the data table. So we can quickly do some transformation steps to
make this a bit more readable. We can remove the integer division
column, because we don't need it anymore, and we can also promote the first
row as headers, so it should have column labels. And there we go, a work-
around for pivoting an otherwise slightly unruly data set.

Transposing data

Selecting transcript lines in this section will navigate to timestamp in the video
- Think of each cell in a table as a value with pivot coordinates referencing row
and column dimensions. When we transpose a table, we change the data table
shape but not its pivot coordinates as the rows become columns and the
columns become rows. Let's use the NOAA PDF Power BI file connection we
created earlier to connect to a PDF file. We can use this data table, the one
query in it, to test out how the transpose functionality works. The PDF file we
see here in the data table contacts several unneeded columns and we may find
it faster to filter these out by transposing the tables. Notice that we do not see
the first row in the headers right now, so let's just transpose it in its current
configuration. To do so, we make sure we have the query selected, go to the
Transform tab, and we select Transpose. The first row in the fourth column used
to be the fourth row in the first column. Let's delete the flag column by filtering
out the row to remove the flag value and we can also remove the first two
columns and the last column in this newly transposed data table. Now we
transpose this table back by hitting the Transform Transpose functionality
again. We promote the first row as headers. Now let's transpose the table
again and see what the results look like. Again, go to the Transform tab and
select the Transpose function. Notice that we no longer see the headers in the
table. So if we decide we want to use the headers in the table, make sure that
you don't put them in the header position first. Do that as the last step in your
Transpose transformation process.

Unpivoting data

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] When we create a pivot table from the original data table, notice
that we see aggregated sums based on the rows and columns in the
table. Unpivoting puts the column headers in their own row, but notice it does
not technically reverse the pivot functionality to the original data table, because
we cannot reverse aggregation when we unpivot a data table. In the NOAA
station list Power BI file, we go to the NOAA API connection query we pivoted
earlier, and then performed a numerical transformation on the precipitation
field. We can now use the unpivot functionality because we did not select the
aggregation when we originally pivoted the data table. In order to unpivot these
column categories, we can select the category columns, and we hold down the
Shift key to highlight all four of them, and we right-click and select unpivot
columns. Notice if we go back to the previous step and we right-click, we see
that there're three options for unpivoting columns we can also only select the
date station and attributes fields, and then unpivot the other columns. We
would unpivot in this manner if we had change in category names, but we need
the other dimensions, the date, the station and the attribute would stay the
same. Thinking of it this way prevents the query from erroring out, because we
reference other columns to pivot from instead of the potentially dynamically
changing column rows. And there we have it. A few options to explore if we
want to unpivot columns.

Accessing native query in integration

Selecting transcript lines in this section will navigate to timestamp in the video
In the CDC flu data power BI file. We go to the SQL server connection
query. When we performed cleaning transformation steps on this database
connection. We know that filtering a column still allow the query to
work. Because, we could translate it to a SQL command that Power BI could
send back to the database. Let's remove the step where we filter for only the
year 2019. So, we can test out how integration, transformation steps affect
query folding. We can see in the last step by right clicking on the renamed
columns step, that we can still view the native query. Let's take a look at the
initial query we will start with. We see it looks a lot like SQL hub. So, we're going
to add some of the integration steps we just learned. To see how this effects the
query folding, and the SQL commands we see on the screen. Now, let's pivot
the table. So, that we can see the trends of the total cases across years by the
same make. We first remove the total patients column. Then we select the year
column. Because, we want to use these as the individual column headers. We go
to the transform tab. And select pivot column. For the values column, we use
the total cases. That we want to aggregate for each week for all those years. We
choose the total cases, as the values column we want to use. We click on the
advance options. Let's set this to Don't Aggregate. You can hit OK. We know see
the pivoting functionality performed on the data table. If we right click on the
pivoted column step. We see that the view Native Query is grayed
out, indicating that pivoting columns does break the query folding
capabilities. Let's get rid of this step and try this using Group, by clicking on the
X next to the step name. If we group the table, so, we can get an
average, number of cases, by week, across the total of the four years. Let's see
what impact this has on query folding. Let's select, the week, and select group
by, we're going to call this new column, average cases. Where we average, the
total cases across the four years. Hit OK. We now see our new table, where we
used the group by functionality. If we right click on the group through step, we
see, that we can view the Native Query. Which means the query folding is
working. This makes sense, because group by is a frequently used SQL
command. Lastly, let's see if performing a sorting functionality on the week
column by sorting from A to Z, impacts the query folding. We click on this arrow
for filtering, we say sort ascending. We see the sorted rows step now appears in
the applied steps. When we right click on this and see that we can view the
Native Query. And, this also makes sense, because order by is another
frequently used SQL command. Notice that the week uses a text value, and
when we put it in the sorting order, it sorts by alphabetical character
effectively instead of number. So, we change the data type let's see if this
impacts the query folding. You select whole number, and we click on the change
type step, we see that it's grayed out, meaning that it actually breaks the query
folding. Let's delete this last step, of changing the type, and we will right click on
the sorted rows, to view the Native Query. When we open up, the Native
Query we can see, the group by SQL command. And the order by SQL
command. So, it's pretty neat how we can leverage query folding to
effectively write SQL codes and commands, without having to do them
ourselves.

Question 1 of 4

You start with a data table with two columns of data and 20 records or rows. If
you remove one of the columns from the table, what kind of object do you now
have?


20 record objects

List object

Column object

 Table object

Correct

A table object can have as many or as few columns and rows as you'd like
it to. Removing a column does not change the object type. It just
changes the number of columns in the table object.

Question 2 of 4

Grouping is a functionality unique to the query editor in Power BI.

 FALSE

Correct

GROUP BY is a SQL function and SQL came out long before Power BI did
in 2015—there are a lot of other applications that also use grouping
logic.


TRUE

Question 3 of 4

Let's say you have a theoretical data table containing three columns: purchase
date, location, and amount. You want to create a new data table that groups the
total amount together by the purchase date and location, which are still their
own columns. How can you do this using the grouping functionality?


Select the Advanced Options and include the date purchased as an
additional aggregation field in the new table.

Split the date fields into three components (year, month, and day), and
then in the Advanced Options, select the year and month plus the
location as grouping categories and the sales amount as the aggregated
sum.

You cannot group a data table by more than one category.

 Select the Advanced Options and include both columns as categories to


group the aggregated sum of the amounts by.

Correct

You can add more than one category to group a data table by, but you
have to go into the Advanced Control and add the other field there.

Question 4 of 4

It is possible to have record objects contained within list objects.

 TRUE

Correct

You saw in this course video how an API query can return list objects that
you can perform transformation commands and see the record object
contained within these list objects that you can then transform into a
table object (which is a useable dataset).


FALSE

6.

Leveraging text formulas

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] Power Query editor formulas share similarities with their Excel
counterparts with a few key differences. Excel formulas are not case
sensitive, while Power Query formulas are. Excel counts using base one, while
Power Query counts using base zero, and Excel type conversions are
implicit, while Power Query observes strict data typing. Analogous formulas
between Excel and Power Query include the formulas we see on the
screen. Because Power Query does not implicitly convert data types, we need to
either first convert the fields to text data types or set up a text conversion
function within the formula such as the example formulas we see here. Let's go
back to the NOAA station list Power BI file and the NOAA stations query we set
up. We know that if the station ID starts with U.S., it is in the United States. Let's
set up a custom column to get the first two characters in the station ID using
the text dot start function. We select the add column tab, and select custom
column. We're going to call this new column name country, and we say text dot
start, click on the station ID, or we can alternatively, if we start typing in station
ID, you can see that there is now a smart prompt available in the query
editor that we can just use our arrows to go down and hit enter to select that
particular field or the function. Comma and then let's use two for the first two
characters in the station ID. Close the parentheses to close the expression, and
notice that we add two to the end of this expression, which is the same as what
we'd see in Excel. And let's hit okay and see if we get the same result. Okay, so
we see the first two characters in the station ID column. Now if we were to
change this formula, and we can go back into it by double clicking on the
step, if we were to change this to text dot start where neither the text nor the
start part of the function were capitalized, would the formulas still work? Let's
hit okay, and we can see that data typing is significant and an important part of
using the Power Query editor in Power BI because it is capitalization
sensitive. So let's correct this error by just going back into the formula and
adjusting it again. Also notice that unlike the splitting functionality that we
could also use to separate the first two characters in the station ID field, creating
a custom column preserves the original field. We don't lose the original
column when we perform this function on it. So let's talk about what we refer
to as the difference between the zero and the one counting. We're going to test
this out by creating another custom column, and calling this the length of the
country ID, which we already know is two. We'll call this length testing, so text
dot length. Again, we know to capitalize this, and scroll down. We know that we
need to use the country. Close the brackets and hit okay. So length returns a
length of two, so this means that the base of the counting is not referring to the
length of the characters, but instead is referring to another way that the
counting works in the Power Query editor. So we'll create another column and
test this out another way. We'll call this position testing and what we'll say is
text dot position, which determines if the U.S. is in the country, it will return in
the position and otherwise, it will return another amount, which we're going to
see. We type in the country field and we type in U.S., and we put the text
expression in quotes so that it knows that it is a text expression being passed
into the formula, and we hit okay. And we had a negative one here because
there is not a match between the U.S. and the country field that we see
here. This is not the U.S. However, if we scroll down, load more, and let's see, if
we select zero, ah ha, these are the U.S. countries. What this means is that when
we look at the country field, the U.S. starts to position zero rather than position
one that it would in Excel. Now let's revisit our NOAA pdf Power BI file. We only
have one connection in this query but we want to use a text formula on some of
the fields in this table. Click on the supplied steps. We're going to start at the
promote header step, so let's remove the steps that are after it. Delete and
delete. So now what we see is we see the year, the month, and the day as
column headers in the file. We want to combine these three fields into a date
field, so to do so, one way we can try this out is by first changing the
year, month, and the day to whole numbers. And then we'll create a new
column to combine them together. Custom column, we're going to call
date, and we're going to use the U.S. formatting for dates. Start with the
month and separate these using a forward slash and an ampersand between the
fields and the text characters and so on. So you're going to add the day, and we
add the year. So we should see all of these fields concatenated into a single
field. However, they all error out. And the reason for that is because we are
combining a text field in the form of a forward slash with a number field in the
form of the year, month, and the day fields. So in order to do this, we can do
two things. We can change the data types for these three columns, but if we
need to use this again in other calculations, we have to worry about what effect
that will have on those other calculations or the transformation steps. So
instead, I'm just going to change the formula so that it converts these numeric
fields into text values. And we do that using the text dot from function, and we
put the parentheses around those numeric expressions. We can actually make
this easier to read by putting these on separate lines because it will not impact
the actual formula. And we'll put this one line, now we can see it all in one
view, and we see that the red squiggly line much like a Microsoft Word
error indicates that we have a spelling issue, or in this case, a syntax issue, so we
use a close parentheses and hit okay. So there we have it. We can see how we
can use text conversion function to convert numeric file use into text file that we
can then use in other formulas, which is pretty impactful and allows us to do a
lot of different calculations.

Conditional formulas

Selecting transcript lines in this section will navigate to timestamp in the video
The richer part of the transformation step includes creating formulas, such as
conditional formulas. The high level logic of the conditional formula says that if
condition one is met then we return this result. Otherwise if condition two is
met then we return another result and so on and if none of the above
conditions are met then we return this alternative result. We write out
conditional formulas in Power BI using the syntax we see here on the screen. We
need to keep in mind a few key points when setting up conditional functions
such as, "if", "then", "else if" and "else" all need to be in lower case. And because
we already know power query, formulas in Power BI are case-sensitive, the
function will not work otherwise. We start the first conditional expression with
"if" the other alternative conditions after that use the "else if" syntax and we can
repeat "else if" as many times as we need to. But then expression returns the
result for each line including the last alternative result if none of the conditions
before hold true. In the U.S. Census Power BI file, we go to the query for GOID
to County mapping web connection where we see the GOID details for the state
and county levels. Notice that the state of Alabama has a summary level of
40. While the counties that follow directly after it, have a summary level of
50. Let's create a conditional column that says if the summary level equals 40
then we return the area name because that shows the state otherwise we return
null. We go to add column, we can select a custom column. I'm going to call this
"State" and we will say if summary level equals 40 then we return the area
name else we return null. And we'll see why null is important in another
video where we talk about the transformation steps from there. Now we see
that because we have summary level of 40, we have the state name and the
counties underneath will use Alabama as the state name as well. If we double-
click on the add column step name we see that it actually uses the conditional
column functionality from the add column tab. This makes it easier to write up
formulas but it also limits the flexibility for creating complex conditional
formulas. And there we have it, pretty simple and pretty powerful example of
conditional formula.

Filling up or down columns

Selecting transcript lines in this section will navigate to timestamp in the video
- [Female Speaker] Returning to the US Census Power BI file, and the GEOID to
county web query, we put the state name in a separate column, by leveraging a
conditional formula. The area name, shows the state name, followed by the
counties within it. We scroll down the list, we see the state name is null, until we
get to the state of Alaska. What we want to see in this table, is the county name
with the state of Alabama in the field next to it. In order to do this, we are going
to leverage the Fill Down functionality. An important thing to note about using
Fill Up, or Fill Down, is that: in order for the fields to be filled, they need need to
have null values in these fields, it won't work with other values, such as
"blank". We select the state name, we right click, go to the Fill menu, and select
Down. So now we see: Autauga County, Alabama, scroll down, until we get to
Haines Borough, Alaska, for example. The last thing we need to do is, because
we don't want to have: Alabama, Alabama, because Alabama is not a county, we
can scroll over, and remove the 40-level summary from the data table. Lastly, we
can just rename this Area Name, County, to make it easier to read. There we
have it, we see how the Fill Up or Fill Down functionality can help get our data
table looking the way we want it to be.

Leveraging date formulas

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] In the CDC flu data Power BI file, we select the query for the CSV
file, which is the same as the SQL Server connection, which we will visit
later when we examine enrichment steps for native queries. Having dates that
correspond to the flu mortality numbers in a continuous standardized state
formula will neighbor time series later in dashboards. In order to create a date
field, we need to look at the year field and the week field. We first convert the
year field into a date. However, instead of seeing that it's the first day of
2016, we instead see the 7th of July, 1905. And the reason for this is
because we're converting a whole number into a date, rather than replacing the
existing step. The way around this would be if we remove the change type two
step and go back into the original change type step one. Then we click on the
whole number data type and change it to date. Select insert and replace the
current step. Now we see that the year is January 1st of each respective year. To
add the dates to this table, let's go to the last step for reading the columns. And
then, we go to add column and choose to add a custom column. We're going to
call this new column name dates. In the formula bar, we type in date.add and we
can see that we have five different options for the time interval we want to add
to the dates field. We're going to select add weeks because we have the weeks
into the year. We see it here next to the year column. And we reference first the
year, which is January 1st of each respective year. Then we add the number of
weeks into the year by typing in week and hit tab to pick up the number. Close
this off. However, one issue we need to make sure that we're thinking about is
whether or not the week starts at one or if it starts at zero. So let's click okay
and go back in. We're going to just look and see what the ranges are. Ah, so it
starts at week one. This means that rather than having January 1st as week
one of the year, we instead have the date a week later. To remediate this, we
remove the filter step that popped up and we subtract one from the week. We
hit okay. And now we see that in this row, which is in the first week of the year, it
starts at January 1st instead of January 8th. Also if we want to take this year
value and move it to the end of the year, which is saying these are the total flu
cases in this entire calendar year, instead of the first of January, we want to use
the 31st of December. We can do this by creating another custom date
column. We're going to call this end of the year date. And then we'll see the
formula pop up again as we type it out. And again we see five different
options for the ending of a time period. This time we're going to choose end of
the year. And we can reference the year field, which would be January 1st. Or we
can reference the date field that we created that's continuous over each week of
the year. So every seven days we have a new date. I'm going to just use the
year and close off the parenthesis. So now we'll take the January 1st date and
move it to the last day of the year. And hit okay. And there we see we now have
the last day of the year for each of our respective years, every year. The last
thing we want to do with dates is to look at the day of the week. January 1st
occurs on a different day of the week each year, which has a small impact on the
spacing between dates that we can account for. Let's create a date formula to
determine the day of the week that January 1st is on. To do so, we go to add
column, select a custom column. I'm going to name this column day of the
week. And in the formula bar, type in date.day. We can use day, day of the week,
day of the year, giving the day a number is not particularly helpful to us. So
we're going to use the day of the week name. And again refer to the year
field. And hit okay. And we can see that the January 1st of 2016 is on a
Friday, while the next year January 1st is on a Sunday. And there we have
it. With a few pretty straight forward date formulas, we created a continuous
date field and some other helpful date functions that we can later use in Power
BI dashboards.

Combining binary files with formulas

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] Remember we defined binary objects as files combined or read
using other functions. Excel and PDF files for example typically contain multiple
tabs or tables in them or what Power BI knows as table objects. We previously
drilled into single binary objects contained in a folder connection to see the
table objects in them and combine them. However, if the folders contain many
tabs or tables, we first need to read the file using a formula to extract the tables
or the tabs from within them. Then we can combine this into a single data
set. We previously set up a connection to a folder of NOAA CSV files. And when
we combined them, we lost the file and path details. Let's try to combine the
CSV files using another methodology, which includes a formula. We first
duplicate our NOAA CSV query. I'm going to rename this with formula on the
end. We're now going to read the CSV files instead of combining them by
creating a custom column. To start with, select the invoke custom function, right
click, and delete all the steps until the end. Now, we add another column to
read the CSV file. I'm going to call this column CSV Tables. And to read the files,
CSV Document, and reference the content column, which contains the binary
objects. Hit okay. We see that this column returns a table object for each row in
this new column, which we can then expand into a combined data table. From
here, we can remove any columns we do not need by CTRL + selecting the
columns we want to keep. And I hold down Shift to keep those tables. Keep the
folder path. And the file name. Right click, remove other columns. We can then
promote the header, first row's headers. And one more step we need to take to
at least make this data set look presentable is we remove the duplicated column
headers that are now in the combined data table. So, it's going to select the
TMAX attributes, because there's not a lot of value options in here, and select to
not keep it. And there we have it. We can combine the CSV files using a
formula instead of selecting to combine them in the interface. In the NOAA
Excel and PDF folder Power BI file, let's revisit the connection to the Excel
folder. Duplicate this query and rename it with formulas in the end. Now, we
need to create a formula in this table that allows us to read the tabs in the Excel
workbook. To do so, we're going to remove the last steps in the query
editor, which we use to view a single Excel file. We then select add column,
custom column. I'm going to call this Excel Tabs, and I'm going to use the
function excel.workbook to read the files in the content column. Select okay. So,
we have a new column consisting of table objects that we can then expand
out. And this allows us to see the tabs within all the Excel files in the folder. We
can then combine all the table objects from these tabs into a single data set by
selecting the diverging arrows in the data column. And from there, we can
promote the headers, remove unneeded rows, and filter out the duplicated
column names as we saw in the CSV folder connection earlier. Now, let's
examine the PDF folder connection again. I added another PDF file to this
folder by duplicating the existing PDF file, so that we would have more than one
file in the folder that we could work with. I'm going to duplicate the NOAA PDF
query. And again rename it with formulas on the end. And in applied steps, I'm
going to remove all the steps other than the source step. I also need to rename
the folder path. And we'll talk more about this formula bar and what I'm doing
later. 20605. So now we see two PDF files when we refresh the query. To
combine these files, rather to combine the tables in both these PDF files, again
I'm going to add a column, and this time I'm going to use a function specific to
reading the tables in a PDF file. I'm going to call this Tables and pdf.tables. And,
again, this function is going to reference the content column. I hit okay. So, let's
help the diverging arrows to expand these two table objects out. What we see
now is all the tables within the PDF file. So, to expand the data from all the
tables again, we click on the diverging arrows in the data column. So now there
we have it, a combined data set of all the tables in multiple PDF files. From
there, we can remove the unneeded column, promote the headers, or filter out
any duplicated names for the column headers in the fields. Now we know how
to use formulas to read the tables and files and combine them into useful data
sets.

Accessing native query in enrichment

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] In this CDC flu data Power BI file, we go to the SQL server
connection query to test out the impact of enrichment steps and the
transformation step have on query folding. As a reminder, query folding allows
Power BI to translate our transformation steps into SQL code to send back to
the database, thus improving performance. We are going to take the query we
already have that we've used for previous query folding and remove some of
the later steps. We can view the native query in this renamed concept by right-
clicking on the step name and selecting the view query because it is available
and it's not grayed out. We see the native query dialogue box shows us
transformation steps in the form of SQL commands. Now, let's add a column to
this table and see what the impact to query folding creating formulas has. I'm
going to create a new column called flu incidence rate. To get the flu incidence
rate, I'm going to assume that we can do this by dividing the total cases by the
total patients. We see the new column added to the data table. We also see that
we can still view the native query. Now, let's add a conditional column. I'm
going to call this season. Just as a way to test this out, the week equals 40. Then
I'm going to say it's winter; otherwise, it is summer. And hit okay. If we right-
click on this add conditional column step, we see that the view native query
option is still available. Now, let's add a date function that we previously learned
about earlier in this chapter. I'm going to call this date, and I'm going to use the
date function to add the number of weeks to the year. And add the week, hit
okay. All right, we can see that not only does this error out because we are
creating a date function using a text value for the week and the text value for
the year, but it also causes a strange issue with the flu incidence rate, and we
cannot view the native query for this data table anymore. Let's click on the x to
get rid of that last column. Now, let's right-click again on the conditional
column and view the native query. And what we see looks a lot like SQL
code. We see that for the the conditional statement, it uses the SQL case
commands and when we created the new field for the incidence rate, it not only
divides the total cases by the total patients, but it also converts the
numbers which were originally text values in the data table into number values
to calculate that number. So, there we have it. We can see how query folding
works for these enrichment steps and how we can add formulas and still have
access to the query folding.

Question 1 of 3

Someone sent you an entire year of data as 12 months of Excel files with several
tabs in them, but you only want one tab from each Excel file. What is the best
approach to combine them in the query editor?


Open up each of the Excel files and remove all the tabs that you do not
need to isolate the single tab in each file that you do need, then save
each of these files as CSV files in a single folder.

 Create a formula to read the Excel tabs as CSV files, and then combine
these table objects into a single dataset.

Incorrect

Even though you may be able to open CSV files in Excel, your Excel files
have a different formatting that the CSV files, and thus you need to read
them using functions in the query editor that work specifically with Excel
data connections.

 First set up a function to read the tabs within each binary object (the
Excel workbooks) as table objects, then combine these table objects into
a single dataset.

Correct

You cannot combine Excel files directly in the query editor, for you first
have to set up a function using Excel.Workbook to read the table objects
in the tabs in the workbook, and once you extract these table objects,
you can combine them together using native query editor functionalities.

 Combine the Excel files and tabs into a single dataset using the
functionality in the query editor that allows you to combine them into a
single dataset in a single functionality.

Incorrect

Unlike combining CSV files, there is not native functionality in the query
editor to combine Excel files together as you first need to extract the
table objects from the individual tabs inside the Excel workbook.

Question 2 of 3

If you find that the fill down option does not work as anticipated on the target
column, what are some triage options to get it to work?


Delete the entire column.

Replace these problematic rows with 0s.

Filter the column to remove all the problematic rows.

 Isolate the nulls in either the column to fill down or create a new column
to put the values to fill down and the null values in the other rows.

Correct

The fill up or down functionality works very well on empty cells with null
values—so if you know the labels you want to use think about how to
isolate them in a new column and the set rows you want to populate with
these values to null.

Question 3 of 3

What should you keep in mind when you are creating text formulas in Power BI?

 Text formulas are capitalization-sensitive.

Correct

Power BI formulas, unlike formulas in other applications, are


capitalization-sensitive.


Text expressions do not require beginning and ending quotes.

Text formulas are not capitalization-sensitive.

Text formula returns start at position one, unlike in Excel where they start
at position zero.

7.

Working with Query Editor steps

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] We record our extraction and transformation steps in Power
BI using an array of function options. We see the types of functions available in
the Power Query M guide available on the Microsoft website we see here, which
can be a helpful reference. We can encounter two types of query editor
errors. Row-level errors occur as individual hyperlink row errors in a data
table. Examples include representing zeroes as dashes, which do not directly
convert to numeric values. Step-level errors occur when the query editor
step cannot return the intended result for the entire function performed or even
the entire query. Examples include dividing a column of text values by a
number. In the US Census Power BI file, we look at the GEOID to county web
query. When we click on a step in the query over here in the applied steps, we
see the function for that step in the formula bar at the top. The query editor
applied steps record the extraction and transformation functions and their order
performed within a query. For adding a column, we see the function we wrote
out to add the column, and we also see the table function Add Column. This is
what we will see later as the M Code for that applied step. We can physically
move the order of the applied query editor steps by right-clicking on the
applied step and selecting to move it to another position in the order of applied
query editor steps. In this version of Power BI, we cannot drag them into
another step order in the list. We need to be cognizant of potential issues that
occur when we move the steps around because order matters. For example, if
we move the merged columns step up one by right-clicking and selecting move
up, we see that the change types step below it now errors out. This case is an
example of a step-level error. Think of each of these steps in the applied steps
list as performing an extraction or transformation step on the output of the
previously-applied step in the list. If we do run into these issues, either move the
step back to its original position or determine where the step errors out and
delete that step, and perhaps those thereafter. To correct our query, I'm simply
going to move the merged columns step down one, back to where it came
from. So again, we right-click and select move down. We now see that we don't
run into any other step errors anymore. If we open up the advanced error at the
top of the Home tab, we can see the query editor steps as lines of M code that
record these functions and steps in the query. We can copy this code and paste
it into an entirely new query or file and it will work the same way, as long as we
bring over any other queries referenced in this code.

Breaking down syntax

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] Microsoft developed the M Formula language specifically for
working in Power Query, although it shows similarities to the F-sharp
language. It is a straightforward functional programming language that treats
computation as mathematical function evaluations and avoids changing-state
and mutable data. Within a mash-up query like M, we can have variables,
expressions, and values. M follows a fairly straightforward yet strict syntax. The
query starts with the let statement in line one and ends with the in statement in
lines five and six. Encapsulated between the let and the in expression, we see
the step name set equal to the variables, expressions, and values constituting
the ETL processes in lines two, three, and four. Because the step order
matters, many of the expressions that transform the query reference a previous
query step but cannot reference a future step. M separates applied steps by
mandatory commas but also optional line breaks that make it easier to
read. The last line of code, or the step, will not have a comma after it as we see
in line four. By referencing the last step after the in statement, this last step is
the output the query returns in line six. M is case-sensitive. We can write out
variables as strings without quotations in lines two and three, but if we want to
name variables with spaces, we need to start off with the pound sign and then
put the variable in quotations as we see in line four. Columnar field names do
not need punctuation around them regardless of whether or not they have
spaces.

Renaming steps in M

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] Optimally, in creating custom code in M, we start with existing
Query Editor steps, and then modify the code where needed. This saves time
and avoids potential errors. In out US Census power VI file lets add a new folder
to the query editor. We right click on the quarry editor panel and select new
group, we're going to call this Custom M Code Examples. Hit okay. We see that
the other queries or the ones that were previously there now exist in their own
folder. We duplicate the query population by zip code. Right click and select
duplicate. We can then move this query into the custom code examples. We can
right click on it and select custom M code examples. I'm going to rename
this example one. First lets go into the M code of this query by selecting the
advanced editor. Lets first add some spaces to this code. We can add comments
to programming code to document what what a step does. To add comments
to M use forward slash, forward slash and I'm going to just write a note under
this source step that says obtained data source online, and we hit done, we see
that adding this additional comment has not effected the code because we still
return the same query. Go back into the advanced editor window. I'm going to
move this to the end of the code, for source. Typically the query editor uses the
performed function to automatically assign names to the steps. It is clear to
changing types or promoting headers and if there's more than one change
type in this code it will add a one to the end of this expression. I'm going to
delete that. We can actually change these steps, to be whatever names we want
them to be. Let's change the M code to kind of have a bit of fun with it. Here I'm
going to say lets start learning Power B.I is fun. In this step because Power B.I is
two words I need to keep this pound sign and the quotations around Power
B.I. So we hit done, and we see we've run into an error issue. We actually need
to go back into the advanced editor and update the M code. We need to
update the steps in the functioning expressions so they properly reference our
new steps. Go back into the advanced editor, each expression in our example
query of references the previous supplied step in the M code. This means that it
took the output return from the applied step and applies it to the next
extraction or transformation step. We know that the source step was the original
step in the query. So we need to update the source step to lets, and we go
down the list and update these steps. Now let's see if it returns the code. We hit
that again, we need to make sure on the last step that we changed this change
type to fun. 'Cause we're returning the last applied step in the M code. The fun
step is the output we want to return. Click done again. We see we have kind of a
fun expression in the applied steps now, and this query works the same
way we've just changed the step name.
Consolidating M steps

Selecting transcript lines in this section will navigate to timestamp in the video
- In the US Census Power BI file, let's duplicate the query population by zip
code, rename it with, "example two" at the end. We want to move this entire
query into the folder for custom code examples. You right click on the query
name, select "move to group" costume M code examples. Click on this code to
select it. In the advanced editor window, we see the M code. Let's maximize the
size of the screen on the top right. In our query the first term of each applied
step expression references the previous step applied in the query editor. Think
of it as taking the output of the last applied step and adding another
transformation function to it for example. We can set up the steps in a single
line of code if we would like by putting the steps inside of one another. This
means we can consolidate the code by taking the expression from the previous
step and placing it into the current applied step where we see that step name
referenced. So here we take the renamed columns. We're going to copy the
expression for it and put it as the first term in the change type expression. We
can then delete the renamed column step. Then if we go up to the remove
other column step, we can take it's expression and again put it inside the
applied step below that. We're going to continue this until we have all the lines
of code inside one another. We now see all our M code in a single line of
code. While this makes the step shorter it also makes the functionalities much
more difficult to read. But it's good to know that this step is an option. We hit
done. And we now see we just have one applied step and that's the change type
step which was the name of the one step that we now have in the advanced
editor.

Adding data types as custom M code

Selecting transcript lines in this section will navigate to timestamp in the video
- [Voiceover] We can add the data type as custom M code  to the end of the
function when adding a new collum to a table. Which translates to one fewer
step in the query editor. On the screen, we see the functions for adding whole
numbers, decimal numbers, dates, and text data types to the end of a new
collum function, Which differs from converting data types in a separate step. In
the Us Census power BI file, we duplicate the query for GI ID to counting web.
We add example three to the end of the query, and right click and move to
group custom M code examples. We click on this query to select it, if we go
over to the state column we created using a conditional function we can change
this into a text data type by clicking on the icon at the top and selecting
"text." We've changed the data type but we also added a new applied step to
our query editor. Instead of putting the data type changes in this separate step,
we can add another data type inside another previous function such as adding a
column. When we select the "Add Column" step, we see the function used to
create this conditional column. We then filled down to put this state name in
each empty cell of this column. To change the M code, we go into the advanced
editor window. In the M code, we locate the line for adding a column to the
data table. We add a comma to the end of the expression, then type in
"text.type" which we mentioned earlier is different from the expression
used when converting data types that we would see here in this last step for
changing the data type. We click "done" we go to this added custom step
again. We now see that we have a text data type for this new collum. This also
updates as we look at the applied steps after adding the column. We can then
delete this last change type step. There we have it! How to use custom M code
inside an already existing query editor step.

Question 1 of 4

Order matters when creating custom M code.


FALSE

 TRUE

Correct

The entire process of setting up the extraction process for the data, and
then the transformation steps is very dependent on the order of steps—
you can think of each step as a transformation of the data table
produced from the previous step, and the step before that and so on.

Question 2 of 4

The query editor steps translate into code in the M formula language. How
exactly would you define this programming language?


It is a language similar to Python because it enables you to import
modules to run in the query editor.

Like Excel, you do not have to worry much about making sure you use
capitalization within the code because the language implicitly defines
whether there needs to be capitalization.

 It is a functional mashup query language that enables flexibility and


creating calculations in a case sensitive syntax.
Correct

M is a functional language that enables some neat query mashups and


combinations, but nothing particular fancy or complicated—it is case-
sensitive syntax as you saw with the formula and that is a key point to
remember.


It is a dynamic language like Python that pulls from an expensive library
of functionality and interaction with other applications.

Question 3 of 4

Which is NOT a data type that you can insert at the end of a column expression
in creating custom M code?


Decimal.Type

 Integer.Type

Correct

This is not a function in M and will cause the entire query to err out—use
a data type like Int.64 instead.


Text.Type

Date.Type

Question 4 of 4

Let's set up a theoretical query in M using functions in place of the query editor
expressions to make them easier to read. How would you combine the M code
below into a single line of code?

let Source = getdata(x), Step1 = newcolumn(Source,y) Step2 =


removecolumns(Step1,z) Step3 = filterrows(Step2,a) in Step3


let Source =
getdata(newcolumn(Source,y),removecolumns(Step1,z),filterrows(Step2
,a)) in Source

let Source = getdata(Step1,Step2,Step3) in Source

 let Step3 = filterrows(removecolumns(newcolumn(getdata(x),y)),z),a)


in Step3

Correct

Remember that each step in the process in the native query editor steps
is dependent on the outcome of the previous one, which means you can
take the previous step in the query transformation process and place it
inside the step name referenced in the current step—you repeat this
process again for the previous step before that until you place all the
steps into a single line.

 let Step3 =
getdata(x),newcolumn(Source,y),removecolumns(Step1,z),filterrows(St
ep2,a)) in Step3

Incorrect

You want to return the last step in the M code as your result, but you
need to place the previous functions inside of each other in the reverse
of the order that it occurs so you see the step for extracting the data
from the source as the innermost function.

8.

Utilizing parameters

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] We can define parameter as the query object that returns a single
self-contained value, that minimizes reliance on outside files and we can
potentially use in multiple other queries. In a new Power BI file, we're in the
Power Query Editor. Where we now select Manage Parameters at the top and
select New Parameter. Name this parameter Current Date and choose it's date
as the Data Type. From the drop-down list of Suggested Values, we see three
options. We keep it this as Any value and enter the date for the current date
into the box for Current Value. I'm going to revert this back to Any, so that we
can go ahead and enter the current date. We then select OK. We see loading the
parameter creates a new query denoted for the parameter icon. We can change
the parameter values by entering a new value into the field on the screen. Click
on the Advanced Editor window, where we can see the parameter has it's own
M code and metadata. Go back to Manage Parameters to add more
parameters. Again we select New Parameter. This time we're going to call this
parameter Start Date. We select Any as Type so that we can enter the dates. Any
value and then we use the current date and we add the start of the year around
this value. So this gives us the value for the start of the year. Then go into the
Manage Parameters and let's add a new parameter for the End Date. And this
I'm going to just take the value we used to create the start date and put the end
of the year instead of the start. We hit OK. Now we see three parameters loaded
into the query list. Note that if you plan on loading to Power BI Pro, we cannot
change the parameters once we publish the report. We can convert the
parameters back to normal queries by right clicking on the parameter name and
selecting Convert To Query. We notice that the parameter changes to another
type of query by the icon next to it. We save this file as M Language
Objects. There we have it. We see how we can create parameters which we can
then use in other Power BI functionalities.

Creating list objects

Selecting transcript lines in this section will navigate to timestamp in the video
- Remember that we can think of a list object as a single column within a table
or table object. The M language directly supports objects such as tables, lists,
records, functions and parameters. Within the M language we classify lists,
records, and tables as structured values or ordered sequences of values. We do
not typically load lists as data but rather use them as intermediaries in
developing other queries. We can create lists using functions that create original
lists or that translate into list objects. As we can see here with the example
functions on the screen. In our M language objects Power BI file, for the first
method we can create lists manually using formulas between curly
brackets. Let's create an example list with both character and number items. We
select new source, select blank query, and we start by typing the formula. 2 5 A
C Notice that the characters need to have quotation marks around them. We hit
enter. We now see we've created a list as indicated with the icon next to the
query. Let's rename this query, comma separated list. Next, let's create a
contiguous numbers list. Again, we select a new source, blank query, we type in
1.. for the starting value, dot, dot, indicating it's a continuous list, and 365. Let's
rename this query, contiguous numbers list. We can see this forms a list that
looks a lot like a table column. Except we see lists at the top and the list
icon next to the query name. For the second method, we can create
lists through M list functions that use input parameters. We create another black
query, this time we type in the function, text.tolist helloworld and hit enter. We
can see that this takes each of the characters in the text string and puts them as
a separate item in the list object. Let's rename this query, Hello World
List. Lastly, let's create a list object using a dates list function. We create another
blank query, and we use the function, list.dates starting with the current date for
today. Datetime.date listlocalnow Notice that we put the date time date around
the function to create the local time for today's date. What this does, is it
eliminates the time from the date time function. We put 10 in because we want
to create a list with 10 dates. Then we add 1 and hit enter. We get an error
message. This is because we need to update the third term, the interval, to
reference the duration. So to do that, we use the duration function to return the
day interval. So 1 day 000 to indicate a single day. And hit enter again, rename
this, dates list, there you have it. Now we know how to create lists in M
manually starting with black queries.

Referencing a list as a column in a table

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] Continuing on from the previous video, there's a third method that
allows us to create lists by referring to a column in a table either as the table
name or the step name. Since this method does not directly reference a list in
the expression, we do not immediately recognize it as a function to generate a
list. In the M Language Objects Power BI file, we're now going to add the
Population by Zip Code query from the U.S Census Power BI file so we can test
out how to create list objects from an existing table. Copying over the query
allows us to consolidate our test work in M Language into a single Power BI
file. Here's the M code from the Population by Zip Code query. I copy this M
code. And in our current Power BI file, I select New Source, Blank Query. I leave
the formula bar blank and I select the Advanced Editor window. And I paste all
of the M code into the text space. Hit Done. And I rename this query Population
by Zip Code. First, I duplicate this query. I can right click on the query name or I
can also just select the query, Control + copy, Control + V. I'm going to rename
this query. List from step. I right click on the Zip Code column header, select
Drill Down and we now see a reference to the list of the column of data in the
formula bar. Notice that this step references the Changed Type as the table we
are getting the list object from. The Zip Code is the column of the previous
step. This gives us a shortcut to the M code to extract the column values in a list
without necessarily using the interface commands and this can be very useful
later. Notice the original query has a little table icon next to it indicating that it
is a table object. Now I'm going to create another blank query and type into the
formula bar equals Population by Zip Code which references the table
name. And I use the open brackets and here I put the column name which is Zip
Code. Hit Enter. We now see this column becomes a list object with the list
icon. Let's rename this Query List from Table. There we have it. Two ways to
create a list object from an existing table.

Leveraging record objects

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] Like list obejcts we can create record objects manually but with a
few key differences. We start by creating a new blank query with a single
record. Then we go to new source, select blank query, in the formula bar, we
type equals city equals Los Angeles, state equals (keyboard
typing) California. Remember that we enclosed list objects in curly
braces instead of the square brackets. Also notice that we have commas
separating the field names and we usually label the field name within the
record. Hit enter. Let's rename this query, Records in M. If we add curly brackets
around the outside of it, we now create a list containing the record
object. Convert this to table. Hit Okay. Expand out the table. So we have now a
single record object. Now we've converted it into a table, let's add two more
records to make it a multidimensional table. To do this, I'm going to add
another record to the list. This time city equals Santa Barbara and state equals
California. Now enter again. We now know how to create a table of records
directly in M.

Leveraging list functions

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] We can perform calculations on lists, including the count,
minimum, maximum, average, and sum of the items in that list. Earlier, in the US
Census Power BI file, we grouped the population by zip code data to return the
maximum population for each zip code. If we go into the code in M by selecting
the advanced editor, we see how the query sets up the function so that for each
zip code, it creates a list object containing the corresponding populations. It
then uses the list max function to return the highest population from this list
object. We go back to the query list and we'll duplicate the population by zip
code with grouping. We'll rename it sum. Now in the grouping row step, we can
go into the advanced editor and see all the M code or we could also go into this
formula bar and change the max to sum and see what impact this has on the
query. Now, instead of calculating the maximum population for the zip code, it's
calculated in the population for the sum of the populations by zip code. If we go
into the advanced editor window, we see that the grouped rows easily reflects
our changes. See the impact that list objects have on the grouping functionality
now? So we have already been leveraging list objects but Power BI is using
them behind the scenes, and now we understand how.

Creating date tables

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] Date calendar tables are immensely useful in Power BI, and we can
easily create them through custom M code. In the M language objects Power BI
file, let's create a new list directly in the formula bar using a blank query. Let's
put a date formula into the formula bar. To the "datetime date of the date
time.local effects now. And hit enter. We get an error message when we try to
create this list. Therefore, in order for this query to work, we need to use a work-
around. In the same formula bar put "=1..100" and hit enter. This creates new
list of contiguous numbers from 1 to 100. We convert this to a table, choose to
use no delimiters. We then change the data type to date. Let's rename this
column "dates" and the query "dates table". Now we need to make changes in
the advanced editor window in the M Pub. Let's add a few extra rows in front of
the source step. We set up the start date to "= Date.StartOfYear" and then again
going to reference the current date. Obviously yours is going to be different
than mine. So what this function does, this piece gets us the current
date, whatever date that might be for you, and then this gets us the date and
time of the exact moment. We just want the date, and then we just want start of
the year. So that's why there we see three functions nested inside one
another. And you add the end date, Just going to copy the start date and then
the line below and use the end of year. Now one last thing we need to do is
remember the 1 to 100 returns a numerical list from 1 to 100? So in order to put
the start dates in we need to put the "number.from" around this entire
expression for the start date and then do the same for the end date. And in the
source step, we replace the 1 with the start date and the 100 with the end
date. And we hit done. We just created a dynamic date table for the current
year by leveraging the power of list objects in M.

Looping with lists

Selecting transcript lines in this section will navigate to timestamp in the video
- [Narrator] We want to create a new dates table and in order to do so we are
going to use a few more helpful list functions. List.Generate creates a new list
using loop logic. List.Sort puts a list in numerical order. And List.Combine
creates a new list from two or more lists. In the M Language Objects Power BI
file, we just created a dates table using list objects in the previous video. What if
we want to create a date table, that only includes the pay cheque dates which
occur every two weeks on a Friday? We first create a new group called Dates
Testing, which we are going to add these queries to. We first create a reference
date using a blank query. I'm going to turn the text value into a date value using
the Date.From function. I know that August 23rd was a Friday. And I hit
enter. I'm going to rename this query Reference Date. Now we go into create
another new source and this time I'm going to use the List.Generate
function. Here we see that zero is the starting value, we increase each item in
the list by increments of two. Until we reach 10, when the loop stops. The
underscore character serves as a variable in conjunction with 'each' for the list
function. We hit enter. Now we convert this to a table. Select to use no
delimiters, and change the data type to date. Okay we know we can use the
List.Generate function to create a dates table. Remove the last two steps from
the query. And we go back into the advanced editor. Just create a few lines to
add some more code. I'm going to create a line called reference date, and we
know we need to start the value in the List.Generate function as an actual
number. So I'm going to use the function Number.From of the reference
date. Which is pointing to the reference date query. Next I do a start
date. Date.StartOfYear Of the Reference Date again. And again I need to put the
Number.From around this expression. And lastly let's add an end date. Dot from
of the Date.EndOfYear. Of the reference date. And put commas after these two
expressions. And let's just test this out by returning the end date in the
query. So this means that rather than returning the last of the source set, we're
going to return the end date. Hit done. Okay what we see is our end date as our
actual number value. Rather than a date. In the advanced editor, we're going to
start by creating a list that uses the reference date as the start date. We
increment each item by 14, which is two weeks between them. And then we
stop at the end of the year. That's the end date. And this time let's return the
source. And hit done. Now we see we have the list of the whole numbers that
we can later convert to dates to get a dates list. So we still need to make some
more edits. I'm going to call this list one. And I'm going to create a second list
called list two. And this time I'm going to also use the List.Generate function
again. And this time we are going to decrease each value by negative one. And
start it at the beginning of the year. We'll call this the start date. And end it at
the reference date. First of all, a thing we need to do is we need to remove the
equal signs so that it does not include the reference date as we create the
list. So we do not duplicate the reference date as a list item in either of
these. I'm going to call this new list date. And I'm going to say List.Combine of
list one, comma, list two. And I'm going to return this new dates list. Hit
done. What we need to do for the List.Combine is put each of these lists within
another set of list characters. And hit done again. Now we see we have the list
of the whole numbers we can then convert them to dates. So we select to table
again, and convert to the date data type. One issue that we see here is we start
at the reference date and go to the end of the year. And then start at the
beginning of the year, and go through to the end, so we need to make some
more edits to this. We first need to change the order of this, so that list two
comes before list one. Undo, done. And we see that we now have a dates
table with 14 days between each of the dates. Let's rename this column
Dates. And there we have it, we can see how we can create a list of pay cheque
dates, using some functionalities in M.

Combining list objects

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] In the NOAA station list Power BI file, let's go back to our original
NOAA API connection which extracted API data originally as a list object. The
NOAA rest API query we set up returned the data in a JSON format as a list
object, which had API query limitations of 1000 records and one year of data. So
now what if we put several API queries into one Power BI query and join
together the list objects? To do this, let's first duplicate the NOAA API
connection. Let's duplicate this again, and I'm going to rename this NOAA API
2018 and the next one, NOAA API 2019. So then I'm going to remove all the
steps after the navigation. So convert it to table, I right click and say delete until
the end, confirm, and I do the same for the 2019 data. Right click, delete until
the end. We see now that both the NOAA API connections for 2018 and 2019
are list objects. So to create a combined query, let's set up a blank query that
will combine the two lists together. Set this up as lists dot combine. We'll use
the open curly braces and we're going to refer to the NOAA API 2018,
comma, NOA API 2019. Hit enter. I'm going to rename this combined NOAA API
2018 and 2019. From there, we can just convert this into a table where we can
extract the data and this gives us the dataset. However, it would be much more
helpful if we could consolidate both of these queries into a single query. I'm
going to combine the 2018 and 19 into a single query and eliminate the list
objects we see above. I duplicate the query and rename it all dates. So now
what I'd do is I'm going to, again, eliminate this so we're left with the list
object and instead of referencing these other list objects, I'm going to put the
list objects inside this query. So I need to go first to 2018 and I'm going to copy
the steps inside this query. Then go to our latest combined version, and I need
to rename this source one, results one, source one, to make sure that all these
versions of source one refer to the 2018 numbers. I can then hit done. I then do
the same for 2019, and I copy the steps and go into the combined query
again and I put the steps below. Instead of one, I put a two suffix at the end of
the source and results steps. Now source is going to reference results
one because that returns a list object and results two, which also returns a list
object. So we'll click done and we see we've combined these lists together as a
single list object. From there, I'm going to convert this to a table and expand
out the record objects so we see our results, and just to confirm when I select to
scroll down, I want to make sure that I see the 2018 numbers in here. Looks like
I need to actually update the 2018. Double check. We actually have the 2019
end data in here, so I update the start date and end date and hit done, and we
do the same in our combined query. 2018 and 2018 and hit done again. And we
confirm, yep. We see both 2018 and 2019 in a combined query. We can add
more lists to objects to the query than these two list objects we see here, but
this consolidates our queries so that we do not have to have many list
objects and have to worry about combining them too. Pretty cool stuff.

Question 1 of 2

In order to create a list object in the query editor, you need to see it explicitly
set up as a list.

TRUE

 FALSE

Correct

We can create a list object directly in the query editor by referencing a


column in a table object or a column in a step in the query
transformation process, neither of which explicitly say they are list
objects.

Question 2 of 2

Parameters allow you to dynamically enter values for queries.


FALSE

 TRUE

Correct

With parameters you can directly enter the values in the query editor
interface.

9.

Setting up custom functions

Selecting transcript lines in this section will navigate to timestamp in the video
- [Narrator] Power BI allows us to create custom function using Logiquare. We
first need to name the function parameters. We then create an expression to
calculate the result. So far in this course, we set up many different functions, but
by leveraging existing M functions, rather than creating custom functions. In the
NOAA CSV Folder Power BI file we created earlier, we see we have two
queries in the Other Queries section. Let's delete the query with formula at the
end, so we don't get confused. The Other Queries folder contains the combined
NOAA data set. We're going to duplicate this, so we can make changes to
it. When we combine the CSV files into a single data folder, the Query Editor
automatically creates a new, separate folder, called Helper Queries, with queries
for a sample file parameter, a sample file, and a transformation function. Let's
make modifications to the CSV query with two at the end of it. I'm going to
rename this custom. We see the applied steps on the right-hand side. We start
with the source step. Power BI automatically filters the hidden files if they're are
any and invokes the custom function, renames the columns, removes the other
columns, expands the data table, and changes the type. All of this is done
automatically. Let's remove all the steps from the renamed column step and
below. We select delete until end and select delete. This transform file
function is a table object that the function returns. If we double-click on this
step, we're given some information about the function. However, I actually find
this step at the top to be more helpful. We see that we are passing our content
field that contains the binary objects into the transform file function. If we look
at the transform file function, we click on the Advance Editor, hit OK to take a
look at the end code. We see it's a function object, where we pass the file into
as parameter one and it reads the file as a CSV document, and then promotes
the headers and returns the table object with the headers promoted. So, given
the logic that we can consolidate the steps and end into a single set, let's copy
the source step and place it inside the promoted header step. We delete the
source step and hit done, which updates the transform file function to have the
code in a single line. However, let's go back into the code. What we're really
doing here, is creating a function in the same way that we created a
function earlier in this course. Let's copy this code and put it in our combined
data table. Let's delete this column and now what we do, is we add a custom
column. I'm going to call this their Reading Table Objects. I'm going to paste
the code in here. However, parameter one is not a field in our data table that
we're creating this column for, so instead I'm going to pass in the content
field and hit OK. So, we see this function returns a table object, in the same way
that the transform file function returns a table object. Let's check that we're not
referencing any other queries. If we try to delete this group, it won't let
us, because the NOAA CSV query and the transform sample file are referencing
this query group. So, we can go into the other queries and delete this, select
delete and we'll go and expand these table objects. Okay, now let's see if we can
delete the helper queries. I'm going to delete and delete this and, finally, delete
the entire group. To transform this combined data table that comes from
combining the table objects, we can simply do these very familiar applied
steps, such as, removing the columns. We can remove the folder path, for
example. And here, we see a combined data set, where the function itself, lifts
when we add the custom column, which makes it much easier to understand. It
is easier to think about adding the custom column with these functions inside of
one another that return a table object. But it is also harder to read, and that's
why we may want to make it a custom function, to make it easier to understand
what's going on. Creating a new column that returns a table object, we can think
at the function parameter as the binary object, that it's the file passed into the
function and the expression that calculates the result, we return as a table
object, as the expression to calculate the result. We can then expand this into a
combined data table, which is much easier way to think of the custom functions.

Converting queries into functions

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] On the NOAA website, we already downloaded the zip files for the
2019 daily weather summaries. Now let's download the zip files for 2018 and
2017, and you can put these zip files, along with the 2019 file, in the same
folder. You can access these files in your 0902 exercise files folder. In a new
Power BI file, we're going to connect to a folder of zip files. We first set up the
connection to the folder and hit okay. We select to transform the data
later. Next, we're going to copy over the connection to unzip a binary file that
we created earlier in this course. The code for how to do that is located in your
0902 exercise files folder. Select to create another query, and just use blank
query and copy over the code. Hit done, and I'm going to update the folder
path to our current folder source. We can see that it unzips the data from a
compressed file. I'm going to rename this query single zipped file. Now what I'm
going to do is duplicate this query and turn it into a function object. Remember
we saw the function object in the previous video where we learned how a folder
of csv files uses a transformation function to convert the file into a table object
with a header promoted that we can then combine. We're going to create our
own custom query to connect to a zip file. To set this up, I'm going to call this
function unzipped, and I'm going to use x as the variable that we pass into
it, and I'm going to delete this single file path and put an x in it to allow us to
pass the file path into the source step, and then to change type, I need to put
let around this expression and then put an n and unzipped to be the expression
we want returned. Also going to indent this to make it a bit easier to read, and
we see that, essentially, the function object is a single step returned to the
bottom, and these are the steps that we perform on the variable passed into the
expression. I hit done, and I'm going to rename this function testing. Next, I'm
going to duplicate the connection to the folder of files and rename it function
testing at the end as a suffix, and now, we can think of adding another
column that we want to refer to this function object that's going to return the
result. So we know that the single zipped and we pass through the content. And
hit okay. All right, we get an error for this, and the reason for this is because in
order to pass in the file, we're not looking to pass in the binary file, so we click
on it, we cannot read it. So what we need to do is actually pass in the file
path with the combination of the file name. To do so, I'm going to go back to
the source step and select only the file path and the name and select remove
other columns. Now, I'm going to merge these fields into a single field by
highlighting them both, right clicking, and selecting merge columns, insert the
step, and I'm going to call this file path and hit okay. Now, going to go into the
function again, and this time, we're just going to add it again to make it easier
to read, and we add the column again, and this time, we're going to pass the file
path in as the x variable into the function and rename this data tables and single
zip file function testing and pass in the file path field. And hit okay. We can then
expand out the table objects so that we have all three years, 2017, 2018, and
2019, and a single dataset. We can also consolidate all the data into a single
query. Let's first save this Power BI file so that we can delete the other steps and
check that it works. Select apply later, and I'm going to save the files to our
folder. This NOAA zipped files separated steps. And you can find how to do this
in separated steps in your own exercise files. Now, I'm going to take this query
output, duplicate it, and then I'm going to go into the function object and copy
the code, and I'm going to rename this consolidated queries. I'm going to go
into the advanced editor to include the function object within this query. So I
give myself some space and then I copy the query for the function into this
code. I can remove the let from beginning, because this is not a standalone
function, and I can also remove that n unzipped at the end, and I need to put a
comma after the changed type as the last step in the unzip function. I'm going
to rename this rather than unzipped function one, and then where we refer to
the file path later, and this is where we call the separate function query, I'm
going to put function one in this path as well. This added custom function sends
the file path, which is the combination of the folder path and the name of the
file, into the function one expression as the x variable where the function that
we already set up when we connected to an unzipped binary file returns the
results as a table object which we can then expand into a combined dataset. As
we saw before with the csv functions, having the steps separated out into three
separate steps makes it much easier to read rather than having everything
consolidated in this added column step. Let's make sure this works. Click done,
and we see that the consolidated queries all are in the same query now, and I'm
going to remove the other queries by highlighting them and deleting
them. Select delete, and I'm going to save this file as NOAA zipped files
consolidated query. So now I can also remove the file path if I'd like, and we
would need to rename the columns, but we can see how we would use the
combination of folder files and functions to combine an otherwise difficult
query into a single query.

Configuring custom filtering

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] In the U.S. census Power BI file, we earlier set up the grouping to
return the highest county population for each zip code. The county information
is contained in the geo ID, which gives us the corresponding state and county
name for each ID. This type of functionality is helpful if we have addresses with
zip codes, but no county information and we want to figure out what county to
map a zip code to based on the county with the highest population. However,
this grouping only returns the maximum population rather than the county or
the geo ID associated with it. We can use custom M code instead to add
matching geo ID to this grouping. Duplicate the zip code query and rename
it with the suffix with M code at the end. We go into the advanced editor to
view the M code. Now we need to update the grouping so that it includes the
zip code column, but it also returns a new table object as part of this
grouping. We put in, we're going to call this new table object, all data. Each with
the variable passed through it, type, table, and then we're, again, going to
return the maximum population because we'll need that for our next step. We
hit done. Now we see that the all data returns a table object for each zip
code. I'm going to duplicate this query so we can take a look into the table
object. I'm going to rename it M with table object view at the end. Now for zip
code, 00601, I'm going to click on the table hyperlink and we see that looking
into this table object for the geo IDs a and the corresponding populations for
zip code 00601, we want to return the geo ID 72001 because that has the
highest population of the zip code. Now we'll go back into the query we're
working with and we need to add another step. We're going to add a custom
column and just call this custom with a value of one, which is a dummy variable
or dummy value that we're going to update in the M code. In M, we need to
make this new column a custom function that works as an inline custom
function that iterates through each table object that corresponds to a zip
code and returns the geo ID with the highest population. We can think of x as
the table object associated with each row of the zip codes and we're going to
pass it into this inline function. And in order to return only the geo ID with the
highest population, we're going to use the table select rows which is the
function for filtering. And x, again, is the table object, and we're going to pass
the all data that column into it, and say each population by zip code and we're
going to set this equal to the max population in the table object. And hit done
and see what this returns. Okay, need to make sure that got the functionality
correctly set up. Ah, we need to update this so we have all data in capital letters,
hit okay. So what we have is we've created another table object, but this table
object filters the all data table object to only return the highest population geo
ID. So we can remove the other columns. We can remove all data and maximum
population because we do not need them anymore, and going to expand out
the columns and we're going to not include the zip code or the population by
county because really, we're only interested in returning the geo ID and the
population by zip code. Hit okay, and there we have it. We can see how we can
use inline custom functions to change the way that our data table looks and the
results it returns.

You answered 1 of 1 question correctly.

Continue watching Retake quiz

Question 1 of 1
A function is an example of a type of Power Query object.


FALSE

 TRUE

Correct

In addition to table objects, list objects, and record objects, there are
several other lesser used Power Query objects, including function objects.

10.

Configuring loading options

Selecting transcript lines in this section will navigate to timestamp in the video
- After performing the extraction and transformation steps on our data
sources, we then load the data tables into Power BI. We can select if we want
all or only the selected queries loaded. For example, if we created a list object as
a query, we may want to leverage it to create other queries, but we do not want
to load it. If we decide to load parameters, they will no longer be dynamic. If
you would like to create DAX measures, I would recommend putting them in a
separate table so they are easy to see and access their calculations. With define
caching is a process where we hide something for future use, which in our case,
is data we are going to store in the memory. Caching causes latency issues and
large data tables with almost real-time requirements my not benefit from
caching. We also do not need to refresh tables we do not cache. And by caching
only what we need, we can reduce refresh times. Caching into memory through
storage modes improves core performance and report interactivity. We can set
the storage modes individually for each table in the model, which lets us control
the caching of data and memory for reports and enables a single data set. At
the time of this filming, Microsoft is further developing this functionality, so stay
tuned. In the CDC flu data Power BI file, let's not load the OData connection, but
still load the CSV and sequel server connections. To not load the OData
query, we select the query, then right-click on it and deselect the check mark for
enable load. We now see the query name in italic, and if we right-click again, we
see the enable load check box is now no longer there. To load data into Power
BI, click on Close and Apply at the top left. We now see two data tables loaded
into Power BI. We see three tabs to view the data: The Report tab, which is what
we see here, were we can create visuals and dashboards. The Data tab, which
allows us to view and filter the data we just loaded. And the Report View, where
we join the tables with a look very familiar to that of Microsoft Access. To view
the storage modes, click on a table in the model view and select the advanced
options at the bottom. The storage mode drop down list allows us to select
between import direct query and dual. This is a new functionality developed by
Microsoft, so stay tuned to see what updates come next. We can go back and
change the queries by hitting the edit queries button at the top, and these
changes are easy to make.

Fixing errors

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] Most of the errors that I ran into when loading data into Power
BI come from incorrect data types. Fixing errors after loading can be a huge
pain and slightly cumbersome, but recent Power BI update, and a bit of insight
on your end, can make this process much smoother. In the manual sample
Power BI file, let's set the sample data table so it loads with errors so we can
work through how to fix them. Earlier, in the applied steps, we replaced the dash
values with zeros. Let's click on the X next to the step name, so that the Change
Type step now gives us errors. One of the newer developments in Power BI is
the colors below the column name, where we see a red line indicating errors in
the data set. Our sample data set is small so we can see the errors, but the
likelihood you will not be able to see these row level errors gets much
higher when you're working with a much larger data set. Load this data by
clicking on Close & Apply. The data loaded really quickly because it's a small
data set, but we see an error message indicating there are three errors in a
single query. We click on the View Errors hyperlink, and this takes us back into
the query where we will walk through the process of correcting errors that will
also work with much larger data sets. Notice there is a new folder in the query
list, specifically to address the errors. It also indicates which time Power
BI encountered these errors. Click on this query and we see the Time period and
the Amounts fields and the original data set, but we also see the table filtered
down to only the rows with the errors. Click on the error hyperlink to see what
the message says. It indicates that it could not convert that row to a
number, because of the dash issue that we already knew about. Delete this
step. Note the rows where the data issues occur, and go into the actual sample
data set to make these corrections. Clicking on the step before we change
type, we see that, indeed these are dashes. So we go back to Change Type
step, and this time we're going to fix the issue by replacing the errors with
zeros. We enter zero, and we hit Okay. Notice that we no longer see the red
line under the Amounts column name. And the query no longer shows any rows
erring out. So we can delete this entire error folder group. If we look at the
query specific to the errors, we see we no longer have any rows with errors. We
select the query folder name, right click, and delete the group. We Close &
Apply, and see that this time, the data loads and we receive no error
messages. So, we're good to go.

Refreshing data

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] While Power BI allows us to easily refresh  data and update
connection options, we also need to exercise caution to avoid running into
issues or errors. The query automatically adds new rows and columns to an
existing dataset when we refresh it. However, if we rename, delete or change
positions with the columns and the data source, this will err out the refreshing
process. We refresh our CDC Flu Data Power BI Desktop file manually by hitting
the refresh button at the top. We see it updates both of the
connections. Refreshing the data also updates the entire query for the ETL
process, which includes any transformation steps that also got loaded. You can
think of Power BI Desktop as a design application, while Power BI Cloud Services
are more for the enterprise level and working across organizations. Here's an
example of a Power BI services account that we can load our Power BI Desktop
file to. This is just a demo, so please follow along to learn how this process
works. In our Power BI Desktop file, I'm going to setup this CDC Flu Data file so
it only includes the SQL server connection. I select edit queries, and we can also
refresh the data from the query editor by hitting this refresh preview, where we
can select which tables to refresh and the connections. Again, to disable the
load for a query, I right click on the CSV file connection and de-select enable
load. So now we see we're only going to load the SQL server connection for the
CDC flu data. Hit close and apply. Now I'm going to create a matrix for the SQL
server data. Select the matrix table and add the week to the rows, the year to
the columns, we can compare them year over year, and then the total cases as
the values. The one other thing I'm going to do, and this is just an example, is
update the week so that it is a numeric value, which will actually put it in
order. Now before we upload this file into the Cloud Services account, let's first
save it, which will save both the new visual and also our data connections. We
go to the home button and we select the publish button over here on the
right. We select the workspace that we want it to go into, I only have one
workspace, but if you have several, you'd be able to select them from a list
here. And confirm, we see we've successfully uploaded our Power BI Desktop
file to the cloud account. So we go into our cloud account and here we see the
dataset and let's go into the reports and now we see our small visual table that
we just quickly put together. In order to make sure that we have the latest
updates to our data in SQL, we want to set up a refreshing schedule to get every
week of the flu data as it gets updated. We go to the datasets tab, and on the
actions button, we hit schedule refresh. Now, we first need to download the
gateway connection. Your organization might set this up for you, but in our
case, we're going to use a personal gateway connector. So I select install now,
now that we've downloaded the connection, we go to the downloads folder, we
can double click on the on-premises data gateway file, oh, we see that we
already have the gateway installed on this computer. If you're installing this for
the first time, you will get a message indicating that it has installed. So I'm going
to go into my gateway connector and enter my email address to get it
connected to my Power BI account. Select the account you want to use and then
enter your password here to link it up to the Power BI account. We select sign in
for this here, we get a success message indicating the gateway is online and
ready to be used so we hit close. Now we go back into our Power BI Cloud
Services account and refresh the page, we see here that the status is confirmed
for the SQL server. Now the data source credentials, I need to go in and update
this and instead of basics, I'm going to use the Windows without
impersonation. I typically find that selecting the Windows account solves a lot of
issues I run into so I select the Windows option. And now we see we can confirm
that the data source credentials are properly setup. And lastly, go to the
scheduled refresh tab, turn the scheduled refresh on, I'm going to refresh this
weekly. And I'm going to refresh it on Monday morning at 3:00 a.m. so someone
can look at this report first thing when they get in on Monday morning. And we
pick the time and the timezone when we want this to refresh and we can also
add additional days or update times. And I select apply, there we have it. We've
learned how to refresh not only a Power BI Desktop file, but how to setup a
refreshing schedule so that we automate the data updates and make the best
use of the ETL processes we learned in this course.

Joining sets of data

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] Those of you who identify as Excel power users, use formulas to
create Excel tables that join data with other sources and lookup keys into a
single, consolidated data table. Power BI, however, performs better when we
join related tables after loading in the model view when they still remain
separate tables as opposed to merging them together in a single query in the
ETL process. We can think of the relationships between tables in two ways. The
first is the relationship between data tables and lookup keys, which typically
leverages a many-to-one relationship. The second way to think of the
relationships between tables is with the many-to-many relationship. Think of
this as blending or aggregating two data fields with a shared ID. But unlike the
many-to-one relationship, the second table can have many instances of the
ID in the same way that we see those ID occurrences in the first table. This is a
new update to Power BI, and a pretty exciting one. In the US Census Power BI
desktop file, delete all the queries except GEOID to county web and population
by zip code. We can then remove the web suffix from the end of this query
name. Which makes it easier to read. In the GEOID to county query, let's remove
all the columns except GEOID, county, and state. Highlight those three
fields and select remove other columns. One last thing we'd like to do, using the
knowledge we know about M language is to add a text data type to the state
field. Let's go into the advanced editor window in M code and put in
text.type and hit done. Where we now see that this simple step allowed us to
update the type in this conditional column we created earlier. The population by
zip code is the data table and the GEOID to county is the lookup key because
we will not see duplicate GEOIDs in this data table, but we will see duplicates of
the GEOID field in the population by zip code, because it's the data table. Now
we load both queries by hitting close and apply. And we see both data tables
successfully load, so now we're going to connect them together. And we see
that Power BI's used a bit of AI to set up the connection or what it believes to be
the connection. We can double click on this to confirm it's joining the
population by zip code on the GEOID field to the GEOID, and the GEOID to
county lookup table. We use many-to-one and cross field to the direction of
single right now. For the sake of making this easier to read, I always try to put
the lookup key above the data tables. So if we think of, if we have many lookup
tables, or lookup keys, we would have the data at the bottom and the lookup
keys above that. And this makes it much easier to read and for others to go into
your Power BI file and understand what's going on. So the data table has
multiple instances of the GEOID, while the GEOID to county has unique values
for each GEOID in the GEOID field. And going back into this connection, the
both option enables bi-directional cross filtering, which means that the filters
flow both ways for the tables, and this gives us more control for applying
filters to related tables. And we can think of this as a single data table that
technically comes into view similar to what we talked about earlier with
flattening the view in Excel, this is allowing us to see both sides of the data
set, so to speak. We hit okay. And if we want to delete the connection between
them, we right click and select delete. I'm also going to add an exceptions
table. I'm going to put this in manually by selecting enter data. And this is just a
very simple example, and I'm going to call this Exception Table, and I'm going to
put the state, and I'm going to use Alaska and Alabama as the two states that
we want to exclude from the data set. And the rationale behind using these
exception tables is that it makes it much easier to manage and see what we
want to exclude rather than having to manually filter them. And other people
can also see how we're dealing and managing with those issues. So, click
load. And confirm that we have this joined on the state field, and I'm again
going to select the cross filter direction as both. Now, to check how this
works, I'm going to go into the report view and put the zip code as a table. And
then I'm going to put the population by zip code and the population by
county, and this is aggregated as a sum, and then I'm going to use the GEOID to
county, and I'm going to pull in the county name, and the state name as well. So
we see how joining those tables together allows us to create a single
consolidated data set that's easy to understand the relationships between the
tables. And if we add the exceptions table, we have to select and put this in the
filters. So, here we see the filters listed here, and we can just take the state and
say if we put this as blank, will we be able to see. There, we see that we no
longer see Alabama and Alaska in this table when we put this by alphabetical
order. Which means the exception table allows us to easily filter out what we do
not want to include in this table.

Composite models

Selecting transcript lines in this section will navigate to timestamp in the video
- Composite models allow for many-to-many relationships between
tables, which removes previous requirements and workarounds, such as
introducing new tables for the sole purpose of establishing relationships. Let's
create two tables with multiple instances of the same dates in each table and
show how the new many-to-many functionality allows us to join or blend these
tables together. Let's select Enter Data, where we will create two small sample
datasets to test this out with. Here is an excel file called Sample blending
data, that I will copy and paste into Power BI. You can find these files in your 10-
05 exercise files folder. We first copy table one and put it into Power BI, and
label it as table one. I'm just going to load both of the tables to see how this
works, because we are testing this out. And now I copy table two and paste it
into Power BI, and label it with the name table two. Now we join the tables
together on the dates field. Make sure that the date is selected as the joining
field for both the tables. And we use the many-to-many cardinality and we'll use
Both as the cross filter direction. We receive this yellow error message warning
us of potentially duplicated fields, and essentially tell us, proceed with
caution. We recognize that we want to blend these tables together so we hit
Okay. Then we go into the report view. And I'm going to quickly create a table
that shows the date field and the corresponding amounts for both the data
tables. So, there we have it. The composite model blended data together in a
single table and allows us to use Power BI more efficiently.

You answered 2 of 2 questions correctly.

Continue watching Retake quiz

Question 1 of 2

You don't have to load all the queries you create in the query editor into Power
BI.

 TRUE

Correct
You can select or deselect the queries you want to load directly from the
query editor before loading into Power BI.


FALSE

Question 2 of 2

A many-to-many data table relationship must be set up with at least one of the
tables containing a unique ID field for joining the data.


TRUE

 FALSE

Correct

A one-to-many relationship must be set up with at least one of the tables


containing a unique ID, but many-to-many relationships work by
aggregating the data up to that key, but they do not have to be unique
for either data table, which means they effectively blend the data
together.

You might also like