Lab 2-4 Analysis Questions (LO 2-3)
AQL. Compare the loan amounts to the validation given by LendingClub
for borrowers from PA: Funded loans: $123,262.53 Number of
approved loans: 8,427 Do the numbers in your analysis match the
numbers provided by LendingClub? What explains the discrepancy,
if any?
AQ2. Does the Numerical Count provide a more useful/accurate value for
validating your data? Why or why not do you think that is the case?
AQ3. Compare and contrast: Why do Power Query and Tableau
Desktop retum different values for their summary statistics?
page 95
AQ4. Compare and contrast: What are some of the summary ~~
statistics measures that are unique to Power Query? To Tableau
Desktop?
Lab 2-4 Submit Your Screenshot Lab Document
Verify that you have answered any questions your instructor has assigned, then upload
your screenshot lab document to Connect or to the location indicated by your instructor.
Lab 2-5 Validate and Transform Data—College
Scorecard
Lab Note: The tools presented in this lab periodically change. Updated instructions, if
applicable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: Your college admissions department is interested in determining the
likelihood that a new student will complete their 4-year program. They have tasked you
with analyzing data from the U.S. Department of Education to identify some variables that
may be predictive of the completion rate. The data used in this lab are a subset of the
College Scorecard dataset that is provided by the U.S. Department of Education. These
data provide federal financial aid and earnings information, insights into the performance
of schools eligible to receive federal financial aid, and the outcomes of students at those
schools.
Data: Lab 2-5 College Scorecard Datasetzip - 0.5MB Zip / 1.4MB Txt
Lab 2-5 Example OutputBy the end of this lab, you will have validated and transformed the College Scorecard
data. While your results will include different data values, your work should look similar
to this
Microsoft | Excel + Power Query
Microsoft Excel
LAB 2-5M Example of Cleaned College Scorecard Data in Microsoft Excel
page 96
Tableau | Prep + Desktop‘Tableau Software, Inc.All sights reserved
LAB 2-5T Example of Cleaned College Scorecard Data in Tableau Prep
Lab 2-5 Load and Clean Data
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 2-5 [Your name] [Your email address].docx.
Working with raw data can present interesting challenges, especially when it comes to
identifying attributes and data types. In this lab you will lean how to transform the raw
data into data models that are ready for analysis.
Microsoft | Excel + Power Query
1. Open anew blank spreadsheet in Excel.
2. From the Data tab in the ribbon, click Get Data > From File > From Text/CSV.
3. Navigate to your Lab 2-5 College Scorecard Dataset.txt file and click Open,
4. Verify that the data loaded correctly into tables and rows and then click
‘Transform Data or Edit.
5. Click through each of the 30 columns and from the Transform tab in the ribbon,
click Data Type > Whole Number or Data Type > Decimal Number where
appropriate. If prompted, click Replace Current, Because the original text filereplaced empty values with “NULL”, Power Query erroneously detected many of
the columns as Text, Hint Hold the Ctrl key and click to select multiple columns
page 97
6. ey Take a screenshot (label it 2-5MA) of your columns with the proper data
types.
7. From the Home tab, click Close & Load.
8. To ensure that you captured all of the data through the extraction from the txt file,
we need to validate them:
a. In the Queries & Connections pane, verify that there are 7,703 rows loaded.
b. Compare the attribute names (column headers) to the attributes listedin the data
dictionary (found in Appendix K of the textbook). There should be 30 columns
(the last column in Excel should be AD).
c. Click Column H for the SAT_AVG attribute. In the summary statistics at the
bottom of your worksheet, the overall average SAT score should be 1,059.07.
9. giTake a screenshot (label it 2-5MB) of your data table in Excel.
10. When you are finished answering the lab questions, you may close Excel. Save
your file as Lab 2-5 College Scorecard Transform.xlsx. Your data are now
ready for the test plan. This lab will continue in Lab 3-3.
Tableau | Prep + Desktop
1. Open anew flow in Tableau Prep Builder.
2. Click Connect to Data > To a File > Text file.
3. Navigate to your Lab 2-5 College Scorecard Dataset.txt file and click Open,
4. Verify that the data types for each of the 30 attributes is detected as a Number
with the exception of INSTNM, CITY, and STABBR.
5. ei Take a screenshot (label it 2-5TA).
6. In the flow, click the + next to your Lab 2-5 College Scorecard Dataset and
choose Add > Clean Step.7. Review the data and click the lightbulb icon in the CITY and STABBR
attributes to change the data roles to City and State/Province, respectively.
8. Click the + next to your Clean 1 task and choose Output.
9. In the Output pane, click Browse:
a. Navigate to your preferred location to save the file.
b. Name your file Lab 2-5 College Scorecard Transform.hyper.
c. Click Accept.
10. Click Run Flow. When itis finished processing, click Done,
11. Close Tableau Prep Builder. Save your file as Lab 2-5 College Scorecard
Transform.tfl.
12. Open Tableau Desktop.
13. Choose Connect > To a File > Mor
14, Locate the Lab 2-5 College Scorecard Transform.hyper and click Open.
15. Click the Sheet 1 tab.
page 98
16. From the menu bar, click Analysis > Aggregate Measures to remove the check
mark. To show each unique entry, you have to disable aggregate measures.
17. To show the summary statistics, go to the menu bar and click Worksheet > Show
Summary. A Summary card appears on the right side of the screen with the
Count, Sum, Average, Minimum, Maximum, and Median values.
18. Drag Unitid to the Rows shelf and note the summary statistics
19. @j Take a screenshot (label it 2-5TB) of the Unitid stats in your worksheet
20. Create two new sheets and repeat steps 16-18 for Sat Avg and C150 4, noting
the count, sum, average, minimum, maximum, and median of each.
21. When you are finished answering the lab questions, you may close Tableau
Desktop. Save your file as Lab 2-5 College Scorecard Transform.twb. Your
data are now ready for the test plan. This lab will continue in Lab 3-3,
Lab 2-5 Objective Questions (LO 2-3)OQL. How many schools report average SAT scores?
0Q2. Whatis the average completion rate (C150 4) of all the schools?
0Q3. How many schools report data to the U.S. Department of
Education?
Lab 2-5 Analysis Questions (LO 2-3)
AQL. In the checksums, you validated that the average SAT score for all
of the records is 1,059.07. When we work with the data more
rigorously, several tests will require us to transform NULL or blank
values. If you were to transform the NULL SAT values into 0, what
would happen to the average (would it stay the same, decrease, or
increase)?
AQ2. How would that change to the average affect the way you would
interpret the data?
AQ3. What would happen if we excluded all schools that don’t report an
average SAT score?
Lab 2-5 Submit Your Screenshot Lab Document
Verify that you have answered any questions your instructor has assigned, then upload
your screenshot lab document to Connect or to the location indicated by your instructor.
Lab 2-6 Comprehensive Case: Build Relationships
among Database Tables—Dillard’s
Lab Note: The tools presented in this lab periodically change. Updated instructions, if
applicable, can be found in the eBook and lab walkthrough videos in Connect.
Case Summary: You are a brand-new analyst and you just got assigned to work on the
Dillard’s account. You were provided an ER Diagram (available in Appendix J), but you
still aren’t sure what all of the different tables and fields represent. Before diving into
problem solving or even transforming the data to prepare them for analysis, it is important
to gain an understanding of what data are available to you. One of the steps in doing so is
connecting to the database and analyzing the way the tables relate.
Data: Dillard’s sales data are available only on the University of Arkansas Remote
Desktop (waltonlab.uark. edu). See your instructor for login credentialspage 99
Lab 2-6 Example Output
By the end of this lab, you will explore how to define relationships between tables from
Dillard’s sales data, While your results will include different data values, your work
should look similar to this:
Microsoft | Power BI Desktop
Microsoft Excel
LAB 2-6M Example of Dillard’s Data Model in Microsoft Power BI
Tableau | Desktop‘Tableau Software, Inc.All sights reserved
LAB 2-6T Example of Dillard’s Data Model in Tableau Desktop
Lab 2-6 Build Relationships between Tables
page 100
Before you begin the lab, you should create a new blank Word document where you will
record your screenshots and save it as Lab 2-6 [Your name] [Your email address].docx.
Before you can analyze the data, you must first define the relationships that show how
the different tables are connected Most tools will automatically detect primary key—
foreign key relationships, but you should always double-check to make sure your data
model is accurate.
Microsoft | Power BI Desktop
1. Open Power BI Desktop.
2. In the Home ribbon, click Get Data > SQL Server.
3. Enter the following and click OK (keep in mind that SQL Server is not just one
database, itis a collection of databases, so it is critical to indicate the server path
and the specific database):
a. Server: essql1.walton.uark.edu
b. Database: WCOB Dillards
c. Data Connectivity: DirectQuery4. If prompted to enter credentials, you can keep the default to “Use my current
credentials” and click Connect.
5. If prompted with an Encryption Support warning, click OK to move pastit.
6. ey Take a screenshot (label it 2-6MA) of the navigator window,
Learn about Power BI!
There are two ways to connect to data, either Import or DirectQuery. There are
pros and cons for each, and it will always depend on a few factors, including the
size of the dataset and the type of analysis you intend to do.
Import: Will pull in all data at once. This can take a long time, but once they are
imported, your analysis can be more efficient if you know that you plan to use
each piece of data that you import. This is also beneficial for some of the
analyses you will learn about in future chapters, such as clustering
DirectQuery: Only creates a connection to the data. This is more efficient if you
are exploring all of the tables in a large database and are comfortable working
with only a sample of data, Note: Unless directed otherwise, you should always
use DirectQuery with Dillard's data to prevent the remote desktop from running
out of storage space.
7. Place a check mark next to each of the following tables and click Load:
a Customer, Department, SKU, SKU_Store, Store, Transact
8. Click the Model button (the icon with three connected boxes) in the toolbar on
the left to view the tables and relationships and note the following
a. All the tables that you selected should appear in the Modeling tab with table
names, attributes, and relationships.
page 101
b. When you hover over any of the relationships, the keys that are common
between the two tables highlight.
1. Something important to consider is that in the raw data, the primary key is
typically the first attribute listed. In this Power BI modeling window, the
attributes have been re-ordered to appear in alphabetical order. For example,
SKU is the primary key of the SKU table, and it exists in the Transact table
as a foreign key,