You are on page 1of 9
Lab 2-4 Analysis Questions (LO 2-3) AQL. Compare the loan amounts to the validation given by LendingClub for borrowers from PA: Funded loans: $123,262.53 Number of approved loans: 8,427 Do the numbers in your analysis match the numbers provided by LendingClub? What explains the discrepancy, if any? AQ2. Does the Numerical Count provide a more useful/accurate value for validating your data? Why or why not do you think that is the case? AQ3. Compare and contrast: Why do Power Query and Tableau Desktop retum different values for their summary statistics? page 95 AQ4. Compare and contrast: What are some of the summary ~~ statistics measures that are unique to Power Query? To Tableau Desktop? Lab 2-4 Submit Your Screenshot Lab Document Verify that you have answered any questions your instructor has assigned, then upload your screenshot lab document to Connect or to the location indicated by your instructor. Lab 2-5 Validate and Transform Data—College Scorecard Lab Note: The tools presented in this lab periodically change. Updated instructions, if applicable, can be found in the eBook and lab walkthrough videos in Connect. Case Summary: Your college admissions department is interested in determining the likelihood that a new student will complete their 4-year program. They have tasked you with analyzing data from the U.S. Department of Education to identify some variables that may be predictive of the completion rate. The data used in this lab are a subset of the College Scorecard dataset that is provided by the U.S. Department of Education. These data provide federal financial aid and earnings information, insights into the performance of schools eligible to receive federal financial aid, and the outcomes of students at those schools. Data: Lab 2-5 College Scorecard Datasetzip - 0.5MB Zip / 1.4MB Txt Lab 2-5 Example Output By the end of this lab, you will have validated and transformed the College Scorecard data. While your results will include different data values, your work should look similar to this Microsoft | Excel + Power Query Microsoft Excel LAB 2-5M Example of Cleaned College Scorecard Data in Microsoft Excel page 96 Tableau | Prep + Desktop ‘Tableau Software, Inc.All sights reserved LAB 2-5T Example of Cleaned College Scorecard Data in Tableau Prep Lab 2-5 Load and Clean Data Before you begin the lab, you should create a new blank Word document where you will record your screenshots and save it as Lab 2-5 [Your name] [Your email address].docx. Working with raw data can present interesting challenges, especially when it comes to identifying attributes and data types. In this lab you will lean how to transform the raw data into data models that are ready for analysis. Microsoft | Excel + Power Query 1. Open anew blank spreadsheet in Excel. 2. From the Data tab in the ribbon, click Get Data > From File > From Text/CSV. 3. Navigate to your Lab 2-5 College Scorecard Dataset.txt file and click Open, 4. Verify that the data loaded correctly into tables and rows and then click ‘Transform Data or Edit. 5. Click through each of the 30 columns and from the Transform tab in the ribbon, click Data Type > Whole Number or Data Type > Decimal Number where appropriate. If prompted, click Replace Current, Because the original text file replaced empty values with “NULL”, Power Query erroneously detected many of the columns as Text, Hint Hold the Ctrl key and click to select multiple columns page 97 6. ey Take a screenshot (label it 2-5MA) of your columns with the proper data types. 7. From the Home tab, click Close & Load. 8. To ensure that you captured all of the data through the extraction from the txt file, we need to validate them: a. In the Queries & Connections pane, verify that there are 7,703 rows loaded. b. Compare the attribute names (column headers) to the attributes listedin the data dictionary (found in Appendix K of the textbook). There should be 30 columns (the last column in Excel should be AD). c. Click Column H for the SAT_AVG attribute. In the summary statistics at the bottom of your worksheet, the overall average SAT score should be 1,059.07. 9. giTake a screenshot (label it 2-5MB) of your data table in Excel. 10. When you are finished answering the lab questions, you may close Excel. Save your file as Lab 2-5 College Scorecard Transform.xlsx. Your data are now ready for the test plan. This lab will continue in Lab 3-3. Tableau | Prep + Desktop 1. Open anew flow in Tableau Prep Builder. 2. Click Connect to Data > To a File > Text file. 3. Navigate to your Lab 2-5 College Scorecard Dataset.txt file and click Open, 4. Verify that the data types for each of the 30 attributes is detected as a Number with the exception of INSTNM, CITY, and STABBR. 5. ei Take a screenshot (label it 2-5TA). 6. In the flow, click the + next to your Lab 2-5 College Scorecard Dataset and choose Add > Clean Step. 7. Review the data and click the lightbulb icon in the CITY and STABBR attributes to change the data roles to City and State/Province, respectively. 8. Click the + next to your Clean 1 task and choose Output. 9. In the Output pane, click Browse: a. Navigate to your preferred location to save the file. b. Name your file Lab 2-5 College Scorecard Transform.hyper. c. Click Accept. 10. Click Run Flow. When itis finished processing, click Done, 11. Close Tableau Prep Builder. Save your file as Lab 2-5 College Scorecard Transform.tfl. 12. Open Tableau Desktop. 13. Choose Connect > To a File > Mor 14, Locate the Lab 2-5 College Scorecard Transform.hyper and click Open. 15. Click the Sheet 1 tab. page 98 16. From the menu bar, click Analysis > Aggregate Measures to remove the check mark. To show each unique entry, you have to disable aggregate measures. 17. To show the summary statistics, go to the menu bar and click Worksheet > Show Summary. A Summary card appears on the right side of the screen with the Count, Sum, Average, Minimum, Maximum, and Median values. 18. Drag Unitid to the Rows shelf and note the summary statistics 19. @j Take a screenshot (label it 2-5TB) of the Unitid stats in your worksheet 20. Create two new sheets and repeat steps 16-18 for Sat Avg and C150 4, noting the count, sum, average, minimum, maximum, and median of each. 21. When you are finished answering the lab questions, you may close Tableau Desktop. Save your file as Lab 2-5 College Scorecard Transform.twb. Your data are now ready for the test plan. This lab will continue in Lab 3-3, Lab 2-5 Objective Questions (LO 2-3) OQL. How many schools report average SAT scores? 0Q2. Whatis the average completion rate (C150 4) of all the schools? 0Q3. How many schools report data to the U.S. Department of Education? Lab 2-5 Analysis Questions (LO 2-3) AQL. In the checksums, you validated that the average SAT score for all of the records is 1,059.07. When we work with the data more rigorously, several tests will require us to transform NULL or blank values. If you were to transform the NULL SAT values into 0, what would happen to the average (would it stay the same, decrease, or increase)? AQ2. How would that change to the average affect the way you would interpret the data? AQ3. What would happen if we excluded all schools that don’t report an average SAT score? Lab 2-5 Submit Your Screenshot Lab Document Verify that you have answered any questions your instructor has assigned, then upload your screenshot lab document to Connect or to the location indicated by your instructor. Lab 2-6 Comprehensive Case: Build Relationships among Database Tables—Dillard’s Lab Note: The tools presented in this lab periodically change. Updated instructions, if applicable, can be found in the eBook and lab walkthrough videos in Connect. Case Summary: You are a brand-new analyst and you just got assigned to work on the Dillard’s account. You were provided an ER Diagram (available in Appendix J), but you still aren’t sure what all of the different tables and fields represent. Before diving into problem solving or even transforming the data to prepare them for analysis, it is important to gain an understanding of what data are available to you. One of the steps in doing so is connecting to the database and analyzing the way the tables relate. Data: Dillard’s sales data are available only on the University of Arkansas Remote Desktop (waltonlab.uark. edu). See your instructor for login credentials page 99 Lab 2-6 Example Output By the end of this lab, you will explore how to define relationships between tables from Dillard’s sales data, While your results will include different data values, your work should look similar to this: Microsoft | Power BI Desktop Microsoft Excel LAB 2-6M Example of Dillard’s Data Model in Microsoft Power BI Tableau | Desktop ‘Tableau Software, Inc.All sights reserved LAB 2-6T Example of Dillard’s Data Model in Tableau Desktop Lab 2-6 Build Relationships between Tables page 100 Before you begin the lab, you should create a new blank Word document where you will record your screenshots and save it as Lab 2-6 [Your name] [Your email address].docx. Before you can analyze the data, you must first define the relationships that show how the different tables are connected Most tools will automatically detect primary key— foreign key relationships, but you should always double-check to make sure your data model is accurate. Microsoft | Power BI Desktop 1. Open Power BI Desktop. 2. In the Home ribbon, click Get Data > SQL Server. 3. Enter the following and click OK (keep in mind that SQL Server is not just one database, itis a collection of databases, so it is critical to indicate the server path and the specific database): a. Server: essql1.walton.uark.edu b. Database: WCOB Dillards c. Data Connectivity: DirectQuery 4. If prompted to enter credentials, you can keep the default to “Use my current credentials” and click Connect. 5. If prompted with an Encryption Support warning, click OK to move pastit. 6. ey Take a screenshot (label it 2-6MA) of the navigator window, Learn about Power BI! There are two ways to connect to data, either Import or DirectQuery. There are pros and cons for each, and it will always depend on a few factors, including the size of the dataset and the type of analysis you intend to do. Import: Will pull in all data at once. This can take a long time, but once they are imported, your analysis can be more efficient if you know that you plan to use each piece of data that you import. This is also beneficial for some of the analyses you will learn about in future chapters, such as clustering DirectQuery: Only creates a connection to the data. This is more efficient if you are exploring all of the tables in a large database and are comfortable working with only a sample of data, Note: Unless directed otherwise, you should always use DirectQuery with Dillard's data to prevent the remote desktop from running out of storage space. 7. Place a check mark next to each of the following tables and click Load: a Customer, Department, SKU, SKU_Store, Store, Transact 8. Click the Model button (the icon with three connected boxes) in the toolbar on the left to view the tables and relationships and note the following a. All the tables that you selected should appear in the Modeling tab with table names, attributes, and relationships. page 101 b. When you hover over any of the relationships, the keys that are common between the two tables highlight. 1. Something important to consider is that in the raw data, the primary key is typically the first attribute listed. In this Power BI modeling window, the attributes have been re-ordered to appear in alphabetical order. For example, SKU is the primary key of the SKU table, and it exists in the Transact table as a foreign key,

You might also like