You are on page 1of 7

CC7183 Week 6 Workshop Tasks

1. In PwerBI, click on ‘Get Data’ and find the train.csv dataset in your computer. DO NOT load
it, but click ‘Transform Data’:

2. There will be some steps applied automatically by PowerBI, remove the last step by clicking
on the x icon next to Changed Type, then you should see all the columns will have ABC data
type(text):

3. For the ‘Survived’ column, right click on value ‘0’ and select replace values and replace them
with ‘FALSE’:
4. Repeat the step 3 to replace ‘1’ with ‘TRUE’, now you should see the below:

5. Click on ABC sign to change the data type of Survived to ‘TRUE?FALSE’:

6. Repeat the step 5 for the below columns and their data types:
PClass – Whole Number
Age – Decimal Number
SibSp(Sibling/Spouse) – Whole Number
Parch – Whole Number
Fare – Fixed decimal number

7. Now you should see the below:

8. Check all the available filters and make a note of the null/blank or similar values, you’ll
cleanse them after Loading the data. (Age, Cabin, Embarked)
9. While checking the filters, did you spot any other data entry issues (data verification).
10. Now click on ‘Close & Apply’ to load the data to PowerBI

11. PowerBI is an Business Intelligence tool, not statistical analysis. Add ‘Whisker & Box’ and
‘histogram’ from the more visualisations:
12. Select the histogram and drag and drop the ‘Age’ attribute from fields to ‘X Data’ and ‘Y
Data’ in the visualisations pane, click on an empty area on the dashboard area and do the
same for the ‘Fare’ attribute:

13. Again click on an empty area and select the ‘Whisker&Box’ plot. Drag and drop the ‘Age’
attribute to ‘Axis’ and ‘Value’ in visualisation pane; make sure Maximum of Age is selected in
Values bar:
14. Repeat step 13 for ‘Fare’ attribute and now you should have the below in your dashboard:

15. PowerBI is interactive, move your mouse cursor over to Min of Age by Age plot, can you see
the statistical information of Age? What’s the mean?
16. Again, on an empty space, click on ‘Card’ visualisation and select ‘Age’ attribute. Make sure
‘Average of Age’ is selected in the fields. Is the value different than Whisker? Why do you
think that is?
17. Select ‘Fare’ attribute, the data type is fixed decimal number, but we can still see more than
2 decimal points, to reduce this to 2 decimal points, click on the ‘comma’ sign in ‘column
tools’:

18. Now we can see the ‘average of fare’ in a card as well.


19. Rearrange all the visuals to have some space in the dashboard. Click on ‘Clustered Column
Chart’. Drag and drop ‘Survived’ attribute to X-axis and Y-axis Y-axis should be ‘Counf of
Survived’:
20. So far, we profiled our data with visualisations. Challenge: Apply the same visuals in Excel,
do you have the same results?
!!! For the next steps to work, you need R environment installed on your machine!!!

21. Click on + sign to add a new page in PowerBI and click on R icon from visualisations pane,
remember to Enable script writing:

22. Select Survived and Sex from Fields and then, copy paste the below R code to R Script Editor:

“library(ggplot2)
ggplot(dataset, aes(x=as.factor(Survived), fill=as.factor(Survived))) +
geom_density( alpha = 0.5, position = "dodge")

23. Once you run the code, you’ll see the below:
24. The same code in R would give the below plot:

25. Can you say this Target(Output) variable is good to use for Logistic Regression?

You might also like