Professional Documents
Culture Documents
1. In PwerBI, click on ‘Get Data’ and find the train.csv dataset in your computer. DO NOT load
it, but click ‘Transform Data’:
2. There will be some steps applied automatically by PowerBI, remove the last step by clicking
on the x icon next to Changed Type, then you should see all the columns will have ABC data
type(text):
3. For the ‘Survived’ column, right click on value ‘0’ and select replace values and replace them
with ‘FALSE’:
4. Repeat the step 3 to replace ‘1’ with ‘TRUE’, now you should see the below:
6. Repeat the step 5 for the below columns and their data types:
PClass – Whole Number
Age – Decimal Number
SibSp(Sibling/Spouse) – Whole Number
Parch – Whole Number
Fare – Fixed decimal number
8. Check all the available filters and make a note of the null/blank or similar values, you’ll
cleanse them after Loading the data. (Age, Cabin, Embarked)
9. While checking the filters, did you spot any other data entry issues (data verification).
10. Now click on ‘Close & Apply’ to load the data to PowerBI
11. PowerBI is an Business Intelligence tool, not statistical analysis. Add ‘Whisker & Box’ and
‘histogram’ from the more visualisations:
12. Select the histogram and drag and drop the ‘Age’ attribute from fields to ‘X Data’ and ‘Y
Data’ in the visualisations pane, click on an empty area on the dashboard area and do the
same for the ‘Fare’ attribute:
13. Again click on an empty area and select the ‘Whisker&Box’ plot. Drag and drop the ‘Age’
attribute to ‘Axis’ and ‘Value’ in visualisation pane; make sure Maximum of Age is selected in
Values bar:
14. Repeat step 13 for ‘Fare’ attribute and now you should have the below in your dashboard:
15. PowerBI is interactive, move your mouse cursor over to Min of Age by Age plot, can you see
the statistical information of Age? What’s the mean?
16. Again, on an empty space, click on ‘Card’ visualisation and select ‘Age’ attribute. Make sure
‘Average of Age’ is selected in the fields. Is the value different than Whisker? Why do you
think that is?
17. Select ‘Fare’ attribute, the data type is fixed decimal number, but we can still see more than
2 decimal points, to reduce this to 2 decimal points, click on the ‘comma’ sign in ‘column
tools’:
21. Click on + sign to add a new page in PowerBI and click on R icon from visualisations pane,
remember to Enable script writing:
22. Select Survived and Sex from Fields and then, copy paste the below R code to R Script Editor:
“library(ggplot2)
ggplot(dataset, aes(x=as.factor(Survived), fill=as.factor(Survived))) +
geom_density( alpha = 0.5, position = "dodge")
”
23. Once you run the code, you’ll see the below:
24. The same code in R would give the below plot:
25. Can you say this Target(Output) variable is good to use for Logistic Regression?