Professional Documents
Culture Documents
Assignment 2
Student ID 21554244
Assignment 2
Table of Contents
1. Overview ..................................................................................................................................... 3
2. Part 1 of assignment ............................................................................................................ 4
1.1. Question 1 .......................................................................................................................... 4
1.2. Question 2 .......................................................................................................................... 6
1.3. Question 3 .......................................................................................................................... 7
1.4. Question 4 .......................................................................................................................... 8
1.5. Question 5 .......................................................................................................................... 9
1.6. Question 6 ........................................................................................................................ 10
3. Part 2 of assignment .......................................................................................................... 16
1.1. Question 1 ........................................................................................................................ 16
1.2. Question 2 ........................................................................................................................ 18
1.3. Question 3 ........................................................................................................................ 19
1.4. Question 4 ........................................................................................................................ 20
1.5. Question 5 ........................................................................................................................ 21
Assignment 2
1. Overview
Assignment 2
2. Part 1 of assignment
1.1. Question 1
Load the ‘Medical_info.csv’ file into spark data frame using the following set
of instruction. After Executing the instructions, we will be able to load the
csv file into spark data frame and display
The first 20 records of the data frame using show() instruction.
Assignment 2
To get the structure of the data frame we need to use the following set of
instructions
Assignment 2
1.2. Question 2
Use the below instructions to copy the existing data frame sdfData into
new data frame sdfDat_Null and remove the records with ‘Null’ values
using dropna() instruction in sdfData_Null data frame, to display the
total records deleted, substract the dropna().count() in sdfData_Null
data frame from original sdfData.count().
Assignment 2
1.3. Question 3
Assignment 2
1.4. Question 4
Display the quartile info of the ‘BMI’ feature in the 2nd DataFrame.
Generate a boxplot for the ‘BMI’ feature and discuss the
distribution of the feature based on the boxplot. (3 marks)
Assignment 2
1.5. Question 5
We need to use pyspark SQL functions to run the query and display
the no of records satisfying.the said criteria in the query.
Assignment 2
1.6. Question 6
Use the ‘BP_1’ feature in the 2nd DataFrame as the target label, to
build two classification models based on all other columns as
predictors. Conduct performance evaluation for the two models and
make conclusions. (15 marks)
Create “Label’ column in data frame and filling it by integer 1 if BP_1 >
1 = 1 else with 0.
We need to consolidate all of the predictor columns into a single
column. By following the steps mentioned in code we can get the
consolidated data into single column.
Assignment 2
Assignment 2
Assignment 2
Assignment 2
Assignment 2
Assignment 2
3. Part 2 of assignment
1.1. Question 1
Load the data file into a Spark DataFrame (1st DataFrame). Describe
the structure of the created data frame. (3 marks)
Load the ‘Region_info.csv’ file into spark data frame using the
following set of instruction. After
Executing the instructions, we will be able to load the csv file into
spark data frame and display
The first 20 records of the data frame using show() instruction.
Assignment 2
Use the following instruction to get the structure of the data frame.
Assignment 2
1.2. Question 2
Assignment 2
1.3. Question 3
Assignment 2
1.4. Question 4
Use Spark SQL query to display the ‘fertility’ and ‘life’ columns in the
2nd DataFrame where ‘fertility’ is great than 1.0 and ‘life’ is greater
than 70. (3 marks)
We need to use pyspark SQL functions to run the query and display
the no of records satisfying The said criteria in the query.
Assignment 2
1.5. Question 5
Assignment 2
Assignment 2
Now create a regression object and train on training data using ‘fit’
function and then create a prediction for the testing data by calling
‘transform’ function and check the predictions. And calculate the
average fertility per life.