Professional Documents
Culture Documents
Of
Business Analytics
Under supervision of
Dr. Tanveer Kajla
Mittal School of Business, Lovely Professional University,
Phagwara-144401, Punjab, India, 2021
Installed packages-
Two datasets were available i.e. train and test datasets which I imported into
RStudio to study survival of passengers who had on boarded the Titanic. These 2
datasets were then binded together to create the main dataset named all (screenshot
attached above):
Step 2: Data Cleaning
I extracted title (Mr, Ms, Miss, Mrs, Lady, Master etc) from each passenger’s
name, then grouped the ones with lowest count under one title called
uncommon_title:
Next, I made a variable named FamName to categorize each family on the Titanic.
This was made using surname (as shown above) and FamSize(as shown below)
which is calculated on the basis of number of siblings/spouse(s) and number of
children/parents.
Bar graph-
Now I will study the relationship between family size & survival by plotting the
variables FamSize and Survived using ggplot2 .
Output:
I can see that the chances of survival are less for those with FamSize=1and
FamSize greater than 4.
Scatter Plot
Histogram:
Output:
Boxplot:
Reference-
• https://www.kaggle.com/