You are on page 1of 6

Q1.

a) This is a bar graph from the code used for discrete variables. “Survived” is a factor
variable (0 = not survived, 1 = survived) so bar graph was ideal and the code
geom_bar() was used to make it. It shows that most of the people died. This was
about 575 while approximately 316 people survived. Aesthetics (ggplot(titanic, aes(x
= Survived))) typically takes the following arguments: Mapping (x,y); color; alpha;
fill, size, shape

b) prop.table(table(titanic$Survived)): This code gives the percentage of people


survived and not survived in a tabular form. 61.6% died and 38.4% survived.
c) theme_bw(): This code has resulted in white background with black grid lines.
labs(y = "Passenger Count",
title = "Titanic Survival Rates"): This code gives the main title to plot that is
“Titanic Survival Rates” and name to y axis of “Passenger Count”.

Q2.
a) ggplot(titanic, aes(x = Sex, fill = Survived)): This code shows the survival rate by
gender in which 2 different colours are used to show the survived and not survived
category, while bar graph is made for gender. “fill = survived” commands to fill the
each bar with survival data. For females, greater proportion survivded. However for
male, greater proportion didn’t survive. Rest of the code has the same function as in
Q1 telling about the theme, title of the plot and y axis label.
theme_bw() +
geom_bar() +
labs(y = "Passenger Count",
title = "Titanic Survival Rates by Sex")
Q3.
b) ggplot(titanic, aes(x = Pclass, fill = Survived)): This code shows the survival rate by
class in which 2 different colours are used to show the survived and not survived
category, while bar graph is made for gender. “fill = survived” commands to fill each
bar with survival data. Greatest proportion of deaths occurred in class 3, while least
proportion of deaths occurred in class 1. Rest of the code has the same function as in
Q1 telling about the theme, title of the plot and y axis label.
theme_bw() +
geom_bar() +
labs(y = "Passenger Count",
title = "Titanic Survival Rates by Pclass")

Q4.
a) ggplot(titanic, aes(x = Sex, fill = Survived)) + facet_wrap(~ Pclass): This code
shows the survival rate by class and gender in which 2 different colours are used to
show the survived and not survived category, while bar graph is made for gender. “fill
= survived” commands to fill each bar with survival data. Greatest propotion of death
was of males in all classes, while class 3 males were died at the maximum rate.
Greatest proportion of females survived in class 1. Rest of the code has the same
function as in Q1 talking about the theme, title of the plot and y axis label.
theme_bw() +
geom_bar() +
labs (y = "Passenger Count",
title = "Titanic Survival Rates by Pclass and Sex")
Q5.
a) geom_histogram(binwidth = 5): This is the code for making histogram which is
used for continuous data. “binwidth = 5” tells the width of one bar. The graph drawn
shows that mast number of passengers aged 20 and least number of passengers of age
more than 70. Rest of the code has the same function as in Q1 talking about the
theme, title of the plot and y axis and x axis label.
ggplot(titanic, aes(x = Age)) +
theme_bw() +
geom_histogram(binwidth = 5) +
labs(y = "Passenger Count",
x = "Age (binwidth = 5)",
title = "Titanic Age Distribution")

Q6.
a) ggplot(titanic, aes(x = Age, fill = Survived)): This code is different from previous
code in just one aspect and that is “fill = survived” commands to fill each bar with
survival data. It shows that most people died were of age 20 and there was no death of
people with age 80. Rest of the code has the same function as in Q5 talking about the
theme, width of the bar and title of the plot and y axis and x axis label.
b) ggplot(titanic, aes(x = Survived, y = Age)) +
theme_bw() +
geom_boxplot() +
labs(y = "Age",
x = "Survived",
title = "Titanic Survival Rates by Age")
This code is to make a box plot as shown by line 3. Aesthetics shows that we need
“survived” data on x-axis and “Age” data on y-axis. Rest of the code has the same
function as in Q5 talking about the theme, title of the plot and y axis and x axis label.
The plot shows that median age for people who survived is greater. Upper and lower
quartile age is higher for people who didn’t survive. There was a greater outlier age
for people who survived.

Q7.
a) ggplot(titanic, aes(x = Age, fill = Survived)) +
theme_bw() +
facet_wrap(Sex ~ Pclass) +
geom_density(alpha = 0.5) +
labs(x = "Age",
title = "Titanic Survival Rates by Age, Pclass and Sex")
This code is to plot a continuous density function as shown by the line 4. Alpha = 0.5
is used for color transparency. These graphs show the survival rate by Age, Pclass and
sex. We can see that mean age of people who survived is greates for male in class 3,
while mean age of people who didn’t survive were also men but from class 1.
Q8.
a) ggplot(titanic, aes(x = Age, fill = Survived)) +
theme_bw() +
facet_wrap(Sex ~ Pclass) +
geom_histogram(binwidth = 5) +
labs(y = "Age",
x = "Survived",
title = "Titanic Survival Rates by Age, Pclass and Sex")

This code plots histograms showing the survival rate by Age, Pclass and sex. We can
see that mean age of people who survived is greates for male in class 3, while mean
age of people who didn’t survive were also men but from class 1. Hence, results are
same as that shown by the continuous density graphs. Binwidth = 5 shows the width
of each bar while “labs” is used to plot the tiltle of the plot as well as the labels of x
and y axis.

Q9.
a) ggplot(titanic, aes(x= Age, y=Fare, shape = Pclass, colour = Pclass)) +
geom_point ()
This code plots the scatter plot. It is used when both the variables are continuous. This
plot is telling that fare for class 1 was the maximum while fare for class 3 was
minimum. There was no fixed pattern as to which age of passengers bought which
class tickets as the data points were scattered out.
Q10.
a) ggplot(clean_titanic %>% filter(Embarked == 'Q'), aes(x= Sex, fill = Survived))
+
geom_bar() +
facet_grid(Embarked ~ Pclass) +
theme_classic() +
labs(y = 'Passenger Count', title = 'Titanic Survival Rate by Gender and Class')
This code shows the survival rate by class and gender in which 2 different colours are
used to show the survived and not survived category, while bar graph is made for
gender. “fill = survived” commands to fill each bar with survival data. This will be
showing the data for people who embarked from terminal Q which means that count
will be given for passengers who embraked from Q. All three classes’ bar graph will
be shown for the passengers embarking from q terminal.
Note: no figure because code was not working.

You might also like