You are on page 1of 24

Titanic Disaster Using Algorithm

of Machine Learning
Tasnim Jahan Chowdhury
(ID 2014-1-60-006)
The sinking of the Titanic is one of the most historic shipwrecks of all time. The tragedy killed
thousands, 1502 out of 2224 passengers, and led many wondering what could have been done better.
One of the most important reason is that there was not enough lifeboats, and although there was
probably quite amount of luck involved, there were some groups of people that were more likely to
survive than others.
Background

• Lam and Tang


• Shawn Cicoria and John Sherlock
• Kunal Vyas and Lin
Literature Review

The Big Data trend is quite noticeable lately.


New theories and tools become available for many unsolved old questions such as the Titanic that sunk in
1912. While many studies explained the human and
hydraulic cause of the sinking, questions remained regarding the chances of survival for
the passengers (British Trade Administration, 1911).
Problem Statement
• Hypothesis – Certain source claim that the survivor belongs to one of
the following categories – Women, Children/or Upper class

• Our problem to confirm if this hypothesis is true or not using the


given sample of survivor data and derive conclusion using the
different algrotihm.
Objectives:
• Collecting datasets available on Kaggle
website.
• Applying machine-learning algorithms on
datasets.
• Comparing accuracy of algorithms.
• Analyzing results

6
Proposed Method
Apply Data Algorithm
Algorithm
Feature B
Engineering A
Data
Training
Pre-
Processing
Apply Algorithm
Machine C
Learning
Data Technique
Set Processed
Data Testing

Prediction
Best Accuracy

Model Comparing Accuracy Result


Analysis

7
Data Set
• Dataset we collect from kaggle website. In this dataset, we have two
subcategories that are training and testing data. Training dataset
contains 12 columns which are the features of the data set and 891
rows that are the data points. Same goes for the testing data set
consisting of 418 passengers which has all these columns except
“survived” because we will predict this with algorithms.
Attribute Information

Attribute Name Description

Unique identification of the passenger. It shouldn't be


PassengerId.
necessary for the machine learning model.
Survival (0 = No, 1 = Yes). Binary variable that will be our
Survived. target variable.

Pclass. Ticket class (1 = 1st, 2 = 2nd, 3 = 3rd).

Name Name of the passenger. We need to parse before using it.

Age. Age in years.


SibSp. Siblings / Spouses aboard the Titanic.

East West University


Department of Computer Science and Engineering
9
Attribute Information

Attribute Name Description

Survived
Survived is the target variable

Parch. Parents / Children aboard the Titanic.

Ticket Ticket number

Fare Passenger fare.

Cabin Age in years.


Embarked.
Embarked is used to represent the sport of embarkation
(Cherbourg, Queenstown, or Southampton).

East West University


Department of Computer Science and Engineering
10
Exploring RAW data and cleaning

Explore and. Handling Missing


Analysis values

Visualize Normalization
. Other cleaning

East West University


Department of Computer Science and Engineering
11
Applied Algorithms

Model Creation

K-Nearest Decision Support


Neibour Tree Vector
Machine

East West University


Department of Computer Science and Engineering 12
Statistical Analysis of Dataset

Figure 1: KNN Graph

13
Statistical Analysis of Dataset

Figure 2: Decision Tree Graph

14
Statistical Analysis of Dataset

Figure 3: SVM Graph

15
Comparison

Algorithms   Accuracy

 Decision Tree  79.8 %


 KNN  82.27 %

SVM  83.5 %.

East West University


Department of Computer Science and Engineering 16
The most Accurate Result is generated from
Support Vector Machine with
of Accuracy

83.5%

17
Comparison with other paper
• Although many other researchers have worked on it to define the
actual cause of the survival of some passengers. We use various
different combinations of features and different machine learning
methods and try to show get better results and accuracy.
Comparison with other paper
Lam and Tang Our project

They used three algorithms- to show a We use various different combinations of


comparison between algorithms and contrast features and different machine learning
the result and there were no significant methods and show the comparison between
differences in accuracy in those three . three algorithm. There is significant difference
in algorithm
Naïve Bayes, Decision tree analysis, and SVM KNN, SVM, Decision tree and try to show get
better results and accuracy. We got the highest
accuracy for SVM.
Research Findings
• For this comparison we have split the gender feature into males and
females, we can clearly see the changes by changing the algorithm
Decision Tree to KNN and then SVM; we got higher accuracy and
precise prediction. In this comparison, the SVM shows that there are
more male passengers died as compared to the KNN and Decision
Tree which show less value for male passengers’ survival.
Conclusion
• Nowadays getting results from raw and unprocessed data by using
machine learning and feature extraction techniques becomes more
important when real-world datasets are considered, which can
contain hundreds or thousands of features. In this paper, we
experimented with using three different algorithms namely, K-nearest
neighbors (KNN), Support Vector Machine (SVM), and Decision tree
for predicting the survival of passengers of titanic.
References
• 
• Kaggle, Titanic: Machine Learning form Disaster [Online]. Available:
http://www.kaggle.com/
• STAT 479: Machine Learning Lecture Notes. [Ebook]. Retrieved from
https://sebastianraschka.com/pdf/lecture-notes/stat479fs18/02_knn_notes.p
df
• CS229 Lecture notes. [Ebook]. Retrieved from
http://cs229.stanford.edu/notes/cs229-notes3.pdf
• Decision Tree Algorithm — Explained. Retrieved June 2020, from
https://towardsdatascience.com/decision-tree-algorithm-explained-
83beb6e78ef4
Future Work

•In this paper, we have done our analysis with few datasets. In near future we will try to
analyze with more number of datasets it would be interesting to play more with the
dataset and introducing more attributes which might lead to good results in the future.

23
Thank You to our
honourable faculties.

You might also like