Thesis Slide

Titanic Disaster Using Algorithm
of Machine Learning
Tasnim Jahan Chowdhury
(ID 2014-1-60-006)
The sinking of the Titanic is one of the most historic shipwrecks of all time. The tragedy killed
thousands, 1502 out of 2224 passengers, and led many wondering what could have been done better.
One of the most important reason is that there was not enough lifeboats, and although there was
probably quite amount of luck involved, there were some groups of people that were more likely to
survive than others.
Background
• Lam and Tang

• Shawn Cicoria and John Sherlock
• Kunal Vyas and Lin
Literature Review
The Big Data trend is quite noticeable lately.

New theories and tools become available for many unsolved old questions such as the Titanic that sunk in
1912. While many studies explained the human and
hydraulic cause of the sinking, questions remained regarding the chances of survival for
the passengers (British Trade Administration, 1911).
Problem Statement
• Hypothesis – Certain source claim that the survivor belongs to one of
the following categories – Women, Children/or Upper class
• Our problem to confirm if this hypothesis is true or not using the

given sample of survivor data and derive conclusion using the
different algrotihm.
Objectives:
• Collecting datasets available on Kaggle
website.
• Applying machine-learning algorithms on
datasets.
• Comparing accuracy of algorithms.
• Analyzing results
6
Proposed Method
Apply Data Algorithm
Algorithm
Feature B
Engineering A
Data
Training
Pre-
Processing
Apply Algorithm
Machine C
Learning
Data Technique
Set Processed
Data Testing
Prediction
Best Accuracy
Model Comparing Accuracy Result

Analysis
7
Data Set
• Dataset we collect from kaggle website. In this dataset, we have two
subcategories that are training and testing data. Training dataset
contains 12 columns which are the features of the data set and 891
rows that are the data points. Same goes for the testing data set
consisting of 418 passengers which has all these columns except
“survived” because we will predict this with algorithms.
Attribute Information
Attribute Name Description
Unique identification of the passenger. It shouldn't be

PassengerId.
necessary for the machine learning model.
Survival (0 = No, 1 = Yes). Binary variable that will be our
Survived. target variable.
Pclass. Ticket class (1 = 1st, 2 = 2nd, 3 = 3rd).
Name Name of the passenger. We need to parse before using it.
Age. Age in years.

SibSp. Siblings / Spouses aboard the Titanic.
East West University

Department of Computer Science and Engineering
9
Attribute Information
Attribute Name Description
Survived
Survived is the target variable
Parch. Parents / Children aboard the Titanic.
Ticket Ticket number
Fare Passenger fare.
Cabin Age in years.

Embarked.
Embarked is used to represent the sport of embarkation
(Cherbourg, Queenstown, or Southampton).

10
Exploring RAW data and cleaning
Explore and. Handling Missing

Analysis values
Visualize Normalization
. Other cleaning

11
Applied Algorithms
Model Creation
K-Nearest Decision Support

Neibour Tree Vector
Machine

Department of Computer Science and Engineering 12
Statistical Analysis of Dataset
Figure 1: KNN Graph
13
Figure 2: Decision Tree Graph
14
Figure 3: SVM Graph
15
Comparison
Algorithms Accuracy
Decision Tree 79.8 %

KNN 82.27 %
SVM 83.5 %.

Department of Computer Science and Engineering 16
The most Accurate Result is generated from
Support Vector Machine with
of Accuracy
83.5%
17
Comparison with other paper
• Although many other researchers have worked on it to define the
actual cause of the survival of some passengers. We use various
different combinations of features and different machine learning
methods and try to show get better results and accuracy.
Comparison with other paper
Lam and Tang Our project
They used three algorithms- to show a We use various different combinations of

comparison between algorithms and contrast features and different machine learning
the result and there were no significant methods and show the comparison between
differences in accuracy in those three . three algorithm. There is significant difference
in algorithm
Naïve Bayes, Decision tree analysis, and SVM KNN, SVM, Decision tree and try to show get
better results and accuracy. We got the highest
accuracy for SVM.
Research Findings
• For this comparison we have split the gender feature into males and
females, we can clearly see the changes by changing the algorithm
Decision Tree to KNN and then SVM; we got higher accuracy and
precise prediction. In this comparison, the SVM shows that there are
more male passengers died as compared to the KNN and Decision
Tree which show less value for male passengers’ survival.
Conclusion
• Nowadays getting results from raw and unprocessed data by using
machine learning and feature extraction techniques becomes more
important when real-world datasets are considered, which can
contain hundreds or thousands of features. In this paper, we
experimented with using three different algorithms namely, K-nearest
neighbors (KNN), Support Vector Machine (SVM), and Decision tree
for predicting the survival of passengers of titanic.
References
•
• Kaggle, Titanic: Machine Learning form Disaster [Online]. Available:
http://www.kaggle.com/
• STAT 479: Machine Learning Lecture Notes. [Ebook]. Retrieved from
https://sebastianraschka.com/pdf/lecture-notes/stat479fs18/02_knn_notes.p
df
• CS229 Lecture notes. [Ebook]. Retrieved from
http://cs229.stanford.edu/notes/cs229-notes3.pdf
• Decision Tree Algorithm — Explained. Retrieved June 2020, from
https://towardsdatascience.com/decision-tree-algorithm-explained-
83beb6e78ef4
Future Work
•In this paper, we have done our analysis with few datasets. In near future we will try to
analyze with more number of datasets it would be interesting to play more with the
dataset and introducing more attributes which might lead to good results in the future.
23
Thank You to our
honourable faculties.

Thesis Slide

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Thesis Slide

Uploaded by

Copyright:

Available Formats

Titanic Disaster Using Algorithm

• Lam and Tang

The Big Data trend is quite noticeable lately.

• Our problem to confirm if this hypothesis is true or not using the

Model Comparing Accuracy Result

Attribute Name Description

Unique identification of the passenger. It shouldn't be

Pclass. Ticket class (1 = 1st, 2 = 2nd, 3 = 3rd).

Name Name of the passenger. We need to parse before using it.

Age. Age in years.

East West University

Attribute Name Description

Parch. Parents / Children aboard the Titanic.

Ticket Ticket number

Fare Passenger fare.

Cabin Age in years.

East West University

Explore and. Handling Missing

East West University

K-Nearest Decision Support

East West University

Figure 1: KNN Graph

Figure 2: Decision Tree Graph

Figure 3: SVM Graph

Decision Tree 79.8 %

East West University

They used three algorithms- to show a We use various different combinations of

You might also like