You are on page 1of 24

Exploratory Data

Analysis (EDA)
Titanic Data
Group 4
Brian Maxwell Ketaren
Muhammad Ilyas Haikal
Rachmat Faisal Manurung
Riska Amylatul Askiyah
Stephanie Lisa Aryani
1 THE OBJECTIVES

AGENDA
2 DATA PREPARATION

3 ANALYS AND FINDING

4 SUMMARY & CONCLUSION


1.THE
OBJECTIVES
Are You Ready?
INTRODUCTION
Data source :

https://raw.githubusercontent.com/mwaskom/se
aborn-data/master/titanic.csv

It is one of the most popular datasets used for


understanding machine learning basics. It
contains information of all the passengers aboard
the RMS Titanic, which unfortunately was
shipwrecked. This dataset can be used to predict
whether a given passenger survived or not.
BACKGROUND
Titanic dataset consist: feature variable & target
variable

feature variable, containing all the features like


Pclass, Age, Sex, Embarked, etc. excluding the
Survived column. On the other hand, is the target
variable, as that is the result that we want to
determine,i.e, whether a person is alive.
GOAL
Are You Ready?
GOAL
knowing what are the factors that
influence passengers to survive or
not
CHALLENGES
AND
METHODOLOGY
Are You Ready?
CHALLENGES METHODOLOGY

Data Preparation
There are many missing
Data Cleaning
value
Exploratory Data Analyst
2. DATA
PREPARATION
Are You Ready?
INITIAL DATA
QUALITY
REPORT

Total of 891 Data

2 Columns with Missing Values

Total of 15 Columns

177 & 688 Missing Values for each


Columns.
DATA
PREPARATION

1 Data Quality Check

2 Feature Understanding

3 Feature Selection

4 Missing Values Imputation


3. ANALYS
AND FINDING
Are You Ready?
FEATURE
TARGET

Passengers on the Titanic


ship had 38% of the samples
of safe passengers and 62%
of the samples of
passengers who were not
safe.
DISTRIBUTION OF
PASENGGER
SURVIVED BY
FARE
From pie chart we know that titanic ship
1 passanger have 38% people alive is 340 and
62% death rate is 549 people.

From displot segment "fare" we know that

2 the data is possitively skwness

Red output "alive=false" show that lot of


3 people is died rather dan alive, in low fare
lot of people dont survive

Green output " alive=true" show that the

4 people who survived is less than people

who survived.
DISTRIBUTION OF
PASENGGER
SURVIVED BY AGE

Red output "alive=false show that lot of


1 people doesn't survive

Red output "alive=flase" in age 0 - 78 years

2 old show that the most not survived people

are in age 24-28 years old

Green output "alive=false" shoe that lot of

3 people survived are in age 0-5 years old and

24-28 years old.

Green output " alive=true" show that the

4 people who survived is less than people

who survived.
DISTRIBUTION OF
PASENGGER
SURVIVED BY
SIBSP
Red output at the right-hand side of the

1 curve shows that at low SIBSP level, majority

of passengers didn't survive

Green output shows that the sum of

2 passengers that survived are less than those

that didn't

Red output ranging from 0 to 6 with an outlier


at 8 and peak red curve at 0 shows that most
3 passengers that didn't survive are those with
no folks on board

Green output ranging from 0 to 2 with no


outlier and peak red curve at 0 shows that
4 most passengers that didn't survive are those
with no folks on board
DISTRIBUTION OF
PASENGGER
SURVIVED BY
PARCH
Red and green output centered at the right-
hand side of the curve shows that at low parch
1 level, majority of passengers both survived and
didn't

At parch equal to zero which shows situations


2 when passengers are without parents present,
red output are higher than green output.

At parch equal to one which shows situations


3 when passengers are without parents present,
red output are lower than green output.

At parch equal to two which shows situations


4 when passengers are without parents present,
red output are lower than green output.
SURVIVED
BY WHO

Children and women have a


chance of surviving better than
men, it can be shown that the
existence of a rescue priority
procedure is likely to affect the
chances of passengers
surviving
SURVIVED BY
PCLASS

Ticket classes that have different


facilities may be low, so the facilities
are inadequate compared to those
with high fares so that the chances
of passengers are safe.
SURVIVED BY
GENDER

Women have a chance of surviving


better than men, it can be shown
that the existence of a rescue
priority procedure may affect the
chances of passengers surviving
4. SUMMARY &
CONCLUSION
Are You Ready?
Based on Analysis, we can conclude that
Numerical variables: fare, age, sibsp, and
parch affect the chances of passengers
surviving or not surviving.

Categorical variables: Who, Pclass,


embark_town, and alone affect the
probability of a passenger surviving or not
surviving. While the "age" feature does not
show a significant comparison.
THANK
YOU!
Have a
great day
ahead.

You might also like