lOMoARcPSD|17203915
MACHINE LEARNING(LP-3) MINI PROJECT
NAME : Umesh Mali
ROLL NO : A-44
Title: Titanic Death Prediction Model
Problem Statement: Build a machine learning model that predicts the type of people who
survived the Titanic shipwreck using passenger data (i.e. name, age, gender, socio-economic
class, etc.)
Objectives:
Learn effects of data pre-processing on the performance of machine learning
algorithms.
Develop in depth understanding for implementation of regression models.
Implement and evaluate supervised and unsuperviced learning algorithms.
Outcomes:
Apply various data pre-processing techniques to simplify and speed up
machine learning algorithms.
Implement variants of multi-class classifier and measure its performance.
Design a neural network for solving engineering problems.
Requirements:
Computer System with:
I5 processor, 256 GB SSD, 8GB RAM.
Jupiter Notebook
Python with sklearn, rake_nltk, pandas, matplotlib, etc.
Introduction to Project:
We use computers to make predictions to help us achieve better results using various
computational statistics. Tasks can be performed without being explicitly programmed to do
so. It becomes a tedious task to extract the relevant information. Search engines solve the
problem to some extent, but it does not solve the personalization problem.
Prediction in machine learning refers to the output of an algorithm after it has been trained on
a historical dataset and applied to new data when forecasting the likelihood of a particular
outcome.
Prediction in machine learning is commonly used for security, marketing, operations, risk,
and fraud detection. Content-based filtering
Predictions in machine learning allow businesses to make an accurate assumption as to the
likely outcome of a question based on historical data. These predictions give businesses
insights that result in tangible business value. For example, with churn, if a model predicts a
customer is likely to churn, the business can target them with specific communications and
outreach that can help prevent the loss of that customer.
lOMoARcPSD|17203915
Implementation
Dataset Used:
“titanic.csv”
https://www.kaggle.com/competitions/titanic/data
About Dataset:
The sinking of the Titanic is one of the most infamous shipwrecks in history.
On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS
Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for
everyone onboard, resulting in the death of 1502 out of 2224 passengers and crew.
While there was some element of luck involved in surviving, it seems some groups of
people were more likely to survive than others.
Libraries used:
pandas
matplotlib
pandas_profiling
warnings
Collections.
Observations
There are a total of 891 passengers in our training set.
The Age feature is missing approximately 19.8% of its values. I'm guessing that the
Age feature is pretty important to survival, so we should probably attempt to fill these
gaps.
The Cabin feature is missing approximately 77.1% of its values. Since so much of the
feature is missing, it would be hard to fill in the missing values. We'll probably drop
these values from our dataset.
The Embarked feature is missing 0.22% of its values, which should be relatively
harmless.
Predictions:
Sex: Females are more likely to survive.
SibSp/Parch: People traveling alone are more likely to survive.
Age: Young children are more likely to survive.
Pclass: People of higher socioeconomic class are more likely to survive.
Conclusion
Thus, we analyzed titanic data set and implemented several basic prediction
models, out of which gradient descent performed best.