Professional Documents
Culture Documents
Titanic in Class Competition
Titanic in Class Competition
The data has been split into two groups: training set (ticc_train.csv) and test set (ticc_test.csv).
The training set should be used to build your machine learning models. For the training set, the
outcome (also known as the “ground truth”) is provided for each passenger. Your model will be based
on “features” like passengers’ gender and class. You can also use feature engineering to create new
features (see page 21 of the “Python Machine Learning by Example” book).
The test set should be used to see how well your model performs on unseen data. For the test set, you
are not provided with the ground truth for each passenger. It is your job to predict these outcomes. For
each passenger in the test set, use the models you trained to predict whether they survived the sinking
of the Titanic.
Variable Notes
age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5
Is the data set balanced? Data that is overwhelmed by one particular sub class is often ineffectual
when training a predictive system. What could be done in such situations?
How are you going to deal with missing data? (See page 22 of the “Python Machine Learning by
Example” book.)
What part might voting and averaging play in your final prediction? (See page 27 of the “Python
Machine Learning by Example” book.)