You are on page 1of 8

An Analysis of the Probability of Survival and Age Distribution of passengers aboard the Titanic and their dependence on various

factors

Fig. 2 Box plot for non survivors and survivors

The Random Variable of interest is age. Let X1 denotes age of survivors and X2 denotes the age of non-survivors. To answer α-value of 0.05). Further to the above assumptions, we assume that the CDF’s of X 1 and X2 have same shape. This allows us to apply the wilcoxon’s rank sum test. From the calculated p-value for Wilcoxon rank sum test (0.19), there is not enough evidence against Ho (Ho: X1 is stochastically equal to X2).

Fig 3: ECDF for non survivors and survivors

But from the histograms and ECDF 5+ to 15, 15+ to 30, 30+ to 45, 45+ to 60, and 60+. Now, we do a chi-square test to see if survivors and non-survivors have a homogeneous distribution across these age categories. We get a p-value of 5.477×10-6, which supports our belief that there is a difference in age distributions of survivors and non-survivors. Now, since the sample size of survivors is 313, and that of non-survivors is 443, we can do a z-test on problem of proportion for each age category separately, null hypotheses being

Survivors = Π45-60. On performing Z tests. and thus the adjoining conclusions:Age . one for each age category. we get the following p values. Non-Survivors I. Thus. Chi-square p value of 1.Π0-5. and thus the adjoining conclusions:Age Category 0 to 5 5+ to 15 15+ to 30 30+ to 45 45+ to 60 60+ Π45-60.47e-11 implies population of male survivors and non-survivors is not homogeneous with respect to age categories. Male_Survivors = Π60+. we go ahead with 6 separate Z tests. Non-Survivors P value Conclusion Π0-5. Survivors < Π60+. Survivors= tailed tests were done wherever null was refuted. (a) Is there a significant difference in Age distribution between male survivors and male non survivors? . Non-Survivors Π60+. On performing Z tests. Survivors >Π0-5. the only difference being that here the two samples come from Male. We use the same approach of dividing the population into age categories to find out if there is a dependence of survival probability on age category as done in part (1). Male_Non-Survivors Π5-15. we get the following p values. Male_Survivors = Π60+. Male_Non-Survivors We began with two tailed tests and single tailed tests were done wherever null was refuted. Male_Survivors= Π0-5. Null hypotheses being as follows:Π0-5.

and thus the adjoining conclusions:Age Category 0 to 5 5+ to 15 15+ to 30 30+ to 45 45+ to 60 60+ 0. we do a Kolmogorov Smirnov two sample test. However.0001661670 0.0077707718 0. whereas that of non survivors follow normal distribution (at assumed α-value of 0. Female_Non-Survivors We began with two tailed tests and single tailed tests were done wherever null was refuted. to reinforce on this.05). On performing Z tests. Female_Non-Survivors Π60+.11744076 The p-values for all the tests for the two samples suggest that the samples of survivors are not normal.12109238 (Kolmogorov-Smirnov) normality Shapiro-Francia normality test 0. Female_Survivors = Π60+. This clearly suggests that the distributions are not same.666 Π60+. We use the same approach of dividing the population into age categories to find out if there is a dependence of Π30-45. This also suggests that the two samples come from different distributions (p value = 0. Female_Non-Survivors P value Conclusion . Female_Non-Survivors Π45-60.01326) implying there is a significant difference in age distributions of female survivors and dead. Female_Survivors = Π30-45. we get the following p values.The test p-values for Lilliefors 0. Female_Survivors = Π45-60. Female_Survivors = Π60+.

age group of 15 to 30 and above 60 years had less survival probability. based on consolidations of your findings in 1 and 2 above. however. The findings in 1 and 2 above suggest that females had higher survival probability than their counterparts. Given that the boarders are females. Given that the boarders are males. age group of 45 to 60 had higher survival probability. infants and teenagers had higher survival probability.The above analysis suggests that there is a significant difference in age distribution between female survivors and female non-survivors. . Remark on how Age affected the Survival Probability of a passenger on board the Titanic. II. Possible reasons could have been that females and kids were given preference in going on life boats. old could have thought of sacrificing their lives for the young.

Is there a significant difference in Survival Probability between the two genders? Ho: No difference in the survival probability of the two genders viz. we can use the following tests: 1. Fisher’s exact test 2.IV. male and female Ha: Significant difference in the survival probability of the two genders viz. Z-test . male and female (Two-sided) Data: The below table displays the problem’s data:Survivor Males Females Total 142 308 450 Non-Survivor 709 154 863 Total 851 462 1313 Test adopted for testing the hypothesis: Since it’s a problem of proportion and we would like to compare the survival probabilities of male and female.

Conclusion: On the basis of Z-test we conclude that there is a significant difference in the survival probability of the two genders. We did single-tailed Fisher’s test by taking sets of two classes at a time.2×10-16 suggests that there is enough evidence to reject the null hypothesis (at α-value of 0.05). It can be said that there is a significant difference between population distributions across passenger classes. This helped us find which passenger class had better . We further break the data to compare different classes.Fisher’s exact test is more powerful test in this case but we can also do a Z-test as the sample size is large. We have the following data:Survivors Passenger Class I Passenger Class II Passenger Class III 193 119 138 Non-Survivors 129 161 573 The p-value of 2.

i..2e-16. i. and do Fisher’s test as follows:Class I Male Female Survivors 59 134 Non Survivors 120 9 We did a two sided Fisher’s test which yielded a p value of less than 2. So. The above conclusion agrees with the common knowledge that passengers in first class had the first option to mount the lifeboats. It was observed that the survival probability is highest for Class I followed by Class II with Class III having the lowest probability for survival. there is a significant difference in Survival Probability between the two genders for class1. there is a significant difference in Survival Probability between the two genders for Class II. we did a one-sided fisher’s test We did a two sided Fisher’s test which yielded a p-value of less than 2.2e-16.chance of survival.e. Is there a significant difference in Survival Probability between the two genders even after taking the effect of Passenger Class into Account? We make three 2×2 contingency tables corresponding to each class. Passengers in third class were the last to mount the lifeboats. we did a one-sided Class III Survivors Non Survivors . VI.. So.e.

So.. there is a significant difference in Survival Probability between the two genders for class2.441 Female 80 132 We did a two sided Fisher’s test which yielded a p value of less than 2. i.e. we did a one-sided fisher’s test with alternate hypothesis being that males’ survival probability is less than that of .2e-16.