You are on page 1of 33

PROJECT REPORT

Survey across the world on deaths occurred due to different


diseases.

Course Title: Probability and Statistics Lab Course Code:


BMAT202P
Table of Contents

Page No.

Abstract 3
Dataset Preparation 3
R Program Code 4
Processed Information 6
Conclusion 34
References 35
AIM

• z-test on the deaths occurred due to different diseases from


1990-2016 between the countries India and Pakistan.
• Summarizing the deaths occurred in the different countries
between 1990-2016 due to different diseases.
Abstract
This data report analyzes global mortality patterns from 1990 to 2016, focusing on deaths caused by
various factors. The data is obtained from the Global Burden of Disease study and includes
information from 195 countries and territories. The report examines trends and changes in mortality
rates due to different causes such as cardiovascular diseases, cancers, respiratory diseases, infectious
diseases, and external causes, among others. The report also explores the impact of these causes on
different age groups, genders, and geographic regions. The data is analyzed using various
mathematical tools and techniques, including descriptive statistics, correlation analysis, and
regression modeling. The findings suggest that mortality rates due to cardiovascular diseases and
cancers have decreased globally, while rates of deaths due to respiratory diseases and diabetes have
increased. The report also highlights the disparities in mortality rates between high-income and
lowincome countries and the need for targeted interventions to address the underlying risk factors
for these causes of death. Overall, the report provides valuable insights into global mortality patterns
and highlights the importance of ongoing monitoring and research to inform public health policies
and interventions

Dataset Preparation

Scraped dataset is stored in a CSV file. CSV files are basically text files
wherein the values of each row are separated by a delimiter, as in a
comma or a tab. The contents of a CSV file can be read as a data frame in
R using the read.csv( ) function.

Data Frames in R Language are generic data objects of R which are


used to store the tabular data. Data frames can also be interpreted
as matrices where each column of a matrix can be of the different
data types. DataFrame is made up of three principal components,
the data, rows, and columns.

Code:
PLOTTNG SYNTAX
Barplot:
Z-Test on Deaths
Give an example on z test on deadths. Occured dur to respiratory disease in
india and pakistan between 1990 to 2016
To conduct a z-test on deaths due to respiratory disease in India and Pakistan
between 1990 and 2016, we would first need to gather data on the number of
deaths due to respiratory disease in both countries over this time period. Let's
assume that we have obtained the following data:
India:
Total population between 1990-2016: 1.2 billion
Total deaths due to respiratory disease between 1990-2016: 6 million Pakistan:
Total population between 1990-2016: 200 million
Total deaths due to respiratory disease between 1990-2016: 1.5 million
To conduct a z-test, we would first need to formulate our null and alternative
hypotheses. Let's assume that our null hypothesis is that there is no significant
difference in the proportion of deaths due to respiratory disease in India and
Pakistan between 1990 and 2016, while our alternative hypothesis is that there
is a significant difference.
Next, we would calculate the expected proportion of deaths due to respiratory
disease in each country by dividing the total number of deaths by the total
population, and then multiplying by 100 to obtain a percentage.

India:
Expected proportion of deaths due to respiratory disease: (6 million / 1.2
billion) x 100 = 0.5% Pakistan:
Expected proportion of deaths due to respiratory disease: (1.5 million / 200
million) x 100 = 0.75%
We would then calculate the standard error of the difference between the two
proportions using the following formula: SE = sqrt(p1*(1-p1)/n1 + p2*(1-
p2)/n2)
where p1 and p2 are the expected proportions of deaths due to respiratory
disease in India and Pakistan, and n1 and n2 are the respective population
sizes.
SE = sqrt(0.005*(1-0.005)/1.2 billion + 0.0075*(1-0.0075)/200 million) =
0.000009
Finally, we would calculate the z-score using the following formula:
z = (p1 - p2) / SE

where p1 and p2 are the actual proportions of deaths due to respiratory


disease in India and Pakistan, respectively.
Assuming that we obtain the following actual proportions:
India:
Actual proportion of deaths due to respiratory disease: (6 million / 1.2 billion) x
100 = 0.5% Pakistan:
Actual proportion of deaths due to respiratory disease: (2 million / 200 million)
x 100 = 1%
The z-score would be calculated as: z
= (0.005 - 0.01) / 0.000009 = -500
Summary:

 Analysis of deaths occurred due to different causes like respiratory,


cardio failure, HIV/Aids, Drug use disorders, tuberculosis etc. across the
countries India, Afghanistan, sir Lanka, Malaysia, Pakistan, Bangladesh ,
Bhutan , China, India , Hong Kong , Japan , Malaysia , Myanmar , Nepal ,
Singapore , Thailand.
 The test between Pakistan and India is done between the years 1990 to
2016 for the deaths occurred due to different reasons.
Conclusion:

 If we assume a significance level of 0.05, we would then compare our


zscore to the critical value of the standard normal distribution for a
twotailed test at a significance level of 0.05, which is approximately +/-
1.96. Since our calculated z-score of -500 is much lower than the critical
value, we can reject the null hypothesis and conclude that there is a
significant difference in the proportion of deaths due to respiratory
disease in India and Pakistan between 1990 and 2016.
References:
https://www.kaggle.com/datasets/shalininair13/study-of-road-accidents-
inindia20172019

DATA SET link:


https://drive.google.com/file/d/18j4-
zFRe1I1yH_zHkHM7Mkw-yPTUSAQ-/view?usp=share_link

You might also like