Professional Documents
Culture Documents
Page No.
Abstract 3
Dataset Preparation 3
R Program Code 4
Processed Information 6
Conclusion 34
References 35
AIM
Dataset Preparation
Scraped dataset is stored in a CSV file. CSV files are basically text files
wherein the values of each row are separated by a delimiter, as in a
comma or a tab. The contents of a CSV file can be read as a data frame in
R using the read.csv( ) function.
Code:
PLOTTNG SYNTAX
Barplot:
Z-Test on Deaths
Give an example on z test on deadths. Occured dur to respiratory disease in
india and pakistan between 1990 to 2016
To conduct a z-test on deaths due to respiratory disease in India and Pakistan
between 1990 and 2016, we would first need to gather data on the number of
deaths due to respiratory disease in both countries over this time period. Let's
assume that we have obtained the following data:
India:
Total population between 1990-2016: 1.2 billion
Total deaths due to respiratory disease between 1990-2016: 6 million Pakistan:
Total population between 1990-2016: 200 million
Total deaths due to respiratory disease between 1990-2016: 1.5 million
To conduct a z-test, we would first need to formulate our null and alternative
hypotheses. Let's assume that our null hypothesis is that there is no significant
difference in the proportion of deaths due to respiratory disease in India and
Pakistan between 1990 and 2016, while our alternative hypothesis is that there
is a significant difference.
Next, we would calculate the expected proportion of deaths due to respiratory
disease in each country by dividing the total number of deaths by the total
population, and then multiplying by 100 to obtain a percentage.
India:
Expected proportion of deaths due to respiratory disease: (6 million / 1.2
billion) x 100 = 0.5% Pakistan:
Expected proportion of deaths due to respiratory disease: (1.5 million / 200
million) x 100 = 0.75%
We would then calculate the standard error of the difference between the two
proportions using the following formula: SE = sqrt(p1*(1-p1)/n1 + p2*(1-
p2)/n2)
where p1 and p2 are the expected proportions of deaths due to respiratory
disease in India and Pakistan, and n1 and n2 are the respective population
sizes.
SE = sqrt(0.005*(1-0.005)/1.2 billion + 0.0075*(1-0.0075)/200 million) =
0.000009
Finally, we would calculate the z-score using the following formula:
z = (p1 - p2) / SE