Professional Documents
Culture Documents
Team Assignm Ent Report Team 06 ECON1193
Team Assignm Ent Report Team 06 ECON1193
REFERENCES 15
CONTRIBUTION
2
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06
PART 1: DATA COLLECTION
The data for the total number of deaths due to COVID 19 between April 01 to July 31, 2020,
and five other variables including average temperature (in Celsius) and average rainfall (in
mm) based on available data from 1991 to 2016, medical doctors ( per 10,000 people, latest
available), hospital beds (per 10,000, latest available) and population of the country (in
millions, latest available) for 50 countries in Region A: Asia and 23 countries in Region B:
North America were collected. After the cleaning process, there are 46 countries remaining
in Region A: Asia and 21 countries remaining in Region B: North America. The datasets are
presented in the attached Excel file.
PART 2: DESCRIPTIVE STATISTICS
● Central Tendency Measurements
Central Tendency
Asia North America
Mean 36.756 91.747
Median 10.031 27.09
Mode 0 0
Figure 1. Measures of Central Tendency of total number of deaths due to COVID-19 between
April 01 to July 31, 2020, in Asia and North America.
In comparing the total death in Asia and North America by using the Central Tendency
measurements, there is nothing worth notice in the mode figure, which will not be
considered. Moreover, the mean will not be used to interpret since there is the existence of
outliers, based on the calculation in appendix 1.1 and appendix 1.2. Consequently, the
Median will be the most suitable measurement for the comparison which illustrates that 50
percent of the values are greater than the median and the remaining 50 percent are lower
than the median.At first glance, it can be clearly defined that there is a significant difference
between Asia and North America middle number of total deaths relating to the COVID-19. In
addition, North America with the figure of 27.09, which is roughly three times higher than
Asia with the median of 10.031. Therefore, it can be concluded that North American
countries have more deaths relating to the Cocid-19 than the Asian countries.
● Box and whisker plot
3
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06
Figure 2. Box-and-whisker plots of total number of deaths due to COVID 19 in Asia and EU
countries.
As can be seen from the box and whisker plot we drew above, the data distribution of Asia
and North America region are both right-skewed. Moreover, the right whiskers of Asia and
North America are both longer than the left whiskers shows the presence of outliers in the
datasets. The box and whisker plots show that 75% of countries in North America have more
than 27 deaths per million population while 75% of countries in Asia have only more than 10
deaths. In addition, 25% of the number of deaths in Asia is around 1 to 10 deaths and 2 to
27 deaths in North America. From which demonstrates that North American countries have
a higher death rate than Asian countries.
● Measurements of variation
Variation Measurements
Asia North America
Range 248.04 447.099
IQR 50.501 99.125
Variance 3170.34 17621.706
Standard Deviation 56.306 132.747
Coefficient of Variation 86.253 192.068
Figure 3. Measures of Variation of total deaths in Asia and EU (Unit: number of deaths
except for the Coefficient of Variation).
In this scenario, the best measure of variation is the Interquartile Range (IQR) due to the
existence of outliers. In addition, standard deviation is not suitable to measure because it
can be heavily influenced by the outliers, the coefficient of variation is also not a good
choice as we can notice that the distribution of the datasets above is highly right-skewed.
The Interquartile Range of Asia region (50.501) is smaller than the Interquartile Range of
North America (99.125), indicating that the dispersion of data of Asia region around the
median is smaller. In other words, the total number of deaths by Covid-19 in Asia are more
consistent than in North America, or the Covid-19 pandemic has less impact on the Asia
region than on North America.
4
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06
PART 3: MULTIPLE REGRESSION
1. Region A: Asian countries (FINAL)
After applying backward elimination, we find that one variable which is the average rainfall
is significant at a 5% level of significance. The FINAL regression model for Asian countries is
given below.
a. Regression output
5
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06
After applying backward elimination, we find that only one variable named Population (in
millions) is significant at a 5% level of significance. The Final regression for North American
countries is given below.
a. Regression Output
6
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06
According to the study in Part 3, the final claim is that the two regions have the same
amount of significant independent variables but in different types including average rainfall
(in mm), hospital beds (per 10,000 population), medical doctors (per 10,000 population),
average temperature (in Celsius) and population (in millions). In the Asia final regression
model, the significant independent variable is the average rainfall (in mm). In the North
America data set, the significant independent variable in the final regression model is
Population (in millions) among the five listed above variables. In comparison, the North
America region has remarkably more total deaths according to the findings in part 2, which
means the region has been impacted more than the Asia Region due to the pandemic.
Moreover, from the study in part 3, 61.1% of the total variation in the total total deaths in
North America due to COVID 19 can be explained by the population of the country (in
millions) which illustrates that the variation of population contributes a major impact to the
variation of the total number of deaths in the NA region. Meanwhile, in Asia, only 16.3% of
the variations in the total number of deaths can be explained by the variation of the average
rainfall (in mm), which means that the average rainfall influence on the total deaths is not
too great and a large amount of other considerable factors that are not included in the
study leading to a lower reliable result compared to that of the North America region.
To conclude, by building the regression models and comparing the descriptive statistics of
two regions, this study indicates that the average rainfall can be used to forecast the total
number of deaths due to COVID 19 in Asia while in North America, the population of the
country is the independent variable that can be utilized to predict the total number of
deaths. Also, the North American countries have suffered a higher impact due to the greater
number of deaths due to the pandemic in comparison to Asian countries.
After testing the Hypothesis for trend models in the Asia region (appendix 3.1), the findings
indicate that linear, quadratic and exponential trend models are significant for this region.
7
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06
Figure 6. Time Series outputs for Region A: Asia linear trend.
b. Formula:
Y^ the total number of death per day of Asia = 88.199 + 10.366*T number of days
● The slope b1= 10.366 indicates that the total number of deaths due to COVID 19
between April 01 to July 31, 2020, increased by 10.366 deaths every day.
● b0 = 88.199 when T = 0, which illustrates that there were 88.199 deaths on 31 March,
2020.
● Quadratic Trend Model
a. Regression output
8
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06
Figure 8. Time Series outputs for Region A: Asia exponential trend.
b. Formula: in linear format:
log(Y^ the total number of death per day) = 2.383 + 0.00653*T number of days
In non linear format: Y^ the total number of death per day of Asia = 241.546 * 1.015*T number of days
Interpretation: ( b1 - 1) x 100% = 1.5% is the estimated daily compound growth rate in
percentage for the total number of deaths due to COVID 19 from April 01 to July 31, 2020 in
Asia.
Figure 9. Time Series outputs for Region B: North America linear trend.
b. Formula: Y^ the total number of death per day of NA = 2056.42 - 5.37*T number of days
● The slope b1= - 5.37 indicates that the total number of deaths due to COVID 19
between April 01 to July 31, 2020, decreased by 5.37 deaths every day.
● b0 = 2056.42 when T = 0, which illustrates that there were 2056.42 deaths on 31
March, 2020.
● Exponential Trend Model
9
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06
a. Regression output
Figure 10. Time Series outputs for Region A: Asia exponential trend.
The Coefficient of Determination (R Square) will be used to determine the most suitable
trend model for the regression outputs. Higher the coefficient of determination, the more of
the total variation in the number of deaths can be explained, which is better for the
estimating the number of deaths due to COVID 19.
a. Region A: Asia
Figure 11. Coefficient of determination of linear , quadratic and exponential trend models of
NA (%).
For region A, it can be seen in the figure that the exponential trend had the highest
coefficient of determination, which means the exponential trend model will be the most
suitable in region A's situation to predict the total number of deaths due to Covid-19 as it
will produce fewer errors.
10
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06
Linear Exponential
Figure 12. Coefficient of determination of linear and exponential trend models of NA (%).
For region B, with a slightly higher coefficient of determination; hence, the linear trend
model will be the most suitable in region B's situation to predict the total number of deaths
due to Covid-19 as it will produce fewer errors compared to the exponential trend model.
3. Predict the number of deaths on September 28, September 29, and September 30.
a. Region A: Asia
As the above conclusion, the exponential trend is the best model for predicting the number
of deaths due to COVID 19 in Asia, with the formula:
Y^ the total number of death per day of NA = 241.546 * 1.015*T number of days
September 28 3575.64
(181)
September 29 3629.27
(182)
September 30 3683.71
(183)
As the above conclusion, the linear trend is the best model to predict the number of deaths
due to COVID 19 in North America, with the formula:
Y^ the total number of death per day of NA = 2056.42 - 5.37*T number of days
11
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06
Date (T) Forecasted number of deaths
September 28 1084.45
(181)
September 29 1079.08
(182)
September 30 1073.71
(183)
Figure 14. Forecasted number of deaths on September 28, 29, 30 in North America.
Figure 15. . Line graph of Daily total number of deaths due to COVID 19 in Asia and North
America from April 01 to July 31,2020.
b. Explanation
12
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06
The line graph above presents the daily total number of Deaths in Asia and North America
due to Covid 19 from April 01 to July 31, 2020. It can be concioused that the number of
Deaths in Asia is more stable and significantly less (in number of deaths) compared to North
America, although this is the region where the pandemic was spread. There is an existence
of irregular components in 2 periods, once occurred in 15-April and once in 17-June, and
started to increase steadily from 24-June to 29-June. On the other hand, in North America
was a chaos of fluctuation, the number of deaths reached the peak in 15-April, then started
to move downward with the cyclical component of a 7 days period until the end of the
observation. Also, the region has the irregular component of 24-June, which the number of
deaths got higher than any other nearby period.
Relating to Part 5.3, Asia and North America do not follow the same trend model in order to
predict the numbers of death due to the Covid-19, which is the exponential trend model in
Asia and the linear trend model in North America.
To come up with the conclusion, our team has compared the Coefficient of Determination (R
Square), because the higher the Coefficient of Determination, the lesser error, the more
total variation in the number of deaths can be explained. The R Square of exponential trend
mode of Asia is the highest (80.6%), similarity, the linear trend model of North America is
higher than the other (7.9%). In conclusion, we want to use exponential trend model to
predict the total number of death in the world since its R square is larger than the Linear
trend model in North America (80.6% > 7.9%), presenting that 80.6% of the independent
variable (number of deaths by the Covid 19) can be explained by exponential trend model.
14
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06
associated with worse outcomes in COVID-19 and can positively impact the total number of
deaths in this pandemic (Clara et al. 2020).
The wearing mask behaviour variable also impacts on the total number of deaths due to
COVID-19. According to the University of Cambridge (2020), wearing masks is a cheap and
effective way to reduce the transmission of the COVID-19 virus and keeps the coronavirus
‘reproduction number’ under 1.0. Hence, wearing crude homemade masks can reduce
disease spread by catching the wearer’s virus particles, breathed directly into the fabric,
whereas inhaled air is often sucked in around the exposed sides of the mask (University of
Cambridge 2020). A study by Christopher Leffler from Virginia Commonwealth University
indicates that wearing masks can help to lower the COVID-19 death rate not just by a few
percent, but up to a hundred times lower mortality (Kate Marino 2020). Some countries that
recommended mask-wearing within 15 days and 30 days, the death rate was far lower than
the countries that waited longer or no policy recommended wearing masks (Kate Marino
2020). Several countries in Asia began using masks very early and still have mortality close
to 1 in 1 million or less, while in the U.S the mortality is 1 in 2,500 people in the population
(Kate Marino 2020). In a forecast of the Institute for Health Metrics and Evaluation (IHME), if
95% of the people in the US wearing masks in public, the total number of deaths would
decrease from 295,011 by December to 228,271, a 49% drop, which means more than
66,000 lives would be saved (IHME 2020). To conclude, wearing a mask is a cheap and
effective solution to reduce the total number of deaths due to COVID-19. The more people
wearing masks in public, the less total cases and total deaths in this pandemic.
In conclusion, the study in this report illustrates the comparison between Asia and North
America in the total number of deaths due to COVID 19 from 01 April to 31 July. Also, the
findings from analyzing the regression and time series can be used to estimate the
significant factors that affect the number of deaths as well as forecast the death cases in
two aforementioned regions and even the world in the upcoming period. In detail, countries
in North America have suffered more than Asia’s countries with significant number of
deaths due to the pandemic in the observed period; however, by the end of September, the
number of deaths per day in North America is just about 1074 deaths compared to that of
Asia is predicted to be 3 times higher with 3684 deaths on 30 September. Moreover, the
world is also forecasted to be at the level of approximately 5844 deaths on 31 October.
Therefore, as the second wave of the pandemic is started and spreading widely again, our
team would recommend everyone to strictly follow the preventive measures and practices
such as continuing social distancing restrictions and patiently waiting for the completion of
COVID 19 vaccine.
References:
CDC 2020, COVID-19 Hospitalization and Death by Age, cdc , viewed 10 September 2020,
<https://www.cdc.gov/coronavirus/2019-ncov/covid-data/investigations-
discovery/hospitalization-death-by-age.html >
15
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06
Clara Bonanad et al. 2020, The Effect of Age on Mortality in Patients With COVID-19: A
Meta-Analysis With 611,583 Subjects, sciencedirect, viewed 10 September 2020,
<https://reader.elsevier.com/reader/sd/pii/S1525861020304412?
token=F801601078B8F98E8ED750F6C96AE14F77C50800B9E39C232A6D7A591D7322B
F95F64FE4CF56DEE65462BEA621BAF7CD >
CNBC 2020, WHO says widespread coronavirus vaccinations are not expected until mid-
2021, CNBC, viewed 10 September 2020 <https://www.cnbc.com/2020/09/04/who-says-
widespread-coronavirus-vaccinations-are-not-expected-until-mid-2021-.html >
Coronavirus (COVID-19) deaths, 2020, Total number of deaths daily due to COVID-19 from
01 April to 31 July, 2020, data file, Our World in Data, viewed 5 September 2020,
<https://ourworldindata.org/covid-deaths?fbclid=IwAR1wvjz7F6vrpzBnPmZ-
hBK_33zQXJ8e4KIR8stGPvZpmd9o8sJVz278RBU>.
Hospital beds (per 10,000 population) 2020, Hospital beds (per 10,000 population), data file,
World Health Organization, viewed 28 August 2020,
<https://www.who.int/data/gho/data/indicators/indicator-details/GHO/hospital-beds-(per-
10-000-population)>.
IHME 2020, New IHME COVID-19 Forecasts See Nearly 300,000 Deaths by December 1,
However, Consistent Mask-Wearing Could Save about 70,000 Lives, viewed 10 September
2020, <https://www.prnewswire.com/news-releases/new-ihme-covid-19-forecasts-see-
nearly-300-000-deaths-by-december-1--however-consistent-mask-wearing-could-save-about-
70-000-lives-301107858.html >
Kate Marino 2020, Early face mask policies curbed COVID-19’s spread, according to 198-
country analysis, viewed 10 September 2020,
<https://news.vcu.edu/article/Early_face_mask_policies_curbed_COVID19s_spread_accordi
ng_to >
Medical doctors (per 10,000 population) 2020, Medical doctors (per 10,000 population),
data file, World Health Organization, viewed 28 August 2020,
<https://www.who.int/data/gho/data/indicators/indicator-details/GHO/medical-doctors-
(per-10-000-population)>.
Population, total 2019, Population, data file, The World Bank, viewed 28 August,
<https://data.worldbank.org/indicator/SP.POP.TOTL?view=chart>.
Rainfall n.d., Average Rainfall (in mm) from 1991 to 2016, data file, Climate Change
Knowledge Portal, viewed 28 August 2020,
<https://climateknowledgeportal.worldbank.org/download-data>.
Robert Preidt 2020, Benefits of Social Distancing Outweigh Economic Toll: Study,usnews,
viewed 10 September 2020,<https://www.usnews.com/news/health-news/articles/2020-04-
20/benefits-of-social-distancing-outweigh-economic-toll-study#:~:text=Assuming%20that
%20social%20distancing%20measures,according%20to%20Thunstrom%20and
%20colleagues.>
16
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06
Salma Khalik 2020, Pandemic has entered new phase in Asia-Pacific:WHO, Straitstimes,
viewed 10 September 2020,< https://www.straitstimes.com/asia/pandemic-has-entered-
new-phase-in-asia-pacific-who>
Temperature n.d., Average Temperature (in Celsius) from 1991 to 2016, data file, Climate
Change Knowledge Portal, viewed 28 August 2020,
<https://climateknowledgeportal.worldbank.org/download-data>.
Total confirmed COVID-19 deaths, 2020, Total number of deaths due to COVID-19 from 01
April to 31 July, 2020, data file, Our World in Data, viewed 28 August 2020,
<https://ourworldindata.org/grapher/total-covid-deaths-region?year=latest>.
University of Cambridge 2020, Widespread facemask use could shrink the 'R' number and
prevent a second COVID-19 wave, Eurek Alert, viewed 10 September 2020,
<https://www.eurekalert.org/pub_releases/2020-06/uoc-wfu060920.php >
USA today 2020, Coronavirus reopening, usatoday.com, viewed 10 September 2020, <
https://www.usatoday.com/storytelling/coronavirus-reopening-america-map/#caseload >
17
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06
Appendix 2: Backward Elimination procedures at 5% level of significance
In this part, the independent variables will be tested whether they are significant or
insignificant at 5% level of significance by comparing p-value and the significant level .
1. Region A: Asia
Step 1: Stating the null and alternative hypotheses
H0; βj = 0 (No variables have relationships with the total number of deaths due to COVID 19)
H1; βj ≠ 0 (At least one variable has a relationship with the total
number of deaths due to COVID 19)
j = 1, 2, 3, 4, 5
Step 2: Full model with five variables
Variable P-value Comparison Decision
18
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06
β 1 Average rainfall 0.02 < Reject H0
H0; βj= 0 (No variables have relationship with the total number of deaths due to COVID 19)
H1; βj ≠ 0 (At least one variable has a relationship with the total number of deaths due to
COVID 19)
j = 1, 2, 3, 4, 5
19
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06
β1
Average rainfall 0.909 > Do not reject H0
β2
Hospital beds 0.536 > Do not reject H0
β3
Medical doctors 0.925 > Do not reject H0
β4
Average > Do not reject H0
temperature 0.191
β5
Population 0.002 < Reject H0
Because the p-value of medical doctors variable output is the largest with 0.925 and greater
than , we eliminate this variable and continue the test.
β1
Average rainfall 0.904 > Do not reject H0
β2
Hospital beds 0.403 > Do not reject H0
β4
Average > Do not reject H0
temperature 0.177
β5
Population 0.002 < Reject H0
20
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06
Because the p-value of average rainfall variable output is the largest with 0.904 and greater
than , we eliminate this variable and continue the test.
β2
Hospital beds 0.351 > Do not reject H0
β4
Average > Do not reject H0
temperature 0.128
β5
Population 0.001 < Reject H0
Because the p-value of hospital bed variable output is the largest with 0.351 and greater
than , we eliminate this variable and continue the test.
β4
Average > Do not reject H0
temperature 0.119
β5
Population 0.0009 > Do not reject H0
Because the p-value of average temperature variable output is the largest with 0.119 and
greater than , we eliminate this variable and continue the test.
21
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06
Variable P-value Comparison Decision
β5
Populatio 0.000028 < Reject H0
n 4
Because the p-value of population variable output is lower than , we reject H 0, which means
variables have a relationship with the total number of deaths due to COVID 19.
A final multiple regression model for North American countries dataset above is contained
in the backward elimination procedure that includes only one significant variable at 5% level
of significance: Population (in millions).
22
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06
b. Quadratic
Step 1: H0; β 2 = 0 (There is no quadratic trend)
H1; β 2 ≠ 0 (There is quadratic trend)
Step 2: p-value = 0.2817 > α (0.05)
=> Do not reject H0
Step 3: As we do not reject H0, it can be concluded that there is not a quadratic trend at 5%
level of significance.
c. Exponential
Step 1: H0; β 1 = 0 (There is no exponential trend)
H1; β 1 ≠ 0 (There is exponential trend)
Step 2: p-value = 0.00269 < α (0.05)
=> Reject H0
Step 3: As we reject H0, it can be concluded that there is an exponential trend at 5% level of
significance.
23
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06