You are on page 1of 14

Econometrics Final Project

Factors affecting Covid’s infection rate.

Name: Nguyen Quang Dong Nguyen


Student ID: 190071
Professor: Moinul Islam
Tuesday, 15 December, 2022
1.Introduction

Covid19 has breaked out in 2019 and its consequences effect on the whole
economy. Many factors can contribute to COVID infection, including people who
frequently go out without wearing a mask (face covering), do not maintain social
distance, frequently go out to work, do not use hand sanitizer, or age (older people are
more likely to get Covid positive than young people). Therefore, I would like to check
whether the Covid infection rate depends on mask, employment status and age using
in an econometric model. In this model, the infection rate is dependence variable, mask,
employment status and age are independent variables.

2. Methods and data source:

2.1 Econometric model

Infection rate = β1 + β2 𝑀𝑎𝑠𝑘𝑠 + β3 𝐸𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 + β4 𝐴𝑔𝑒 + u𝑖

The economic model for infection rate can be defined by the dependent variable is
infection rate and independent variable are masks, employmentstatus and age. Each variable
with assigned probabilities and estimates, intercepts (β1 ), slope (β2 , β3 , β4 ) the random
variable (u𝑖 ) known as error term or residual.

- The infection rate is β1 when variabke masks, employmentstatus and age equal to
zero. β1 knowns as intercept and parameter.
- When employmentstatus and age are constant, if masks increase by 1 unit, infection
rate will increase by β2 unit. β2 knowns as slope and parameter.
- When masks and age are constant, if employmentstatus increase by 1 unit, the
infection rate will increase by β3 unit. β3 knowns as slope and parameter.
- When masks and employmentstatus are constant, if age increase by 1 unit, the
infection rate will increase by β4 unit. β4 knowns as slope and parameter.

2.2 Data Source

Table 1 show the list of all variable taken from Professor Md Manir Hossain Mollah.
Responser have to answer for all the question including how many time you get Covid
positive? Did you wear mask when you go out ( yes =1, no=0) ? How many job do you
have? What is your age?
Table 1: The list of all variable

2.3 Methodology

Using basic theorecial implementation of econometrics, I will using STATA software


to analysice and provide empirical evidence for my theory. A regression analysis is carried
out to reflect the statistical image of the econometric model developed, along with several
hypothesis tests, to scientifically assess the statistical significance of the sample derived from
the entire population. The t-test is performed to calculate the linear effect on the experiment
and the f-test ANOVA is performed to verify the overall significance of the model. Other
measurements include the chi-square test (the normality test) the white test is run to measure
the heteroscedasticity in the model. In addition, the confidence interval, confidence intervals,
R-square, adjusted R-square and correlation coefficients are appropriately described. The
Ordinary Least Square (OLS) assumptions are then matched to all measurements and test
again with Dummy variable Test and Heteroscedasticity Diagnostic Tests

3. Result and Discussion

3.1. Descriptive Statistics


3.1.1 Summary- Mean, Median, standard deviation etc. for all variables

Table 2: Summary statistics table


Table 2 shows that the sample size is 39, the average of infection rate is 1.820513
indicated that people get Covid infection on average 1.8 times and people maximum get
Covid infection 3 times, minimun is 1 times. The economic model for infection rate can
be defined by the dependent avaribale Infection rate and independent variable masks,
employmentstatus and age. Eac variable with assigned probabilities and estimates,
intercepts (β1 ), slope (β2 , β3 , β4 ), the random variable (u1 ) known as error term or
residual.
- The infection rate is β1 when variable masks, employmentstatus and age equal to
zero. β1 knowns as intercept and parameter.
- When employmentstatus and age are constant, if masks increase by 1 unit, infection
rate will increase by β2 unit. β2 knowns as slope and parameter.
- When masks and age are constant, if employmentstatus increase by 1 unit, the
infection rate will increase by β3 unit. β3 knowns as slope and parameter.
- When masks and employmentstatus are constant, if age increase by 1 unit, the
infection rate will increase by β4 unit. β4 knowns as slope and parameter.
Standard deviation is 0.7208108. The average of wearing masks is 0.9230769 means
that almost people in the survey wear masks. The standard deviation is 0.2699528. The
mean of employment status is 2 indicated that people have 2 jobs on average, someone
even do not have job and the maximum job they have is 4. The mean of age is 53.4659.
The oldest one is 87 years old and the youngest is 32. The standard deviation is 17.08153.

3.1.2 Scatter Diagram


(Show this between independent variables and dependent variable)

3.1.3 Correlation Coefficients for all variables and interpret

Table 3: Correlation Coefficient

Correlation coefficient is calculated using the correlation command, “cor infecrate masks
employmentstatus age” to find out the linear relationship between the variables. The
correlation between the variables can be evaluated as listed below:

- The correlation coefficient between infecrate and masks is 0.1977 , that is closer to
1, means that there is a very strong positive relationship between both the variables.

- The correlation coefficient between infecrate and employmentstatus is 0.0679, that is


close to 1 means that there is a positive relationship between both variables.

- The correlation coefficient between infecrate and age is 0.047 , that is close to 1
means that there is a positive relationship between both variables.

- The positive sign evidently implies that the trend line in the scattered diagram will be
upward sloping and that the variables would change in the same direction.

- The correlation between the same variables, such as covidp and covidp,
employmentstatus and employmentstatus, or age and age, is exactly 1.0000, which
means that there is a perfect positive correlation between the variables.

- There is no correlation between the variables that is exactly 0, indicating that there is
no weak relationship between the variables.
3.2 Regression analysis

3.2.1 Interpretation of Coefficient

- Table 2 is extracted from the regression model to interpret the estimators,


β1 , β2 , β3 , and β4 with each variable in the econometrics model.

Table 4: Interpretation of Variable

- Interpretation of β : When masks, employmentstatus and age are constant (i.e. mask=
employment status = age=0), infection rate on average is 1.237793.

- Interpretation of β2 : Holding employmentstatus and age constant, if masks (i.e. the


number of people who wearing mask) increases by 1 person, infection rate on
average increase 0.5027887.

- Interpretation of β3 : Holding mask and age constant, if employmentstatus increases


by 1 unit, infection rate on average increases by 0.02374.

- Interpretation of β4 : Holding mask and employmentstatus constant, if age increases


by 1 year, infection rate on average increase by 0.0013307.

3.2.2 Individual Hypothesis T-test

The individual hypothesis test or t-test is used to compare the means of two groups to
determine whether a process or treatment actually has an effect on the population of
interest, or whether two groups are different from one another. It is given that the model
has a 95% arbitrary confidence interval, 5% level of significance, with degrees of
freedom is 35 (df=n-k=39-4=35). So, the t-critical or t-tabulated value for the model is
± 2 with P(-t <t<t ) = 0.95.

The null 0.025 −0.025 hypothesis is failed to reject if the calculated t value falls in the
acceptance region, and the null hypothesis is rejected if it falls in the rejection region
(critical region) . The graph below is represent the acceptance and rejection (critical)
regions on a t-distribution curve:
Fig 1 : T- distribution graph

- T-test statistics for β2 :

Null Hypothesis (H0 ): β2 = 0 ( Infection rate does not linearly


dependent on masks)
Alternative Hypothesis (H𝐴 ): β2 ≠ 0 (Infection rate linearly dependent
on masks)

0.5027887−0
t 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 = 1.11
0.4518876

As t 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 = 1.11 fall in the acceptance region, so the Null hypothesis


β2 is not significant and masks is not linearly dependent on infection rate.

- T-test statistics for β3 :

Null Hypothesis (H0 ): β3 = 0 ( Infection rate does not linearly


dependent on employemtstatus.)
Alternative Hypothesis (HA): β3 ≠ 0 ⇒ Infection rate linearly
dependent on employemtstatus.)

0.0237498−0
t 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 = = 0.21
0.1130416

As t 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 = 0.21 indicates that the null hypothesis is accepted as t


fall in the acceptance region. β3 is not significant means that masks is not
linearly dependent on infection rate.
- T-test statistics for β4 :

Null Hypothesis (H0 ): β4 = 0 ( Infection rate does not linearly


dependent on age.)
Alternative Hypothesis (HA): β4 ≠ 0 ⇒ Infection rate linearly
dependent on age.)

0.0013307−0
t 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 = = 0.19
0.0070337

As t 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 = 0.19 indicates that the null hypothesis is accepted as t


fall in the acceptance region. B4 is not significant means that masks is not
linearly dependent on infection rate.

3.2.3 Overall Significance Test of the model ( F-Test)


The f-test is used to measure the overall significance based on the variance analysis
(ANOVA) to assess if the findings of the sample are significant so that it can be determined
if the null hypothesis should be discarded or if the alternative hypothesis should be
accepted. It is given that the model has a 95% arbitrary confidence interval, 5% level of
significance, with degrees of freedom is 3 and 35 respectively. If the calculated f-value falls
in the acceptance region, the null hypothesis is rejected and if it falls in the critical region (
rejection region), the null hypothesis is accepted.

Table 5 : ANOVA TABLE

Fig 2: F- distribution graph


The explained sum of squares (ESS), unexplained sum of squares, also known as
residual sum of squares (RSS), and the degrees of freedom are used to find the f calculated
value. F-test statistics is calculated and show on the table 5 , F𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 = 0.05

F𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 < F𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 ( 0.05 < 2.87) indicated that the Null hypothesis is not rejected
and the model is not significant.

3.2.4 R-squared and Adjusted R-squared

Table 6 : R-squared and Adjusted R-squared

The table above shows that R-square is 0.0411 which mean that 4.11% of the variation
of infection rate can be explained by the independent variables or regression ( or masks,
employmentstatus and age). The R-squared value increases with more and more independent
variables, but degrees of freedom goes down. The Adjusted R-square is -0.0411.

3.2.5 Confidence Interval

Table 7 : Confidence Interval

The table show that 95% confidence interval for β2 is -0.4145918 to 1.420169, β3 is
-0.257369 to 0.2532364, and β4 is -0.0129486 to 0.01561, means that if 100 confidence
intervals are 4 constructed, 95 of them will contain β2 , β3 , β4 and 5 of them will not.

3.2.6 Normality test for OLS residuals (Chi-squared Test)

Table 8: Skewness-kurtosis Test


The chi-square (χ2 ) test is used to is a test that measures how a model compares to
actual observed data. We using the command “predict myResiduals,r” and “sktest
myResiduals” to generate the test whether the residua; is normally distributed or not.

Null Hypothesis (H𝑜 ): U1 is normally distributed


Alternative Hypothesis (H𝐴 ): U1 is not normally distributed

With the significance level 0.05 and degree of freedom 3, therefore F0.05,3 = 7.815

Fig 3: Chi-square (𝜒 2 ) distribution graph

The table 9 show that χ2 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 = 4.2 , which is less than the critical value 7.185.
Therefore, we cannot reject the null pypothesis mean that U1 is normally distributed.

3.2.7 Structural Break Test (Chow Test)

A Chow test is simply a test of whether the coefficients estimated over one group of the data
are equal to the coefficients estimated over another.

Dummy variable Regression Model

In regression analysis, a dummy variable is one that takes the values 0 or 1 to


indicate the absence or presence of some categorical effect that may be expected to shift the
outcome. The infection rate model can be re-write as:

Infection rate = β1 + β3 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 + u𝑖 (1)

E (infection rate) = β1 + β3 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠

To know if there is any discrepancy between people wear mask and without wear
mask have different that might significantly affect theinfection rate, the dummy
variable “mask” can be integrated in the model (marks = 1 if people wear mask and
masks = 0 if people do not wear mask).

E (infection rate) = β1 + β3 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 + β2 𝑚𝑎𝑠𝑘𝑠 + u𝑖 (2)


For people wear mask (marks =1),

E (infection rate) = β1 + β3 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 + β2 ∗ 1


= β1 + β3 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 + β2

If employment status =0, the infection rate on average is E = β1 + β2

For people do not wear mask (masks= 0),


E (infection rate) = β1 + β3 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 + β2 ∗ 0
= β1 + β3 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠
If employment status =0, the infection rate on average is E = β1

Interation with dummy variable ( intercept and slope)


E (infection rate) = β1 + β3 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 + β2 𝑚𝑎𝑠𝑘𝑠+β2 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 ∗
𝑚𝑎𝑠𝑘𝑠 + u𝑖 (3)

For people wear masks,


E (infection rate) = β1 + β3 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 + β2 ∗ 1 +β4 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 ∗ 1

= β1 + β2 + (β3 + β4 ) employmentstatus

For people do not wear masks,


E (infection rate) = β1 + β3 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 + β2 ∗ 0 +β4 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 ∗ 0

= β1 + β3 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠

Graph 4: The Dummy Variable Graph

The slope and intercept for masks and without masks in graph 4 consists of straight
“pooled” regression lines which implies that it does not observe any marginal effect
within the time series of the sample collected.
3.2.8 Heteroscedasticity Diagnostic Tests
Econometrics model:

Infection rate = β1 + β2 𝑀𝑎𝑠𝑘𝑠 + β3 𝐸𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 + β4 𝐴𝑔𝑒 + u𝑖

After finding the regression , we find û and û 2 and run regression û 2 with
other independent variables.

Table 10 : Breush - Pagan test

Null Hypothesis (H𝑜 ): (There is no heteroscedasticity in the model)


Alternative Hypothesis (H𝐴 ): (there is a Heteroscedasticity in the model.)

The table show that F𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 = 0.28 and F0.05,3,35 = 2.8742.

Fig 5: F-distribution table

Because F𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 < F𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 (0.28< 2.87), we cannot reject the null hypothesis mean that
there is no heteroscedasticity in the model.
White test

White test is a statistical test that establishes whether the variance of the errors in a
regression model is constant, using to check homoskedasticity.

Econometrics model

Infection rate = β1 + β2 𝑀𝑎𝑠𝑘𝑠 + β3 𝐸𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 + β4 𝐴𝑔𝑒 + u𝑖


….……model (1)

Axiliaxry model :
Û 2 = ⍺1 + ⍺2 ∗ 𝑚𝑎𝑠𝑘𝑠 + ⍺3 ∗ 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 + ⍺4 ∗ 𝑎𝑔𝑒 + ⍺5 ∗
(𝑚𝑎𝑠𝑘𝑠)2 + ⍺6 (𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 )2 + (𝑎𝑔𝑒)2 ⍺7 + ⍺8 ∗ (𝑚𝑎𝑠𝑘𝑠 ∗
𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 ∗ 𝑎𝑔𝑒) …………. Model (2)

Null hypothesis (H𝑜 ) : ⍺2 = ⍺3 = ⍺4 = ⍺5 = ⍺6 = ⍺7 = ⍺8 = 0. There is heteroscadasticity


in the model (1)

Alternative hypothesis (H𝐴 ): At least one of parameter is not equal to zero. There is
heterosceddasticity in model (1)

Table 11: White test

The table show thatF𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 = 0.72 < F0.05,5,33 = 2.5026


Fig 6: F-distributed graph

As we can see on the table F𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 = 0.72 < F0.05,5,33 = 2.5026 therefore, there is
no heterosceddasticity in model (1)

4. Conclusion

In conclusion, it can be outlined that there is a relationship among infection rate,


masks, emploumentstatus and age. The infection rate have positive relationship on
masks, employmentstatus and age as a result of regression analysis. There is no
hectorscedasticity in the model.

You might also like