Econometrics Final Paper 190071

Econometrics Final Project
Factors affecting Covid’s infection rate.
Name: Nguyen Quang Dong Nguyen

Student ID: 190071
Professor: Moinul Islam
Tuesday, 15 December, 2022
1.Introduction
Covid19 has breaked out in 2019 and its consequences effect on the whole
economy. Many factors can contribute to COVID infection, including people who
frequently go out without wearing a mask (face covering), do not maintain social
distance, frequently go out to work, do not use hand sanitizer, or age (older people are
more likely to get Covid positive than young people). Therefore, I would like to check
whether the Covid infection rate depends on mask, employment status and age using
in an econometric model. In this model, the infection rate is dependence variable, mask,
employment status and age are independent variables.
2. Methods and data source:
2.1 Econometric model
Infection rate = β1 + β2 𝑀𝑎𝑠𝑘𝑠 + β3 𝐸𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 + β4 𝐴𝑔𝑒 + u𝑖
The economic model for infection rate can be defined by the dependent variable is
infection rate and independent variable are masks, employmentstatus and age. Each variable
with assigned probabilities and estimates, intercepts (β1 ), slope (β2 , β3 , β4 ) the random
variable (u𝑖 ) known as error term or residual.
- The infection rate is β1 when variabke masks, employmentstatus and age equal to
zero. β1 knowns as intercept and parameter.
- When employmentstatus and age are constant, if masks increase by 1 unit, infection
rate will increase by β2 unit. β2 knowns as slope and parameter.
- When masks and age are constant, if employmentstatus increase by 1 unit, the
infection rate will increase by β3 unit. β3 knowns as slope and parameter.
- When masks and employmentstatus are constant, if age increase by 1 unit, the
2.2 Data Source
Table 1 show the list of all variable taken from Professor Md Manir Hossain Mollah.
Responser have to answer for all the question including how many time you get Covid
positive? Did you wear mask when you go out ( yes =1, no=0) ? How many job do you
have? What is your age?
Table 1: The list of all variable
2.3 Methodology
Using basic theorecial implementation of econometrics, I will using STATA software

to analysice and provide empirical evidence for my theory. A regression analysis is carried
out to reflect the statistical image of the econometric model developed, along with several
hypothesis tests, to scientifically assess the statistical significance of the sample derived from
the entire population. The t-test is performed to calculate the linear effect on the experiment
and the f-test ANOVA is performed to verify the overall significance of the model. Other
measurements include the chi-square test (the normality test) the white test is run to measure
the heteroscedasticity in the model. In addition, the confidence interval, confidence intervals,
R-square, adjusted R-square and correlation coefficients are appropriately described. The
Ordinary Least Square (OLS) assumptions are then matched to all measurements and test
again with Dummy variable Test and Heteroscedasticity Diagnostic Tests
3. Result and Discussion
3.1. Descriptive Statistics

3.1.1 Summary- Mean, Median, standard deviation etc. for all variables
Table 2: Summary statistics table

Table 2 shows that the sample size is 39, the average of infection rate is 1.820513
indicated that people get Covid infection on average 1.8 times and people maximum get
Covid infection 3 times, minimun is 1 times. The economic model for infection rate can
be defined by the dependent avaribale Infection rate and independent variable masks,
employmentstatus and age. Eac variable with assigned probabilities and estimates,
intercepts (β1 ), slope (β2 , β3 , β4 ), the random variable (u1 ) known as error term or
residual.
- The infection rate is β1 when variable masks, employmentstatus and age equal to
zero. β1 knowns as intercept and parameter.
- When employmentstatus and age are constant, if masks increase by 1 unit, infection
rate will increase by β2 unit. β2 knowns as slope and parameter.
- When masks and age are constant, if employmentstatus increase by 1 unit, the
- When masks and employmentstatus are constant, if age increase by 1 unit, the
Standard deviation is 0.7208108. The average of wearing masks is 0.9230769 means
that almost people in the survey wear masks. The standard deviation is 0.2699528. The
mean of employment status is 2 indicated that people have 2 jobs on average, someone
even do not have job and the maximum job they have is 4. The mean of age is 53.4659.
The oldest one is 87 years old and the youngest is 32. The standard deviation is 17.08153.
3.1.2 Scatter Diagram

(Show this between independent variables and dependent variable)
3.1.3 Correlation Coefficients for all variables and interpret
Table 3: Correlation Coefficient
Correlation coefficient is calculated using the correlation command, “cor infecrate masks
employmentstatus age” to find out the linear relationship between the variables. The
correlation between the variables can be evaluated as listed below:
- The correlation coefficient between infecrate and masks is 0.1977 , that is closer to
1, means that there is a very strong positive relationship between both the variables.
- The correlation coefficient between infecrate and employmentstatus is 0.0679, that is

close to 1 means that there is a positive relationship between both variables.
- The correlation coefficient between infecrate and age is 0.047 , that is close to 1
means that there is a positive relationship between both variables.
- The positive sign evidently implies that the trend line in the scattered diagram will be
upward sloping and that the variables would change in the same direction.
- The correlation between the same variables, such as covidp and covidp,
employmentstatus and employmentstatus, or age and age, is exactly 1.0000, which
means that there is a perfect positive correlation between the variables.
- There is no correlation between the variables that is exactly 0, indicating that there is
no weak relationship between the variables.
3.2 Regression analysis
3.2.1 Interpretation of Coefficient
- Table 2 is extracted from the regression model to interpret the estimators,

β1 , β2 , β3 , and β4 with each variable in the econometrics model.
Table 4: Interpretation of Variable
- Interpretation of β : When masks, employmentstatus and age are constant (i.e. mask=
employment status = age=0), infection rate on average is 1.237793.
- Interpretation of β2 : Holding employmentstatus and age constant, if masks (i.e. the

number of people who wearing mask) increases by 1 person, infection rate on
average increase 0.5027887.
- Interpretation of β3 : Holding mask and age constant, if employmentstatus increases

by 1 unit, infection rate on average increases by 0.02374.
- Interpretation of β4 : Holding mask and employmentstatus constant, if age increases

by 1 year, infection rate on average increase by 0.0013307.
3.2.2 Individual Hypothesis T-test
The individual hypothesis test or t-test is used to compare the means of two groups to
determine whether a process or treatment actually has an effect on the population of
interest, or whether two groups are different from one another. It is given that the model
has a 95% arbitrary confidence interval, 5% level of significance, with degrees of
freedom is 35 (df=n-k=39-4=35). So, the t-critical or t-tabulated value for the model is
± 2 with P(-t <t<t ) = 0.95.
The null 0.025 −0.025 hypothesis is failed to reject if the calculated t value falls in the
acceptance region, and the null hypothesis is rejected if it falls in the rejection region
(critical region) . The graph below is represent the acceptance and rejection (critical)
regions on a t-distribution curve:
Fig 1 : T- distribution graph
- T-test statistics for β2 :
Null Hypothesis (H0 ): β2 = 0 ( Infection rate does not linearly

dependent on masks)
Alternative Hypothesis (H𝐴 ): β2 ≠ 0 (Infection rate linearly dependent
on masks)
0.5027887−0
t 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 = 1.11
0.4518876
As t 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 = 1.11 fall in the acceptance region, so the Null hypothesis

β2 is not significant and masks is not linearly dependent on infection rate.

dependent on employemtstatus.)
Alternative Hypothesis (HA): β3 ≠ 0 ⇒ Infection rate linearly
dependent on employemtstatus.)
0.0237498−0
t 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 = = 0.21
0.1130416
As t 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 = 0.21 indicates that the null hypothesis is accepted as t

fall in the acceptance region. β3 is not significant means that masks is not
linearly dependent on infection rate.

dependent on age.)
Alternative Hypothesis (HA): β4 ≠ 0 ⇒ Infection rate linearly
dependent on age.)
0.0013307−0
t 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 = = 0.19
0.0070337
As t 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 = 0.19 indicates that the null hypothesis is accepted as t

fall in the acceptance region. B4 is not significant means that masks is not
linearly dependent on infection rate.
3.2.3 Overall Significance Test of the model ( F-Test)

The f-test is used to measure the overall significance based on the variance analysis
(ANOVA) to assess if the findings of the sample are significant so that it can be determined
if the null hypothesis should be discarded or if the alternative hypothesis should be
accepted. It is given that the model has a 95% arbitrary confidence interval, 5% level of
significance, with degrees of freedom is 3 and 35 respectively. If the calculated f-value falls
in the acceptance region, the null hypothesis is rejected and if it falls in the critical region (
rejection region), the null hypothesis is accepted.
Table 5 : ANOVA TABLE
Fig 2: F- distribution graph

The explained sum of squares (ESS), unexplained sum of squares, also known as
residual sum of squares (RSS), and the degrees of freedom are used to find the f calculated
value. F-test statistics is calculated and show on the table 5 , F𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 = 0.05
F𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 < F𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 ( 0.05 < 2.87) indicated that the Null hypothesis is not rejected
and the model is not significant.
3.2.4 R-squared and Adjusted R-squared
Table 6 : R-squared and Adjusted R-squared
The table above shows that R-square is 0.0411 which mean that 4.11% of the variation
of infection rate can be explained by the independent variables or regression ( or masks,
employmentstatus and age). The R-squared value increases with more and more independent
variables, but degrees of freedom goes down. The Adjusted R-square is -0.0411.
3.2.5 Confidence Interval
Table 7 : Confidence Interval
The table show that 95% confidence interval for β2 is -0.4145918 to 1.420169, β3 is
-0.257369 to 0.2532364, and β4 is -0.0129486 to 0.01561, means that if 100 confidence
intervals are 4 constructed, 95 of them will contain β2 , β3 , β4 and 5 of them will not.
3.2.6 Normality test for OLS residuals (Chi-squared Test)
Table 8: Skewness-kurtosis Test

The chi-square (χ2 ) test is used to is a test that measures how a model compares to
actual observed data. We using the command “predict myResiduals,r” and “sktest
myResiduals” to generate the test whether the residua; is normally distributed or not.
Null Hypothesis (H𝑜 ): U1 is normally distributed

Alternative Hypothesis (H𝐴 ): U1 is not normally distributed
With the significance level 0.05 and degree of freedom 3, therefore F0.05,3 = 7.815
Fig 3: Chi-square (𝜒 2 ) distribution graph
The table 9 show that χ2 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 = 4.2 , which is less than the critical value 7.185.
Therefore, we cannot reject the null pypothesis mean that U1 is normally distributed.
3.2.7 Structural Break Test (Chow Test)
A Chow test is simply a test of whether the coefficients estimated over one group of the data
are equal to the coefficients estimated over another.
Dummy variable Regression Model
In regression analysis, a dummy variable is one that takes the values 0 or 1 to

indicate the absence or presence of some categorical effect that may be expected to shift the
outcome. The infection rate model can be re-write as:
Infection rate = β1 + β3 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 + u𝑖 (1)
E (infection rate) = β1 + β3 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠
To know if there is any discrepancy between people wear mask and without wear
mask have different that might significantly affect theinfection rate, the dummy
variable “mask” can be integrated in the model (marks = 1 if people wear mask and
masks = 0 if people do not wear mask).
E (infection rate) = β1 + β3 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 + β2 𝑚𝑎𝑠𝑘𝑠 + u𝑖 (2)

For people wear mask (marks =1),
E (infection rate) = β1 + β3 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 + β2 ∗ 1

= β1 + β3 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 + β2
If employment status =0, the infection rate on average is E = β1 + β2
For people do not wear mask (masks= 0),

E (infection rate) = β1 + β3 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 + β2 ∗ 0
= β1 + β3 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠
If employment status =0, the infection rate on average is E = β1
Interation with dummy variable ( intercept and slope)

E (infection rate) = β1 + β3 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 + β2 𝑚𝑎𝑠𝑘𝑠+β2 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 ∗
𝑚𝑎𝑠𝑘𝑠 + u𝑖 (3)
For people wear masks,

E (infection rate) = β1 + β3 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 + β2 ∗ 1 +β4 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 ∗ 1
= β1 + β2 + (β3 + β4 ) employmentstatus
For people do not wear masks,

E (infection rate) = β1 + β3 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 + β2 ∗ 0 +β4 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 ∗ 0
= β1 + β3 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠
Graph 4: The Dummy Variable Graph
The slope and intercept for masks and without masks in graph 4 consists of straight
“pooled” regression lines which implies that it does not observe any marginal effect
within the time series of the sample collected.
3.2.8 Heteroscedasticity Diagnostic Tests
Econometrics model:
After finding the regression , we find û and û 2 and run regression û 2 with
other independent variables.
Table 10 : Breush - Pagan test
Null Hypothesis (H𝑜 ): (There is no heteroscedasticity in the model)

Alternative Hypothesis (H𝐴 ): (there is a Heteroscedasticity in the model.)
The table show that F𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 = 0.28 and F0.05,3,35 = 2.8742.
Fig 5: F-distribution table
Because F𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 < F𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 (0.28< 2.87), we cannot reject the null hypothesis mean that
there is no heteroscedasticity in the model.
White test
White test is a statistical test that establishes whether the variance of the errors in a
regression model is constant, using to check homoskedasticity.
Econometrics model

….……model (1)
Axiliaxry model :
Û 2 = ⍺1 + ⍺2 ∗ 𝑚𝑎𝑠𝑘𝑠 + ⍺3 ∗ 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 + ⍺4 ∗ 𝑎𝑔𝑒 + ⍺5 ∗
(𝑚𝑎𝑠𝑘𝑠)2 + ⍺6 (𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 )2 + (𝑎𝑔𝑒)2 ⍺7 + ⍺8 ∗ (𝑚𝑎𝑠𝑘𝑠 ∗
𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 ∗ 𝑎𝑔𝑒) …………. Model (2)
Null hypothesis (H𝑜 ) : ⍺2 = ⍺3 = ⍺4 = ⍺5 = ⍺6 = ⍺7 = ⍺8 = 0. There is heteroscadasticity

in the model (1)
Alternative hypothesis (H𝐴 ): At least one of parameter is not equal to zero. There is
heterosceddasticity in model (1)
Table 11: White test
The table show thatF𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 = 0.72 < F0.05,5,33 = 2.5026

Fig 6: F-distributed graph
As we can see on the table F𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 = 0.72 < F0.05,5,33 = 2.5026 therefore, there is
no heterosceddasticity in model (1)
4. Conclusion
In conclusion, it can be outlined that there is a relationship among infection rate,

masks, emploumentstatus and age. The infection rate have positive relationship on
masks, employmentstatus and age as a result of regression analysis. There is no
hectorscedasticity in the model.

Econometrics Final Paper 190071

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econometrics Final Paper 190071

Uploaded by

Copyright:

Available Formats

Econometrics Final Project

Factors affecting Covid’s infection rate.

Name: Nguyen Quang Dong Nguyen

2. Methods and data source:

2.1 Econometric model

Infection rate = β1 + β2 𝑀𝑎𝑠𝑘𝑠 + β3 𝐸𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 + β4 𝐴𝑔𝑒 + u𝑖

2.2 Data Source

Using basic theorecial implementation of econometrics, I will using STATA software

3. Result and Discussion

3.1. Descriptive Statistics

Table 2: Summary statistics table

3.1.2 Scatter Diagram

3.1.3 Correlation Coefficients for all variables and interpret

Table 3: Correlation Coefficient

- The correlation coefficient between infecrate and employmentstatus is 0.0679, that is

3.2.1 Interpretation of Coefficient

- Table 2 is extracted from the regression model to interpret the estimators,

Table 4: Interpretation of Variable

- Interpretation of β2 : Holding employmentstatus and age constant, if masks (i.e. the

- Interpretation of β3 : Holding mask and age constant, if employmentstatus increases

- Interpretation of β4 : Holding mask and employmentstatus constant, if age increases

3.2.2 Individual Hypothesis T-test

- T-test statistics for β2 :

Null Hypothesis (H0 ): β2 = 0 ( Infection rate does not linearly

As t 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 = 1.11 fall in the acceptance region, so the Null hypothesis

- T-test statistics for β3 :

Null Hypothesis (H0 ): β3 = 0 ( Infection rate does not linearly

As t 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 = 0.21 indicates that the null hypothesis is accepted as t

Null Hypothesis (H0 ): β4 = 0 ( Infection rate does not linearly

As t 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 = 0.19 indicates that the null hypothesis is accepted as t

3.2.3 Overall Significance Test of the model ( F-Test)

Table 5 : ANOVA TABLE

Fig 2: F- distribution graph

3.2.4 R-squared and Adjusted R-squared

Table 6 : R-squared and Adjusted R-squared

3.2.5 Confidence Interval

Table 7 : Confidence Interval

3.2.6 Normality test for OLS residuals (Chi-squared Test)

Table 8: Skewness-kurtosis Test

Null Hypothesis (H𝑜 ): U1 is normally distributed

Fig 3: Chi-square (𝜒 2 ) distribution graph

3.2.7 Structural Break Test (Chow Test)

Dummy variable Regression Model

In regression analysis, a dummy variable is one that takes the values 0 or 1 to

Infection rate = β1 + β3 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 + u𝑖 (1)

E (infection rate) = β1 + β3 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠

E (infection rate) = β1 + β3 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 + β2 𝑚𝑎𝑠𝑘𝑠 + u𝑖 (2)

E (infection rate) = β1 + β3 𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 + β2 ∗ 1

If employment status =0, the infection rate on average is E = β1 + β2

For people do not wear mask (masks= 0),

Interation with dummy variable ( intercept and slope)

For people wear masks,

For people do not wear masks,

Graph 4: The Dummy Variable Graph

Infection rate = β1 + β2 𝑀𝑎𝑠𝑘𝑠 + β3 𝐸𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 + β4 𝐴𝑔𝑒 + u𝑖

Table 10 : Breush - Pagan test

Null Hypothesis (H𝑜 ): (There is no heteroscedasticity in the model)

The table show that F𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 = 0.28 and F0.05,3,35 = 2.8742.

Fig 5: F-distribution table

Infection rate = β1 + β2 𝑀𝑎𝑠𝑘𝑠 + β3 𝐸𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑢𝑠 + β4 𝐴𝑔𝑒 + u𝑖

Null hypothesis (H𝑜 ) : ⍺2 = ⍺3 = ⍺4 = ⍺5 = ⍺6 = ⍺7 = ⍺8 = 0. There is heteroscadasticity

Table 11: White test

The table show thatF𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 = 0.72 < F0.05,5,33 = 2.5026