You are on page 1of 20

Factors that affect AQI

FOREIGN TRADE UNIVERSITY

FACULTY OF INTERNATIONAL ECONOMICS




ECONOMETRIC REPORT
Topic: Factors that affect AQI

Class : KTEE 309.1


Student Name – ID : Khuat Duc Hung - 1816150082 – 20%
Phan Van Phuc - 1811150116 – 20%
Nguyen Hoang Linh - 1811150093 – 20%
Tran Ha Trang - 1811150133 – 20%
Pham Phuong Thao - 1817150125 – 20%
Supervisor : PhD. Dinh Thi Thanh Binh

Hanoi 10/2019 1
Factors that affect AQI

Table of Contents

I. Introduction .......................................................................................................................... 3
II. Literature review ................................................................................................................ 3
1. Question of interest ............................................................................................................ 3
2. Procedure and program used .............................................................................................. 4
III. Economic model................................................................................................................. 5
1. Specifying the object for modeling .................................................................................... 5
2. Defining the target for modeling by the choice of the variables to analyze, denote {𝑥𝑖} . 5
3. Embedding that target in a general unrestricted model (GUM) ......................................... 5
IV. Econometric model ............................................................................................................ 6
V. Data collection ..................................................................................................................... 7
1. Data overview .................................................................................................................... 7
2. Data description.................................................................................................................. 7
VI. Estimation of econometric model ..................................................................................... 8
1. Checking the correlation among variables ......................................................................... 8
2. Regression run .................................................................................................................. 10
VII. Diagnosing the model problem ..................................................................................... 13
1. Normality ........................................................................................................................ 13
2. Multicolinearity ................................................................................................................ 14
3. Heteroskedasticity ......................................................................................................................... 14
VIII. Hypothesis postulated ............................................................................................................... 17
IX. Result analysis & Policy implication ........................................................................................... 18
X. Conclusion....................................................................................................................................... 19
XI.References ...................................................................................................................................... 20

Exhibit 1: Definition of variables in the AQI model .......................................................................... 6


Exhibit 2: Statistic indicators of variables in the AQI model ........................................................... 7
Exhibit 3: Correlation matrix .............................................................................................................. 8
Exhibit 4: Scatterplot of variables in the AQI model ........................................................................ 9
Exhibit 5: Regression model............................................................................................................... 10
Exhibit 6: Histogram plot indicating normality ............................................................................... 13
Exhibit 7: Skewness/Kurtosis tests for normality ............................................................................ 13
Exhibit 8: Multicollinearity test ......................................................................................................... 14
Exhibit 9: Heteroskedasticity test ...................................................................................................... 15
Exhibit 10: Residual-versus-fitted plot of the AQI model ............................................................... 15
Exhibit 11: Correcting heteroskedasticity ........................................................................................ 16

2
Factors that affect AQI

I. Introduction
As much as Economy is a meaningful science that determines the social
development in general and national growth in particular, Econometrics is the
use of statistical techniques to understand those issues and test theories. Without
evidence, economic theories are abstract and might have no bearing on reality
(even if they are completely rigorous). Econometrics is a set of tools we can use
to confront theory with real-world data.
Given the data set, our group, which includes five members: Tran Ha
Trang, Khuat Duc Hung, Phan Van Phuc, Nguyen Hoang Linh, and Pham
Phuong Thao, follows the methodology of econometric comprising eight steps
to analyze the data. Note that because of the lack of information on the data set,
all inferences of abbreviations and others are based on assumptions and self-
research. As a result, we hope to have shown clearly our logic and reasoning of
analysis.
To the extent of purpose and resources, there are still deficiencies in this
report, but we look forward to providing readers with a decent view of the
overall of the data set given and the knowledge that we have gained through Dr.
Dinh Thanh Binh’s Econometrics course.

II. Literature review


1. Question of interest
Recently, Hanoi and Ho Chi Minh City is facing a huge problem which is
the heavily polluted air. Both cites have been in the Top 4 of Air quality and
Pollution Ranking for 1 week. This is ranked according to AQI also known as
Air Quality Index which measures the quality of the air. Both Hanoi and Ho Chi
Minh City have a very high AQI of 150-170 which means it could cause serious
consequences to human.
3
Factors that affect AQI

Vietnamese people have been familiar to the term “AQI” recently but we only
know it as an index to measure the air quality, very few of us really understand
it. And the government has said that the AQI of Hanoi is not objective enough
to classify Hanoi as the most polluted city in the world. This declaration has
raised a question among people “What AQI truly is?”. And now people are
horribly confusing with the ranking and the government’s declaration.
For all those reasons, our group choose to analyze the AQI of cities in China to
find out what affects the AQI and help others to understand more about AQI.
We are going to run the Regression model and test out all the necessarily to
truly understand the factors of AQI. Since we are dealing with the dataset with
322 observations, the result would be objective enough to count on.

2. Procedure and program used


• Procedure
Step 1: Questions of interest
Step 2: Economic model
Step 3: Econometric model
Step 4: Data collection
Step 5: Estimation of econometric model
Step 6: Check multicollinearity and heteroscedasticity
Step 7: Hypothesis postulated
Step 8: Result analysis & Policy implication
• Stata program is primarily used to analyze the data and run the
regression.

4
Factors that affect AQI

III. Economic model


As data are provided up front, the economic model used in this report is
an empirical one. Note that the fundamental model is mathematical; with an
empirical model, however, data is gathered for the variables and using accepted
statistical techniques, the data are used to provide estimates of the model's
values.
Empirical model discovery and theory evaluation are suggested to
involve five key steps, but for the limitation of purpose and resources, this part
of the report only follows three of them:
(1) Specifying the object for modelling.
(2) Defining the target for modelling.
(3) Embedding that target in a general unrestricted model.
1. Specifying the object for modeling
𝐴𝑄𝐼 = 𝑓(𝑥)
As such, this report find the relationship between AQI, which is the
object for modeling, and each of relating factors including location factors,
environment factors and human factors.
2. Defining the target for modeling by the choice of the variables to
analyze, denote {𝒙𝒊 }
As mentioned above, there are three main categories that are expected to
affect AQI: location, environment and human. Hence, the choices of {𝑥𝑖 } would
be such variables that constitute them. After thorough research, factors have
been narrowed down to eight significant ones: (location) altitude, coastal,
(environment) precipitation, temperature, green coverage, (human) GDP,
population density and the amount of incinerated waste.
3. Embedding that target in a general unrestricted model (GUM)
In its simplest acceptable representation (which will later be specified in
the econometric model), the GUM of is determined to be:
𝐴𝑄𝐼 = 𝑓(𝑎𝑙𝑡𝑖, 𝑐𝑜𝑎𝑠𝑡𝑎𝑙, 𝑝𝑟𝑒𝑐, 𝑡𝑒𝑚𝑝, 𝑔𝑟𝑒𝑒𝑛, 𝑔𝑑𝑝, 𝑝𝑜𝑝𝑢, 𝑖𝑛𝑐𝑖)
A brief description of each variable is given in Exhibit 1.

5
Factors that affect AQI

Exhibit 1: Definition of variables in the AQI model


Variable Definition
AQI median air quality index
alti altitude measured as meters
coastal =1 if coastal
prec precipitation measured as mm
temp temperature measured as Celcius degree
green percentage of the city covered with trees
gdp GDP measured as yuan
popu number of people per kilometer square
inci incenerated waste measured as 10000 tons

IV. Econometric model


To demonstrate the relationship between AQI and other factors, the
regression function can be constructed as follows:
• (PRF):
𝑎𝑞𝑖 = 𝛽0 + 𝛽1 𝑎𝑙𝑡𝑖 + 𝛽2 𝑐𝑜𝑎𝑠𝑡𝑎𝑙 + 𝛽3 𝑝𝑟𝑒𝑐 + 𝛽4 𝑡𝑒𝑚𝑝
+ 𝛽5 𝑔𝑟𝑒𝑒𝑛 + 𝛽6 𝑔𝑑𝑝 + 𝛽7 𝑝𝑜𝑝𝑢 + 𝛽8 𝑖𝑛𝑐𝑖 + 𝑢
• (SRF):
𝑎𝑞𝑖 = ̂𝛽0 + 𝛽 ̂1 𝑎𝑙𝑡𝑖 + 𝛽
̂2 𝑐𝑜𝑎𝑠𝑡𝑎𝑙 + 𝛽
̂3 𝑝𝑟𝑒𝑐 + 𝛽
̂4 𝑡𝑒𝑚𝑝
+𝛽 ̂5 𝑔𝑟𝑒𝑒𝑛 + 𝛽 ̂6 𝑔𝑑𝑝 + 𝛽
̂7 𝑝𝑜𝑝𝑢 + 𝛽
̂8 𝑖𝑛𝑐𝑖 + 𝑢̂

where:
• 𝛽0 is the intercept of the regression model
• 𝛽𝑖 is the slope coefficient of the independent variable xi
• 𝑢 is the disturbance of the regression model
• 𝛽 ̂0 is the estimator of 𝛽0
• 𝛽̂𝑖 is the estimator of 𝛽𝑖
• 𝑢̂ is the residual (the estimator of 𝑢)
From this model, this report is interested in explaining AQI in terms of
each of the eight independent variables:
(𝑎𝑙𝑡𝑖, 𝑐𝑜𝑎𝑠𝑡𝑎𝑙, 𝑝𝑟𝑒𝑐, 𝑡𝑒𝑚𝑝, 𝑔𝑟𝑒𝑒𝑛, 𝑔𝑑𝑝, 𝑝𝑜𝑝𝑢, 𝑖𝑛𝑐𝑖).

6
Factors that affect AQI

V. Data collection
1. Data overview
This set of data is a secondary one, collected from a given source.
Data source: https://www.kaggle.com/maxwellnee/china-aqi-test
This data is conducted in 2015 and is a set of 322 observations which are
322 cities in China. It shows their API in 2015 and also the correlative factors,
including factors that we have mentioned above in our model.
The structure of Economic data: cross-sectional data.

2. Data description
To get statistic indicators of the variables, in Stata, the following
command is used:
sum aqi alti coastal prec temp green gdp popu inci

The result is shown in Exhibit 2.


Exhibit 2: Statistic indicators of variables in the AQI model

Variable | Obs Mean Std. Dev. Min Max


---------+--------------------------------------------------------
aqi | 323 75.19195 43.11274 12 296
alti | 323 382.2505 743.2191 -12 4505
coastal | 323 .247678 .4323335 0 1
prec | 323 1081.209 584.6341 56.1 2478.1
temp | 323 15.98943 5.029369 -2.5 27.44794
---------+--------------------------------------------------------
green | 323 38.3831 6.321973 7.6 76.49
gdp | 323 2394.094 3263.974 22.5 24964.99
popu | 323 2596.56 2913.836 1 25900
inci | 323 52.41839 91.97647 1.53 686.67

where:
• Obs is the number of observations.
• Std. Dev is the standard deviation of the variable.
• Min is the minimum value of the variable.
• Max is the maximum value of the variable.

7
Factors that affect AQI

VI. Estimation of econometric model


1. Checking the correlation among variables
First of all, the correlation of aqi and alti, coastal, prec, temp, green, gdp,
popu, inci is checked by calculating the correlation coefficient among these
variables. The correlation coefficient 𝑟 measures the strength and direction of a
linear relationship between two variables on a scatterplot. In Stata, the
correlation matrix is generated with the command:
corr aqi alti coastal prec temp green gdp popu inci

The result is shown in Exhibit 3.


Exhibit 3: Correlation matrix

. corr aqi alti coastal prec temp green gdp popu inci
(obs=323)

| aqi alti coastal prec temp green gdp popu inci


--------+-------------------------------------------------------------------------
aqi | 1.0000
alti | -0.2025 1.0000
coastal | -0.1560 -0.2716 1.0000
prec | -0.4044 -0.3236 0.2597 1.0000
temp | -0.2874 -0.4594 0.3059 0.6856 1.0000
green | -0.0988 -0.1824 0.2644 0.1532 0.2166 1.0000
gdp | 0.1568 -0.2090 0.1742 0.1768 0.1458 -0.0392 1.0000
popu | -0.0348 -0.0314 -0.0342 0.0661 0.1449 0.0212 0.2294 1.0000
inci | 0.0958 -0.1222 0.1589 0.2013 0.1736 -0.0291 0.8996 0.2836 1.0000

From the correlation matrix, it can be inferred that the correlation


between aqi and each of the independent variable is decent enough to run the
regression model . Specially:
- aqi and alti have a weak downhill relationship.
- aqi and coastal have a weak downhill relationship.
- aqi and prec have a moderate downhill relationship.
- aqi and temp have a weak downhill relationship.
- aqi and green have a weak downhill relationship.
- aqi and gdp have a weak uphill relationship.
- aqi and popu have a weak downhill relationship.
- aqi and inci have a weak uphill relationship.
The correlation between each pair of them can be visualized using scatter
plot graph in Stata.
The result is shown in Exhibit 4.
8
Factors that affect AQI

300 Exhibit 4: Scatterplot of variables in AQI model

300
200

200
AQI

AQI
100

100
0

0
0 1000 2000 3000 4000 5000 0 .2 .4 .6 .8 1
Altitude (m) Coastal (0 for non-coastal and 1 for coastal)
300

300
200

200
AQI

AQI
100

100
0

0 500 1000 1500 2000 2500 0 10 20 30


Precipitation (mm) Temperature (Celcius degree)
300
200
AQI
100
0

0 5000 10000 15000 20000 25000


GDP (yuan)
300

300
200

200
AQI

AQI
100

100
0

0 5000 10000 15000 20000 25000 0 200 400 600 800


Population Density (People/km2) Incinerator (10000 tons)

9
Factors that affect AQI

However, the correlation coefficient between gdp and inci is especially


high (greater than 0,8). So to make our model work efficiently, we should drop
one of these two variables.
So we run the regression model in Stata in both cases.
• Case 1: With gdp and without inci:
. reg aqi alti coastal prec temp green gdp popu

Source | SS df MS Number of obs = 323


---------+------------------------------ F( 7, 315) = 25.16
Model | 214653.49 7 30664.7843 Prob > F = 0.0000
Residual | 383850.609 315 1218.57336 R-squared = 0.3586
---------+------------------------------ Adj R-squared = 0.3444
Total | 598504.099 322 1858.70838 Root MSE = 34.908

• Case 2: With inci and without gdp:


. reg aqi alti coastal prec temp green popu inci

Source | SS df MS Number of obs = 323


---------+------------------------------ F( 7, 315) = 24.83
Model | 212821.762 7 30403.1089 Prob > F = 0.0000
Residual | 385682.337 315 1224.38837 R-squared = 0.3556
---------+------------------------------ Adj R-squared = 0.3413
Total | 598504.099 322 1858.70838 Root MSE = 34.991

2 2
We have 𝑅𝑤𝑖𝑡ℎ𝑜𝑢𝑡 𝑖𝑛𝑐𝑖 > 𝑅𝑤𝑖𝑡ℎ𝑜𝑢𝑡 𝑔𝑑𝑝 so we will drop inci out of the
model and keep gdp.
2. Regression run
Having checked the required condition of correlation among variables,
the regression model is ready to run. In Stata, this is done by using the
command:
reg aqi alti coastal prec temp green gdp popu

The result is shown in Exhibit 5.


Exhibit 5: Regression model
. reg aqi alti coastal prec temp green gdp popu

Source | SS df MS Number of obs = 323


---------+------------------------------ F( 7, 315) = 25.16
Model | 214653.49 7 30664.7843 Prob > F = 0.0000
Residual | 383850.609 315 1218.57336 R-squared = 0.3586
---------+------------------------------ Adj R-squared = 0.3444
Total | 598504.099 322 1858.70838 Root MSE = 34.908

--------------------------------------------------------------------------

10
Factors that affect AQI

aqi | Coef. Std. Err. t P>|t| [95% Conf. Interval]


--------+-----------------------------------------------------------------
alti | -.0241589 .0030309 -7.97 0.000 -.0301223 -.0181955
coastal | -13.84192 4.968798 -2.79 0.006 -23.61814 -4.065688
prec | -.0307408 .004615 -6.66 0.000 -.0398209 -.0216607
temp | -1.429899 .5802437 -2.46 0.014 -2.571542 -.2882556
green | -.1994903 .3257578 -0.61 0.541 -.8404266 .4414459
gdp | .002662 .0006408 4.15 0.000 .0014012 .0039229
popu | -.0006881 .0006986 -0.98 0.325 -.0020627 .0006865
_cons | 147.0262 14.1379 10.40 0.000 119.2096 174.8429

From the result, it can be inferred that:


➢ We have the regression function:
𝑎𝑞𝑖 = 147.0262 − .0242𝑎𝑙𝑡𝑖 − 13.8419𝑐𝑜𝑎𝑠𝑡𝑎𝑙 − .0307𝑝𝑟𝑒𝑐
− 1.4299𝑡𝑒𝑚𝑝 − .1995𝑔𝑟𝑒𝑒𝑛 + .0027𝑔𝑑𝑝 − .0007𝑝𝑜𝑝𝑢 + 𝑢̂𝑖
in which, regression coefficients:
❖ 𝛽 ̂0 = 147.0262: When all the independent variables are zero, the
expected value of AQI is 147.0262.
❖ 𝛽 ̂1 = −0.0242: When altitude increases by one meter, the expected
value of AQI decreases by 0.0242.
❖ 𝛽 ̂2 = −13.8419: Expected value of AQI in coastal city is lower than
that in non-coastal city 13.8419 unit
❖ 𝛽 ̂3 = −0.0307: When precipitation increases by one mm, the
expected value of AQI decreases by 0.0307
❖ 𝛽 ̂4 = −1.4299: When temperature increases by one Celcius degree,
the expected value of AQI decreases by 1.4299
❖ 𝛽 ̂5 = −0.1995: When the percentage of city covered with trees
increases by one percent, the expected value of AQI decreases by
0.1995
❖ 𝛽 ̂6 = 0.0027: When GDP increases by 1 yuan, the expected value
of AQI increases by 0.0027
❖ 𝛽 ̂7 = −0.0007: When population density increases by 1 person per
kilometer square, the expected value of AQI decreases by 0.0007

➢ The coefficient of determination 𝑅 − 𝑠𝑞𝑢𝑎𝑟𝑒𝑑 = 0.3586:


❖ All independent variables (alti, coastal, prec, temp, green, gdp, popu)
jointly explain 35.86% of the variation in the dependent variable (aqi).
❖ Other factors that are not mentioned explain the remaining 64,14% of
the variation in the aqi.

11
Factors that affect AQI

➢ Other indicators:
❖ Adjusted coefficient of determination adj R-squared = 0.3444
❖ Total Sum of Squares TSS = 598504.099
❖ Explained Sum of Squares ESS = 214653.49
❖ Residual Sum of Squares RSS = 383850.609
❖ The degree of freedom of Model Dfm = 7
❖ The degree of freedom of residual Dfr = 315

12
Factors that affect AQI

VII. Diagnosing the model problem


1. Normality
𝐻0 : 𝑢𝑖 𝑖𝑠 𝑛𝑜𝑟𝑚𝑎𝑙𝑙𝑦 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑑
We have this following hypothesis: {
𝐻1 : 𝑢𝑖 𝑖𝑠 𝑛𝑜𝑡 𝑛𝑜𝑟𝑚𝑎𝑙𝑙𝑦 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑑
To test this hypothesis, we can use histogram in Stata, which is generated
using these commands:
predict resid, residual
histogram resid, normal

The result is shown in Exhibit 6:


Exhibit 6: Histogram plot indicating normality
.015
.01
Density

.005

-100 -50 0 50 100 150


Residuals

We can also test normality using Skewness Kurtosis test for normality,
using the command:
sktest resid

The result is shown in Exhibit 7.


Exhibit 7: Skewness/Kurtosis tests for normality

. sktest resid
Skewness/Kurtosis tests for Normality
------- joint ------
Variable | Obs Pr(Skewness) Pr(Kurtosis) adj chi2(2) Prob>chi2
---------+---------------------------------------------------------------
resid | 323 0.0000 0.0001 33.59 0.0000

13
Factors that affect AQI

At the 5% significance level, both p-values of Skewness and Kurtosis are


smaller than 0.05 so we have enough evidence to reject H0.
However, our sample has 323 observations in total, which is really big
that even though ui is not normally distributed, this model can still give us good
results and can still be used for statistic analysis.
2. Multicolinearity
Multicollinearity is the high degree of correlation amongst the
explanatory variables, which may make it difficult to separate out the effects of
the individual regressors, standard errors may be overestimated and t-value
depressed. The problem of Multicollinearity can be detected by examining the
correlation matrix of regressors and carry out auxiliary regressions amongst
them. In Stata, the vif command is used, which stand for variance inflation
factor.
Exhibit 8 shows the result.
Exhibit 8: Multicolinearity test

. vif

Variable | VIF 1/VIF


-------------+----------------------
temp | 2.25 0.444373
prec | 1.92 0.519860
alti | 1.34 0.745784
coastal | 1.22 0.820078
gdp | 1.16 0.865040
green | 1.12 0.892279
popu | 1.10 0.913165
-------------+----------------------
Mean VIF | 1.44

The value of VIF here is lower than 10, indicating that Multicollinearity
is not too worrisome a problem for this set of data.
3. Heteroskedasticity
Heteroskedasticity indicates that the variance of the error term is not
constant, which makes the least squares results no longer efficient and t tests
and F tests results may be misleading. The problem of Heteroskedasticity can be
detected by plotting the residuals against each of the regressors, most popularly
the White’s test. It can be remedied by respecifying the model – look for other

14
Factors that affect AQI

missing variables. In Stata, the imtest white command is used, which stands
for information matric test.
Exhibit 9 shows the result.
Exhibit 9: Heteroskedasticity test

. imtest, white
White's test for Ho: homoskedasticity
against Ha: unrestricted heteroskedasticity
chi2(34) = 82.54
Prob > chi2 = 0.0000

Cameron & Trivedi's decomposition of IM-test


---------------------------------------------------
Source | chi2 df p
---------------------+-----------------------------
Heteroskedasticity | 82.54 34 0.0000
Skewness | 27.61 7 0.0003
Kurtosis | 8.85 1 0.0029
---------------------+-----------------------------
Total | 119.01 42 0.0000
---------------------------------------------------

At the 5% significance level, there is enough evidence to reject the null


hypothesis and conclude that this set of data meets the problem of
Heteroskedasticity.
Another way to test if Heteroskedasticity exists is to graph the residual-
versus-fitted plot, which can be generated using the rvfplot, yline (0) line
command in Stata.
The result is shown in Exhibit 10.
Exhibit 10: Residual-versus-fitted plot of the AQI model
150
100
50
Residuals

0
-50
-100

0 50 100 150 200


Fitted values

15
Factors that affect AQI

From the graph, we can see that there is an increase in the variability,
which means this set of data has Heteroskedasticity problem.
To fix the problem, robust standard errors are used to relax the
assumption that errors are both independent and identically distributed. In Stata,
regression is rerun with the robust option, using the command:
reg aqi alti coastal prec temp green gdp popu, robust

Exhibit 11 shows the result.


Exhibit 11: Correcting heteroskedasticity

Linear regression Number of obs = 323


F( 7, 315) = 25.57
Prob > F = 0.0000
R-squared = 0.3586
Root MSE = 34.908
---------------------------------------------------------------------------
| Robust
aqi | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------+------------------------------------------------------------------
alti | -.0241589 .0023214 -10.41 0.000 -.0287262 -.0195916
coastal | -13.84192 5.245334 -2.64 0.009 -24.16223 -3.521597
prec | -.0307408 .0048155 -6.38 0.000 -.0402154 -.0212663
temp | -1.429899 .6778547 -2.11 0.036 -2.763594 -.0962037
green | -.1994903 .2773035 -0.72 0.472 -.7450916 .3461109
gdp | .002662 .0011476 2.32 0.021 .0004042 .0049199
popu | -.0006881 .00067 -1.03 0.305 -.0020063 .0006301
_cons | 147.0262 15.80144 9.30 0.000 115.9365 178.116
---------------------------------------------------------------------------

Note that comparing the results with the earlier regression, none of the
coefficient estimates changed, but the standard errors and hence the t values are
different, which gives reasonably more accurate p values.

16
Factors that affect AQI

VIII. Hypothesis postulated


The question of interest, in multiple regression model:
𝑎𝑞𝑖 = 𝛽0 + 𝛽1 𝑎𝑙𝑡𝑖 + 𝛽2 𝑐𝑜𝑎𝑠𝑡𝑎𝑙 + 𝛽3 𝑝𝑟𝑒𝑐 + 𝛽4 𝑡𝑒𝑚𝑝
+ 𝛽5 𝑔𝑟𝑒𝑒𝑛 + 𝛽6 𝑔𝑑𝑝 + 𝛽7 𝑝𝑜𝑝𝑢 + 𝑢
(Full model)
Which independent variables among alti, coastal, prec, temp, green, gdp,
popu contribute to explaining/ predicting aqi and which ones should be dropped
to reduce the model?
From this question, the following hypothesis is postulated:
𝐻 : 𝑇ℎ𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑥𝑖 𝑑𝑜𝑒𝑠𝑛′ 𝑡 𝑐𝑜𝑛𝑡𝑟𝑖𝑏𝑢𝑡𝑒 𝑡𝑜 𝑒𝑥𝑝𝑙𝑎𝑖𝑛𝑖𝑛𝑔 𝑎𝑞𝑖
{ 0
𝐻1 : 𝑇ℎ𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑥𝑖 𝑖𝑠 𝑢𝑠𝑒𝑓𝑢𝑙 𝑖𝑛 𝑒𝑥𝑝𝑙𝑎𝑖𝑛𝑖𝑛𝑔 𝑎𝑞𝑖
𝐻0 : 𝛽𝑖 = 0
which is expressed as: {
𝐻1 : 𝛽𝑖 ≠ 0
For this, we use p-value method: if p-value is smaller than significance
level 𝛼 = 5%, H0 is rejected.
We have:
Variables p-value Conclusion

alti 0.000 < 𝛼 = 0.05 alti has meaning in explaining aqi

coastal 0.009 < 𝛼 = 0.05 coastal has meaning in explaining aqi

prec 0.000 < 𝛼 = 0.05 prec has meaning in explaining aqi

temp 0.036 < 𝛼 = 0.05 temp has meaning in explaining aqi

green 0.472 > 𝛼 = 0.05 green has no meaning in explaining aqi

gdp 0.021 < 𝛼 = 0.05 gdp has meaning in explaining aqi

popu 0.305 > 𝛼 = 0.05 popu has no meaning in explaining aqi

As a result, there is enough evidence to reject the null hypothesis and


conclude that alti, coastal, prec, temp, gdp does have explanatory or predictive
power on aqi, meanwhile we can reduce the model by dropping green and popu
out of the model.
𝑎𝑞𝑖 = 𝛽0 + 𝛽1 𝑎𝑙𝑡𝑖 + 𝛽2 𝑐𝑜𝑎𝑠𝑡𝑎𝑙 + 𝛽3 𝑝𝑟𝑒𝑐 + 𝛽4 𝑡𝑒𝑚𝑝 + 𝛽5 𝑔𝑑𝑝 + 𝑢
(Reduced model)

17
Factors that affect AQI

IX. Result analysis & Policy implication


From data analysis in preceding sections, we have gained an overall view
of the data set given in terms of the statistical proof of the relationship between
AQI and each of the factors proposed. As mentioned at the beginning of this
report, we aim to learn how location, environment and human factors are
associated with AQI.
Following the analysis of data, regression model run and hypothesis
testing, it can be concluded that location (altitude, coastal), environment
(precipitation, temperature) and human (GDP) factors do affect or at least
statistically affect AQI. Therefore, governments, scientists, urban planners and
environmentalists should take all of these ingredients into account to decide
what to do to improve the air quality.

18
Factors that affect AQI

X. Conclusion
This report is completed on the dedicated contribution of each member
and the knowledge from our study in Econometrics. This also provides us with a
good opportunity to practice what we have learned and to get a deeper
understanding of data analysis and relevant testing. From this useful application,
we hope that our work can somehow suggest you a closer look about AQI and
truly understand it.
Again, due to the limitation of understanding and resources, our report
may contain misinterpretations. We hope that PhD. Dinh Thi Thanh Binh and
readers can give us constructive comments on the report so that we would
improve ourselves and do better in the future.

Sincerely.

19
Factors that affect AQI

XI.References
1. https://www.kaggle.com/maxwellnee/china-aqi-test
2. Basic Econometrics (Fifth Edition), by Damodar N.Gujarati, Dawn C.Porter
3. Introductory Econometrics A Modern Approach (Fifth Edition), by Jeffrey
M.Wooldridge

20

You might also like