You are on page 1of 9

Assignment 3

Group 8: Jack SiQi , Wayne ShengWei Hu

In the last work, we analyzed the relationship between electricity consumption and the average
temperature and the growth value of the water coal industry, and guessed that there is a certain
collinear relationship between the two independent variables. In this assignment, we will adjust
the regression formula for the collinear relationship between the two.

1. Data screening procedures


(1) usage data

The data we use are Taiwan's monthly electricity sales from 2017 to 2021, the year-on-year
growth rate of Taiwan's monthly electricity sales from 2017 to 2021, and Taiwan's monthly
average temperature from 2017 to 2021 (the average temperature of Kaohsiung, Taipei, Taichung,
and Hualien), and the 2017 - Taiwan's industrial growth value in 2021.

The power generation data comes from the Department of Statistics.

The temperature data comes from the "Main Temperature Data Query Page"
https://stat.motc.gov.tw/mocdb/stmain.jsp?sys=100&funid=b8101

The data on the industrial growth value of the electricity and gas supply industry comes
from the Department of Statistics, https://dmz26.moea.gov.tw/

(2) data processing

We did not find outliers during data collection.

There are no missing values in the collected data.

We define the average monthly electricity sales as AE, and the average monthly temperature data
as AT , Hydro-Coal Industry Growth Value as WG .

2. Scatter Plot for Raw Material


(1) A E and A T
Through the presentation of the scatter plot, we preliminarily guess that there is a
positive linear relationship between electricity sales and average temperature. The
higher the average temperature, the higher the electricity sales.
( 2) A E and W G

Through the presentation of the scatter plot, we preliminarily guess that the electricity
sales and W G have a heteroskedasticity relationship. As WG gradually becomes
larger, AE will become larger and larger, and the fluctuation of AE will also become
larger and larger.

3. Multicollinearity Analysis
(1) Regressive Multicollinearity Detection
a. Pearson Correlation
Correlation
The growth
Total value of
electricity average hydropower,
sales ( ten temperatur coal and power
million kWh ) e industry
Pearson Total electricity 1.000 .780 .465
correlation sales ( ten million
kWh )
average temperature .780 1.000 .321
The growth value of .465 .321 1.000
hydropower, coal and
power industry
salience (single Total electricity . .000 .000
tail) sales ( ten million
kWh )
average temperature .000 . .006
The growth value of .000 .006 .
hydropower, coal and
power industry
number of cases Total electricity 60 60 60
sales ( ten million
kWh )
average temperature 60 60 60
The growth value of 60 60 60
hydropower, coal and
power industry
Pearson correlation between AT and W G is 0.32, and the collinearity is not obvious .
b. VIF _
a
coefficient
Unstandardized standardized
coefficients coefficient Collinearity Statistics
standard
Model B error Beta Tolerance VIF
1 ( constant ) -181.235 390.470
average temperature 32.495 3.766 .704 .897 1.115
The growth value of 11.816 4.036 .239 .897 1.115
hydropower, coal and
power industry
a. Dependent variable: total electricity sales ( ten million kWh )
Through V IF we can find that V IF is 1.12< 10 , so the collinearity is not obvious.
c. Collinearity diagnosis
diagnosisa
Collinearity
Variance ratio
The growth
value of
average hydropower,
dimens Eigenvalue Condition ( constant temperatur coal and power
Model ion s indicator ) e industry
1 1 2.983 1.000 .00 .00 .00
2 .016 13.703 .02 .95 .01
3 .001 68.895 .98 .05 .99
a. Dependent variable: total electricity sales ( ten million kWh )
From the chart, we can see that both the two-dimensional and three-dimensional
condition indicators are greater than 10 , so there is still a certain collinearity problem
in the regression.

To sum up, although no collinearity problem was found in the Pearson and VIF tests,
but in the process of collinearity diagnosis, we found that two-dimensional and three-
dimensional collinearity problems cannot be ignored. Therefore, we will discuss
further on the collinearity of regression.

4. Lasso /Elastic Method selection


(1) Ridge Regression Result
Dependent Variable: AE
Method: Elastic Net Regularization
Date: 10/02/22 Time: 18:02
Sample: 2017M01 2021M12
Included observations: 60
Penalty type: Ridge (alpha = 0) *analytic
Lambda at minimum error: 33.88
Regressor transformation: None
Cross-validation method: K-Fold (number of folds = 5), rng=kn,
seed=798684039
Selection measure: Mean Squared Error

(minimum) (+1 SE) (+2 SE)


Lambda 33.88 316 552.2

Variable Coefficients

C -143.3961 148.1997 345.5108


WG 11.70978 10.48241 9.437262
AT 31.39567 24.62027 20.92083
df 2 2 2
L1 Norm 186.5015 183.3024 375.8689
R-squared 0.659498 0.627346 0.587235

( 2) LASSO Regression

Dependent Variable: AE
Method: Elastic Net Regularization
Date: 10/02/22 Time: 18:08
Sample: 2017M01 2021M12
Included observations: 60
Penalty type: Lasso (alpha = 1)
Lambda at minimum error: 0.05647
Regressor transformation: None
Cross-validation method: K-Fold (number of folds = 5), rng=kn,
seed=886243792
Selection measure: Mean Squared Error

(minimum) (+1 SE) (+2 SE)


Lambda 0.05647 116.1 184.9

Variable Coefficients

C -180.8419 627.3063 1106.191


WG 11.81235 5.170837 1.235270
AT 32.49279 27.07702 23.86779

df 2 2 2
L1 Norm 225.1470 659.5542 1131.294
R-squared 0.660103 0.618183 0.553820

( 3) Elastic Net Method

Dependent Variable: AE
Method: Elastic Net Regularization
Date: 10/02/22 Time: 18:10
Sample: 2017M01 2021M12
Included observations: 60
Penalty type: Elastic Net (alpha = 0.5)
Lambda at minimum error: 0.3449
Regressor transformation: None
Cross-validation method: K-Fold (number of folds = 5), rng=kn,
seed=1985126259
Selection measure: Mean Squared Error
(minimum) (+1 SE) (+2 SE)
Lambda 0.3449 5.621 8.155

Variable Coefficients

C -168.4865 19.68725 103.0054


WG 11.77563 11.02430 10.62623
AT 32.14264 27.60236 25.86179

df 2 2 2
L1 Norm 212.4047 58.31391 139.4934
R-squared 0.660040 0.647532 0.636673

We found that in the Elastic Net Method, R ^2 has strong explanatory power
and small fluctuations, so we choose to use this method to deal with the
collinearity problem of the regression.

5. Net Elastic Method Showcase


(1) Estimation
Dependent Variable: AE
Method: Elastic Net Regularization
Date: 10/02/22 Time: 18:10
Sample: 2017M01 2021M12
Included observations: 60
Penalty type: Elastic Net (alpha = 0.5)
Lambda at minimum error: 0.3449
Regressor transformation: None
Cross-validation method: K-Fold (number of folds = 5), rng=kn,
seed=1985126259
Selection measure: Mean Squared Error

(minimum) (+1 SE) (+2 SE)


Lambda 0.3449 5.621 8.155

Variable Coefficients

C -168.4865 19.68725 103.0054


WG 11.77563 11.02430 10.62623
AT 32.14264 27.60236 25.86179

df 2 2 2
L1 Norm 212.4047 58.31391 139.4934
R-squared 0.660040 0.647532 0.636673
From the table, we can see that the influence of WG and AT on AE is positive,
but the influence of AT is significantly higher than that of WG. The explanatory
power of R ^2 is 0.659, our model has high explanatory power.
(2) T able

obs Actual Fitted Residual Residual Plot


2017M01 1558.11 1662.53 -104.420 | * . |
2017M02 1464.27 1502.11 -37.8419 | *. |
2017M03 1619.31 1678.98 -59.6662 | *. |
2017M04 1610.21 1736.47 -126.257 | * . |
2017M05 1732.82 1916.90 -184.075 | * . |
2017M06 1760.15 1923.31 -163.156 | * . |
2017M07 1918.16 2015.63 -97.4672 | * . |
2017M08 1986.96 2033.53 -46.5701 | *. |
2017M09 2008.66 1964.69 43.9691 | .* |
2017M10 1980.06 1904.66 75.3931 | . * |
2017M11 1849.98 1757.50 92.4793 | . * |
2017M12 1679.06 1691.43 -12.3722 | * |
2018M01 1650.90 1640.00 10.9052 | * |
2018M02 1431.86 1481.99 -50.1307 | *. |
2018M03 1691.80 1742.84 -51.0311 | *. |
2018M04 1667.95 1793.43 -125.478 | * . |
2018M05 1815.37 1985.49 -170.118 | * . |
2018M06 1843.36 1948.24 -104.883 | * . |
2018M07 2000.70 1995.36 5.34731 | * |
2018M08 2028.51 1976.31 52.1990 | .* |
2018M09 1995.17 1926.63 68.5490 | .* |
2018M10 1896.06 1804.80 91.2579 | . * |
2018M11 1772.90 1718.33 54.5745 | .* |
2018M12 1682.10 1730.25 -48.1438 | *. |
2019M01 1659.84 1674.41 -14.5729 | *. |
2019M02 1457.55 1585.30 -127.746 | * . |
2019M03 1657.74 1711.04 -53.3022 | *. |
2019M04 1664.88 1818.75 -153.863 | * . |
2019M05 1798.06 1874.15 -76.0923 | * . |
2019M06 1800.52 1949.35 -148.828 | * . |
2019M07 1946.40 2009.90 -63.4993 | *. |
2019M08 2010.23 1987.49 22.7450 | .* |
2019M09 2007.42 1913.66 93.7669 | . * |
2019M10 1928.99 1893.57 35.4169 | .* |
2019M11 1791.65 1773.23 18.4198 | .* |
2019M12 1682.07 1728.62 -46.5537 | *. |
2020M01 1571.83 1662.40 -90.5670 | * . |
2020M02 1618.01 1585.39 32.6162 | .* |
2020M03 1719.37 1759.89 -40.5204 | *. |
2020M04 1641.45 1730.49 -89.0377 | * . |
2020M05 1762.30 1927.61 -165.316 | * . |
2020M06 1857.72 1990.71 -132.993 | * . |
2020M07 2087.55 2046.07 41.4782 | .* |
2020M08 2082.67 2007.48 75.1842 | . * |
2020M09 2110.41 1941.25 169.164 | . * |
2020M10 1989.88 1921.02 68.8636 | .* |
2020M11 1845.16 1824.95 20.2065 | .* |
2020M12 1733.66 1734.85 -1.19543 | * |
2021M01 1728.53 1596.16 132.363 | . * |
2021M02 1567.98 1562.30 5.68253 | * |
2021M03 1830.88 1728.61 102.264 | . * |
2021M04 1773.28 1711.24 62.0429 | .* |
2021M05 1901.90 1901.68 0.21966 | * |
2021M06 1947.84 1890.71 57.1315 | .* |
2021M07 2096.18 2008.83 87.3530 | . * |
2021M08 2109.96 1973.96 136.000 | . * |
2021M09 2141.99 1952.13 189.858 | . * |
2021M10 2148.22 1887.28 260.938 | . *|
2021M11 2001.30 1707.01 294.293 | . *
2021M12 1860.25 1675.23 185.016 | . * |

Through the observation of residuals, we believe that the mean of residuals is


basically 0, and there is no obvious autocorrelation problems such as periodicity.
It conforms to the Gauss-Markov residual assumption.
(3) Graph
We believe that the "fitted" and "actual" trends are basically in line. The independent variables
WG and AT have certain explanatory and predictive power for the dependent variable AE.

You might also like