You are on page 1of 8

Statistical Learning 452 Project 2

In this report, we applied six models to train our data and make predictions. They are respectively,
the multivariate linear regression model, the PLS, the Ridge regression, the LASSO regression
model, the single tree and the random forest.

In comparing these models, we evaluate them using the MSPE, which represents for the predicting
effectiveness of our model. If the MSPE is lower, then the fitting effectiveness is viewed as better.
See the R code line 207 listed below. We can see that, overall, the OLS regression has a MSPE
around 1.80, and the Lasso regression has a EMSPE smaller than the OLS. This is because the
Ridge and LASSO regression censors some trivial regressors which gives us a better fitting
performance and avoid the overfitting problem. However, , at line 267 from the next table in the R
code (little inconsistency since I run the Random Forest earlier than the Ridge and PLS, but does
not affect my results), we can see that the random forest algorithm has a MSPE around 1.56 which
is apparently lower than the regressions. Thus, among the models we have chosen, random forests
provide the best predicting effectiveness. Besides, the single tree algorithm shows that the X4 is
the most influential regressor, and then follows the X3 and X6, so on and so forth.

Intuitively, Lasso itself is a regression, however, put a L2 penalty on the total size of the
parameters, in which sense, we can crash down the values of some trivial parameters and keep our
model to have better predictability. However, random forest is an ensemble machine which is
consist of multiple single trees and take the mean of the results using the Bagging algorithm.

For the random forest algorithm, we tune the algorithm using the tuning parameter M and replicate
the algorithm from M=1 to M=10. Based on the results, it is shown that, when M=3, the random
forest MSPE is the lowest, equals to 1.554.

Based on the results mentioned previously, we believe that random forest is the best machine. The
predicted y running based on the test data have been appended to the csv. File.
R Codes:

You might also like