Professional Documents
Culture Documents
Regression in R
Jeffrey Strickland, Ph.D.
12-04-2022
L1 Regularization
Lasso regression performs 𝐿1 regularization, which adds a penalty
equal to the absolute value of the magnitude of coefficients. This
type of regularization can result in sparse models with few
coefficients; Some coefficients can become zero and eliminated from
the model. Larger penalties result in coefficient values closer to
zero, which is the ideal for producing simpler models. On the other
hand, 𝐿2 regularization (e.g., Ridge regression) does not result in
elimination of coefficients or sparse models. This makes the LASSO
far easier to interpret than the Ridge.
[1] 65.20802
Data Preprocessing
Refresh the data and Impute missing Values
For Model 5, we first refresh the data used in the Ridge Regression
model (Model 4) and impute the missing values in the train dataset.
For numerical features, we use the mean. For categorical features,
we replace NA values with the lowest category. We use the glmnet
function from the glmnet package to do this. the glmnet function
2
represents a generalized linear model (GLM) with lasso or
elasticnet regularization. As previously stated, the glmnet
function fits a generalized linear model via a penalized maximum
likelihood. The regularization path is computed for the lasso or
elasticnet penalty at a grid of values for the regularization
parameter lambda. It can deal with all shapes of data, including very
large sparse data matrices.
3
Mean: 0.0000 Mean: 0.0000 Mean: 0.0000
3rd Qu.: 0.7485 3rd Qu.: 0.9482 3rd Qu.: 0.4989
Max.: 2.0141 Max.: 2.6430 Max.: 5.2957
RSO_MRP Orbit_Establishment_Year Orbit_Height
Min. :-1.78821 Min. :-1.62365 Min. :-0.8048
1st Qu.:-0.76257 1st Qu.:-1.38111 1st Qu.:-0.8048
Median : 0.04356 Median : 0.07416 Median :-0.8048
Mean : 0.00000 Mean : 0.00000 Mean : 0.0000
3rd Qu.: 0.74644 3rd Qu.: 0.68052 3rd Qu.: 0.6311
Max. : 2.31959 Max. : 1.28688 Max. : 2.0669
Stealth_Type RSO_Type Survivability
Min. :-1.7418 Min. :-0.6509 Min. :-1.2637
1st Qu.:-0.2129 1st Qu.:-0.6509 1st Qu.:-0.7851
Median :-0.2129 Median :-0.6509 Median :-0.2237
Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
3rd Qu.:-0.2129 3rd Qu.: 0.2665 3rd Qu.: 0.5373
Max. : 4.3738 Max. : 2.1014 Max. : 6.3959
Define Lambda
We also need to define the initial value of lambda and the stop and
step values that we’ll iterate through and plot the values.
4
Figure 0-1. Scatterplot of lambda values vs index.
set.seed(567)
part <- sample(2, nrow(X), replace = TRUE, prob = c(0.7, 0.3))
X_train <- X[part == 1,]
X_cv <- X[part == 2,]
5
par(mfrow = c(1, 2))
plot(lasso_reg, lwd = 2)
plot(lasso_reg, xlim =c(-5,0), xvar = "lambda", label = TRUE,
lwd = 2)
6
par(mfrow = c(1,1))
plot(lasso_reg_cv)
Coefficients Barplot
Now, we construct a barplot of the values of the lasso regression
coefficients.
plotlabels <- c("Intercept", "Intercept", names(train[1:8]))
par(mar = c(10,4,2,1))
barplot(coef(lasso_reg_cv)[1:10],
main = "Model 1 Coefficients",
ylab = "Coefficients",
las = 2, cex =.9, cex.lab = 1, cex.main = 1.25,
cex.sub =.75, cex.axis =.75, las = 2,
col = "green2", names = plotlabels)
7
Figure 0-4. Lasso regression coefficient barplot.
8
Orbit_Height 0.01318699
Stealth_Type 0.43825862
RSO_Type -0.12284756
[1] 0.4573362
[1] 0.3512174
9
Here, we calculate the “train error”.
sum((Y_cv - predict(lasso_reg_cv, X_cv)) ^ 2)/nrow(X_cv)
[1] 0.4736844
[1] 0.6691864
[1] 0.6760167
10
# A tibble: 64 x 6
lambda estimate std.error conf.low conf.high nzero
<dbl> <dbl> <dbl> <dbl> <dbl> <int>
1 0.612 0.988 0.0222 0.966 1.01 0
2 0.558 0.926 0.0216 0.905 0.948 1
3 0.508 0.859 0.0197 0.839 0.878 2
4 0.463 0.796 0.0182 0.778 0.814 2
5 0.422 0.744 0.0168 0.727 0.761 2
6 0.385 0.701 0.0157 0.685 0.716 2
7 0.350 0.665 0.0147 0.650 0.679 2
8 0.319 0.635 0.0140 0.621 0.649 2
9 0.291 0.610 0.0133 0.597 0.623 2
10 0.265 0.590 0.0128 0.577 0.602 2
# ... with 54 more rows
# A tibble: 1 x 3
lambda.min lambda.1se nobs
<dbl> <dbl> <int>
1 0.00174 0.0376 5956
11
Figure 0-5. Scatterplot of the predicted vales vs the index.
12
Figure 0-6. MSE plot of the logarithm of regularization parameter for 𝜸 =
𝟎. 𝟎, 𝟎. 𝟐𝟓, 𝟎. 𝟓, 𝟎. 𝟕𝟓, 𝟏. 𝟎.
13
Figure 0-7. MSE plot of the logarithm of regularization parameter.
Model Evaluation
Now that we have the predictions, we evaluate the following
measures from the cvms package.
∑𝑛𝑖=1|𝑦̂𝑖 − 𝑦𝑖 |
𝑀𝑆𝐸 =
𝑛
∑𝑛𝑖=1(𝑦̂𝑖 − 𝑦𝑖 )2
𝑅𝑀𝑆𝐸 = √
𝑛
1 𝑛 𝑦̂𝑖 − 𝑦𝑖
𝑀𝐴𝑃𝐸 = ∑ | |
𝑛 𝑖=1 𝑦𝑖
∑𝑛𝑖=1|𝑦̂𝑖 − 𝑦𝑖 |
𝑅𝐴𝐸 =
∑𝑛𝑖=1|𝑦𝑖 − 𝑦̅𝑖 |
library(cvms)
print(paste("MSE =", mse(Y_cv, predict(cv.lasso_reg,
lambda = bestlam, newx = X_cv))))
print(paste("RMSE =", rmse(Y_cv, predict(cv.lasso_reg,
lambda = bestlam, newx = X_cv))))
print(paste("MAPE =", mape(Y_cv, predict(cv.lasso_reg,
14
lambda = bestlam, newx = X_cv))))
print(paste("RAE =", rae(Y_cv, predict(cv.lasso_reg,
lambda = bestlam, newx = X_cv))))
Model 2
Using cv.glmnet() we calculate prediction accuracy for the lasso
regression, by taking the 𝜆 values and creating a grid for caret to
search to obtain prediction accuracy with the train() function. We
set 𝛼 = 1 in this grid, as glmnet can actually tune over the 𝛼 = 1
parameter.
library(caret)
lasso_reg_cv <- cv.glmnet(X_train, Y_train, alpha = 1)
cv_5 = trainControl(method = "cv", number = 5)
lasso_grid = expand.grid(alpha = 1,
lambda = c(fit_cv$lambda.min, fit_cv$lambda.1se))
lasso_grid
alpha lambda
1 0.00174
2 0.04124
15
2 0.01556848 0.002472874
As we can see, the RMSE for the minimum 𝜆 is 0.67808 and is close
to what we calculated above. The value of R-square for the
minimum 𝜆 is 0.5475. Therefore, lasso model is explaining about
55% of the variance.
$s
[1] 4.1
$fraction
[1] 0.9132485
$mode
[1] "lambda"
$coefficients
(Intercept) RSO_Weight
0.00000000 0.00000000
RSO_Density RSO_Visibility
0.00000000 -0.00070929
RSO_MRP Orbit_Establishment_Year
0.35704578 -0.05081050
Orbit_Height Stealth_Type
0.01877038 0.41788910
RSO_Type
16
-0.09796110
17
Figure 0-9. Plot of the standardized coefficients vs |𝜷|/𝒎𝒂𝒙|𝜷|.
Next, we print the lasso object and observe an R-squared of 55%,
indicating that this lasso model explains 55% of the variance.
lasso_obj
Call:
lars(x = X_train, y = Y_train, type = "lasso")
R-squared: 0.549
Sequence of LASSO moves:
Stealth_Type RSO_MRP RSO_Type Orbit_Height
Var 8 5 9 7
Step 1 2 3 4
Orbit_Establishment_Year RSO_Visibility RSO_Density
Var 6 4 3
Step 5 6 7
RSO_Weight Orbit_Height Orbit_Height
Var 2 -7 7
Step 8 9 10
18
Figure 0-10. Lasso regression model with various alpha levels.
Variable Importance
One last thing to do is look at variable importance. We'll us a plot to
do this, since a visual is very powerful.
Variable Reduction
Before we make a plot of variable importance, we'll reduce the
number of variables. Recall that when lasso regression regularizes
the variable importance, fewer variables are needed to explain the
variability in the response and reduces the chance of over fitting.
V = varImp(lasso_reg, lambda = opitimal_lambda, scale = TRUE)
# Remove insignificant Overall importance values.
# Insignificant values < median value.
# Transform from numerical to logical.
V_log <- V > median(V$Overall)
19
V1_log <- V_log==TRUE
# Transform to (0,1).
V2 = V1_log-FALSE
# Transform to numerical with insignificant = 0.
V3 = V*V2
# Convert to data frame.
V4 <- as.data.frame(V3)
# Remove rows containing 0 overall values.
V5 <- V4[!(V4$Overall == 0),]
# Convert to data frame.
V5 <- as.data.frame(V5)
# Insert new column.
s <- nrow(V5)
new <- seq(s)
# Rename new column.
V5$Variables <- new
# Rename "V5" column to "Overall".
names(V5)[1] <- paste('Overall')
# Count variable reduction.
nrow(V)
nrow(V) - nrow(V5)
my_ggp + theme_light() +
theme(axis.title = element_text(size = 14)) +
theme(axis.text = element_text(size = 12)) +
theme(plot.title = element_text(size = 14)) +
theme(legend.title = element_text(size = 13)) +
theme(legend.text = element_text(size = 11))
20
Figure 0-11. Variable importance plot for the lasso regression Model 2.
print(V)
Variable Overall
Intercept 0.000939219
RSO_Density 0.000000000
RSO_Weight 0.001196625
RSO_Visibility 0.001708460
Orbit_Establishment_Year 0.154577505
Stealth_Type 0.474689714
RSO_MRP 0.389425473
RSO_Type 0.167249226
RSO_Height 0.032785041
Chapter Summary
We have seen that Lasso regression is a type of linear regression
that uses shrinkage, where shrinkage is where data values are
shrunk towards a central point, like the mean. The procedure
21
models with fewer parameters. This type of regression is well-
suited for:
• Models showing high levels of multicollinearity.
• Models where want to automate certain parts of model
selection, like variable selection/parameter elimination.
Lasso regression performs 𝐿1 regularization, which adds a penalty
to the coefficients, resulting in sparse models.
We use two R functions for lasso regression: glmnet and
cv.glmnet. We also saw that we can use the Least Angle
Regression (LARS) algorithm (lars) for high dimensional data.
References
Bergstra, J., & Bengio, Y. (2012). Random Search for Hyper-
Parameter Optimization. Journal of Machine Learning
Research, 13, 281–305.
Besliu-Ionescu, D., & Mierla, M. (2021). Geoeffectiveness prediction
of cmes. Frontiers in Astronomy and Space Sciences, 8.
Besliu-Ionescu, D., Talpeanu, D. C., Mierla, M., & Muntean, G. M.
(2019). On the prediction of geoeffectiveness of cmes during
the ascending phase of sc24 using a logistic regression
method. Journal of Atmospheric and Solar-Terrestrial Physics,
193.
22
Blustin, A. J., Band, D., Barthelmy, S., Boyd, P., Capalbi, M., Holland, S.
T., . . . Beardmore, A. (2006). Swift Panchromatic
Observations of the Bright Gamma-Ray Burst GRB 050525a.
The Astrophysical Journal, 637, 901–913. Retrieved from
https://iopscience.iop.org/article/10.1086/498425
Buonsanto, M. J. (1999). Ionospheric storms - a review. Space
Science Reviews, 88(3), 563–601.
Burrows, D. N., Romano, P., Falcone, A., Kobayashi, S., Zhang, B.,
Moretti, A., . . . Gehrels, N. (2005, Sep 16). Bright X-ray Flares
in Gamma-Ray Burst Afterglows. Science, 309(5742), 1833-
1835. doi:DOI: 10.1126/science.1116168
Chen, T. (2022, April 16). xgboost: eXtreme Gradient Boosting.
Retrieved from
http://127.0.0.1:56991/help/library/xgboost/doc/xgboost.
pdf
Chen, Y., & Yang, Y. (2021). The One Standard Error Rule for Model
Selection: Does It Work? 868-892, 4(4), 868-892.
doi:https://doi.org/10.3390/stats4040051
Claesen, M., & DeMoor, B. (2015). Hyperparameter Search in
Machine Learning. Computer and information sciences.
doi:doi.10.48550/ARXIV.1502.02127
Cole, D. G. (2003). Space weather: its effects and predictability.
Advances in Space Environment Research, 107(1/2), 295–
302.
Cole, D. G. (2003). Space weather: its effects and predictability.
Advances in Space Environment Research, 107(1/2), 295–
302.
Davidian, M., & Giltinan, D. (2003). Nonlinear models for repeated
measurement data: An overview and update. Journal of
Agricultural, Biological, and Environmental Statistics, 1-42.
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least
Angle Regression. The Annals of Statistics, 32(2), 407–499.
Retrieved from https://www.jstor.org/stable/3448465
23
EHT. (2019, April 10). Astronomers Capture First Image of a Black
Hole . Retrieved from Event Horizon Telescope:
https://eventhorizontelescope.org/press-release-april-10-
2019-astronomers-capture-first-image-black-hole
Feurer, M., & Hutter, F. (n.d.). Hyperparameter optimization. In
AutoML: Methods, Systems, Challenges (pp. 3–38). Retrieved
from
https://link.springer.com/content/pdf/10.1007%2F978-3-
030-05318-5_1.pdf
Friedman, J. H. (2001). Greedy function approximation: a gradient
boosting machine. Annals of Statistics, 1189–1232.
Friendly, M. (2002). Corrgrams: Exploratory Displays for
Correlation Matrices. The American Statistician, 56(4), 316–
324.
Gruber, M. (1998). Improving Efficiency by Shrinkage: The James--
Stein and Ridge Regression Estimators. CRC Press.
Hall, P. B., Anderson, S. F., A., S. M., York, D. G., Richards, G. T., Fan, X.,
. . . Schneider, D. P. (2002). Unusual Broad Absorption Line
Quasars from the Sloan Digital Sky Survey. The Astrophysical
Journal Supplement Series, 141(2), 267--309.
doi:doi.10.1086/340546
Hilt, D. E., & Seegrist, D. W. (1977). Ridge, a computer program for
calculating ridge regression estimates. USDA Forest Service
research note NE, 236. doi:doi:10.5962/bhl.title.68934
Irons, J. R., Dwyer, J. L., & Barsi, J. A. (2012). The next Landsat
satellite: The Landsat data continuity mission. Remote Sens
Environ, 122, 11–21.
Kennedy, J., & Eberhart, R. (1995). Particle Swarm Optimization.
Proceedings of IEEE International Conference on Neural
Networks, IV, 1942–1948.
doi:doi:10.1109/ICNN.1995.488968
Kitsionas, S., Hatziminaoglou, E., Georgakakis, A., &
Georgantopoulos, I. (2005). On the use of photometric
24
redshifts for X-ray selected AGNs. Astrophysics, 434(2), 475-
482. doi:doi.10.1051/0004-6361:20041916
Li, J., & Chen, B. (2021). Optimal Solar Zenith Angle Definition for
Combined Landsat-8 and Sentinel-2A/2B Data Angular
Normalization Using Machine Learning Methods. Remote
Sens., 13(13), 2598. Retrieved from
https://doi.org/10.3390/rs13132598
LIGO. (2016, February 11). Gravitational Waves Detected 100 Years
After Einstein's Prediction. Retrieved from LIGO Caltech:
https://www.ligo.caltech.edu/news/ligo20160211
Lindstrom, M., & Bates, D. (1990). Nonlinear Mixed Effects Models
for Repeated Measures Data. Biometrics, 46, 673-687.
Lorr, M., & Klett, C. J. (1966). Inpatient Multidimensional Psychiatric
Scale: Manual. Palo. Palo: Consulting Psychologists Press.
Mavromichalaki, H., & Paouris, E. (2017). Effective acceleration
model for the arrival time of interplanetary shocks driven by
coronal mass ejections. Solar Physics, 292(12).
Mészáros, P., & Rees, M. J. (1997). Optical and Long-Wavelength
Afterglow from Gamma-Ray Bursts. The Astrophysical
Journal, 476(1), 232-237. doi:DOI: 10.1086/303625
Mittlböck, M., & Heinzl, H. (2004). Proceedings of 1st European
workshop on the assessment of diagnostic performance, 71-8.
Möstl, C., Isavnin, A., & Boakes, P. D. (2017). Modeling observations
of solar coronal mass ejections with heliospheric imagers
verified with the heliophysics system observatory. Space
Weather, 15(7), 955–970.
NASA. (2020, September 8). What Are Black Holes? Retrieved from
Black Holes:
https://www.nasa.gov/vision/universe/starsgalaxies/black
_hole_description.html
NASA. (2021, December 22). Discoveries - Highlights | Realizing
Monster Black Holes Are Everywhere. Retrieved from Hubble
Space Telescope:
25
https://www.nasa.gov/content/discoveries-highlights-
realizing-monster-black-holes-are-everywhere
Pearson, K. (1900). On the Criterion that a given System of
Deviations from the Probable in the Case of a Correlated
System of Variables is such that it can be reasonably
supposed to have arisen from Random Sampling.
Philosophical Magazine Series 5, 50(302), 157–175.
doi:doi:10.1080/14786440009463897
Piro, L., De Pasquale, M., Soffitta, P., Lazzati, D., Amati, L., Costa1, E., .
. . Nicastro, L. (2005, April). Probing the Environment in
Gamma-Ray Bursts: The Case of an X-Ray Precursor,
Afterglow Late Onset, and Wind Versus Constant Density
Profile in GRB 011121 and GRB 011211. The Astrophysical
Journal, 623(1), 314-324. doi:DOI 10.1086/428377
Poedts, S., Lani, L., & Scolini, C. (2020). European heliospheric
forecasting information asset 2.0. Journal of Space Weather
and Space Climate, 10, 57.
Richards, G. T., Fan, X., Newberg, H. J., Strauss, M. A., Berk, D. E.,
Schneider, D., . . . Sto. (2002). Spectroscopic Target Selection
in the Sloan Digital Sky Survey: The Quasar Sample.
American Astronomical Society, 123(6), 2945--2975.
doi:doi.10.1086/340187
Richardson, I. G., & Cane, H. V. (2010). Near-earth interplanetary
coronal mass ejections during solar cycle 23 (1996 – 2009):
catalog and summary of properties. Solar Physics, 264(1),
189–237.
Sagar, C. (2017). Building Regression Models in R using Support
Vector Regression. Retrieved from KD nuggets:
https://www.kdnuggets.com/2017/03/building-
regression-models-support-vector-regression.html
Santosa, F., & Symes, W. W. (1986). Linear inversion of band-limited
reflection seismograms. SIAM Journal on Scientific and
Statistical Computing, 7(4), 1307–1330.
doi:doi:10.1137/0907087
26
Schneider, D. P., Fan, X., Hall, P. B., Jester, S., Richards, G. T.,
Stoughton, C., . . . Yanny, B. (2003). The Sloan Digital Sky
Survey Quasar Catalog. {II}. First Data Release. The
Astronomical Journal, 126(6), 2579--2593.
doi:doi.10.1086/379174
Shi, Y. -R., Chen, Y. -H., & Liu, S. -Q. (2021). Predicting the cme arrival
time based on the recommendation algorithm. Research in
Astronomy and Astrophysics, 21(8), 190.
Shi, Y., Wang, J., Chen, Y., Liu, S., Cui, Y., & Ao, X. (2022, April 22).
How scientist applied the recommendation algorithm to
anticipate CMEs' arrival times. Science & Technology.
doi:https://doi.org/10.34133/2022/9852185
Siscoe, G. L. (1975). Geomagnetic storms and substorms. Reviews of
Geophysics, 13(3), 990.
Specht, D. F. (1991, 11 01). A general regression neural network.
IEEE Transactions on Neural Networks, 2(6), 568–576.
doi:doi:10.1109/72.97934
Srivastava, N. (2005). A logistic regression model for predicting the
occurrence of intense geomagnetic storms. Annales
Geophysicae, 23(9), 2969–2974.
Stanton, J. M. (2001). Galton, Pearson, and the Peas: A Brief History
of Linear Regression for Statistics Instructors. Journal of
Statistics Education, 9(3). doi:DOI:
10.1080/10691898.2001.11910537
Stigler, S. M. (1989). Francis Galton's Account of the Invention of
Correlation. Statistical Science., 4(2), 73–79.
doi:doi:10.1214/ss/1177012580
Strickland, J. (2017). Logistic Regression Inside-Out. Lulu, Inc.
Retrieved from
https://www.lulu.com/spotlight/strickland_jeffrey
Tayo, B. O. (2018). Machine Learning: Dimensionality Reduction via
Principal Component Analysis. Towards AI. Retrieved from
https://pub.towardsai.net/machine-learning-
27
dimensionality-reduction-via-principal-component-
analysis-1bdc77462831
USGS. (2022, October 11). Landsat Collection 2 Data Dictionary.
Retrieved from USGS:
https://www.usgs.gov/centers/eros/science/landsat-
collection-2-data-dictionary#wrs_type
Vršnak, B., Žic, T., & Vrbanec, D. (2013). Propagation of
interplanetary coronal mass ejections: the drag-based
model. Solar Physics, 285(1-2), 295–315.
Wang, P., Zhang, Y., & Feng, L. (2019). A new automatic tool for cme
detection and tracking with machine-learning techniques.
The Astrophysical Journal Supplement Series, 244(1), 9.
Weinstein, M. A., Richards, G. T., Schneider, D. P., Younger, J. D.,
Strauss, M. A., Hall, P. B., . . . Brinkmann, J. (2004). An
Empirical Algorithm for Broad-band Photometric Redshifts
of Quasars from the Sloan Digital Sky Survey. The
Astrophysical Journal Supplement Series, 155(2), 243–256.
doi:DOI 10.1086/425355
Yashiro, S., Michalek, G., & Gopalswamy, N. (2008). A comparison of
coronal mass ejections identified by manual and automatic
methods. Annales Geophysicae, 26(10), 3103–3112.
Yeh, C. (1998). Modeling of strength of high-performance concrete
using artificial neural networks. Cement and Concrete
Research, 28(12).
York, D. G., Adelman, J., Anderson, J. E., Anderson, S. F., Annis, J.,
Bahcall, N. A., . . . Carey, L. (2000). The Sloan Digital Sky
Survey: Technical Summary. The Astronomical Journal,
120(3), 1579--1587. doi:doi.10.1086/301513
Zakamska, N. L., Schmidt, G. D., Smith, P. S., Strauss, M. A., Krolik, J.
H., Hall, P. B., . . . Szokoly, G. P. (2005). Candidate Type II
Quasars from the SDSS: III. Spectropolarimetry Reveals
Hidden Type I Nuclei. The Astronomical Journal, 129(3), 212-
1224. doi:doi.10.1086/427543
28
Zhang, H. K., Roy, D. P., & Kovalskyy, V. (2016). Optimal Solar
Geometry Definition for Global Long-Term Landsat Time-
Series Bidirectional Reflectance Normalization. IEEE Trans.
Geosci. Remote Sens, 54(3), 1410–1418.
29