You are on page 1of 31

Satellite Survivability using Lasso

Regression in R
Jeffrey Strickland, Ph.D.

12-04-2022

Satellite Survivability using Lasso Regression in R © 2022 by Jeffrey


Strickland, Ph.D. is licensed under CC BY-NC-SA 4.0
Table of Contents

LASSO REGRESSION .............................................................................................. 1


LASSO REGRESSION PRELIMINARIES ................................................................................... 1
L1 REGULARIZATION ............................................................................................................. 1
MODEL 1: LASSO REGRESSION........................................................................................... 2
DATA PREPROCESSING ........................................................................................................... 2
Refresh the data and Impute missing Values ....................................................... 2
LOAD THE DATA ...................................................................................................................... 3
STANDARDIZE THE DATA ...................................................................................................... 3
Define the Features and Response Sets ................................................................... 4
Matricize the Data ............................................................................................................ 4
Define Lambda ................................................................................................................... 4
Split the Data into Subsets ............................................................................................ 5
LASSO REGRESSION MODEL .................................................................................................. 5
Coefficients Barplot.......................................................................................................... 7
Lasso Reg & Plots .............................................................................................................. 8
MSE over log(lambda) Values .................................................................................. 12
CV Plot with Reflex = True ......................................................................................... 12
CV Polt with Reflex = False......................................................................................... 13
Model Evaluation ........................................................................................................... 14
MODEL 2 ................................................................................................................................15
MODEL 3: LASSO REGRESSION WITH LARS ....................................................................16
VARIABLE IMPORTANCE ......................................................................................................19
Variable Reduction ........................................................................................................ 19
Plot Variable Importance ........................................................................................... 20
CHAPTER SUMMARY .............................................................................................................21
TEST YOUR KNOWLEDGE ....................................................................................................22
REFERENCES.......................................................................................................... 22
Lasso Regression
LASSO (Least Absolute Shrinkage Selector Operator) is quite like
ridge, but we’ll try to understand the difference them by
implementing it in our satellite survivability problem. Let’s first
discuss some preliminaries.

Lasso Regression Preliminaries


Lasso regression is a type of linear regression that uses shrinkage.
Shrinkage is where data values are shrunk towards a central point,
like the mean. The lasso procedure encourages simple, sparse
models (i.e., models with fewer parameters). This type of regression
is well-suited for models showing high levels of multicollinearity or
when you want to automate certain parts of model selection, like
variable selection/parameter elimination.

L1 Regularization
Lasso regression performs 𝐿1 regularization, which adds a penalty
equal to the absolute value of the magnitude of coefficients. This
type of regularization can result in sparse models with few
coefficients; Some coefficients can become zero and eliminated from
the model. Larger penalties result in coefficient values closer to
zero, which is the ideal for producing simpler models. On the other
hand, 𝐿2 regularization (e.g., Ridge regression) does not result in
elimination of coefficients or sparse models. This makes the LASSO
far easier to interpret than the Ridge.

The goal of the algorithm is to minimize:


2 𝑝
𝑛

∑ (𝑦𝑖 − ∑ 𝑥(𝑖𝑗)𝛽𝑗 ) + 𝜆 ∑|𝛽𝑗 |


𝑖=1 𝑗 𝑗−1

Which is the same as minimizing the sum of squares with constraint


𝛴 |𝛽𝑗 | ≤ 𝑠. Some of the 𝛽s are shrunk to exactly zero, resulting in a
regression model that’s easier to interpret. So, the basic difference
between ridge and lasso is the way we penalize the regression.
1
Recall, our ridge penalty term was comprised of the squared root of
the sum of squared coefficients, while the LASSO penalty is the sum
of absolute values of the coefficients. No LASSO is more likely than
Ridge to perform feature selection. In other words, it is more likely
to completely zero out certain coefficients A tuning parameter, 𝜆
controls the strength of the 𝐿1 penalty. 𝜆 is basically the amount of
shrinkage:

• When 𝜆 = 0, no parameters are eliminated. The estimate is


equal to the one found with linear regression.

• As 𝜆 increases, more and more coefficients are set to zero


and eliminated (theoretically, when 𝜆 = ∞, all coefficients
are eliminated).

• As 𝜆 increases, bias increases.

• As 𝜆 decreases, variance increases.

If an intercept is included in the model, it is usually left unchanged.

So, we denote the 𝐿1 norm by ‖𝒙‖1 , where 𝒙 is the matrix of


features, and in R, we use:
library(pracma)
Norm(X,1)

[1] 65.20802

MODEL 1: Lasso Regression


So, let’s revisit the satellite survivability problem.

Data Preprocessing
Refresh the data and Impute missing Values
For Model 5, we first refresh the data used in the Ridge Regression
model (Model 4) and impute the missing values in the train dataset.
For numerical features, we use the mean. For categorical features,
we replace NA values with the lowest category. We use the glmnet
function from the glmnet package to do this. the glmnet function

2
represents a generalized linear model (GLM) with lasso or
elasticnet regularization. As previously stated, the glmnet
function fits a generalized linear model via a penalized maximum
likelihood. The regularization path is computed for the lasso or
elasticnet penalty at a grid of values for the regularization
parameter lambda. It can deal with all shapes of data, including very
large sparse data matrices.

Load the Data


library(glmnet)
train <-
read.csv("https://raw.githubusercontent.com/stricje1/Data/mast
er/survive2.csv")

Standardize the Data


train$RSO_Weight <- (train$RSO_Weight –
mean(train$RSO_Weight))/sd(train$RSO_Weight)
train$RSO_Density <- (train$RSO_Density –
mean(train$RSO_Density))/sd(train$RSO_Density)
train$RSO_Visibility <- (train$RSO_Visibility –
mean(train$RSO_Visibility))/sd(train$RSO_Visibility)
train$RSO_MRP <- (train$RSO_MRP –
mean(train$RSO_MRP))/sd(train$RSO_MRP)
train$Orbit_Establishment_Year <-
(train$Orbit_Establishment_Year –
mean(train$Orbit_Establishment_Year))/
sd(train$Orbit_Establishment_Year)
train$Orbit_Height <- (train$Orbit_Height –
mean(train$Orbit_Height))/sd(train$Orbit_Height)
train$Stealth_Type <- (train$Stealth_Type –
mean(train$Stealth_Type))/sd(train$Stealth_Type)
train$RSO_Type <- (train$RSO_Type –
mean(train$RSO_Type))/sd(train$RSO_Type)
train$Survivability <- (train$Survivability –
mean(train$Survivability))/sd(train$Survivability)
summary(train)

RSO_Identifier RSO_Weight RSO_Density RSO_Visibility


Length:8523 Min. :-1.9588 Min. :-0.7467 Min. :-1.3683
Class:character 1st Qu.:-0.8340 1st Qu.:-0.7467 1st Qu.:-
0.7629
Mode:character Median :-0.0264 Median :-0.7467 Median :-
0.1591

3
Mean: 0.0000 Mean: 0.0000 Mean: 0.0000
3rd Qu.: 0.7485 3rd Qu.: 0.9482 3rd Qu.: 0.4989
Max.: 2.0141 Max.: 2.6430 Max.: 5.2957
RSO_MRP Orbit_Establishment_Year Orbit_Height
Min. :-1.78821 Min. :-1.62365 Min. :-0.8048
1st Qu.:-0.76257 1st Qu.:-1.38111 1st Qu.:-0.8048
Median : 0.04356 Median : 0.07416 Median :-0.8048
Mean : 0.00000 Mean : 0.00000 Mean : 0.0000
3rd Qu.: 0.74644 3rd Qu.: 0.68052 3rd Qu.: 0.6311
Max. : 2.31959 Max. : 1.28688 Max. : 2.0669
Stealth_Type RSO_Type Survivability
Min. :-1.7418 Min. :-0.6509 Min. :-1.2637
1st Qu.:-0.2129 1st Qu.:-0.6509 1st Qu.:-0.7851
Median :-0.2129 Median :-0.6509 Median :-0.2237
Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
3rd Qu.:-0.2129 3rd Qu.: 0.2665 3rd Qu.: 0.5373
Max. : 4.3738 Max. : 2.1014 Max. : 6.3959

Define the Features and Response Sets


Now, that we have imputed missing values, we need to define the
set of features and the response, Y.

train <- train[c(-1)]


Y <- train[c(11)]

Matricize the Data


So, we take the response and the features a form a model matrix, X.

X <- model.matrix(Survivability~., train)

Define Lambda
We also need to define the initial value of lambda and the stop and
step values that we’ll iterate through and plot the values.

lambda <- 10^seq(10, -2, length = 100)


plot(lambda)

4
Figure 0-1. Scatterplot of lambda values vs index.

Split the Data into Subsets


Finally, we split the set into the training set and the cross-validation
set, which will complete our data preprocessing.

set.seed(567)
part <- sample(2, nrow(X), replace = TRUE, prob = c(0.7, 0.3))
X_train <- X[part == 1,]
X_cv <- X[part == 2,]

Y_train <- Y[part == 1,]


Y_cv <- Y[part == 2,]

Lasso Regression Model


We now illustrate lasso, which can be fit using glmnet() with alpha
= 1 and seeks to minimize the regularization parameter.

lasso_reg <- glmnet(X_train, Y_train, alpha = 1,


lambda = lambda)
bestlam <- lasso_reg$lambda.min

Like ridge, lasso is not scale invariant.


The two plots illustrate how much the coefficients are penalized for
different values of λ. Notice some of the coefficients are forced to be
zero.

5
par(mfrow = c(1, 2))
plot(lasso_reg, lwd = 2)
plot(lasso_reg, xlim =c(-5,0), xvar = "lambda", label = TRUE,
lwd = 2)

Figure 0-2. Coefficient plots vs the L1 Norm and the logarithm of 𝝀.


Again, to actually pick a λ, we will use cross-validation. The plot is
similar to the ridge plot. Notice along the top is the number of
features in the model. (Which changed in this plot.)
cv.glmnet() returns several details of the fit for both λ values in
the plot. Notice the penalty terms are again smaller than the full
linear regression. (As we would expect.) Some coefficients are 0.
lasso_reg_cv = cv.glmnet(X_train, Y_train, alpha = 1)
coef(lasso_reg_cv)

10 x 1 sparse Matrix of class "dgCMatrix"


s1
(Intercept) -0.00126571
(Intercept) .
RSO_Weight .
RSO_Density .
RSO_Visibility -0.01226748
RSO_MRP 0.36876628
Orbit_Establishment_Year -0.08799010
Orbit_Height 0.01318699
Stealth_Type 0.43825862
RSO_Type -0.12284756

6
par(mfrow = c(1,1))
plot(lasso_reg_cv)

Figure 0-3. MSE plot of the logarithm of the regularization parameter.

Coefficients Barplot
Now, we construct a barplot of the values of the lasso regression
coefficients.
plotlabels <- c("Intercept", "Intercept", names(train[1:8]))
par(mar = c(10,4,2,1))
barplot(coef(lasso_reg_cv)[1:10],
main = "Model 1 Coefficients",
ylab = "Coefficients",
las = 2, cex =.9, cex.lab = 1, cex.main = 1.25,
cex.sub =.75, cex.axis =.75, las = 2,
col = "green2", names = plotlabels)

7
Figure 0-4. Lasso regression coefficient barplot.

Lasso Reg & Plots


cv.glmnet() returns several details of the fit for both 𝜆 values in
the plot. Notice the penalty terms are again smaller than the full
linear regression. (As we would expect.) Some coefficients are 0.
Here, we calculate the fitted coefficients, using 1-SE rule lambda,
default behavior.
coef(lasso_reg_cv)

10 x 1 sparse Matrix of class "dgCMatrix"


s1
(Intercept) -0.00126571
(Intercept) .
RSO_Weight .
RSO_Density .
RSO_Visibility -0.01226748
RSO_MRP 0.36876628
Orbit_Establishment_Year -0.08799010

8
Orbit_Height 0.01318699
Stealth_Type 0.43825862
RSO_Type -0.12284756

Next, we calculate the fitted coefficients, using minimum lambda.


coef(lasso_reg_cv, s = "lambda.min")

10 x 1 sparse Matrix of class "dgCMatrix"


s1
(Intercept) -0.0009578853
(Intercept) .
RSO_Weight -0.0101071585
RSO_Density 0.0105583387
RSO_Visibility -0.0394406199
RSO_MRP 0.3960969645
Orbit_Establishment_Year -0.1744814324
Orbit_Height 0.0001819979
Stealth_Type 0.4853874896
RSO_Type -0.1806360048

Now, we calculate the penalty term using minimum lambda.


sum(coef(lasso_reg_cv, s = "lambda.min")[-1] ^ 2)

[1] 0.4573362

Then, we calculate the fitted coefficients, using 1-SE rule lambda.


coef(lasso_reg_cv, s = "lambda.1se")

10 x 1 sparse Matrix of class "dgCMatrix"


s1
(Intercept) -0.00126571
(Intercept) .
RSO_Weight .
RSO_Density .
RSO_Visibility -0.01226748
RSO_MRP 0.36876628
Orbit_Establishment_Year -0.08799010
Orbit_Height 0.01318699
Stealth_Type 0.43825862
RSO_Type -0.12284756

Next, we calculate the penalty term using 1-SE rule lambda.


sum(coef(lasso_reg_cv, s = "lambda.1se")[-1] ^ 2)

[1] 0.3512174

9
Here, we calculate the “train error”.
sum((Y_cv - predict(lasso_reg_cv, X_cv)) ^ 2)/nrow(X_cv)

[1] 0.4736844

Next, we calculate the the CV-RMSEs.


sqrt(lasso_reg_cv$cvm)

[1] 0.9938588 0.9624557 0.9266329 0.8921775 0.8624796


0.8370232 0.8152849
[8] 0.7967865 0.7810956 0.7678249 0.7566302 0.7472085
0.7392949 0.7326597
[15] 0.7271048 0.7224538 0.7179864 0.7130534 0.7088168
0.7051387 0.7015489
[22] 0.6983831 0.6957245 0.6928502 0.6890875 0.6859214
0.6832799 0.6810139
[29] 0.6790393 0.6773941 0.6760167 0.6748707 0.6739177
0.6731256 0.6724672
[36] 0.6719203 0.6714669 0.6710916 0.6707776 0.6705167
0.6703029 0.6701354
[43] 0.6700057 0.6699022 0.6698130 0.6697287 0.6696470
0.6695719 0.6695067
[50] 0.6694502 0.6694027 0.6693632 0.6693305 0.6693028
0.6692801 0.6692608
[57] 0.6692447 0.6692311 0.6692199 0.6692104 0.6692026
0.6691960 0.6691905
[64] 0.6691864

Next, we calculate the CV-RMSEs using minimum lambda.


sqrt(lasso_reg_cv$cvm[lasso_reg_cv$lambda ==
lasso_reg_cv$lambda.min])

[1] 0.6691864

Finally, we calculate the the CV-RMSEs using 1 se lambda.


sqrt(lasso_reg_cv$cvm[lasso_reg_cv$lambda ==
lasso_reg_cv$lambda.1se])

[1] 0.6760167

Sometimes, the output from glmnet() can be overwhelming. The


broom package can help with that.
library(broom)
tidy(lasso_reg_cv)

10
# A tibble: 64 x 6
lambda estimate std.error conf.low conf.high nzero
<dbl> <dbl> <dbl> <dbl> <dbl> <int>
1 0.612 0.988 0.0222 0.966 1.01 0
2 0.558 0.926 0.0216 0.905 0.948 1
3 0.508 0.859 0.0197 0.839 0.878 2
4 0.463 0.796 0.0182 0.778 0.814 2
5 0.422 0.744 0.0168 0.727 0.761 2
6 0.385 0.701 0.0157 0.685 0.716 2
7 0.350 0.665 0.0147 0.650 0.679 2
8 0.319 0.635 0.0140 0.621 0.649 2
9 0.291 0.610 0.0133 0.597 0.623 2
10 0.265 0.590 0.0128 0.577 0.602 2
# ... with 54 more rows

Here are the two lambda values of interest.


glance(lasso_reg_cv)

# A tibble: 1 x 3
lambda.min lambda.1se nobs
<dbl> <dbl> <int>
1 0.00174 0.0376 5956

Here, we predict and plot the predicted values using minimum


lambda
pred <- predict(lasso_reg_cv, X_cv, s = "lambda.min")
plot(pred, col = 'dodgerblue', pch = 20)

11
Figure 0-5. Scatterplot of the predicted vales vs the index.

MSE over log(lambda) Values


cv.glmnet() returns several details of the fit for 𝜆 values in the
plots.

CV Plot with Reflex = True


With reflex set to True, the plot is made using gamma values. The
defaults are used here: 𝛾 = (0.0, 0.25, 0.50, 0.75, 1.0).
cv.lasso_reg <- cv.glmnet(X_train, Y_train, alpha = 1,
nfolds = 5, type.measure = "mse", trace.it = 1,
relax = TRUE)

12
Figure 0-6. MSE plot of the logarithm of regularization parameter for 𝜸 =
𝟎. 𝟎, 𝟎. 𝟐𝟓, 𝟎. 𝟓, 𝟎. 𝟕𝟓, 𝟏. 𝟎.

CV Polt with Reflex = False


With reflex set to False, we get the same plot as Figure 0-3, and
corresponds to 𝛾 = 1 in Figure 0-6
cv2.lasso_reg <- cv.glmnet(X_train, Y_train, alpha = 1,
nfolds = 5, type.measure = "mse", trace.it = 1,
relax = FALSE)

13
Figure 0-7. MSE plot of the logarithm of regularization parameter.

Model Evaluation
Now that we have the predictions, we evaluate the following
measures from the cvms package.
∑𝑛𝑖=1|𝑦̂𝑖 − 𝑦𝑖 |
𝑀𝑆𝐸 =
𝑛

∑𝑛𝑖=1(𝑦̂𝑖 − 𝑦𝑖 )2
𝑅𝑀𝑆𝐸 = √
𝑛

1 𝑛 𝑦̂𝑖 − 𝑦𝑖
𝑀𝐴𝑃𝐸 = ∑ | |
𝑛 𝑖=1 𝑦𝑖
∑𝑛𝑖=1|𝑦̂𝑖 − 𝑦𝑖 |
𝑅𝐴𝐸 =
∑𝑛𝑖=1|𝑦𝑖 − 𝑦̅𝑖 |
library(cvms)
print(paste("MSE =", mse(Y_cv, predict(cv.lasso_reg,
lambda = bestlam, newx = X_cv))))
print(paste("RMSE =", rmse(Y_cv, predict(cv.lasso_reg,
lambda = bestlam, newx = X_cv))))
print(paste("MAPE =", mape(Y_cv, predict(cv.lasso_reg,

14
lambda = bestlam, newx = X_cv))))
print(paste("RAE =", rae(Y_cv, predict(cv.lasso_reg,
lambda = bestlam, newx = X_cv))))

[1] "MSE = 0.48588832218464"


[1] "RMSE = 0.69705690024892"
[1] "MAPE = 2.02541630625507"
[1] "RAE = 0.65203023241388"

Model 2
Using cv.glmnet() we calculate prediction accuracy for the lasso
regression, by taking the 𝜆 values and creating a grid for caret to
search to obtain prediction accuracy with the train() function. We
set 𝛼 = 1 in this grid, as glmnet can actually tune over the 𝛼 = 1
parameter.
library(caret)
lasso_reg_cv <- cv.glmnet(X_train, Y_train, alpha = 1)
cv_5 = trainControl(method = "cv", number = 5)
lasso_grid = expand.grid(alpha = 1,
lambda = c(fit_cv$lambda.min, fit_cv$lambda.1se))
lasso_grid

alpha lambda
1 0.00174
2 0.04124

The train() function in caret uses the type of variable in 𝑦 to


determine the family to use, and sine this is a regression it chooses
the family = "gaussian".
fit_lasso = train(
Survivability ~ ., data = as.data.frame(train),
method = "glmnet",
trControl = cv_5,
tuneGrid = lasso_grid
)
fit_lasso$results

alpha lambda RMSE Rsquared MAE RMSESD


1 1 0.00174 0.6729195 0.5474546 0.5029826 0.006820233
2 1 0.04124 0.6808688 0.5405118 0.5146499 0.006286171
RsquaredSD MAESD
1 0.01353016 0.003490523

15
2 0.01556848 0.002472874

As we can see, the RMSE for the minimum 𝜆 is 0.67808 and is close
to what we calculated above. The value of R-square for the
minimum 𝜆 is 0.5475. Therefore, lasso model is explaining about
55% of the variance.

Model 3: Lasso Regression with LARS


Least Angle Regression (LARS) is an algorithm used in regression
for high dimensional data (i.e., data with a large number of
attributes). Least Angle Regression (LARS) is an algorithm used in
regression for high dimensional data (i.e., data with a large number
of attributes). LARS is described in detail in Efron, Hastie, Johnstone
and Tibshirani (2004). With the "lasso" option, it computes the
complete lasso solution simultaneously for ALL values of the
shrinkage parameter in the same computational cost as a least
squares fit. We again construct a lasso regression model but with
different lambda value settings and a search for the optimal lambda.
lasso_obj <- lars(x = X_train, y = Y_train, type = "lasso")
fits <- predict.lars(lasso_obj, newx = X_cv, type = "fit")
coef4.1 <- predict(lasso_obj, s = 4.1, type = "coef",
mode = "lambda")
coef4.1

$s
[1] 4.1

$fraction
[1] 0.9132485

$mode
[1] "lambda"

$coefficients
(Intercept) RSO_Weight
0.00000000 0.00000000
RSO_Density RSO_Visibility
0.00000000 -0.00070929
RSO_MRP Orbit_Establishment_Year
0.35704578 -0.05081050
Orbit_Height Stealth_Type
0.01877038 0.41788910
RSO_Type

16
-0.09796110

plotlabels <- names(coef4.1$coefficients)


par(mar = c(10,4,2,1))
barplot(as.matrix(coef4.1$coefficients)[1:9],
main= "Model 2 Coefficients",
ylab = "Coefficients", las = 2, cex = .75,
cex.lab = .75, cex.main = 1.25, cex.sub = .75,
cex.axis = .75, las = 2,
col = "#acfffd", names = plotlabels)

Figure 0-8. Lasso regression coefficients barplot.

plot(lasso_obj, xvar = "norm", breaks = TRUE,


plottype = "coefficients", omit.zeros = F,
eps = 1e-10, lwd = 2)

17
Figure 0-9. Plot of the standardized coefficients vs |𝜷|/𝒎𝒂𝒙|𝜷|.
Next, we print the lasso object and observe an R-squared of 55%,
indicating that this lasso model explains 55% of the variance.
lasso_obj

Call:
lars(x = X_train, y = Y_train, type = "lasso")
R-squared: 0.549
Sequence of LASSO moves:
Stealth_Type RSO_MRP RSO_Type Orbit_Height
Var 8 5 9 7
Step 1 2 3 4
Orbit_Establishment_Year RSO_Visibility RSO_Density
Var 6 4 3
Step 5 6 7
RSO_Weight Orbit_Height Orbit_Height
Var 2 -7 7
Step 8 9 10

18
Figure 0-10. Lasso regression model with various alpha levels.

Variable Importance
One last thing to do is look at variable importance. We'll us a plot to
do this, since a visual is very powerful.

Variable Reduction
Before we make a plot of variable importance, we'll reduce the
number of variables. Recall that when lasso regression regularizes
the variable importance, fewer variables are needed to explain the
variability in the response and reduces the chance of over fitting.
V = varImp(lasso_reg, lambda = opitimal_lambda, scale = TRUE)
# Remove insignificant Overall importance values.
# Insignificant values < median value.
# Transform from numerical to logical.
V_log <- V > median(V$Overall)

19
V1_log <- V_log==TRUE
# Transform to (0,1).
V2 = V1_log-FALSE
# Transform to numerical with insignificant = 0.
V3 = V*V2
# Convert to data frame.
V4 <- as.data.frame(V3)
# Remove rows containing 0 overall values.
V5 <- V4[!(V4$Overall == 0),]
# Convert to data frame.
V5 <- as.data.frame(V5)
# Insert new column.
s <- nrow(V5)
new <- seq(s)
# Rename new column.
V5$Variables <- new
# Rename "V5" column to "Overall".
names(V5)[1] <- paste('Overall')
# Count variable reduction.
nrow(V)
nrow(V) - nrow(V5)

Plot Variable Importance


Now we have reduced the number of significant variables from 10
to 5. We'll plot all of them here.
my_ggp <- ggplot2::ggplot(V, aes(x = reorder(rownames(V),
Overall), y = Overall)) +
geom_point(aes(color = (factor(rownames(V)))), size = 5,
alpha = 0.6) +
geom_segment(aes(x = rownames(V), y = 0, xend = rownames(V),
yend = Overall), color = 'skyblue', size = 1.5) +
ggtitle("Variable Importance using Lasso Regression") +
guides(color = guide_legend(
title = "Important Variables")) +
xlab('') + ylab('Overall Importance') + coord_flip()

my_ggp + theme_light() +
theme(axis.title = element_text(size = 14)) +
theme(axis.text = element_text(size = 12)) +
theme(plot.title = element_text(size = 14)) +
theme(legend.title = element_text(size = 13)) +
theme(legend.text = element_text(size = 11))

20
Figure 0-11. Variable importance plot for the lasso regression Model 2.
print(V)

Variable Overall
Intercept 0.000939219
RSO_Density 0.000000000
RSO_Weight 0.001196625
RSO_Visibility 0.001708460
Orbit_Establishment_Year 0.154577505
Stealth_Type 0.474689714
RSO_MRP 0.389425473
RSO_Type 0.167249226
RSO_Height 0.032785041

Chapter Summary
We have seen that Lasso regression is a type of linear regression
that uses shrinkage, where shrinkage is where data values are
shrunk towards a central point, like the mean. The procedure

21
models with fewer parameters. This type of regression is well-
suited for:
• Models showing high levels of multicollinearity.
• Models where want to automate certain parts of model
selection, like variable selection/parameter elimination.
Lasso regression performs 𝐿1 regularization, which adds a penalty
to the coefficients, resulting in sparse models.
We use two R functions for lasso regression: glmnet and
cv.glmnet. We also saw that we can use the Least Angle
Regression (LARS) algorithm (lars) for high dimensional data.

Test Your Knowledge


1. Why do we use lasso regression?
2. What is the 𝐿1 Norm?
3. Describe 𝐿1 regularization in relation to lasso regression.
4. Do we need to impute missing values before using the lasso
procedure? I so, how do we impute missing values in R?
5. What is mean by matricize the data and how do we do it in R?
6. How can we simplify lasso regression output?
7. What performs variable reduction in lasso regression?

References
Bergstra, J., & Bengio, Y. (2012). Random Search for Hyper-
Parameter Optimization. Journal of Machine Learning
Research, 13, 281–305.
Besliu-Ionescu, D., & Mierla, M. (2021). Geoeffectiveness prediction
of cmes. Frontiers in Astronomy and Space Sciences, 8.
Besliu-Ionescu, D., Talpeanu, D. C., Mierla, M., & Muntean, G. M.
(2019). On the prediction of geoeffectiveness of cmes during
the ascending phase of sc24 using a logistic regression
method. Journal of Atmospheric and Solar-Terrestrial Physics,
193.
22
Blustin, A. J., Band, D., Barthelmy, S., Boyd, P., Capalbi, M., Holland, S.
T., . . . Beardmore, A. (2006). Swift Panchromatic
Observations of the Bright Gamma-Ray Burst GRB 050525a.
The Astrophysical Journal, 637, 901–913. Retrieved from
https://iopscience.iop.org/article/10.1086/498425
Buonsanto, M. J. (1999). Ionospheric storms - a review. Space
Science Reviews, 88(3), 563–601.
Burrows, D. N., Romano, P., Falcone, A., Kobayashi, S., Zhang, B.,
Moretti, A., . . . Gehrels, N. (2005, Sep 16). Bright X-ray Flares
in Gamma-Ray Burst Afterglows. Science, 309(5742), 1833-
1835. doi:DOI: 10.1126/science.1116168
Chen, T. (2022, April 16). xgboost: eXtreme Gradient Boosting.
Retrieved from
http://127.0.0.1:56991/help/library/xgboost/doc/xgboost.
pdf
Chen, Y., & Yang, Y. (2021). The One Standard Error Rule for Model
Selection: Does It Work? 868-892, 4(4), 868-892.
doi:https://doi.org/10.3390/stats4040051
Claesen, M., & DeMoor, B. (2015). Hyperparameter Search in
Machine Learning. Computer and information sciences.
doi:doi.10.48550/ARXIV.1502.02127
Cole, D. G. (2003). Space weather: its effects and predictability.
Advances in Space Environment Research, 107(1/2), 295–
302.
Cole, D. G. (2003). Space weather: its effects and predictability.
Advances in Space Environment Research, 107(1/2), 295–
302.
Davidian, M., & Giltinan, D. (2003). Nonlinear models for repeated
measurement data: An overview and update. Journal of
Agricultural, Biological, and Environmental Statistics, 1-42.
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least
Angle Regression. The Annals of Statistics, 32(2), 407–499.
Retrieved from https://www.jstor.org/stable/3448465

23
EHT. (2019, April 10). Astronomers Capture First Image of a Black
Hole . Retrieved from Event Horizon Telescope:
https://eventhorizontelescope.org/press-release-april-10-
2019-astronomers-capture-first-image-black-hole
Feurer, M., & Hutter, F. (n.d.). Hyperparameter optimization. In
AutoML: Methods, Systems, Challenges (pp. 3–38). Retrieved
from
https://link.springer.com/content/pdf/10.1007%2F978-3-
030-05318-5_1.pdf
Friedman, J. H. (2001). Greedy function approximation: a gradient
boosting machine. Annals of Statistics, 1189–1232.
Friendly, M. (2002). Corrgrams: Exploratory Displays for
Correlation Matrices. The American Statistician, 56(4), 316–
324.
Gruber, M. (1998). Improving Efficiency by Shrinkage: The James--
Stein and Ridge Regression Estimators. CRC Press.
Hall, P. B., Anderson, S. F., A., S. M., York, D. G., Richards, G. T., Fan, X.,
. . . Schneider, D. P. (2002). Unusual Broad Absorption Line
Quasars from the Sloan Digital Sky Survey. The Astrophysical
Journal Supplement Series, 141(2), 267--309.
doi:doi.10.1086/340546
Hilt, D. E., & Seegrist, D. W. (1977). Ridge, a computer program for
calculating ridge regression estimates. USDA Forest Service
research note NE, 236. doi:doi:10.5962/bhl.title.68934
Irons, J. R., Dwyer, J. L., & Barsi, J. A. (2012). The next Landsat
satellite: The Landsat data continuity mission. Remote Sens
Environ, 122, 11–21.
Kennedy, J., & Eberhart, R. (1995). Particle Swarm Optimization.
Proceedings of IEEE International Conference on Neural
Networks, IV, 1942–1948.
doi:doi:10.1109/ICNN.1995.488968
Kitsionas, S., Hatziminaoglou, E., Georgakakis, A., &
Georgantopoulos, I. (2005). On the use of photometric

24
redshifts for X-ray selected AGNs. Astrophysics, 434(2), 475-
482. doi:doi.10.1051/0004-6361:20041916
Li, J., & Chen, B. (2021). Optimal Solar Zenith Angle Definition for
Combined Landsat-8 and Sentinel-2A/2B Data Angular
Normalization Using Machine Learning Methods. Remote
Sens., 13(13), 2598. Retrieved from
https://doi.org/10.3390/rs13132598
LIGO. (2016, February 11). Gravitational Waves Detected 100 Years
After Einstein's Prediction. Retrieved from LIGO Caltech:
https://www.ligo.caltech.edu/news/ligo20160211
Lindstrom, M., & Bates, D. (1990). Nonlinear Mixed Effects Models
for Repeated Measures Data. Biometrics, 46, 673-687.
Lorr, M., & Klett, C. J. (1966). Inpatient Multidimensional Psychiatric
Scale: Manual. Palo. Palo: Consulting Psychologists Press.
Mavromichalaki, H., & Paouris, E. (2017). Effective acceleration
model for the arrival time of interplanetary shocks driven by
coronal mass ejections. Solar Physics, 292(12).
Mészáros, P., & Rees, M. J. (1997). Optical and Long-Wavelength
Afterglow from Gamma-Ray Bursts. The Astrophysical
Journal, 476(1), 232-237. doi:DOI: 10.1086/303625
Mittlböck, M., & Heinzl, H. (2004). Proceedings of 1st European
workshop on the assessment of diagnostic performance, 71-8.
Möstl, C., Isavnin, A., & Boakes, P. D. (2017). Modeling observations
of solar coronal mass ejections with heliospheric imagers
verified with the heliophysics system observatory. Space
Weather, 15(7), 955–970.
NASA. (2020, September 8). What Are Black Holes? Retrieved from
Black Holes:
https://www.nasa.gov/vision/universe/starsgalaxies/black
_hole_description.html
NASA. (2021, December 22). Discoveries - Highlights | Realizing
Monster Black Holes Are Everywhere. Retrieved from Hubble
Space Telescope:

25
https://www.nasa.gov/content/discoveries-highlights-
realizing-monster-black-holes-are-everywhere
Pearson, K. (1900). On the Criterion that a given System of
Deviations from the Probable in the Case of a Correlated
System of Variables is such that it can be reasonably
supposed to have arisen from Random Sampling.
Philosophical Magazine Series 5, 50(302), 157–175.
doi:doi:10.1080/14786440009463897
Piro, L., De Pasquale, M., Soffitta, P., Lazzati, D., Amati, L., Costa1, E., .
. . Nicastro, L. (2005, April). Probing the Environment in
Gamma-Ray Bursts: The Case of an X-Ray Precursor,
Afterglow Late Onset, and Wind Versus Constant Density
Profile in GRB 011121 and GRB 011211. The Astrophysical
Journal, 623(1), 314-324. doi:DOI 10.1086/428377
Poedts, S., Lani, L., & Scolini, C. (2020). European heliospheric
forecasting information asset 2.0. Journal of Space Weather
and Space Climate, 10, 57.
Richards, G. T., Fan, X., Newberg, H. J., Strauss, M. A., Berk, D. E.,
Schneider, D., . . . Sto. (2002). Spectroscopic Target Selection
in the Sloan Digital Sky Survey: The Quasar Sample.
American Astronomical Society, 123(6), 2945--2975.
doi:doi.10.1086/340187
Richardson, I. G., & Cane, H. V. (2010). Near-earth interplanetary
coronal mass ejections during solar cycle 23 (1996 – 2009):
catalog and summary of properties. Solar Physics, 264(1),
189–237.
Sagar, C. (2017). Building Regression Models in R using Support
Vector Regression. Retrieved from KD nuggets:
https://www.kdnuggets.com/2017/03/building-
regression-models-support-vector-regression.html
Santosa, F., & Symes, W. W. (1986). Linear inversion of band-limited
reflection seismograms. SIAM Journal on Scientific and
Statistical Computing, 7(4), 1307–1330.
doi:doi:10.1137/0907087

26
Schneider, D. P., Fan, X., Hall, P. B., Jester, S., Richards, G. T.,
Stoughton, C., . . . Yanny, B. (2003). The Sloan Digital Sky
Survey Quasar Catalog. {II}. First Data Release. The
Astronomical Journal, 126(6), 2579--2593.
doi:doi.10.1086/379174
Shi, Y. -R., Chen, Y. -H., & Liu, S. -Q. (2021). Predicting the cme arrival
time based on the recommendation algorithm. Research in
Astronomy and Astrophysics, 21(8), 190.
Shi, Y., Wang, J., Chen, Y., Liu, S., Cui, Y., & Ao, X. (2022, April 22).
How scientist applied the recommendation algorithm to
anticipate CMEs' arrival times. Science & Technology.
doi:https://doi.org/10.34133/2022/9852185
Siscoe, G. L. (1975). Geomagnetic storms and substorms. Reviews of
Geophysics, 13(3), 990.
Specht, D. F. (1991, 11 01). A general regression neural network.
IEEE Transactions on Neural Networks, 2(6), 568–576.
doi:doi:10.1109/72.97934
Srivastava, N. (2005). A logistic regression model for predicting the
occurrence of intense geomagnetic storms. Annales
Geophysicae, 23(9), 2969–2974.
Stanton, J. M. (2001). Galton, Pearson, and the Peas: A Brief History
of Linear Regression for Statistics Instructors. Journal of
Statistics Education, 9(3). doi:DOI:
10.1080/10691898.2001.11910537
Stigler, S. M. (1989). Francis Galton's Account of the Invention of
Correlation. Statistical Science., 4(2), 73–79.
doi:doi:10.1214/ss/1177012580
Strickland, J. (2017). Logistic Regression Inside-Out. Lulu, Inc.
Retrieved from
https://www.lulu.com/spotlight/strickland_jeffrey
Tayo, B. O. (2018). Machine Learning: Dimensionality Reduction via
Principal Component Analysis. Towards AI. Retrieved from
https://pub.towardsai.net/machine-learning-

27
dimensionality-reduction-via-principal-component-
analysis-1bdc77462831
USGS. (2022, October 11). Landsat Collection 2 Data Dictionary.
Retrieved from USGS:
https://www.usgs.gov/centers/eros/science/landsat-
collection-2-data-dictionary#wrs_type
Vršnak, B., Žic, T., & Vrbanec, D. (2013). Propagation of
interplanetary coronal mass ejections: the drag-based
model. Solar Physics, 285(1-2), 295–315.
Wang, P., Zhang, Y., & Feng, L. (2019). A new automatic tool for cme
detection and tracking with machine-learning techniques.
The Astrophysical Journal Supplement Series, 244(1), 9.
Weinstein, M. A., Richards, G. T., Schneider, D. P., Younger, J. D.,
Strauss, M. A., Hall, P. B., . . . Brinkmann, J. (2004). An
Empirical Algorithm for Broad-band Photometric Redshifts
of Quasars from the Sloan Digital Sky Survey. The
Astrophysical Journal Supplement Series, 155(2), 243–256.
doi:DOI 10.1086/425355
Yashiro, S., Michalek, G., & Gopalswamy, N. (2008). A comparison of
coronal mass ejections identified by manual and automatic
methods. Annales Geophysicae, 26(10), 3103–3112.
Yeh, C. (1998). Modeling of strength of high-performance concrete
using artificial neural networks. Cement and Concrete
Research, 28(12).
York, D. G., Adelman, J., Anderson, J. E., Anderson, S. F., Annis, J.,
Bahcall, N. A., . . . Carey, L. (2000). The Sloan Digital Sky
Survey: Technical Summary. The Astronomical Journal,
120(3), 1579--1587. doi:doi.10.1086/301513
Zakamska, N. L., Schmidt, G. D., Smith, P. S., Strauss, M. A., Krolik, J.
H., Hall, P. B., . . . Szokoly, G. P. (2005). Candidate Type II
Quasars from the SDSS: III. Spectropolarimetry Reveals
Hidden Type I Nuclei. The Astronomical Journal, 129(3), 212-
1224. doi:doi.10.1086/427543

28
Zhang, H. K., Roy, D. P., & Kovalskyy, V. (2016). Optimal Solar
Geometry Definition for Global Long-Term Landsat Time-
Series Bidirectional Reflectance Normalization. IEEE Trans.
Geosci. Remote Sens, 54(3), 1410–1418.

29

You might also like