You are on page 1of 26

3/6/2017 Homework__3.

html

Problem 1
Consider two curves, g1 and g2, where g^(m) represents the mth derivative of g.

a. As > will g1 or g2 have the smaller training RSS?

g2 will have the smaller training RSS because its a higher order polynomial and is therefore more likely to capture
more of the data due to its higher DoF value.

b. As > will g1 or g2 have the smaller test RSS?

g1 will have the smaller test RSS because g2 is more likely to overt the data with the extra degree of freedom.

c. For = 0, will g1 or g2 have the smaller training and test RSS?

When = 0, the penalty function will cancel out, and because the loss function is the same in g1 and g2, they will
have the same training and test RSS.

Problem 2
Suppose that we carry out backward stepwise, forward stepwise, and best subset all on the same data set. Each
approach will yield a sequence of models with k = 0 up through k = p predictors.

a. Which approach with k predictors will have the smallest test residual sum of squares? Explain.

Forward Stepwise Selection

While best subset selection is able to lter through all potential models, it is less likely to nd a model to t the
test data because there is concern of overtting the data, especially as p increases. Forward and backwise step
selection evaluate fewer models, making it less likely to overt the data. However, when looking at the number of
p parameters, if p > n, only forward stepwise selection is the viable model able to provide the most accurate test

b. Which approach with k predictors will have the smallest training residual sum of squares? Explain.

Best Subset Selection

With a larger search space, we are more likely to nd a model that looks good on training data because it is able
to lter through all p^2 model options, unlike forward and backward stepwise selection methods, which only lter
through 1+p(p+1)/2 models.

c.True or False: i. The predictors in the k-variable model identied by forward stepwise are a subset of the
predictors in the (k+1)-variable model identied by backward stepwise selection.

False. Predictors defined by a forward stepwise model are not necessarily the same ones
identified by backward stepwise because these models do not evaluate all possible option
s.

ii. The predictors in the k-variable model identied by backward stepwise are a subset of the predictors in the
(k+1)-variable model identied by forward stepwise selection.

False. Predictors dened by a backward stepwise model are not necessarily the same ones identied by
forward stepwise because these models do not evaluate all possible options.

iii. The predictors in the k-variable model identied by best subset are a subset of the predictors in the (k+1)-
variable model identied by best subset selection.
le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__3.html 1/26
3/6/2017 Homework__3.html

False. The k and k+1 variable models are evaluated independently of one another, so it is impossible to
determine fully that a k variable model is a subset of a larger best subset model.

iv. The predictors in the k-variable model identied by backward stepwise are a subset of the predictors in the
(k+1)-variable model identied by backward stepwise selection.

True. The model contains all but one feature in the (k+1) variable model, minus the feature resulting in the
smallest overall benet in RSS.

v. The predictors in the k-variable model identied by forward stepwise are a subset of the predictors in the
(k+1)-variable model identied by forward stepwise selection.

True. The k+1 variable model contains all chosen k features, plus the best overall feature.

Problem 3: Christine Yeh, Gerardo Sanz

This question uses the variables dis (the weighted mean of distances to ve Boston employment centers) and nox
(nitrogen oxides concentration in parts per 10 million) from the Boston data (in the MASS library). We will treat dis
as the predictor and nox as the response.

a. Use the poly() function to t a cubic polynomial regression to predict nox using dis. Report the regression
output, and plot the resulting data and polynomial ts.

require(MASS)

library(splines)
attach(Boston)

polyfit = lm(nox ~ poly(dis, 3), data=Boston)

summary(polyfit)

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__3.html 2/26
3/6/2017 Homework__3.html

##
## Call:
## lm(formula = nox ~ poly(dis, 3), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.121130 -0.040619 -0.009738 0.023385 0.194904
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.554695 0.002759 201.021 < 2e-16 ***
## poly(dis, 3)1 -2.003096 0.062071 -32.271 < 2e-16 ***
## poly(dis, 3)2 0.856330 0.062071 13.796 < 2e-16 ***
## poly(dis, 3)3 -0.318049 0.062071 -5.124 4.27e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.06207 on 502 degrees of freedom
## Multiple R-squared: 0.7148, Adjusted R-squared: 0.7131
## F-statistic: 419.3 on 3 and 502 DF, p-value: < 2.2e-16

#create range of axis for our line to follow

dislims = range(dis)
#create grid of x-axis points for which we want to predict
dis.grid = seq(from=dislims, to=dislims)
#predict values for each of the points
polypreds = predict(polyfit,newdata = list(dis=dis.grid), se=TRUE)

require(ggplot2)

ggplot(Boston, aes(x = dis, y = nox)) + geom_point() + stat_function(fun = function(x)

0.9341281 + -0.1820817*x^1 + 0.0219277*x^2+ -0.0008850*x^3)

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__3.html 3/26
3/6/2017 Homework__3.html

Based on the results above, each of the linear(1), quadratic(2), and cubic(3) coecients are signicant to our
output.

b. Plot the polynomial ts for a range of dierent polynomial degrees (say, from 1 to 10), and report the
associated residual sum of squares.

rss = rep(NA, 10)

for(i in 1:10){
polyfit = lm(nox ~ poly(dis, i), data=Boston)
}
plot(1:10, rss, xlab = "Degree", ylab = "RSS", type = "b")

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__3.html 4/26
3/6/2017 Homework__3.html

##  1.832171

Based on the plot, RSS decreases as the polynomial degree increases, as expected. Therefore, we see a
minimum RSS at degree 10 of 1.832171.

c. Perform cross-validation or another approach to select the optimal degree for the polynomial, and explain

require(boot)

set.seed(36)
cv.error = rep(0,10)
for (i in 1:10){
polyfit = glm(nox ~ poly(dis, i), data=Boston)
cv.error[i] = cv.glm(Boston, K=10, polyfit)\$delta #delta = estimated test MSE, valu
e 2 considers LOOCV in estimation
}
plot(1:10, cv.error, xlab = "Degree", ylab = "Test MSE", type = "b", col="darkgreen")

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__3.html 5/26
3/6/2017 Homework__3.html

Looking at our test MSE curve, we see the traditional U-shape occur with it bottoming out at the 3rd degree, and
as we increase the degree of polynomial, we notice our model starts likely overtting our data, which is why we
see peaks in test MSE at the 7th and 9th degrees. I used 10-fold cross validation (K=10) to minimize
computational time.

d. Use the bs() function to t a regression spline to predict nox using dis. Report the output for the t using
four degrees of freedom. How did you choose the knots? Plot the resulting t.

library(splines)
bs.fit = lm(nox ~ bs(dis, knots = c(6)), degree=3, data=Boston)

## Warning: In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :

## extra argument 'degree' will be disregarded

summary(bs.fit)

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__3.html 6/26
3/6/2017 Homework__3.html

##
## Call:
## lm(formula = nox ~ bs(dis, knots = c(6)), data = Boston, degree = 3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.12387 -0.04012 -0.01033 0.02308 0.19446
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.76037 0.01018 74.667 < 2e-16 ***
## bs(dis, knots = c(6))1 -0.23672 0.02321 -10.200 < 2e-16 ***
## bs(dis, knots = c(6))2 -0.36177 0.02548 -14.200 < 2e-16 ***
## bs(dis, knots = c(6))3 -0.33337 0.04044 -8.244 1.47e-15 ***
## bs(dis, knots = c(6))4 -0.36220 0.05105 -7.095 4.45e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.06208 on 501 degrees of freedom
## Multiple R-squared: 0.7152, Adjusted R-squared: 0.7129
## F-statistic: 314.6 on 4 and 501 DF, p-value: < 2.2e-16

bs.pred = predict(bs.fit, newdata=list(dis=dis.grid), se=TRUE)

bs.pred2 = cbind(bs.pred\$bs.fit)

plot(dis, nox, col = "darkblue")

lines(dis.grid, predict(bs.fit,list(dis=dis.grid)), col="darkblue", lwd=2)
abline(v=c(6), lty=2, col="darkblue")

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__3.html 7/26
3/6/2017 Homework__3.html

The goal in choosing knots is for all terms to be signicant. So, in order to eectively select values for our knot,
we want to select inputs that give each term importance, by referring to the coecient and t-statistics. I chose
degree 3 (because of our CV error results), resulting in the above t curve.

e. Now t a regression spline for a range of degrees of freedom, and plot the resulting ts and report the
resulting RSS. Describe the results obtained.

rss = rep(NA, 10)

for(i in 1:10){
splineFit = lm(nox ~ bs(dis, knots=c(6), degree=i), data=Boston)
plot(dis,nox,col="darkgrey")
lines(dis.grid,predict(splineFit,list(dis=dis.grid)),col="darkblue",lwd=2)
abline(v=c(6),lty=2,col="darkgreen")
}

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__3.html 8/26
3/6/2017 Homework__3.html

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__3.html 9/26
3/6/2017 Homework__3.html

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__3.html 10/26
3/6/2017 Homework__3.html

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__3.html 11/26
3/6/2017 Homework__3.html

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__3.html 12/26
3/6/2017 Homework__3.html

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__3.html 13/26
3/6/2017 Homework__3.html

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__3.html 14/26
3/6/2017 Homework__3.html

Referring to the above chart, we see our plots get smootherand then bumpier as a result of the models
increased exibility. As mentioned in lecture, splines create more unstable plots at the tails of our graphs -
especially when the degree of freedom increases.

f. Perform cross-validation or another approach in order to select the best degrees of freedom for a
regression spline. Describe your results.

set.seed(36)
reg.cv.error = rep(NA,10)
for (i in 1:10){
regPolyfit = glm(nox ~ bs(dis, knots= c(6), degree = i), data=Boston)
reg.cv.error[i] = cv.glm(Boston, K=10, regPolyfit)\$delta
}

## Warning in bs(dis, degree = 1L, knots = 6, Boundary.knots = c(1.1296,

## 10.7103: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 1L, knots = 6, Boundary.knots = c(1.1296,

## 10.7103: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 1L, knots = 6, Boundary.knots = c(1.1691,

## 12.1265: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 1L, knots = 6, Boundary.knots = c(1.1691,

## 12.1265: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 2L, knots = 6, Boundary.knots = c(1.1691,

## 12.1265: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 2L, knots = 6, Boundary.knots = c(1.1691,

## 12.1265: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 2L, knots = 6, Boundary.knots = c(1.1296,

## 10.7103: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 2L, knots = 6, Boundary.knots = c(1.1296,

## 10.7103: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__3.html 15/26
3/6/2017 Homework__3.html

## Warning in bs(dis, degree = 3L, knots = 6, Boundary.knots = c(1.1296,

## 10.7103: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 3L, knots = 6, Boundary.knots = c(1.1296,

## 10.7103: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 3L, knots = 6, Boundary.knots = c(1.137,

## 12.1265: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 3L, knots = 6, Boundary.knots = c(1.137,

## 12.1265: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 4L, knots = 6, Boundary.knots = c(1.1296,

## 10.7103: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 4L, knots = 6, Boundary.knots = c(1.1296,

## 10.7103: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 4L, knots = 6, Boundary.knots = c(1.137,

## 12.1265: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 4L, knots = 6, Boundary.knots = c(1.137,

## 12.1265: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 5L, knots = 6, Boundary.knots = c(1.137,

## 12.1265: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 5L, knots = 6, Boundary.knots = c(1.137,

## 12.1265: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 5L, knots = 6, Boundary.knots = c(1.1296,

## 10.7103: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 5L, knots = 6, Boundary.knots = c(1.1296,

## 10.7103: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__3.html 16/26
3/6/2017 Homework__3.html

## Warning in bs(dis, degree = 6L, knots = 6, Boundary.knots = c(1.137,

## 12.1265: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 6L, knots = 6, Boundary.knots = c(1.137,

## 12.1265: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 6L, knots = 6, Boundary.knots = c(1.1296,

## 10.7103: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 6L, knots = 6, Boundary.knots = c(1.1296,

## 10.7103: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 7L, knots = 6, Boundary.knots = c(1.1296,

## 10.7103: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 7L, knots = 6, Boundary.knots = c(1.1296,

## 10.7103: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 7L, knots = 6, Boundary.knots = c(1.1691,

## 12.1265: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 7L, knots = 6, Boundary.knots = c(1.1691,

## 12.1265: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 8L, knots = 6, Boundary.knots = c(1.137,

## 10.7103: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 8L, knots = 6, Boundary.knots = c(1.137,

## 10.7103: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 9L, knots = 6, Boundary.knots = c(1.137,

## 10.7103: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 9L, knots = 6, Boundary.knots = c(1.137,

## 10.7103: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__3.html 17/26
3/6/2017 Homework__3.html

## Warning in bs(dis, degree = 10L, knots = 6, Boundary.knots = c(1.1296,

## 10.7103: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 10L, knots = 6, Boundary.knots = c(1.1296,

## 10.7103: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 10L, knots = 6, Boundary.knots = c(1.137,

## 12.1265: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

## Warning in bs(dis, degree = 10L, knots = 6, Boundary.knots = c(1.137,

## 12.1265: some 'x' values beyond boundary knots may cause ill-conditioned
## bases

plot(1:10, reg.cv.error, xlab = "Degree", ylab = "Test MSE", type = "b", col="darkred")

reg.cv.error

##  0.004429047

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__3.html 18/26
3/6/2017 Homework__3.html

Looking at our test MSE plot, we see the test MSE value slowly increases as degree increases, so we can
conclude that 1 degree is the best option to t our data. The minimum test MSE, at 1 degree, is 0.004429047.

Problem 4: Christine Yeh, Gerardo Sanz

This problem works with the body dataset, which you can download from the homework folder on the class
website. The goal of this problem is to perform and compare Principal Components Regression and Partial Least
Squares on the problem of trying to predict someones weight.

a. Read the body dataset into R using the load() function. This dataset contains: X - A dataframe containing
21 dierent types of measurements on the human body and Y - A dataframe that contains the age, weight
(kg), height (cm), and the gender of each person in the sample. Lets say we forgot how the gender is
coded in this dataset. Using a simple visualization, explain how you can tell which gender is which.

genderCode = as.factor(Y\$Gender)

par(mfrow = c(3,1))
plot(Y\$Weight, Y\$Gender, col = "darkblue")
plot(X\$Bicep.Girth, Y\$Gender, col = "darkgreen")
plot(X\$Forearm.Girth, Y\$Gender, col = "darkgrey")

The above plots analyze weight, chest girth, bicep girth, and forearm girth versus gender to allow us to intitively
gure out whether or not 0 or 1 is male. We can assume men are likely to be heavier and have girthier features
than women. So, we can assume that 1 is coded as male.

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__3.html 19/26
3/6/2017 Homework__3.html

b. Reserve 200 observations from your dataset to act as a test set and use the remaining 307 as a training
set. On the training set, use both pcr and plsr to t models to predict a persons weight based on the
variables in X. Use the options scale = TRUE and validation=CV. Why does it make sense to scale our
variables in this case?

set.seed(36)
testing = sort(sample(1:nrow(X), 200))
training = (1:nrow(X))[-testing]
library(pls)

##
## Attaching package: 'pls'

##

pcrFit = pcr(Y\$Weight ~ ., data=X, subset=training, scale=TRUE, validation="CV")

plsrFit = plsr(Y\$Weight ~., data=X, subset=training, scale=TRUE, validation="CV")

We want to scale our variables to improve stability in our analysis. It is much easier to compare predictors when
they are on the same scale (e.g., comparing mm to mm vs.cm to mm). Additionally, we want to use cross validate
our variables because this process takes place within the PCR/PSLR model tting. This is therefore cross
validating our choice of model on the 307 training observations. This will prevent us from overtting the data.

c. Run summary() on each of the objects calculated above, and compare the training % variance explained
from the pcr output to the plsr output. Do you notice any consistent patterns (in comparing the two)? Is that
pattern surprising? Explain why or why not.

summary(pcrFit)

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__3.html 20/26
3/6/2017 Homework__3.html

## Data: X dimension: 307 21

## Y dimension: 307 1
## Fit method: svdpc
## Number of components considered: 21
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 12.95 3.370 3.193 2.976 2.940 2.936 2.936
## adjCV 12.95 3.369 3.191 2.974 2.937 2.933 2.932
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 2.915 2.895 2.892 2.896 2.913 2.925 2.924
## adjCV 2.911 2.888 2.886 2.888 2.906 2.916 2.916
## 14 comps 15 comps 16 comps 17 comps 18 comps 19 comps
## CV 2.943 2.909 2.861 2.850 2.821 2.832
## adjCV 2.934 2.895 2.850 2.815 2.809 2.820
## 20 comps 21 comps
## CV 2.842 2.843
## adjCV 2.829 2.831
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps
## X 62.49 74.20 79.27 83.69 86.21 88.29 89.98
## Y\$Weight 93.23 93.97 94.81 94.99 95.07 95.09 95.15
## 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## X 91.41 92.66 93.81 94.92 95.87 96.75
## Y\$Weight 95.26 95.31 95.35 95.35 95.39 95.42
## 14 comps 15 comps 16 comps 17 comps 18 comps 19 comps
## X 97.53 98.06 98.53 98.93 99.32 99.59
## Y\$Weight 95.42 95.66 95.76 95.92 95.92 95.93
## 20 comps 21 comps
## X 99.82 100.00
## Y\$Weight 95.93 95.94

summary(plsrFit)

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__3.html 21/26
3/6/2017 Homework__3.html

## Data: X dimension: 307 21

## Y dimension: 307 1
## Fit method: kernelpls
## Number of components considered: 21
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 12.95 3.273 2.956 2.859 2.843 2.811 2.807
## adjCV 12.95 3.272 2.954 2.855 2.832 2.802 2.796
## 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## CV 2.802 2.80 2.801 2.804 2.805 2.804 2.805
## adjCV 2.792 2.79 2.791 2.793 2.794 2.793 2.794
## 14 comps 15 comps 16 comps 17 comps 18 comps 19 comps
## CV 2.805 2.804 2.804 2.804 2.804 2.804
## adjCV 2.794 2.794 2.794 2.794 2.794 2.794
## 20 comps 21 comps
## CV 2.804 2.804
## adjCV 2.794 2.794
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps
## X 62.48 72.47 78.75 80.70 83.45 86.13 87.99
## Y\$Weight 93.67 94.99 95.43 95.77 95.87 95.92 95.94
## 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps
## X 89.31 90.48 91.65 92.79 93.58 94.61
## Y\$Weight 95.94 95.94 95.94 95.94 95.94 95.94
## 14 comps 15 comps 16 comps 17 comps 18 comps 19 comps
## X 95.37 96.13 96.81 97.47 98.07 98.81
## Y\$Weight 95.94 95.94 95.94 95.94 95.94 95.94
## 20 comps 21 comps
## X 99.61 100.00
## Y\$Weight 95.94 95.94

Each of these models has a similar training percent of variance explained in the data. Although each of these
methods are so dierent (PLSR is a supervised learning method, PCR is unsupervised), this is not a surprising
result. Each of these methods are used to model a response variable under a large p value, especially if the p
predictors are highly correlated. PCR creates linear combinations of the original set of predictors without
consideration for the response variable. PLSR however does consider the response variable, which is why it
typically has fewer linear combinations. Despite their dierences, both of these approaches create linear
combinations of our original set of predictors, and so their similarity in results is unsurprising.

d. For each of the models, pick a number of components that you would use to predict future values of weight
from X. Please include any further analysis you use to decide on the number of components.

par(mfrow = c(1,2), ask = FALSE)

validationplot(pcrFit, val.type = "RMSEP", type = "b", main = "PCR Fit")
validationplot(plsrFit, val.type = "RMSEP", type = "b", main = "PLSR Fit")

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__3.html 22/26
3/6/2017 Homework__3.html

Using validationplot through the pls library, we were able to nd that a signicant amount of the RMSE drops o
after just adding 1 component, so well move forward in our analysis of the body dataset with one component.

e. Practically speaking, it might be nice if we could guess a persons weight without measuring 21 dierent
quantities. Do either of the methods performed above allow us to do that? If not, pick another method that
will, and t it on the training data.

Yeah, so if we want to reduce the number of predictors, aka simplify the model via feature selection, the lasso
seems like a good option!

library(ISLR)
library(glmnet)

## Loaded glmnet 2.0-5

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__3.html 23/26
3/6/2017 Homework__3.html

grid = 10^seq(10, -2, length=100)

lassoX = scale(model.matrix(Y\$Weight ~ ., data = X)[, -1])
lassoY = Y\$Weight
lassoFit = glmnet(lassoX[training,], lassoY[training], alpha = 1, lambda = grid)
plot(lassoFit)

set.seed(36)
lassoCV = cv.glmnet(lassoX[training,], lassoY[training], alpha = 1)
bestLambda = lassoCV\$lambda.1se
predict(lassoFit, type = "coefficients", s = bestLambda)

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__3.html 24/26
3/6/2017 Homework__3.html

## 22 x 1 sparse Matrix of class "dgCMatrix"

## 1
## (Intercept) 69.00221166
## Wrist.Diam 0.04076614
## Wrist.Girth .
## Forearm.Girth 1.35906177
## Elbow.Diam 0.54778799
## Bicep.Girth .
## Shoulder.Girth 1.62958068
## Biacromial.Diam 0.42140884
## Chest.Depth 0.47952509
## Chest.Diam 0.21704888
## Chest.Girth 1.39778183
## Navel.Girth .
## Waist.Girth 3.26613218
## Bitrochanteric.Diam .
## Hip.Girth 1.45081886
## Thigh.Girth 0.82597085
## Knee.Diam 0.32318685
## Knee.Girth 1.13763065
## Calf.Girth 0.84002676
## Ankle.Diam 0.44355954
## Ankle.Girth 0.19631757

bestLambda

##  0.5396431

Using the lasso method, weve eliminated 4 of the 21 predictors!

f. Compare all 3 methods in terms of performance on the test set. Keep in mind that you should only run one
version of each model on the test set. Any necessary selection of parameters should be done only with the
training set.

pcrPredict = predict(pcrFit, X[testing,])

mean((pcrPredict - lassoY[testing])^2)

##  8.562787

plsrPredict = predict(plsrFit, X[testing,])

mean((plsrPredict - lassoY[testing])^2)

##  7.952771

#lasso with 1-standard error lambda

lassoPredict = predict(lassoFit, s = bestLambda, newx = lassoX[testing,])
mean((lassoPredict - lassoY[testing])^2)

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__3.html 25/26
3/6/2017 Homework__3.html

##  8.141433

These results show that if we are employing the 1 standard error rule, we nd that PLSR has the lowest test MSE.

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__3.html 26/26