You are on page 1of 5

Homework 3: Further Multiple Linear Regression & Tutorial in R

Q1. The following shows residual plots for some predictor variable after fitting a linear regression model. Comment on
whether the scatterplots are ideal or not, and if not ideal, what is the problem (make sure to use any formal
terminology) and how you reached that conclusion. (10 points total; two for each part)

a.

x
The residual scatter plot is not completely random and therefore not ideal, as it is exhibiting signs of heteroskedasticity,
as the variance in the error term increases as the x-axis value increases.

b.

time =
time =
The residual scatterplot appears to be mostly random, although a few outliers seem to be present (circled in red). Aside
from those outliers, there also seems to be a somewhat cyclical trend, where residuals trends go from from negative to
positive to negative through time period ~65. This indicates that there may be some correlation of the error terms. After
that, from time period ~65 – 100 the residuals have uniform variance that appears to be completely random aside from
the outlier.

c.
residual

The residual plot is clearly not random and not ideal, as there is positive trend in the plot. There is also one outlier
present (circled in red)

d. Note: the below is for a simple linear regression model

residual

The residual plot is clearly not random and not ideal, as there is positive trend in the plot.

e.

residual

The residual scatter plot appears to be completely random, and is ideal. There appears to be no presence of clear
outliers and the trend appears to be flat through the x-axis.
Q2. This exercise involves the Auto data set studied in the lab. Answer the following questions using R. For those
questions requiring coding, copy and paste your code below, in addition to answering the questions. Recall the
definitions of the recorded data columns:
mpg: miles per gallon
cylinders: Number of cylinders between 4 and 8
displacement: Engine displacement (cu. inches)
horsepower: Engine horsepower
weight: Vehicle weight (lbs.)
acceleration: Time to accelerate from 0 to 60 mph (sec.)
year: Model year (modulo 100)
origin: Origin of car (1. American, 2. European, 3. Japanese)
name: Vehicle name

a. Which of the variables are quantitative, and which are qualitative? (1 point)
Code:
Auto <- read.csv("Auto.csv", na.strings = "?", stringsAsFactors = T)
head(Auto)

All variables in the dataset are quantitative except for ‘name’ and ‘origin’, which are qualitative variables. The
variable ‘name’ is qualitative as it is a string variable that displays the make and model the vehicle. The variable
‘origin’ is also qualitative, as it is a dummy variable that represents the country of origin (1=US, 2=Europe,
3=Japan)

b. What is the mean and standard deviation of horsepower and weight? (1 point)
Code:
mean(Auto$horsepower)
104.4694
mean(Auto$weight)
2977.584
sqrt(var(Auto$horsepower))
38.49116
sqrt(var(Auto$weight))
849.4026

The means for horsepower and weight were 104.4694 and 2,966.584 respectively.
The standard deviations for horsepower and weight were 38.49116 and 849.4026 respectively.

c. Now remove the 10th through 85th observations. What is the range, mean, and standard deviation of
horsepower and weight in the subset of the data that remains? (For range, you can use the range() function) (2
points)
Code:
The range for horsepower and weight were (46 to 230) and (1613 to 5140) respectively.
The means for horsepower and weight were 104.0692 and 2,972.469 respectively.
The standard deviations for horsepower and weight were 38.17633 and 848.5121 respectively.

d. Using the full data set, create scatterplots to explore relationships between all data. (1 point)
(assuming full data set means without removing observations 10 and 85 as done the step c., the Auto dataset
used in the next code step includes all rows except for missings)
From the plots generated in R for all variables, several variables can be seen to have multicollinearity
(horsepower & displacement, weight & displacement, acceleration & displacement, horsepower & weight,
acceleration & horsepower). You can also see from the plots that displacement, horsepower, and weight all have
relatively strong relationships to MPG, though all those variables are also correlated to each other.

You might also like