You are on page 1of 1

Alex Harutunian

Multiple Regression: Car Data


1. Explain in layman’s terms why you dropped a particular variable for statistical significance.
a. I dropped the variable “Liter” because it was not statistically significant, since its
P-Value was too large. What this means is that there was not enough correlation
between “Liter” and “Price,” most likely because “Liter” and “Cylinder” were
highly correlated.
2. Identify your chosen variable transformation, describe its effect on the data, and hypothesize
why such a transformation would make sense to transform pricing data.
a. My chosen variable transformation is the log(Price) transformation of the Price
data. I hypothesize that this is the best transformation because it minimizes
instead of maximizes the data and the outliers, making a correlation between
the dependent and independent variable more likely.
3. Report your correlation matrix and comment on Cylinder and Liter as explanatory variables.
Report your best regression.
a. Cylinder and Liter have a correlation of 0.957897 with each other, and a
0.569086 and 0.558146 correlation with price, respectively. They are sufficient
explanatory variables for price, but due to their high correlation to each other,
only one of the two should be included in a multiple regression model.
b. The best regression was the log(Price) regression with the explanatory variables
Mileage, Doors, Cylinder, Liter, Cruise, Sound, and Leather. It has an adjusted
R^2 of 0.48176186, a Standard Error of 0.128218578, and a Significance Factor of
1.1811E-110.
The regression equation is: Y = -3.23578E-06(Mileage) -0.013045897(Doors)+ 0.033597475(Cylinder)
+0.030500047(Liter)+ 0.13610044(Cruise)- 0.03905682(Sound)+ 0.051873437(Leather)

You might also like