Professional Documents
Culture Documents
The Boston data set records the median house value (medv) for 506 census tracts in Boston. There are 12 predictors for
medv, some of which are:
CRIM - per capita crime rate by town
INDUS - proportion of non-retail business acres per town.
NOX - nitric oxides concentration (parts per 10 million)
RM - average number of rooms per dwelling
For each of the four predictors above, fit a simple linear regression model to predict the response using Excel.
i) For each model, paste the line fit plot and write down the simple linear regression model. Is there a statistically
significant association between the predictor and the response?
Regression on CRIM
y = 24.0331 + -0.4152
p-value < 0.05; implies that CRIM has statistically significant association with MEDV
Regression on INDUS
y = 29.7549 + -0.6485
p-value < 0.05; implies that INDUS has statistically significant association with MEDV
Regression on NOX
y = 41.3459 + -33.9161
p-value < 0.05; implies that NOX has statistically significant association with MEDV
Regression on RM
y = -34.6706 + 9.1021
p-value < 0.05; implies that RM has statistically significant association with MEDV
ii) For each model, paste the residual plot. Comment on whether or not the observed plot is completely uniformly
random.
Residual on CRIM Regression
Residuals for CRIM do not appear to be completely uniformly random, with residuals appearing to increase as
CRIM gets larger. Majority of the data is also clustered where CRIM is lower and
not uniformly distributed.
Residual on RM Regression
Residuals for RM do appear to be mostly random, with the overall residuals appearing to have no trend. When
looking at the few datapoints with lager residual values, a negative trend can be seen, however these data
points are so few and majority of the data appears to be random.