Professional Documents
Culture Documents
a). Find the average and standard deviation for both x and y and also find the
correlation between these two variable.
mean(my_data$x)
[1] 52.25
> mean(my_data$y)
[1] 181.2
> sd(my_data$x)
[1] 30.02083
> sd(my_data$y)
[1] 115.3957
> cor(my_data$x,my_data$y)
[1] 0.9741692
b). Build the linear model that can be used to predict y if a corresponding x is
known.
model<-lm(y~x,data=my_data)
> model
Call:
lm(formula = y ~ x, data = my_data)
Coefficients:
(Intercept) x
-14.454 3.745
So we can get the formula to be y= (-14.454) + (3.745 * x)
c. Using R function show whether significant relationship between yield and
temperature is there. Also
write the null hypothesis and alternative hypothesis and interpret your results.
cor.test(my_data$x,my_data$y)
Q2)
In a study of pollution in a water stream, the concentration of pollution is
measured at different locations. The locations are at different distances to the
pollution source. In the table below, these distances and the average pollution
are given:
First we have to create a data frame:-
The variation in concentration with the distance can be easily calculated by the
given formula:-
Variation in distance = 11.40000 + (-0.06364 * Distance)
Q3)
Use the Boston dataset from MASS library in R which records medv (median
house value) for 506
neighborhoods around Boston using 13 predictors such as rm (average number
of rooms per house), age (average age of houses), and lstat (percent of
households with low socioeconomic status).
a). Fit the regression model, with medv as the response and lstat as the
predictor.
b). Find all the detailed information like coefficients, p-values and standard
errors for the coefficients, as well as the R2 statistic and F-statistic for the
model.
Prediction Plot:-
Confidence interval plot:-