R Assignment 8

Q1)
1. The yield y of a chemical process is a random variable whose value is

considered to be a linear function of the temperature x. The following data of
corresponding values of x and y is found:
Temperature in ◦C (x) Yield in grams (y)
First we have to create a data frame:-

> my_data<-data.frame(
+ x=c(0,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100),
+
y=c(14,38,54,76,98,102,114,128,137,145,152,175,189,221,245,289,310,354,378,40
5),
+ stringAsFactors=FALSE
+ )
> View(my_data)
a). Find the average and standard deviation for both x and y and also find the
correlation between these two variable.
mean(my_data$x)
[1] 52.25
> mean(my_data$y)
[1] 181.2
> sd(my_data$x)
[1] 30.02083
> sd(my_data$y)
[1] 115.3957
> cor(my_data$x,my_data$y)
[1] 0.9741692
b). Build the linear model that can be used to predict y if a corresponding x is
known.
model<-lm(y~x,data=my_data)
> model
Call:
lm(formula = y ~ x, data = my_data)
Coefficients:
(Intercept) x
-14.454 3.745
So we can get the formula to be y= (-14.454) + (3.745 * x)
c. Using R function show whether significant relationship between yield and
temperature is there. Also
write the null hypothesis and alternative hypothesis and interpret your results.
cor.test(my_data$x,my_data$y)
Pearson's product-moment correlation
data: my_data$x and my_data$y

t = 18.302, df = 18, p-value = 4.428e-13
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.9345038 0.9899377
sample estimates:
cor
0.9741692
Null hypothesis:- correlation is equal to zero
Alternative hypothesis:- correlation is not equal to zero
With the value of correlation we get to know that there is a relation between yield
and temperature.
d. What is the t-statistic and p-value?

t-statistic = 18.302
p- value= 0.00000004428
e. If significant predict the value of y for an unknown dataset defined by you.

new.temp<-data.frame(x=c(32,48,77))
> predict(model,newdata=new.temp)
1 2 3
105.3726 165.2856 273.8780
f. Draw the scatterplot showing regression line using ggplot.

ggplot(my_data,aes(x,y))+
+ geom_point()+
+ stat_smooth(method=lm)
g. Give the 95% confidence interval of the expected yield at a temperature of

xnew = 80 ◦C.
new.temp<-data.frame(x=c(80))
> predict(model,newdata = new.temp,interval='confidence')
fit lwr upr
(1) 285.1117 267.7778 302.4455
h. What is the upper quartile of the residuals?
quantile(my_data$y)
0% 25% 50% 75% 100%
14.0 101.0 148.5 256.0 405.0
From this the upper quartile will be 256 grams of yield
Q2)
In a study of pollution in a water stream, the concentration of pollution is
measured at different locations. The locations are at different distances to the
pollution source. In the table below, these distances and the average pollution
are given:
First we have to create a data frame:-
a. Build the linear model that can be used to predict y if a corresponding x is

known. What are the parameter estimates for the three unknown parameters
in the usual linear regression model:
1) The intercept (β0),
2) the slope (β1) and
3) error standard deviation (σ)?
From this we get that :-
β0 :- 11.40000
β1 :- -0.06364
σ :- 3.138
b. How large a part of the variation in concentration can be explained by the

distance?
The variation in concentration with the distance can be easily calculated by the
given formula:-
Variation in distance = 11.40000 + (-0.06364 * Distance)
c. What is a 90 %-confidence interval for the expected pollution concentration

7 km from the pollution source?
d. Draw the scatter plot showing prediction interval as well as regression line.
Q3)
Use the Boston dataset from MASS library in R which records medv (median
house value) for 506
neighborhoods around Boston using 13 predictors such as rm (average number
of rooms per house), age (average age of houses), and lstat (percent of
households with low socioeconomic status).
a). Fit the regression model, with medv as the response and lstat as the
predictor.
b). Find all the detailed information like coefficients, p-values and standard
errors for the coefficients, as well as the R2 statistic and F-statistic for the
model.
c. Use the predict () function to produce confidence intervals and prediction

interval for a defined
unknown value of lstat and plot it.
Prediction Plot:-
Confidence interval plot:-

R Assignment 8

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

R Assignment 8

Uploaded by

Copyright:

Available Formats

Q1)

1. The yield y of a chemical process is a random variable whose value is

First we have to create a data frame:-

Pearson's product-moment correlation

data: my_data$x and my_data$y

d. What is the t-statistic and p-value?

e. If significant predict the value of y for an unknown dataset defined by you.

f. Draw the scatterplot showing regression line using ggplot.

g. Give the 95% confidence interval of the expected yield at a temperature of

a. Build the linear model that can be used to predict y if a corresponding x is

b. How large a part of the variation in concentration can be explained by the

c. What is a 90 %-confidence interval for the expected pollution concentration

c. Use the predict () function to produce confidence intervals and prediction

You might also like