You are on page 1of 4

Simple Linear Regression

Excell

Girth Yield
50 480
45 375
62 500
78 650
55 440
40 400
52 468
57 513
45 408
66 540

R Studio

Reg_data=read.table("clipboard",header=1)
plot(Girth~Yield,data=Reg_data,xlab="Yield",ylab="Girth",xlim=c(40,80))
plot(Reg_data)

lm1.fit=lm(Girth~Yield,data=Reg_data)
lm1.fit

summary(aov(lm1.fit))

par(mfrow=c(2,2))
plot(lm1.fit)

pred=predict(lm1.fit,interval = "predict")
pred

abline(lm1.fit,col="Blue")

ggplot(Reg_data, aes(x = Yield, y = Girth)) +


geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(x = "Yield", y = "Girth", title = "Scatter Plot with Regression Line")

R Studio Results
> Reg_data=read.table("clipboard",header=1)
> plot(Girth~Yield,data=Reg_data,xlab="Yield",ylab="Girth",xlim=c(40,80))
> plot(Reg_data)
>
> lm1.fit=lm(Girth~Yield,data=Reg_data)
> lm1.fit

Call:
lm(formula = Girth ~ Yield, data = Reg_data)

Coefficients:
(Intercept) Yield
-8.7523 0.1335

>
> summary(aov(lm1.fit))
Df Sum Sq Mean Sq F value Pr(>F)
Yield 1 1039.2 1039.2 67.71 3.56e-05 ***
Residuals 8 122.8 15.3
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> par(mfrow=c(2,2))
> plot(lm1.fit)
>
> pred=predict(lm1.fit,interval = "predict")
Warning message:
In predict.lm(lm1.fit, interval = "predict") :
predictions on current data refer to _future_ responses

> pred
fit lwr upr
1 55.34721 45.87153 64.82288
2 41.32544 31.10463 51.54625
3 58.01802 48.50517 67.53087
4 78.04911 66.58164 89.51659
5 50.00558 40.42758 59.58358
6 44.66396 34.75591 54.57200
7 53.74472 44.26301 63.22642
8 59.75405 50.18566 69.32243
9 45.73228 35.90759 55.55697
10 63.35964 53.59914 73.12015
>
> abline(lm1.fit,col="Blue")
>
> ggplot(Reg_data, aes(x = Yield, y = Girth)) +
+ geom_point() +
+ geom_smooth(method = "lm", se = FALSE, color = "red") +
+ labs(x = "Yield", y = "Girth", title = "Scatter Plot with Regression
Line")
`geom_smooth()` using formula = 'y ~ x

Interpretation
Standardized residuals
Residuals vs Fitted Q-Q Residuals

1.5
6

5 5
Residuals

0.0
-6 -2

-1.5
6 1 6
1

40 50 60 70 -1.5 -0.5 0.5 1.5

Fitted values Theoretical Quantiles


Standardized residuals

Standardized residuals

Scale-Location Residuals vs Leverage


1.2

1.5
6 5 1
5
2
0.5
-1.5 0.0
0.6

1
Cook's distance
0.0

40 50 60 70 0.0 0.2 0.4 0.6

Fitted values Leverage

According to the residual analysis, there is no reason for reject the model, so this is not broken the
underline the assumption.

> summary(aov(lm1.fit))
Df Sum Sq Mean Sq F value Pr(>F)
Yield 1 1039.2 1039.2 67.71 3.56e-05 ***
Residuals 8 122.8 15.3
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

According to this results value of intercept and p value of Yield are less than 0.05 therefore Yield variable
significant.

Therefore, the regression model is as follow,

You might also like