You are on page 1of 20

Assignment-1(Applied Regression Analysis)

Shashwat Patel (MM19B053)

2023-02-16

Import Libraries

library('MPV')
library(ggplot2)
library(ggthemes)

Plot theme

# Setting up a theme for the plots


my_theme<-theme_fivethirtyeight()+theme(plot.title = element_text(hjust = 0.5,size=20),
axis.title = element_text(size=20),
axis.text = element_text(size=14),
plot.subtitle = element_text(hjust=0.5),
legend.position = "top",
legend.title = element_blank(),
legend.text = element_text(size=14))

Q 2.14

Data

q1_data<-p2.14 #Getting data from MPV package


q1_data

## ratio visc
## 1 1.0 0.45
## 2 0.9 0.20
## 3 0.8 0.34
## 4 0.7 0.58
## 5 0.6 0.70
## 6 0.5 0.57
## 7 0.4 0.55
## 8 0.3 0.44

1
a) Scatter-Plot

ggplot(q1_data)+geom_point(aes(ratio,visc),color='blue',size=2.5)+
scale_x_continuous(breaks=c(0,0.2,0.4,0.6,0.8,1),limits=c(0.2,1))+
scale_y_continuous(limits = c(0,0.8))+
labs(x='Ratio',y='Viscosity',title='Scatterplot')+
my_theme

Scatterplot
0.8

0.6
Viscosity

0.4

0.2

0.0
0.2 0.4 0.6 0.8 1.0
Ratio

b) Prediction Equation

Fitting a linear model between ratio (x) and viscosity (y)

q1_model<-lm(visc~.,data=q1_data) # Linear model fir between Viscosity and Ratio


q1_model

##
## Call:
## lm(formula = visc ~ ., data = q1_data)
##
## Coefficients:
## (Intercept) ratio
## 0.6714 -0.2964

2
Prediction equations is:
yˆi = 0.6714 − 0.2964xi

c) Analysis

summary(q1_model) ## Gives summary of the model

##
## Call:
## lm(formula = visc ~ ., data = q1_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.20464 -0.10634 0.02196 0.08527 0.20643
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.6714 0.1595 4.209 0.00563 **
## ratio -0.2964 0.2314 -1.281 0.24754
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.15 on 6 degrees of freedom
## Multiple R-squared: 0.2147, Adjusted R-squared: 0.08382
## F-statistic: 1.64 on 1 and 6 DF, p-value: 0.2475

The R2 score is 0.2147, showing that the fit is not good.

Hypothesis testing of the intercept and ratio with 5% significance level

For ratio, t0 = −1.281 and tα/2,6 = 2.447.


So −tα/2,6 < to < tα/2,6 , i.e null hypothesis is not rejected. It means that there might be no linear relationship
between ratio and viscosity.
For intercept, t0 = 4.209 and and tα/2,6 = 2.447.
so to > tα/2,6 , i.e null hypothesis is rejected.

95% Confidence interval for intercept and ratio

confint(q1_model) ## Gives 95% confidence interval for both intercept and regressor (x)

## 2.5 % 97.5 %
## (Intercept) 0.2811246 1.061733
## ratio -0.8627412 0.269884

ANOVA (For testing Significance of Regression)

3
anova(q1_model) ## Gives the ANOVA table

## Analysis of Variance Table


##
## Response: visc
## Df Sum Sq Mean Sq F value Pr(>F)
## ratio 1 0.036905 0.036905 1.6405 0.2475
## Residuals 6 0.134982 0.022497

F-statistic is close to 1, meaning that Null hypothesis is not rejected. Ratio is not a good variable in
explaining the variability.
F0 = 1.6405 and F0.05,1,6 = 5.9874, so here F0 < Fα,1,n−2 that’s why null hypothesis is not rejected.

d) 95% Confidence and Prediction band

ci_band<-as.data.frame(predict(q1_model,q1_data,interval='confidence',level=0.95))
## 95% Confidence level for every point present in the data

## The code here first predicts y_hat for every x present in the data using the
## model we fitted, then finds the 95% confidence interval for each of them and then that table is
## converted to a data frame.

pe_band<-as.data.frame(predict(q1_model,q1_data,interval='prediction',level=0.95))

## The code here first predicts y_hat for every x present in the data, assuming these
## data points were not present while training then using the model we fitted, it finds
## the 95% prediction interval for each of them and then that table is
## converted to a data frame.

ci_band$ratio<-q1_data$ratio ## 'x' variable added to the data frame of CI and PE band dataframe
pe_band$ratio<-q1_data$ratio

colors<-c('95% Confidence Band'='red','95% Prediction Band'='green','Fit'='blue') # For setting legend

### Plot of 95% Prediction and Confidence Interval band


ggplot(q1_data)+geom_point(aes(ratio,visc),color='black',size=2.5)+
geom_ribbon(data=ci_band,aes(x=ratio,ymin=lwr,ymax=upr,color='95% Confidence Band'),
alpha=0,size=1)+
geom_ribbon(data=pe_band,aes(x=ratio,ymin=lwr,ymax=upr,color='95% Prediction Band'),
alpha=0,size=1)+
geom_line(data=pe_band,aes(x=ratio,y=fit,color='Fit'),size=1)+
labs(x='Ratio',y='Viscosity',title='CI & PI Band',color="Legend")+
scale_color_manual(values=colors)+
my_theme

4
CI & PI Band
95% Confidence Band 95% Prediction Band Fit

1.00

0.75
Viscosity

0.50

0.25

0.00

0.4 0.6 0.8 1.0


Ratio

Q-2.15

q2_data<-p2.15 ## Question 2.15 data


q2_data

## temp visc
## 1 24.9 1.1330
## 2 35.0 0.9772
## 3 44.9 0.8532
## 4 55.1 0.7550
## 5 65.2 0.6723
## 6 75.2 0.6021
## 7 85.2 0.5420
## 8 95.2 0.5074

Prediction Equation

q2_model<-lm(visc~.,data=q2_data) # Linear model fit using Temperature to predict Viscosity


q2_model

5
##
## Call:
## lm(formula = visc ~ ., data = q2_data)
##
## Coefficients:
## (Intercept) temp
## 1.281511 -0.008758

Prediction equations is:


yˆi = 1.2815 − 0.00876xi

Analysis

summary(q2_model)

##
## Call:
## lm(formula = visc ~ ., data = q2_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.043955 -0.035863 -0.009305 0.019900 0.069559
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.2815107 0.0468683 27.34 1.58e-07 ***
## temp -0.0087578 0.0007284 -12.02 2.01e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04743 on 6 degrees of freedom
## Multiple R-squared: 0.9602, Adjusted R-squared: 0.9535
## F-statistic: 144.6 on 1 and 6 DF, p-value: 2.007e-05

R2 is 0.9602 indicating a very good fit.

Hypothesis testing of the intercept and temperature with 5% significance level

For temperature, t0 = −12.02 and tα/2,6 = 2.447.


So −tα/2,6 > to , i.e null hypothesis is rejected. It means that there might be a linear relationship between
temperature and viscosity.
For intercept, t0 = 27.34 and and tα/2,6 = 2.447.
so to > tα/2,6 , i.e null hypothesis is rejected.

95% Confidence interval for intercept and temperature

6
confint(q2_model)

## 2.5 % 97.5 %
## (Intercept) 1.16682797 1.396193340
## temp -0.01054005 -0.006975593

ANOVA (Testing significance of Regression)

F-statistic is very far from 1, i.e the Null hypothesis is rejected. F0 = 144.6 and F0.05,1,6 = 5.9874, so here
F0 > Fα,1,n−2 that’s why null hypothesis is rejected.
The Regression is significant here. Temperature is a good variable in explaining the variability of the data.

anova(q2_model)

## Analysis of Variance Table


##
## Response: visc
## Df Sum Sq Mean Sq F value Pr(>F)
## temp 1 0.32529 0.32529 144.58 2.007e-05 ***
## Residuals 6 0.01350 0.00225
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

95% Confidence and Prediction interval band

ci_band2<-as.data.frame(predict(q2_model,q2_data,interval='confidence',level=0.95))

## The code here first predicts y_hat for every x present in the data using the
## model we fitted, then finds the 95% confidence interval for each of them and then that table is
## converted to a data frame.

pe_band2<-as.data.frame(predict(q2_model,q2_data,interval='prediction',level=0.95))

## The code here first predicts y_hat for every x present in the data, assuming these
## data points were not present while training then using the model we fitted, it finds
## the 95% prediction interval for each of them and then that table is
## converted to a data frame.

ci_band2$temp<-q2_data$temp ## 'x' variable added to the data frame of CI and PE band dataframe
pe_band2$temp<-q2_data$temp

colors<-c('95% Confidence Band'='red','95% Prediction Band'='green','Fit'='blue')

ggplot(q2_data)+geom_point(aes(temp,visc),size=2.5,color='black')+
geom_ribbon(data=ci_band2,aes(x=temp,ymin=lwr,ymax=upr,color='95% Confidence Band'),

7
alpha=0,size=1)+
geom_ribbon(data=pe_band2,aes(x=temp,ymin=lwr,ymax=upr,color='95% Prediction Band'),
alpha=0,size=1)+
geom_line(data=pe_band2,aes(x=temp,y=fit,color='Fit'),size=1)+
labs(x='Temperature',y='Viscosity',title='CI & PI Band',color="Legend")+
scale_color_manual(values=colors)+
my_theme

CI & PI Band
95% Confidence Band 95% Prediction Band Fit

1.00
Viscosity

0.75

0.50

40 60 80
Temperature

Q-2.16

q3_data<-p2.16 ## Question-2.16
q3_data

## volume pressure
## 1 2084 4599
## 2 2084 4600
## 3 2273 5044
## 4 2273 5043
## 5 2273 5044
## 6 2463 5488

8
## 7 2463 5487
## 8 2651 5931
## 9 2652 5932
## 10 2652 5932
## 11 2842 6380
## 12 2842 6380
## 13 3030 6818
## 14 3031 6817
## 15 3031 6818
## 16 3221 7266
## 17 3221 7268
## 18 3409 7709
## 19 3410 7710
## 20 3600 8156
## 21 3600 8158
## 22 3788 8597
## 23 3789 8599
## 24 3789 8600
## 25 3979 9048
## 26 3979 9048
## 27 4167 9484
## 28 4168 9487
## 29 4168 9487
## 30 4358 9936
## 31 4358 9938
## 32 4546 10377
## 33 4547 10379

Analysis

q3_model<-lm(pressure~.,q3_data) # Linear model fit to predict Pressure using volume


q3_model

##
## Call:
## lm(formula = pressure ~ ., data = q3_data)
##
## Coefficients:
## (Intercept) volume
## -290.707 2.346

Prediction equation:yˆi = −290.707 + 2.346 ∗ xi

summary(q3_model) ## Summary of the model

##
## Call:
## lm(formula = pressure ~ ., data = q3_data)
##
## Residuals:
## Min 1Q Median 3Q Max

9
## -4.3276 -0.9227 0.0773 1.2676 2.9577
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.907e+02 1.355e+00 -214.6 <2e-16 ***
## volume 2.346e+00 4.007e-04 5855.4 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.741 on 31 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 3.429e+07 on 1 and 31 DF, p-value: < 2.2e-16

R2 value is 1, indication a very good fit.

Hypothesis testing of the intercept and volume with 5% significance level

For volume, t0 = 5855.4 and tα/2,31 = 2.04.


So to > tα/2,31 , i.e null hypothesis is rejected. It means that there might be a linear relationship between
Pressure and volume.
For intercept, t0 = −214.6 and and tα/2,31 = 2.447.
so to < −tα/2,31 , i.e null hypothesis is rejected.

95% Confidence interval for intercept and volume

confint(q3_model)

## 2.5 % 97.5 %
## (Intercept) -293.469711 -287.943397
## volume 2.345614 2.347249

ANOVA (Significance of Regression)

anova(q3_model)

## Analysis of Variance Table


##
## Response: pressure
## Df Sum Sq Mean Sq F value Pr(>F)
## volume 1 103947022 103947022 34286009 < 2.2e-16 ***
## Residuals 31 94 3
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

F-statistic is very far from 1, i.e the Null hypothesis is rejected.


F0 = 34286009 and F0.05,1,31 = 4.17, so here F0 > Fα,1,n−2 that’s why null hypothesis is rejected.
The Regression is significant here. Volume is a good variable in explaining the variability of the data.

10
Plot of 95% Confidence and Prediction interval band

ci_band3<-as.data.frame(predict(q3_model,q3_data,interval='confidence',level=0.95))

## The code here first predicts y_hat for every x present in the data using the
## model we fitted, then finds the 95% confidence interval for each of them and then that table is
## converted to a data frame.

pe_band3<-as.data.frame(predict(q3_model,q3_data,interval='prediction',level=0.95))

## The code here first predicts y_hat for every x present in the data, assuming these
## data points were not present while training then using the model we fitted, it finds
## the 95% prediction interval for each of them and then that table is
## converted to a data frame.

ci_band3$volume<-q3_data$volume ## 'x' variable added to the data frame of CI and PE band dataframe
pe_band3$volume<-q3_data$volume

colors<-c('95% Confidence Band'='red','95% Prediction Band'='green','Fit'='blue')

ggplot(q3_data)+geom_point(aes(volume,pressure),size=2.5,color='black')+
geom_ribbon(data=ci_band3,aes(x=volume,ymin=lwr,ymax=upr,color='95% Confidence Band')
,alpha=0,size=1)+
geom_ribbon(data=pe_band3,aes(x=volume,ymin=lwr,ymax=upr,color='95% Prediction Band')
,alpha=0,size=1)+
geom_line(data=pe_band3,aes(x=volume,y=fit,color='Fit'),size=1)+
labs(x='Volume',y='Pressure',title='CI & PI Band',color="Legend")+
scale_color_manual(values=colors)+
my_theme

11
CI & PI Band
95% Confidence Band 95% Prediction Band Fit

10000
Pressure

8000

6000

2000 2500 3000 3500 4000 4500


Volume

Q-2.17
Data Setup

## Data reading
bp<-c(199.5,199.3,197.9,198.4,199.4,199.9,200.9,201.1,201.9,201.3,203.6,204.6,209.5,
208.6,210.7,211.9,212.2)
press<-c(20.79,20.79,22.4,22.67,23.15,23.35,23.89,23.99,24.02,24.01,25.14,26.57,
28.49,27.76,29.04,29.88,30.06)
q4_data<-data.frame(bp=bp,press=press)
q4_data

## bp press
## 1 199.5 20.79
## 2 199.3 20.79
## 3 197.9 22.40
## 4 198.4 22.67
## 5 199.4 23.15
## 6 199.9 23.35
## 7 200.9 23.89

12
## 8 201.1 23.99
## 9 201.9 24.02
## 10 201.3 24.01
## 11 203.6 25.14
## 12 204.6 26.57
## 13 209.5 28.49
## 14 208.6 27.76
## 15 210.7 29.04
## 16 211.9 29.88
## 17 212.2 30.06

Prediction equation

q4_model<-lm(bp~.,q4_data)
q4_model

##
## Call:
## lm(formula = bp ~ ., data = q4_data)
##
## Coefficients:
## (Intercept) press
## 163.333 1.606

The prediction equation is :yˆi = 163.333 + 1.606xi

Analysis

summary(q4_model)

##
## Call:
## lm(formula = bp ~ ., data = q4_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.4013 -0.9267 -0.1009 0.5989 2.7839
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 163.3333 2.7316 59.79 < 2e-16 ***
## press 1.6057 0.1083 14.83 2.28e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.308 on 15 degrees of freedom
## Multiple R-squared: 0.9362, Adjusted R-squared: 0.9319
## F-statistic: 219.9 on 1 and 15 DF, p-value: 2.279e-10

The R2 is 0.9362, indicating a pretty good fit.

13
Hypothesis testing of the intercept and volume with 5% significance level

For Pressure, t0 = 14.83 and tα/2,15 = 2.131.


So to > tα/2,31 , i.e null hypothesis is rejected. It means that there might be a linear relationship between
Pressure and Boiling Point.
For intercept, t0 = 59.79 and and tα/2,15 = 2.131.
so to > tα/2,31 , i.e null hypothesis is rejected.

95% Confidence interval for intercept and Pressure

confint(q4_model)

## 2.5 % 97.5 %
## (Intercept) 157.510998 169.155543
## press 1.374942 1.836487

ANOVA (Significance of Regression)

anova(q4_model)

## Analysis of Variance Table


##
## Response: bp
## Df Sum Sq Mean Sq F value Pr(>F)
## press 1 376.27 376.27 219.95 2.279e-10 ***
## Residuals 15 25.66 1.71
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

F-statistic is very far from 219.95, i.e the Null hypothesis is rejected.
F0 = 219.95 and F0.05,1,15 = 4.5431, so here F0 > Fα,1,n−2 that’s why null hypothesis is rejected.
The Regression is significant here. Pressure is a good variable to predict Boiling Point.

Plot of 95% Confidence and Prediction interval band

ci_band4<-as.data.frame(predict(q4_model,q4_data,interval='confidence',level=0.95))

## The code here first predicts y_hat for every x present in the data using the
## model we fitted, then finds the 95% confidence interval for each of them and then that table is
## converted to a data frame.

pe_band4<-as.data.frame(predict(q4_model,q4_data,interval='prediction',level=0.95))

14
## The code here first predicts y_hat for every x present in the data, assuming these
## data points were not present while training then using the model we fitted, it finds
## the 95% prediction interval for each of them and then that table is
## converted to a data frame.

ci_band4$bp<-q4_data$press
pe_band4$bp<-q4_data$press

colors<-c('95% Confidence Band'='red','95% Prediction Band'='green','Fit'='blue')

ggplot(q4_data)+geom_point(aes(press,bp),size=2.5,color='black')+
geom_ribbon(data=ci_band4,aes(x=press,ymin=lwr,ymax=upr,color='95% Confidence Band'),alpha=0,size=1)+
geom_ribbon(data=pe_band4,aes(x=press,ymin=lwr,ymax=upr,color='95% Prediction Band'),alpha=0,size=1)+
geom_line(data=pe_band4,aes(x=press,y=fit,color='Fit'),size=1)+
labs(x='Pressure',y='Boiling Point',title='CI & PI Band',color="Legend")+
scale_color_manual(values=colors)+
my_theme

CI & PI Band
95% Confidence Band 95% Prediction Band Fit

215

210
Boiling Point

205

200

195

22.5 25.0 27.5 30.0


Pressure

15
Q 2.18
Data Setup

firms<-c('Miller Lite','Pepsi',"Stroh's","Fed'l Express","Burger King","Coco-Cola","McDonald's","MCI",


"Diet Coke","Ford","Levi's","Bud Lite","ATT/Bell","Calvin Klein","Wendy's","Polaroid","Shasta","Meow Mix
"Oscar Meyer","Crest","Kibbles 'n Bits")

spent<-c(50.1,74.1,19.3,22.9,82.4,40.1,185.9,26.9,20.4,166.2,27,45.6,154.9,5,49.7,26.9,5.7,7.6,9.2,32.4,

impressions<-c(32.1,99.6,11.7,21.9,60.8,78.6,92.4,50.7,21.4,40.1,40.8,10.4,88.9,12,29.2,38,10,12.3,23.4,

q5_data<-data.frame(firms=firms,spent=spent,impressions=impressions)
q5_data$firms<-as.character(q5_data$firms)
q5_data

## firms spent impressions


## 1 Miller Lite 50.1 32.1
## 2 Pepsi 74.1 99.6
## 3 Stroh's 19.3 11.7
## 4 Fed'l Express 22.9 21.9
## 5 Burger King 82.4 60.8
## 6 Coco-Cola 40.1 78.6
## 7 McDonald's 185.9 92.4
## 8 MCI 26.9 50.7
## 9 Diet Coke 20.4 21.4
## 10 Ford 166.2 40.1
## 11 Levi's 27.0 40.8
## 12 Bud Lite 45.6 10.4
## 13 ATT/Bell 154.9 88.9
## 14 Calvin Klein 5.0 12.0
## 15 Wendy's 49.7 29.2
## 16 Polaroid 26.9 38.0
## 17 Shasta 5.7 10.0
## 18 Meow Mix 7.6 12.3
## 19 Oscar Meyer 9.2 23.4
## 20 Crest 32.4 71.1
## 21 Kibbles 'n Bits 6.1 4.4

Prediction equation

q5_model<-lm(impressions~spent,data=q5_data)
q5_model

##
## Call:
## lm(formula = impressions ~ spent, data = q5_data)
##
## Coefficients:
## (Intercept) spent
## 22.1627 0.3632

16
Prediction equations is: yˆi = 22.1627 + 0.3632x

Analysis

summary(q5_model)

##
## Call:
## lm(formula = impressions ~ spent, data = q5_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -42.422 -12.623 -8.171 8.832 50.526
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22.16269 7.08948 3.126 0.00556 **
## spent 0.36317 0.09712 3.739 0.00139 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 23.5 on 19 degrees of freedom
## Multiple R-squared: 0.424, Adjusted R-squared: 0.3936
## F-statistic: 13.98 on 1 and 19 DF, p-value: 0.001389

The R2 is 0.424 indicating a not very good fit.

Hypothesis testing of the intercept and volume with 5% significance level

For Spent Amount, t0 = 3.739 and tα/2,19 = 2.093.


So to > tα/2,31 , i.e null hypothesis is rejected. It means that there might be a linear relationship between
Spent Amount and Retained Impressions.
For intercept, t0 = 3.739 and and tα/2,19 = 2.093.
so to > tα/2,31 , i.e null hypothesis is rejected.

95% Confidence interval for intercept and Pressure

confint(q5_model)

## 2.5 % 97.5 %
## (Intercept) 7.324244 37.0011425
## spent 0.159899 0.5664492

ANOVA (Significance of Regression)

17
anova(q5_model)

## Analysis of Variance Table


##
## Response: impressions
## Df Sum Sq Mean Sq F value Pr(>F)
## spent 1 7723.3 7723.3 13.983 0.001389 **
## Residuals 19 10494.1 552.3
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

F_value is greater than 1.


F0 = 13.983 and F0.05,1,19 = 4.3807, so here F0 > Fα,1,n−2 that’s why null hypothesis is rejected.
The Regression is significant here. Amount Spent seem to be a good variable to predict Impressions. But
this conclusion is not reflected in the R2 value. This might be due to x-values (Money spent amount) range
being confined to a more certain region as seen from the plot

Plot of 95% Confidence and Prediction interval band

ci_band5<-as.data.frame(predict(q5_model,q5_data,interval='confidence',level=0.95))

## The code here first predicts y_hat for every x present in the data using the
## model we fitted, then finds the 95% confidence interval for each of them and then that table is
## converted to a data frame.

pe_band5<-as.data.frame(predict(q5_model,q5_data,interval='prediction',level=0.95))

## The code here first predicts y_hat for every x present in the data, assuming these
## data points were not present while training then using the model we fitted, it finds
## the 95% prediction interval for each of them and then that table is
## converted to a data frame.

ci_band5$spent<-q5_data$spent
pe_band5$spent<-q5_data$spent

colors<-c('95% Confidence Band'='red','95% Prediction Band'='green','Fit'='blue')

ggplot(q5_data)+geom_point(aes(spent,impressions),size=2.5,color='black')+
geom_ribbon(data=ci_band5,aes(x=spent,ymin=lwr,ymax=upr,color='95% Confidence Band'),alpha=0,size=1)+
geom_ribbon(data=pe_band5,aes(x=spent,ymin=lwr,ymax=upr,color='95% Prediction Band'),alpha=0,size=1)+
geom_line(data=pe_band5,aes(x=spent,y=fit,color='Fit'),size=1)+
labs(x='Amount Spent(Millions)',y='Retained Impressions(Millions)',title='CI & PI Band',color="Legend"
scale_color_manual(values=colors)+
my_theme

18
CI & PI Band
95% Confidence Band 95% Prediction Band Fit

150
Retained Impressions(Millions)

100

50

0 50 100 150
Amount Spent(Millions)

Retained Impressions for MCI

MCI_data<-q5_data[8,]

MCI_CI<-predict(q5_model,MCI_data,interval = 'confidence',level=0.95)
MCI_PI<-predict(q5_model,MCI_data,interval = 'prediction',level=0.95)

Prediction interval

PI: (−18.64084, 82.50499)

MCI_PI

## fit lwr upr


## 8 31.93208 -18.64084 82.50499

Confidence interval

CI: (20.18314, 43.68102)

19
MCI_CI

## fit lwr upr


## 8 31.93208 20.18314 43.68102

20

You might also like