Remarks

4/27/23, 1:01 PM IE451 Midterm Examination Spring 2023
Answer the following questions. Write your responses below each part in the qmd file. Compress
qmd, html, and any data files you may create into a single zip file. Name the zip file with your Bilkent
student ID (e.g., 21400023.zip). Upload the zip file to the IE 451 moodle page.
You may use your notes, your textbook, or internet.

You may NOT seek help from anybody inside or outside classroom in any form.
Be sure that qmd file and the final version of html file are submitted.
Questions
pollution.csv contains data on average pollution amounts per unit time ( PollutantAmnt ) at different
locations along with several potential risk factors. We would like to use data to come up with a
statistical model that shed light on relations between risk factors and pollution amount.
Exploration [suggested time: 15 minutes]

1. Read data, print the first six rows, and explore the relations between variables (graphical and
tabular summaries). For full credit, you should comment on the plots and tables.
d <- suppressMessages(read_csv("pollution.csv", show_col_types = FALSE) %>%

select(-1)) %>%
mutate_if(is.character, factor)
d %>% head()
# A tibble: 6 × 7
Convection Habitants HumidDays HumidityAmnt IndustrySize PollutantAmnt Region
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
1 103. 315 171 24.9 907 33 A
2 98.4 1501 309 69.0 2013 129 A
3 101 351 177 29.3 273 55 B
4 111. 1163 209 70.8 1549 111 A
file:///home/sdayanik/usr/savas/bilkent/teaching/IE 451 Applied Data Analysis/IE451_git/Spring 2023/Midterm/submissions/Den… 1/22

5 93.2 925 331 71.2 781 21 C

6 97.2 159 253 85.7 823 111 A
d %>% summary() %>% pander()
Table continues below
Convection Habitants HumidDays HumidityAmnt IndustrySize
Min. : 86.0 Min. : 141 Min. : 71.0 Min. : 13.10 Min. : 69.0
1st Qu.:100.2 1st Qu.: 597 1st Qu.:205.0 1st Qu.: 60.92 1st Qu.: 361.0
Median :108.2 Median :1029 Median :229.0 Median : 76.48 Median : 693.0
Mean :110.5 Mean :1216 Mean :226.8 Mean : 72.54 Mean : 925.2
3rd Qu.:117.6 3rd Qu.:1433 3rd Qu.:255.0 3rd Qu.: 85.22 3rd Qu.: 923.0
Max. :150.0 Max. :6737 Max. :331.0 Max. :118.60 Max. :6687.0
PollutantAmnt Region
Min. : 15.0 A:14
1st Qu.: 25.0 B:16
Median : 51.0 C:11
Mean : 59.1 NA
3rd Qu.: 69.0 NA
Max. :219.0 NA
There are 7 different variables which are

Convection,Habitants,HumidDays,HumidityAmnt,IndstrySize,PollutantAmnt and Region. Region is
qualitiative variable while others are quantitive variables.
The range for Convection is btw 86 and 150 The range for Habitants is btw 141 and 6737 The range
for HumidityAmnt is btw 13.10 and 118.60 The range for IndustrySize is btw 69.0 and 6687 The range
for PollutantAmnt is btw 15.0 and 219.0
scatterplotMatrix(~
Convection+Habitants+HumidDays+HumidityAmnt+IndustrySize+PollutantAmnt, data
= d,
regLine = FALSE, smooth = list(spread = FALSE, lty.smooth =
"solid", col.smooth = "red"))

scatterplotMatrix(~
Convection+Habitants+HumidDays+HumidityAmnt+IndustrySize+PollutantAmnt|Region,
data = d,
regLine = FALSE, smooth = list(spread = FALSE, lty.smooth =
"solid", col.smooth = "red"))

Habitans,IndustrySize,Convection and PollutantAmnt have skewed distributions
d %>%
select(where(is.numeric)) %>%
cor() %>%
corrplot(diag=FALSE, type = "upper", method = "pie", order = "hclust")

Habitans,IndustrySize,Convection and PollutantAmnt have skewed distributions
d %>%
select(where(is.numeric)) %>%
cor() %>%
corrplot(diag=FALSE, type = "upper", method = "pie", order = "hclust")

We can say that there is no such two variable quite highly correlated the most positively correlated
pair is IndustrySize and PollutantAmnt Of course there other pairs positively correlated such as
Convectiın and Habitants Also there are negatively correlated pairs such as IndustrySize and
HumidDays
Lets see how variables affect PollutantAmnt
Habitants_lm <- lm( PollutantAmnt ~ Habitants, data = d)

IndustrySize_lm <- lm( PollutantAmnt ~ IndustrySize, data = d)
Convection_lm <- lm( PollutantAmnt ~ Convection, data = d)
HumidDays_lm <- lm( PollutantAmnt ~ HumidDays, data = d)
HumidityAmnt_lm<- lm( PollutantAmnt ~ HumidityAmnt, data = d)
Habitants_lm
Call:
lm(formula = PollutantAmnt ~ Habitants, data = d)
Coefficients:
(Intercept) Habitants
55.793948 0.002716
IndustrySize_lm
Call:
lm(formula = PollutantAmnt ~ IndustrySize, data = d)
Coefficients:
(Intercept) IndustrySize
34.24801 0.02686
Convection_lm
Call:
lm(formula = PollutantAmnt ~ Convection, data = d)
Coefficients:
(Intercept) Convection
214.734 -1.408
HumidDays_lm
Call:
lm(formula = PollutantAmnt ~ HumidDays, data = d)
Coefficients:
(Intercept) HumidDays
-15.1267 0.3273
HumidityAmnt_lm
Call:
lm(formula = PollutantAmnt ~ HumidityAmnt, data = d)
Coefficients:
(Intercept) HumidityAmnt
51.2444 0.1083
As we can see Habitants,IndustrySize,HumidDays,HumidityAmnt are positively correlated with
PollutantAmnt while Convection is negatively correlated with PollutantAmnt.
Lets build a model with all in it
total_lm <- lm(PollutantAmnt ~ Habitants + IndustrySize + HumidDays + HumidityAmnt

+ Convection, data = d)
total_lm %>% summary()
Call:
lm(formula = PollutantAmnt ~ Habitants + IndustrySize + HumidDays +
HumidityAmnt + Convection, data = d)
Residuals:
Min 1Q Median 3Q Max
-56.077 -21.538 -3.405 19.525 108.500

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 154.789101 82.063921 1.886 0.0676 .
Habitants -0.004374 0.005051 -0.866 0.3924
IndustrySize 0.025353 0.004996 5.074 1.28e-05 ***
HumidDays -0.022045 0.180982 -0.122 0.9037
HumidityAmnt 0.556838 0.408719 1.362 0.1818
Convection -1.350079 0.621210 -2.173 0.0366 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 32.86 on 35 degrees of freedom

Multiple R-squared: 0.5714, Adjusted R-squared: 0.5102
F-statistic: 9.333 on 5 and 35 DF, p-value: 1.019e-05
We can see that IndustrySize is significant and also Convection but not as powerful as IndustrySize.
Therefore there is an exact relation with IndustrySize and Convection btw PollutantAmnt Also
Adjusted R^2 value is 0.51 that means with these variables we can estimate the 51.02% of the
variables. P value is less than 0.05 that means null hypothesis is rejected.
Model [suggested time: 55 minutes]

plot(total_lm)


2. Fit the best statistical model. Check diagnostics to make sure model assumptions are satisfied. If
not, try to improve your model. Check diagnostics again. Repeat until a satisfactory model is
obtained. Explain your reasoning.
Here we can say that that is not the best model the variance may or may not be constant.
Therefore, maybe we need powertransform to normalize the model.
X <- d %>% select(PollutantAmnt, Habitants,

IndustrySize,HumidDays,HumidityAmnt,Convection)%>% as.matrix()
res_pt <- powerTransform(X ~ 1)
res_pt %>% summary()
bcPower Transformations to Multinormality

Est Power Rounded Pwr Wald Lwr Bnd Wald Upr Bnd
PollutantAmnt -0.2558 0 -0.7067 0.1951
Habitants 0.0176 0 -0.2436 0.2787
IndustrySize 0.1288 0 -0.1017 0.3592
HumidDays 0.1557 0 -0.3244 0.6358
HumidityAmnt 1.6367 2 1.0931 2.1803
Convection -0.7290 0 -2.2045 0.7464
Likelihood ratio test that transformation parameters are equal to 0

(all log transformations)
LRT df pval
LR test, lambda = (0 0 0 0 0 0) 47.95536 6 1.2061e-08
Likelihood ratio test that no transformations are needed

LRT df pval
LR test, lambda = (1 1 1 1 1 1) 153.7129 6 < 2.22e-16
From that we can say that there is a need for powertransform for HumidityAmnt with power=2. And
log for others
Xpt <- bcPower(X, coef(res_pt, round=TRUE)) %>%

set_colnames(paste("pt", colnames(X), sep="_"))
head(Xpt)
pt_PollutantAmnt pt_Habitants pt_IndustrySize pt_HumidDays pt_HumidityAmnt

[1,] 3.496508 5.752573 6.810142 5.141664 309.5050
[2,] 4.859812 7.313887 7.607381 5.733341 2378.6202
[3,] 4.007333 5.860786 5.609472 5.176150 429.9178
[4,] 4.709530 7.058758 7.345365 5.342334 2504.4042
[5,] 3.044522 6.829794 6.660575 5.802118 2535.6442
[6,] 4.709530 5.068904 6.712956 5.533389 3675.1738
pt_Convection
[1,] 4.632785
[2,] 4.589041
[3,] 4.615121
[4,] 4.707727
[5,] 4.534748
[6,] 4.576771
d_pt <- Xpt %>%

as_tibble()
res_lm_sat <- lm(pt_PollutantAmnt ~

pt_Habitants+pt_IndustrySize+pt_HumidDays+pt_HumidityAmnt+pt_Convection,
data= d_pt)
res_lm_sat
Call:
lm(formula = pt_PollutantAmnt ~ pt_Habitants + pt_IndustrySize +
pt_HumidDays + pt_HumidityAmnt + pt_Convection, data = d_pt)
Coefficients:
(Intercept) pt_Habitants pt_IndustrySize pt_HumidDays
1.336e+01 -1.978e-01 2.635e-01 4.225e-01
pt_HumidityAmnt pt_Convection
8.514e-05 -2.641e+00
par(mfrow = c(2, 2))

plot(res_lm_sat)
file:///home/sdayanik/usr/savas/bilkent/teaching/IE 451 Applied Data Analysis/IE451_git/Spring 2023/Midterm/submissions/De… 10/22

The model looks like got better but still not the best let us look at the variance if it is constant or not
ncvTest(res_lm_sat)
Non-constant Variance Score Test

Variance formula: ~ fitted.values
Chisquare = 3.475041, Df = 1, p = 0.062301
Non-canstant variance test cannot reject “constant variance”. let me also look at normality
shapiro.test(rstudent(res_lm_sat))
Shapiro-Wilk normality test
data: rstudent(res_lm_sat)
W = 0.96323, p-value = 0.2038
Shapiro test cannot reject normality.
residualPlots(res_lm_sat)

Test stat Pr(>|Test stat|)

pt_Habitants 2.1667 0.037354 *
pt_IndustrySize 2.9202 0.006171 **
pt_HumidDays 0.2634 0.793801
pt_HumidityAmnt -1.6387 0.110505
pt_Convection -2.4081 0.021607 *
Tukey test -0.0397 0.968348
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
res_lm_sat %>% summary()
Call:
lm(formula = pt_PollutantAmnt ~ pt_Habitants + pt_IndustrySize +
pt_HumidDays + pt_HumidityAmnt + pt_Convection, data = d_pt)
Residuals:
-1.41295 -0.40593 -0.07447 0.43447 0.86185
Coefficients:
(Intercept) 1.336e+01 1.249e+01 1.070 0.2918
pt_Habitants -1.978e-01 1.432e-01 -1.382 0.1759
pt_IndustrySize 2.635e-01 1.164e-01 2.264 0.0299 *
pt_HumidDays 4.225e-01 7.944e-01 0.532 0.5982
pt_HumidityAmnt 8.514e-05 1.556e-04 0.547 0.5878
pt_Convection -2.641e+00 1.919e+00 -1.376 0.1775

---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

F-statistic: 5.636 on 5 and 35 DF, p-value: 0.0006509
total_lm %>% summary()
Call:
lm(formula = PollutantAmnt ~ Habitants + IndustrySize + HumidDays +
HumidityAmnt + Convection, data = d)
Residuals:
-56.077 -21.538 -3.405 19.525 108.500
Coefficients:
(Intercept) 154.789101 82.063921 1.886 0.0676 .
Habitants -0.004374 0.005051 -0.866 0.3924
IndustrySize 0.025353 0.004996 5.074 1.28e-05 ***
HumidDays -0.022045 0.180982 -0.122 0.9037
HumidityAmnt 0.556838 0.408719 1.362 0.1818
Convection -1.350079 0.621210 -2.173 0.0366 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

So, there is nothing wrong as an evidence from the test we can play with the variables maybe to get a
better result but i dont have time so that is the best model that i can find in a time.
residualPlots(res_lm_sat)

Test stat Pr(>|Test stat|)

pt_Habitants 2.1667 0.037354 *
pt_IndustrySize 2.9202 0.006171 **
pt_HumidDays 0.2634 0.793801
pt_HumidityAmnt -1.6387 0.110505
pt_Convection -2.4081 0.021607 *
Tukey test -0.0397 0.968348
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Looks like Habitants,IndustrySize and Convection variables are significant we can build a new model
with those variables
new_model<-lm(PollutantAmnt~Habitants+IndustrySize+Convection,data=d)
plot(new_model)


summary(new_model)

Call:
lm(formula = PollutantAmnt ~ Habitants + IndustrySize + Convection,
data = d)
Residuals:
-53.395 -24.191 -2.792 18.938 120.742
Coefficients:
(Intercept) 150.140364 43.684024 3.437 0.00147 **
Habitants -0.001695 0.004999 -0.339 0.73650
IndustrySize 0.024881 0.005135 4.845 2.27e-05 ***
Convection -1.013342 0.391323 -2.590 0.01366 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now it looks like it got better but we can eliminate Habitants and check it again
new_model2<-lm(PollutantAmnt~IndustrySize+Convection,data=d)
plot(new_model2)


summary(new_model2)
Call:
lm(formula = PollutantAmnt ~ IndustrySize + Convection, data = d)
Residuals:
-54.339 -25.098 -3.225 15.730 121.659
Coefficients:
(Intercept) 152.449670 42.644264 3.575 0.000974 ***
IndustrySize 0.024304 0.004788 5.076 1.05e-05 ***
Convection -1.048053 0.373268 -2.808 0.007831 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3. How good does your model fit data? Explain. Not the best fit at all, I tried multiple methods and
the last method is the best
Analysis [suggested time: 50 minutes]

4. Does presence of strong air current alleviate industry size effect on pollution amount in a typical
location?
lm_all <- lm(PollutantAmnt ~ Convection + IndustrySize, data = d)

lm_ind<- lm(PollutantAmnt ~ IndustrySize, data = d)
summary(lm_all)
Call:
lm(formula = PollutantAmnt ~ Convection + IndustrySize, data = d)
Residuals:
-54.339 -25.098 -3.225 15.730 121.659
Coefficients:
(Intercept) 152.449670 42.644264 3.575 0.000974 ***
Convection -1.048053 0.373268 -2.808 0.007831 **
IndustrySize 0.024304 0.004788 5.076 1.05e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(lm_ind)
Call:
lm(formula = PollutantAmnt ~ IndustrySize, data = d)
Residuals:
-53.951 -25.936 -6.989 13.420 134.354
Coefficients:
(Intercept) 34.248007 7.379913 4.641 3.86e-05 ***
IndustrySize 0.026859 0.005099 5.268 5.36e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(lm_all,lm_ind)
Analysis of Variance Table

Model 1: PollutantAmnt ~ Convection + IndustrySize

Model 2: PollutantAmnt ~ IndustrySize
Res.Df RSS Df Sum of Sq F Pr(>F)
1 38 42655
2 39 51505 -1 -8849.4 7.8836 0.007831 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can say that there is an effect of presence of Convention over IndustrySize on PollutantAmnt
5. How does pollution amount change with convection? Does this relation differ from region to
region?
lm_interact<-lm(PollutantAmnt ~ Convection, data = d)

lm_interact2<-lm(PollutantAmnt ~ Convection+Region+(Convection:Region), data = d)
lm_interact %>% summary()
Call:
lm(formula = PollutantAmnt ~ Convection, data = d)
Residuals:
-62.496 -23.660 -6.610 8.912 145.361
Coefficients:
(Intercept) 214.7340 52.2226 4.112 0.000196 ***
Convection -1.4081 0.4686 -3.005 0.004624 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

lm_interact2 %>% summary()
Call:
lm(formula = PollutantAmnt ~ Convection + Region + (Convection:Region),
data = d)
Residuals:
-57.798 -26.568 -5.644 14.337 159.131
Coefficients:
(Intercept) 236.9276 88.4927 2.677 0.0112 *
Convection -1.5634 0.7846 -1.993 0.0542 .
RegionB 97.2158 144.4482 0.673 0.5054
RegionC -92.4553 122.2767 -0.756 0.4546
Convection:RegionB -0.8812 1.2798 -0.689 0.4956
Convection:RegionC 0.7190 1.1098 0.648 0.5213

---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(lm_interact,lm_interact2)
Analysis of Variance Table
Model 1: PollutantAmnt ~ Convection

Model 2: PollutantAmnt ~ Convection + Region + (Convection:Region)
Res.Df RSS Df Sum of Sq F Pr(>F)
1 39 71578
2 35 66858 4 4720.4 0.6178 0.6528
So we can say that there is a negative correlation between PollutantAmnt and Convection as
Convection increases Pollutant decreases. As we can see there is a different Estimate std for every
Region, again both are negatively affect the PollutantAmnt and it differs from region to region.
6. Predict pollution amount for a location in region C having average risk factors. Give 80%
confidence interval.
lm4<-lm(PollutantAmnt ~ Region, data = d)

lm4 %>%
predict(newdata = data.frame(Region='C'),interval= "confidence",level=0.80)
fit lwr upr

1 54.81818 35.91388 73.72248
So, we can say that we are 80% sure that the Pollutant amount in region C will be between 35.914
and 73.723

Remarks

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Remarks

Uploaded by

Copyright:

Available Formats

4/27/23, 1:01 PM IE451 Midterm Examination Spring 2023

You may use your notes, your textbook, or internet.

Exploration [suggested time: 15 minutes]

d <- suppressMessages(read_csv("pollution.csv", show_col_types = FALSE) %>%

ﬁle:///home/sdayanik/usr/savas/bilkent/teaching/IE 451 Applied Data Analysis/IE451_git/Spring 2023/Midterm/submissions/Den… 1/22

5 93.2 925 331 71.2 781 21 C

d %>% summary() %>% pander()

Table continues below

Convection Habitants HumidDays HumidityAmnt IndustrySize

Min. : 15.0 A:14

1st Qu.: 25.0 B:16

Median : 51.0 C:11

3rd Qu.: 69.0 NA

There are 7 diﬀerent variables which are

ﬁle:///home/sdayanik/usr/savas/bilkent/teaching/IE 451 Applied Data Analysis/IE451_git/Spring 2023/Midterm/submissions/Den… 2/22

ﬁle:///home/sdayanik/usr/savas/bilkent/teaching/IE 451 Applied Data Analysis/IE451_git/Spring 2023/Midterm/submissions/Den… 3/22

Habitans,IndustrySize,Convection and PollutantAmnt have skewed distributions

ﬁle:///home/sdayanik/usr/savas/bilkent/teaching/IE 451 Applied Data Analysis/IE451_git/Spring 2023/Midterm/submissions/Den… 4/22

Habitans,IndustrySize,Convection and PollutantAmnt have skewed distributions

ﬁle:///home/sdayanik/usr/savas/bilkent/teaching/IE 451 Applied Data Analysis/IE451_git/Spring 2023/Midterm/submissions/Den… 4/22

Lets see how variables aﬀect PollutantAmnt

Habitants_lm <- lm( PollutantAmnt ~ Habitants, data = d)

Lets build a model with all in it

total_lm <- lm(PollutantAmnt ~ Habitants + IndustrySize + HumidDays + HumidityAmnt

ﬁle:///home/sdayanik/usr/savas/bilkent/teaching/IE 451 Applied Data Analysis/IE451_git/Spring 2023/Midterm/submissions/Den… 6/22

Residual standard error: 32.86 on 35 degrees of freedom

Model [suggested time: 55 minutes]

ﬁle:///home/sdayanik/usr/savas/bilkent/teaching/IE 451 Applied Data Analysis/IE451_git/Spring 2023/Midterm/submissions/Den… 7/22

ﬁle:///home/sdayanik/usr/savas/bilkent/teaching/IE 451 Applied Data Analysis/IE451_git/Spring 2023/Midterm/submissions/Den… 8/22

Therefore, maybe we need powertransform to normalize the model.

X <- d %>% select(PollutantAmnt, Habitants,

bcPower Transformations to Multinormality

Likelihood ratio test that transformation parameters are equal to 0

Likelihood ratio test that no transformations are needed

ﬁle:///home/sdayanik/usr/savas/bilkent/teaching/IE 451 Applied Data Analysis/IE451_git/Spring 2023/Midterm/submissions/Den… 9/22

Xpt <- bcPower(X, coef(res_pt, round=TRUE)) %>%

pt_PollutantAmnt pt_Habitants pt_IndustrySize pt_HumidDays pt_HumidityAmnt

d_pt <- Xpt %>%

res_lm_sat <- lm(pt_PollutantAmnt ~

par(mfrow = c(2, 2))

ﬁle:///home/sdayanik/usr/savas/bilkent/teaching/IE 451 Applied Data Analysis/IE451_git/Spring 2023/Midterm/submissions/De… 10/22

Non-constant Variance Score Test

Shapiro-Wilk normality test

Shapiro test cannot reject normality.

ﬁle:///home/sdayanik/usr/savas/bilkent/teaching/IE 451 Applied Data Analysis/IE451_git/Spring 2023/Midterm/submissions/De… 11/22

Test stat Pr(>|Test stat|)

res_lm_sat %>% summary()

pt_Convection -2.641e+00 1.919e+00 -1.376 0.1775

Residual standard error: 0.5717 on 35 degrees of freedom

total_lm %>% summary()

Residual standard error: 32.86 on 35 degrees of freedom

ﬁle:///home/sdayanik/usr/savas/bilkent/teaching/IE 451 Applied Data Analysis/IE451_git/Spring 2023/Midterm/submissions/De… 13/22

Test stat Pr(>|Test stat|)

ﬁle:///home/sdayanik/usr/savas/bilkent/teaching/IE 451 Applied Data Analysis/IE451_git/Spring 2023/Midterm/submissions/De… 14/22

ﬁle:///home/sdayanik/usr/savas/bilkent/teaching/IE 451 Applied Data Analysis/IE451_git/Spring 2023/Midterm/submissions/De… 15/22

ﬁle:///home/sdayanik/usr/savas/bilkent/teaching/IE 451 Applied Data Analysis/IE451_git/Spring 2023/Midterm/submissions/De… 16/22

Residual standard error: 33.9 on 37 degrees of freedom

ﬁle:///home/sdayanik/usr/savas/bilkent/teaching/IE 451 Applied Data Analysis/IE451_git/Spring 2023/Midterm/submissions/De… 17/22

ﬁle:///home/sdayanik/usr/savas/bilkent/teaching/IE 451 Applied Data Analysis/IE451_git/Spring 2023/Midterm/submissions/De… 18/22

Residual standard error: 33.5 on 38 degrees of freedom

Analysis [suggested time: 50 minutes]

ﬁle:///home/sdayanik/usr/savas/bilkent/teaching/IE 451 Applied Data Analysis/IE451_git/Spring 2023/Midterm/submissions/De… 19/22

lm_all <- lm(PollutantAmnt ~ Convection + IndustrySize, data = d)