You are on page 1of 4

BA1 Homework 4

Group 23

2022-11-14

Task 1a)

plot(site$number_ads, site$impressions,
xlab = "Number of ads", ylab = "Number of Impressions")
1.2e+07
Number of Impressions

6.0e+06
0.0e+00

2 4 6 8 10 12 14

Number of ads

plot(site$impressions,site$clicks,
xlab = "Number of Impressions", ylab = "Number of Clicks")

1
800
Number of Clicks

600
400
200
0

0.0e+00 4.0e+06 8.0e+06 1.2e+07

Number of Impressions

Task 1b)

coradim<-cor(site$number_ads, site$impressions)
coradim

## [1] 0.6975701

corimc<-cor(site$clicks, site$impressions)
corimc

## [1] 0.6717794

Task 1c) We have a rather strong positive correlation in both cases. # of Ads correlates to impressions and
clicks also correlate with impressions. But this correlation doesn’t tell us anything about whether they are
causaly related. So we don’t know whether # of ads affect impressions or vice versa. Based on the graphs
in 1a however we can see that ads relate to impressions in a weird scatterplot, where it is almost 0 for a few
# of ads, while only at 14 ads the numbers jump up exponetially.For clicks and impressions it is more of a
gradual increase, which seem to be better related
Task 2d)

model_adim<-lm(site$impressions~site$number_ads)
summary(model_adim)

##

2
## Call:
## lm(formula = site$impressions ~ site$number_ads)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2589730 -1499885 -218481 805872 9582015
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2153615 100183 -21.50 <2e-16 ***
## site$number_ads 474641 13998 33.91 <2e-16 ***
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 1730000 on 1213 degrees of freedom
## Multiple R-squared: 0.4866, Adjusted R-squared: 0.4862
## F-statistic: 1150 on 1 and 1213 DF, p-value: < 2.2e-16

model_imc<-lm(site$clicks~site$impressions)
summary(model_imc)

##
## Call:
## lm(formula = site$clicks ~ site$impressions)
##
## Residuals:
## Min 1Q Median 3Q Max
## -172.80 -1.75 -1.69 -0.76 828.21
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.684e+00 1.687e+00 0.998 0.318
## site$impressions 2.097e-05 6.640e-07 31.585 <2e-16 ***
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 55.85 on 1213 degrees of freedom
## Multiple R-squared: 0.4513, Adjusted R-squared: 0.4508
## F-statistic: 997.6 on 1 and 1213 DF, p-value: < 2.2e-16

Task 2e) For the model between ads and impressions the estimated slope beta_1 equals 474,641, meaning
with every ad that is done, 474,641 impressions are generated. Both results for slope and intercept are
significant.However, due to Rˆ2 only being about 0.48, only 48% of of variations of impressions can be
explained by nr. of ads. But we have to note that the residual standard error is extremely high. Without
doing any ads we will not have any impressions as the y intercept is below 0. For the model between
impressions and clicks the estimated slope beta_1 equals 0.00002 , meaning with every impression that is
done, only a minimal change in clicks is generated. However, due to Rˆ2 only being about 0.45, only 45% of
of variations of clicks can be explained by clicks. Without doing any impressions we will still have around
1,7 clicks as this is the intercept.
Task 2f)

3
beta1_adim<-summary(model_adim)$coefficients[2, 1]
beta0_adim<-summary(model_adim)$coefficients[1, 1]

impressions_per_ad<-function(x) {beta0_adim + beta1_adim * x}


impressions_per_ad(5)

## [1] 219592

beta1_imc<-summary(model_imc)$coefficients[2, 1]
beta0_imc<-summary(model_imc)$coefficients[1, 1]

clicks_per_impression<-function(x) {beta0_imc + beta1_imc * x}


clicks_per_impression(1000000)

## [1] 22.65623

Doing 5 ads will give us 219,592 impressions. Doing 1.000.000 impressions will give us a bit less than 23
clicks.
Task3g)

model_adc<-lm(site$clicks~site$impressions + site$number_ads)
summary(model_adc)

##
## Call:
## lm(formula = site$clicks ~ site$impressions + site$number_ads)
##
## Residuals:
## Min 1Q Median 3Q Max
## -170.32 -3.36 -1.44 -0.01 830.50
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.687e+00 3.800e+00 -0.444 0.657
## site$impressions 2.033e-05 9.267e-07 21.941 <2e-16 ***
## site$number_ads 6.243e-01 6.305e-01 0.990 0.322
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 55.85 on 1212 degrees of freedom
## Multiple R-squared: 0.4517, Adjusted R-squared: 0.4508
## F-statistic: 499.3 on 2 and 1212 DF, p-value: < 2.2e-16

Task 3h) Our model has not really improved. R squared stayed on almost the same level or even decreased
a little compared to d). We can explain around 45% of the data in our model with mainly the impressions,
as # of ads in not significant, so we cannot really on it. Our estimate has also stayed pretty much the same.
Task 3i) Based on our model, impressions are significantly influencing number of clicks therefore we would
advise to manager to increase number of impressions. Positive direct relationship between number of ads
and clicks is not observed. But In our simple regression model we could see that number of ads positively
and significantly influence the impressions. So we could suggest to increase the number of ads so that we
increase impressions and then the page should get more clicks. But further analysis is needed to see if any of
the other variables we have in our dataset can explain more of the clicks. If there is a possibility to directly
increase impressions we should do that instead of going through ads.

You might also like