You are on page 1of 6

Problem 1

Questions:
We are trying to assess whether there is a difference in daytime power consumption on a weekday
versus a weekend in the summer? We are also looking for any differences in daytime power
consumption based on temperature in the summer and if this relationship differs on the weekdays vs
the weekends?
Approach:
I created a fitted linear model of the main effects for temperature and weekday in relation to COAST
(our predictor variable) and included an interaction of the two variables in the model (temperature :
weekday). I then bootstrapped with 10000 simulations to compute the confidence intervals for the
model. From there I created a scatter plot to visualize the date with temperature on the ‘x’ axis and
COAST/power consumption on the ‘y’ and faceted by weekday (1 being weekday and 0 being a non
weekday) in order to visualize any difference between the two. Lastly I created an anova table to
visualize any statistically significant differences between the variables.
Results:
Daytime power consumption, with 95% accuracy, increases by 962.098 megawatts on weekdays vs
weekends with a confidence interval for the weekday coefficient being (406.42, 1509.11). We also see
that with a 95% accuracy, there is a confidence interval for the temperature variable of (496.64, 528.04)
suggesting a difference in megawatts according to temperature increasing by one unit is 512.214.

Above is a scatter plot faceted on weekday displaying the relationship between power consumption and weekday.
Temperature Outside (x) Power Consumption/COAST (y)
D2 R2 R2_improve sd sd_improve pval

Intercept 1 0.000 2120.9

temp 1 0.659 0.659 1238.4 882.52 0.000

weekday 1 0.702 0.431 1157.6 80.86 0.000

temp:weekday 1 0.702 0.000 1157.7 -0.10 0.963


Residuals 5816
The anova table shows that R2 did not improve for the interaction term.

Conclusion:
With 95% certainty, there is a 962.098 megawatt increase on weekdays vs weekends with a confidence
interval of (406.42, 1509.11). In regards to temperature, we are 95% certain that with a one unit
increase in temperature there is a 512.124 increase in power consumption megawatts with a
confidence interval of (496.64, 528.04). However, when the interaction term was added and
bootstrapped into the model, the confidence interval was (-17.98, 19.10) with 95% confidence. Because
this interval captures zero, there is not significant evidence to suggest a difference in power
consumption of the interaction term. Furthermore, we can see from the anova table that R2 did not
improve meaning there is no variability explained by the interaction of temperature and weekday
meaning it is not making a significant difference in this model. Therefore, while power consumption
increases with temperature, the relationship does not seem to differ between weekends and weekdays.

Problem 2
Questions:
We are concluding whether there is a difference in the drop off price associated with an extra 0.1 mile
walking distance from the tower in an adjusted vs non adjusted model? We are also trying to find
whether there is a difference in rent with an additional 100 square feet in an adjusted vs non adjusted
model?
Approach:
I first built a linear regression model on price adjusted for all other variables. I tried bootstrapping to
find the confidence intervals but encountered an error and used the function confint instead to find
these intervals. I then fit the linear models without adjusting for all other variables first modeling rent
based on distance and then rent based on square footage. I found the confidence intervals for these
using the same method as previously mentioned. I then had to account for the units we are using in the
question and converted my distance into tenths of a mile and square footage into 100 feet units by
multiplying and dividing my confidence intervals and intercepts. Lastly, I created scatter plots for the
two unadjusted models to visualize distance (x) vs rent (y) and square footage (x) vs rent (y).
Results:
First question: ​Controlling for all other variables, we can conclude with 95% confidence that rent
decreases on average by $-37.92 with a confidence interval of (-68.07, -7.77) for every tenth of a mile
further you live away from the tower. When we do not adjust, we see that we are 95% confident that
there is a $-63.15 decrease in rent per tenth of a mile distance from the tower with a confidence interval
of (-92.17, -34.13). (ALL UNITS ACCOUNTING FOR 0.1 OF A MILE DISTANCE UNITS)
Second question: ​ Controlling for all other variables, we can conclude with 95% confidence that rent
increases on average by $37.55 with a confidence interval of (3.179, 71.97) for every 100sqft addition.
When we do not adjust, we see that we are 95% confident that there is a $50.96 increase in rent per 100
square feet addition with a confidence interval of (11.04, 90.87). (ALL UNITS ACCOUNTING FOR 100
SQUARE FT UNITS)

Rent vs. Distance from the Tower

The scatter plot above demonstrates the correlation between rent (y) vs distance from the tower (x).
Rent vs. Square Ft

The scatter plot above demonstrates the correlation between rent(y)versus square footage (x).

Conclusion:
With 95% confidence, the drop-off in price for an extra 0.1 mile away from the tower, adjusting for all
variables, is $-37.92 with a confidence interval of (-68.07, -7.77). This differs from the unadjusted
analysis that only regresses on the distance variable that gives us a drop off price of $-63.15 with 95%
confidence and an interval of (-92.17, -34.13). As we can see, unadjusted models provide a good
reference but do not disclose the full story. Both of these confidence intervals do not capture zero
meaning distance has a significant effect on rent. As far as square footage, we can say with 95%
certainty and a confidence interval of (3.179, 71.97) that rent increases by $37.55 when we adjust for all
other variables and have an increase of 100 square feet. Without adjusting we see that with 95%
confidence and an interval of (11.04, 90.87) that rent increases by $50.96 with an addition of 100 sq ft.
Both of these confidence intervals do not capture zero meaning square footage has a significant effect
on rent. There is a significant change in all of the adjusted vs unadjusted values explaining that the other
variables have a recognizable effect on rent prices as well.

Problem 3
Question:
Is there an association between FAIR policies and the variables contributing to the composition of
various ZIP codes?
Approach:
I created a fitted linear model accounting for all other variables including minority fire, age, and income
fitted on policies. I then bootstrapped and ran 10000 simulations on the model to find the confidence
intervals for the model. Finally, I created a histogram to visualize the data
Results:
I found that there are in fact two variables that capture the null (0) and therefore are not significant
factors of change in our model. These are age with a 95% confidence interval of (-0.0004, 0.012) and
income with a confidence interval of (-0.081, 0.567). Minority and fire do not capture the null meaning
they are significant when it comes to influencing the number of FAIR policies in a particular ZIP. We
expect the number of fair policies to increase by 0.008 with a 95% confidence interval of (0.003, 0.0136)
given an increase in the minority population. Furthermore, we expect the number of fair policies to
increase by 0.0227 with a 95% confidence interval of (0.006, 0.0391) given an increase in fires per 100
housing units.

Below is a table of the intercepts when bootstrapping and accounting for all other variables:
(Intercept) minority fire age income
-0.170456545 0.008330393 0.022709223 0.005707462 -0.012083928

The scatter plot demonstrates the correlation between the percentage of minorities present in a ZIP
and the number of FAIR policies present.
Conclusion:
We can conclude from the results that there is an association between the number of FAIR policies and
the racial/ethnic composition of a ZIP as well as fires in that area. This means, the more presence of
minorities or fires, the less access to the private market. We can see this because we expect the number
of fair policies to increase by 0.008 with a 95% confidence interval of (0.003, 0.0136) for minorities and
we expect the number of fair policies to increase by 0.0227 with a 95% confidence interval of (0.006,
0.0391) given an increase in fires per 100 housing units. However, we can see that there is in fact
evidence of discrimination as the presence of minorities means that population has less access to the
private market and more FAIR policies in place. As we can see in the scatterplot, there is a clear upwards
correlation between the two variables, FAIR policies and minorities. Even when adjusting for all other
variables, there is clear evidence suggesting discrimination on the basis of redlining.

You might also like