Professional Documents
Culture Documents
Packages
For the following exercises, you will need tidyverse and AmesHousing packages. First, make sure you
have both installed:
install.packages("tidyverse")
install.packages("AmesHousing")
library(tidyverse)
library(AmesHousing)
For the following exercises1, use the dataset “ames_raw”, from the AmesHousing package. This data
set describes the sale of individual residential properties in Ames, Iowa from 2006 to 2010. It
contains 2930 observations and a large number of explanatory variables (23 nominal, 23 ordinal, 14
discrete, and 20 continuous) involved in assessing home values.
NOTE: All exercises about t-tests should be performed at a 95% confidence interval unless
otherwise stated.
1De Cock, D. (2011). Ames, Iowa: Alternative to the Boston Housing Data as an End of Semester
Regression Project. Journal of Statistics Education, Volume 19, Number 3.
1
Bachelor of Liberal Arts and Sciences
Exercise 1 (1 point)
Load the “ames_raw” data set and convert it to a data frame called population.dat. Select a simple
random sample of size 70 from the population and name it sample.dat. Use the function sample_n().
Example code: sample.dat <- sample_n(population, 70)
Exercise 2 (3 points)
a) Describe the distribution of ground living area (Gr Liv Area) using a histogram plot.
b) Will your distribution be exactly the same as that of other students based on the random
sampling from the population – i.e. for the variable Gr Liv Area?
a) Calculate the 95% confidence interval for the sample mean for the variable Gr Liv Area.
b) Calculate the 99% confidence interval for the sample mean for the variable Gr Liv Area.
c) Does your confidence intervals (in a & b) capture the population mean size of houses (i.e. Gr
Liv Area) in Ames?
d) Perform a one-sample t-test to compare the population mean size of houses to the sample
mean size of houses in Ames (i.e. two-tailed test). Interpret your findings? Note: Use the
function t.test().
a) Suppose that the national average of a house costs $45,000. Is the average price of houses in
Ames greater than the national average (i.e. use the population data set)? Note: this is a one-
tailed test/upper tail test.
b) Do prices of houses with central air conditioning differ from those with no central air
conditioning (i.e. perform an independent two sample t-test and interpret your findings)? Use
the population data set.
Exercise 5 (6 points)
What are the 90% confidence intervals for house prices based on the foundation type (for all the data
– i.e. population)? Hint: use group_by and the summarise functions with a formula.