You are on page 1of 4

Bachelor of Liberal Arts and Sciences

Dealing with Numerical Information


Exercise Sheet Week 6 – Confidence intervals & t-tests
(31 points)

Packages

For the following exercises, you will need tidyverse and AmesHousing packages. First, make sure you
have both installed:

 install.packages("tidyverse")

 install.packages("AmesHousing")

Then load the package:

 library(tidyverse)

 library(AmesHousing)

For the following exercises1, use the dataset “ames_raw”, from the AmesHousing package. This data
set describes the sale of individual residential properties in Ames, Iowa from 2006 to 2010. It
contains 2930 observations and a large number of explanatory variables (23 nominal, 23 ordinal, 14
discrete, and 20 continuous) involved in assessing home values.

NOTE: All exercises about t-tests should be performed at a 95% confidence interval unless
otherwise stated.

Some variables considered in this exercise include:

1. Gr Liv Area: Above grade (ground) living area square feet

2. SalePrice: Price at which property was sold.

3. Central Air: Central air conditioning (coded as: Y = Yes, N = No)

1De Cock, D. (2011). Ames, Iowa: Alternative to the Boston Housing Data as an End of Semester
Regression Project. Journal of Statistics Education, Volume 19, Number 3.
1
Bachelor of Liberal Arts and Sciences

Exercise 1 (1 point)

Load the “ames_raw” data set and convert it to a data frame called population.dat. Select a simple
random sample of size 70 from the population and name it sample.dat. Use the function sample_n().
Example code: sample.dat <- sample_n(population, 70)

Exercise 2 (3 points)

a) Describe the distribution of ground living area (Gr Liv Area) using a histogram plot.

b) Will your distribution be exactly the same as that of other students based on the random
sampling from the population – i.e. for the variable Gr Liv Area?

Exercise 3 (11 points)

a) Calculate the 95% confidence interval for the sample mean for the variable Gr Liv Area.

b) Calculate the 99% confidence interval for the sample mean for the variable Gr Liv Area.

c) Does your confidence intervals (in a & b) capture the population mean size of houses (i.e. Gr
Liv Area) in Ames?

d) Perform a one-sample t-test to compare the population mean size of houses to the sample
mean size of houses in Ames (i.e. two-tailed test). Interpret your findings? Note: Use the
function t.test().

Exercise 4 (10 points)

a) Suppose that the national average of a house costs $45,000. Is the average price of houses in
Ames greater than the national average (i.e. use the population data set)? Note: this is a one-
tailed test/upper tail test.

b) Do prices of houses with central air conditioning differ from those with no central air
conditioning (i.e. perform an independent two sample t-test and interpret your findings)? Use
the population data set.

Exercise 5 (6 points)

What are the 90% confidence intervals for house prices based on the foundation type (for all the data
– i.e. population)? Hint: use group_by and the summarise functions with a formula.

You might also like