You are on page 1of 3

R Assignment 4

Dave Riegert

3/18/23

Assignment Description

Due Date: Friday, March 31 by 11:59pm


Objective: Use ANOVA amd linear regression with parametric and non-parametric methods for
inference.
Requirement: Answer the attached questions.

• Hypothesis tests should follow the steps outlined in class.


– Unless indicated, use the parametric approach if conditions allow, otherwise use a non-
parametric approach.
• Answers should include the R code used and also be presented fully.
– You should do your best to present your answers clearly and concisely.
– If you have any questions at all, please post them to Teams or the discussion forum on
Blackboard.

Marks: 36 marks + 2 Bonus

• Q1: 7 points (5 + 2)
• Q2: 15 points (5 + 3 + 1 + 4+ 2) + 2 BONUS
• Q3: 11 points (4 + 5 + 2)
• Formatting: 3 points

1
Question 1 - Red “Die” No. 40?

Required Data: red_dye40.csv


The safety of Red No. 40 (a dye) was tested using a mouse study. Groups of mice were provided
with none of the dye (control group), a low, medium, or high amount and the time to death was
recorded. This data is contained in the red_dye40.csv file.

a) Is there any difference between the time to death? Conduct a hypothesis test.
b) If there is a difference, which groups are different?

Ÿ Warning

This data is in a “wide” format and contains NA values. You will need to do some “prepro-
cessing” on the data before performing the analysis as a result.

Question 2 - Ground Water

Required Data: groundwater.csv


Water treatment plants add bicarbonate to water in order to keep microorganisms in the system
happy and healthy.
The data in groundwater.csv is a sample of pH is measured on a logarithmic scale from 0 to 14
and bicarbonate levels are measured in parts per million (ppm)

Parametric

a) Use a linear model and parametric method to determine if there is a relationship between
bicarbonate levels and pH in the water. (Still check and comment on assumptions, but use
a parametric method regardless!)
b) Plot the data and regression line. Include appropriate labels and title. Bonus for including
confidence and prediction intervals on this plot1 [2 marks]. where is the code
c) Provide a 95% parametric confidence interval for the true slope parameter in the linear
model.

Non-Parametric

d) Use a linear model and non-parametric method to determine if there is a relationship


between bicarbonate levels and pH in the water.
e) Provide a 95% non-parametric confidence interval for the true slope parameter in the linear
model.

1
We’re going to talk about these on March 25th.

2
Attention

For parts a) and c), use diagnostic plots as discussed in lecture to check assumptions. Com-
ment on these plots and indicate whether a parametric approach is appropriate. Use the
parametric approach even if it seems that conditions are violated.
You only need to check assumptions once, in part a), but should refer back to them in part
c) as part of the process.

Quesiton 3 - Fawning Over Models

Required Data: antelope.csv


The data provided in the antelope.csv file contains:

• the spring fawn count (divided by 100)


• the size of the adult antelope population (divided by 100)
• annual precipitation (in inches)
• winter severity index (1=mild, 5=severe)

a) Fit the full model (i.e., fawn ∼ adult + precip + severity);

• Provide diagnostic plots, the summary, model formula, and brief discussion of the model fit.
2
b) Fit the quadratic polynomial model: fawn ∼ adult + adult

• Provide diagnostic plots, the summary, model formula, and brief discussion of the model fit.
• Also include the model plotted on top of the original data used in this model.

c) Which model is better? Why?

You might also like