You are on page 1of 5

STAT 1000 - Assignment 3

Devesh arora 7927685

2023-04-10

Instructions

To properly view the assignment questions, knit this file to .PDF and view the output.
To enter your answers, add code as needed into the R code chunks given below, and, where applicable,
replace the “Delete me; . . . ” and add in your own text response. Be sure when adding in text responses
to never copy-paste symbols from outside of the document. Only use the symbols on your keyboard. Do
not delete the question text, or modify any other part of the code except for the “author” in Line 3. All
numerical and graphical answers must be done using R, unless stated otherwise.
You will have a link in your email that takes you to the Crowdmark submission page. Once you have
completed the worksheet, knit it to .PDF and upload your output to Crowdmark. Also, upload your .Rmd
file to Crowdmark where prompted. To see where your .Rmd file is saved, click File > Save As in the top-left
of your screen. Make sure you set your Name and Student Number in the Author section of this document
(Line 3). Do not alter the title or the date. Please note that if you do not submit a knit .PDF file, you will
be given a grade of zero.
After you knit your assignment to PDF, check your code chunks. If your code at any point runs off the page,
find the nearest comma, click to the right of it, and press Enter (or Return if you are on a Mac). This will
force a break in the code so that it goes onto the next line. All of your code must be readable in the final
submission.
All calculations and output must be visible in the final document, and all text responses should be in complete
English sentences. Your work should be done using the same formatting, functions, and packages as in your
labs and course notes, unless otherwise specified. You may speak to your class mates about ideas and what
functions/optional arguments you may need to use but you may not directly show your code/output to your
classmates.
Your full submission is due by 11:59 p.m. on Monday, April 10. Crowdmark may allow you to submit late,
but you will be given an automatic grade of zero if you do. If you have an issue that you can’t resolve
without someone looking at your work (e.g., you get an error when knitting your document), please see the
Help Centre in 311 Machray Hall.

1
Setup [1 mark]

Before you begin, set your name and student number in Line 3.

0. Load in the BSDA package, and import the Marathon100 dataset. Make sure that the object is named
Marathon100. Make sure you have “Heading” set to “Yes” when you import the data. [1 mark]

Marathon100 <- read.csv("~/Downloads/Marathon100.csv")

The block of code below will isolate a sample of 80 of these observations, which is the dataset that you will
be working with.
After importing the data, replace 1111111 with your seven-digit student ID number in the code chunk below,
and click the green arrow at the top-right hand side of the code chunk. This part is not worth marks, but you
will receive a five-mark deduction on your assignment if it is not completed correctly.

seed.number = 7927685
set.seed(seed.number)
if(!exists("Marathon100")) Marathon100 = data.frame(x=rep(0, 80))
Marathon100 = Marathon100[sample(1:NROW(Marathon100), 80), ]

Make sure you complete this setup stage before beginning your assignment!

Questions [24 marks]

Now that you have completed the setup, the dataset Marathon100 should contain only 80 observations.
Check your environment in the top-right and verify that you see the dataset “Marathon100” and that it says
“80 obs. of 2 variables”.
This dataset contains the ages and run times (in minutes) for 80 runners in the 2017 Boston Marathon.
Your goal is to test whether the mean time of this sample is greater than 231 minutes, at the α = 0.1 level
of significance. The standard deviation of run times is assumed to be 42 minutes.

1. Use z.test to produce a p-value for the appropriate test of significance. [4 marks]

library(BSDA)

## Loading required package: lattice

##
## Attaching package: ’BSDA’

## The following object is masked from ’package:datasets’:


##
## Orange

mu_test = mean(Marathon100$Minutes)
z_test = z.test(Marathon100$Minutes, alternative = "greater", mu = mu_test, sigma.x = 42)

2. Based on the output of this test, give a fully worded conclusion. [2 marks]

2
Fail to reject the null hypothesis, there is not enough evidence to suggest that the mean time is greater than
231 minutes because alpha (0.1) is less than the p-value (0.5).

3. Give an interpretation of the p-value for this test, in the context of this question. [2 marks]

we fail to reject the null hypothesis and conclude that there is evidence to suggest that the mean time of the
sample is greater than 231 minutes.

4. Use z.test to produce a 90% confidence interval for the true mean run time, based on the provided
sample. Also, type the interval below to two decimal places. [3 marks]

z.test(Marathon100$Minutes, alternative = "two.sided", sigma.x = 42 , conf.level = 0.9)

##
## One-sample z-Test
##
## data: Marathon100$Minutes
## z = 52.457, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 90 percent confidence interval:
## 238.6012 254.0488
## sample estimates:
## mean of x
## 246.325

lower bound = 238.60 and upper bound = 254.05.

5. Provide an interpretation of the confidence interval above, in the context of this question. [2 marks]

we are 90% confident that the true mean run time of all runners in the 2017 Boston Marathon falls between
238.60 and 254.05 minutes, Since the interval does not contain the value 231 minutes (which was the null
hypothesis mean in the earlier hypothesis test), this provides additional evidence to support the conclusion
that the mean run time of the sample is greater than 231 minutes, Therefore, the evidence from the confidence
interval supports the conclusion that the runners in the 2017 Boston Marathon tended to have a longer run
time than 231 minutes on average.

6. Could the interval from Question 4 have been used to conduct the hypothesis test from Question 1?
If so, what would the conclusion be, and why? If not, why not? [2 marks]

we would conclude that the null hypothesis mean falls outside the 90% confidence interval. Therefore, we
would reject the null hypothesis that the mean run time of all runners in the 2017 Boston Marathon is 231
minutes or less, and we would conclude that there is sufficient evidence to support the alternative hypothesis
that the mean run time is greater than 231 minutes, In other words, we can use the confidence interval to
make the same conclusion as in the hypothesis test from Question 1, which was that the mean run time of
the sample is greater than 231 minutes at the 0.1 level of significance.

7. Use the nsize function to determine the sample size required to calculate a 90% confidence interval for
the true mean run time, with a margin of error of under 5 minutes. [3 marks]

3
nsize(5, sigma = 42, conf.level = 0.90)

##
## The required sample size (n) to estimate the population
## mean with a 0.9 confidence interval so that the margin
## of error is no more than 5 is 191 .

8. Using only the output from Question 7, what would the minimum sample size be to produce a 90%
confidence interval for the true mean run time, with a margin of error of under 3 minutes? [1 mark]

Hint: Just use R as a calculator here.

nsize(3, sigma = 42, conf.level = 0.90)

##
## The required sample size (n) to estimate the population
## mean with a 0.9 confidence interval so that the margin
## of error is no more than 3 is 531 .

Suppose that a random sample of 50 runners is taken, and a mean age of 45.89 is obtained (knit this file to
PDF to see the mean age) Suppose that your goal is to test whether the mean age of 2017 Boston Marathon
runners differs from 44 years, at the α = 0.01 level of significance. Assume that the standard deviation of
ages is 11 years.

9. Using R as a basic calculator, determine the test statistic for the appropriate test of significance. [1
mark]

xbar <- round(runif(1, 45, 46), 2) # sample mean age


mu <- 44 # hypothesized population mean age
s <- 11 # sample standard deviation of ages
n <- 50 # sample size

t <- (xbar - mu) / (s / sqrt(n))


t

## [1] 1.047804

10. Use pnorm to determine the p-value of the appropriate test of significance.

1 - pnorm(1.12, 0, 1)

## [1] 0.1313569

Suppose you wish to construct a 99% confidence interval for the true mean age of 2017 Boston Marathon
runners.

11. Use qnorm to determine the critical value (i.e., the z-star value) for the 99% confidence interval, to 6
decimal places.

4
z_star <- qnorm(0.995, lower.tail = FALSE)
z_star

## [1] -2.575829

12. Use R as a calculator to determine the margin of error for the 99% confidence interval. [1 mark]

n <- 50# insert sample size here


s <- 11# insert standard deviation here

margin_of_error <- z_star * (s / sqrt(n))


margin_of_error

## [1] -4.00705

13. Could the resulting confidence interval be used to conduct the hypothesis test from Questions 9 – 10?
If so, what would your conclusion be, and why? If not, why not? [2 marks]

Yes, the resulting confidence interval could be used to conduct the hypothesis test from Questions 9-10,If the
hypothesized population mean of 44 years falls within the 99% confidence interval for the sample mean age,
we would fail to reject the null hypothesis and conclude that there is not enough evidence to suggest that the
true mean age of 2017 Boston Marathon runners differs from 44 years at the 0.01 level of significance.If the
hypothesized population mean of 44 years falls outside the 99% confidence interval for the sample mean age,
we would reject the null hypothesis and conclude that there is enough evidence to suggest that the true mean
age of 2017 Boston Marathon runners differs from 44 years at the 0.01 level of significance.

You might also like