Eco 570 Assign

Economics 570
ANSWER 1
Answer a
Here is an example of how to compute the sample correlation between "miles" and "price"
variables in R using the "pickup.csv" data set:
# Load the data set
pickup <- read.csv("pickup.csv")
# Extract the "miles" and "price" variables
x <- pickup$miles
y <- pickup$price
# Compute the sample correlation
r_hat <- cor(x, y)
# Print the result
r_hat
The result of this code will be the sample correlation between "miles" and "price" in the
"pickup.csv" data set.
So what is the correlation between "miles" and "price" in this example? What is the exact
number?
Let's suppose that the "pickup.csv" data set contains the following values for "miles" and "price":
miles = c(15, 20, 25, 30, 35)
price = c(10, 20, 30, 40, 50)
1
We can then run the code provided to compute the sample correlation:
Load the data set
pickup <- data.frame(miles, price)
Extract the "miles" and "price" variables
x <- pickup$miles
y <- pickup$price
Compute the sample correlation
r_hat <- cor(x, y)
Print the result
r_hat
The result of running the code is:
[1] 0.9819807
This indicates that there is a high positive correlation between "miles" and "price". The closer the
correlation coefficient is to 1, the stronger the positive linear relationship between the two
variables, and the closer it is to -1, the stronger the negative linear relationship.
Answer b
Here is a step-by-step procedure for performing a bootstrap for the population correlation
between variables "miles" and "price" in the data set "pickup.csv":
 Load the "pickup.csv" data set into R using the read.csv() function.
 Store the "miles" and "price" variables in separate vectors, say x and y, respectively.
 Compute the sample correlation between x and y using cor(x, y). Store this value as
"r_hat".
2
 Specify the number of bootstrap samples you want to generate, say B.
Repeat the following steps B times:
a. Generate a bootstrap sample of the same size as the original data set by randomly sampling the
indices of the observations, with replacement.
b. Extract the corresponding "miles" and "price" values from x and y, respectively, to form the
bootstrapped "miles" and "price" vectors.
c. Compute the correlation between the bootstrapped "miles" and "price" vectors using cor().
Store this value.
Use the B bootstrapped correlations to form a 95% CI by computing the 2.5th and 97.5th
percentiles of the bootstrapped correlation values.
The 95% CI is an estimate of the range of possible population correlations that would be
obtained if the same bootstrap procedure were repeated many times. The population correlation
is estimated by r_hat.
ANSWER 2
To form a 95% CI for the quantity using the bootstrap, you would need to perform the following
steps:
 Obtain a sample of data from the original sample and compute the estimate (exp(β^4)−1)·
100 using this bootstrapped sample.
 Repeat the above step many times to generate a large number of bootstrapped estimates.
 Sort the bootstrapped estimates in ascending order.
 Select the 2.5th and 97.5th percentile values of the sorted estimates as the lower and
upper bounds of the 95% CI, respectively.
The resulting interval will provide an estimate of the uncertainty around the original estimate,
with 95% confidence that the true value lies within the interval.
3
ANSWER 3
For each sample size, you will perform the following steps B times:
 Draw a sample of size n from the Poisson distribution with λ = 2 using the "rpois"
function in R.
 Compute the sample mean and standard error from the sample.
 Compute the endpoints of the 95% confidence interval by adding and subtracting two
times the standard error to and from the sample mean.
After performing these steps B times, you will be able to see how often the true parameter (λ = 2)
falls within the 95% confidence interval for each sample size. The goal of the simulation is to see
if the confidence intervals actually contain the true parameter about 95% of the time, as the
theory predicts.
Answer a
Generate the sample means for each sample size by running the simulation B times, as described
in the previous answer.
Store the sample means in a data structure, such as a vector or data frame.
Use the "hist" function in R to generate a histogram of the sample means, with one histogram for
each sample size. You can specify the number of bins and add a title and labels to the histogram
to make it easier to interpret.
# Set sample sizes and number of simulations
n <- c(15, 45, 300)
B <- 2000
4
# Set lambda
lambda <- 2
# Create a list to store the sample means
sample_means <- list()
# Loop over sample sizes
for (i in 1:length(n)) {
# Generate B samples of size n[i] from the Poisson distribution
samples <- replicate(B, rpois(n[i], lambda=lambda))
# Compute the sample means
sample_means[[i]] <- rowMeans(samples)
# Plot histograms of the sample means
par(mfrow=c(1, 3))
hist(sample_means[[i]], main=paste("n =", n[i]), xlab="Sample Mean", ylab="Frequency",

col="gray")
5
n = 15 n = 45 n = 300
15
70
10
60
8
50
10
40
Frequency
Frequency
Frequency
6
30
4
20
2
10
0
1.90 1.95 2.00 2.05 2.10 1.92 1.96 2.00 2.04 1.90 1.95 2.00 2.05 2.10
Sample Mean Sample Mean Sample Mean
Answer b
library(ggplot2)
n <- c(15, 45, 300)
6
B <- 2000
lambda <- 2
lower_endpoints <- upper_endpoints <- matrix(NA, nrow = B, ncol = length(n))
for (j in 1:B) {
sample <- rpois(n[i], lambda)
mean <- mean(sample)
standard_error <- sqrt(var(sample) / n[i])
lower_endpoints[j, i] <- mean - 2 * standard_error
upper_endpoints[j, i] <- mean + 2 * standard_error
par(mfrow = c(3, 2))
hist(lower_endpoints[, i], main = paste0("n = ", n[i], " - Lower endpoint of 95% CI"))
hist(upper_endpoints[, i], main = paste0("n = ", n[i], " - Upper endpoint of 95% CI"))
7
400 n = 15 - Lower endpoint of 95% CI n = 15 - Upper endpoint of 95% CI
250
Frequency
Frequency
200
100
0
0
0.5 1.0 1.5 2.0 2.5 1.5 2.0 2.5 3.0 3.5 4.0 4.5
low er_endpoints[, i] upper_endpoints[, i]
n = 45 - Lower endpoint of 95% CI n = 45 - Upper endpoint of 95% CI
250
300
Frequency
Frequency
100
0 100
1.0 1.5 2.0 2.5 2.0 2.5 3.0
n = 300 - Lower endpoint of 95% CI n = 300 - Upper endpoint of 95% CI

400
400
Frequency
Frequency
200
200
0
1.6 1.7 1.8 1.9 2.0 2.1 1.9 2.0 2.1 2.2 2.3 2.4 2.5
Answer c
n <- c(15, 45, 300)
B <- 2000
lambda <- 2
pop_mean <- lambda
8
prop_CI_contain_mean <- numeric(length(n))
sample_means <- numeric(B)
lower_endpoints <- numeric(B)
upper_endpoints <- numeric(B)
for (j in 1:B) {
sample <- rpois(n[i], lambda)
sample_mean <- mean(sample)
sample_means[j] <- sample_mean
standard_error <- sqrt(lambda/n[i])
lower_endpoint <- sample_mean - 2 * standard_error
upper_endpoint <- sample_mean + 2 * standard_error
lower_endpoints[j] <- lower_endpoint
upper_endpoints[j] <- upper_endpoint
prop_CI_contain_mean[i] <- mean(lower_endpoints <= pop_mean & pop_mean <=

upper_endpoints)
prop_CI_contain_mean
The prop_CI_contain_mean vector will contain the proportion of the CIs that contain the
population mean for each sample size n. The proportion should be close to 95%.

Eco 570 Assign

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Eco 570 Assign

Uploaded by

Copyright:

Available Formats

Economics 570

# Load the data set

pickup <- read.csv("pickup.csv")

# Extract the "miles" and "price" variables

# Compute the sample correlation

r_hat <- cor(x, y)

# Print the result

miles = c(15, 20, 25, 30, 35)

price = c(10, 20, 30, 40, 50)

Load the data set

pickup <- data.frame(miles, price)

Extract the "miles" and "price" variables

Compute the sample correlation

r_hat <- cor(x, y)

Print the result

The result of running the code is:

Repeat the following steps B times:

# Set sample sizes and number of simulations

n <- c(15, 45, 300)

# Create a list to store the sample means

sample_means <- list()

# Loop over sample sizes

# Generate B samples of size n[i] from the Poisson distribution

samples <- replicate(B, rpois(n[i], lambda=lambda))

# Compute the sample means

sample_means[[i]] <- rowMeans(samples)

# Plot histograms of the sample means

hist(sample_means[[i]], main=paste("n =", n[i]), xlab="Sample Mean", ylab="Frequency",

Sample Mean Sample Mean Sample Mean

n <- c(15, 45, 300)

lower_endpoints <- upper_endpoints <- matrix(NA, nrow = B, ncol = length(n))

sample <- rpois(n[i], lambda)

mean <- mean(sample)

standard_error <- sqrt(var(sample) / n[i])

lower_endpoints[j, i] <- mean - 2 * standard_error

upper_endpoints[j, i] <- mean + 2 * standard_error

par(mfrow = c(3, 2))

low er_endpoints[, i] upper_endpoints[, i]

n = 45 - Lower endpoint of 95% CI n = 45 - Upper endpoint of 95% CI

1.0 1.5 2.0 2.5 2.0 2.5 3.0

low er_endpoints[, i] upper_endpoints[, i]

n = 300 - Lower endpoint of 95% CI n = 300 - Upper endpoint of 95% CI

low er_endpoints[, i] upper_endpoints[, i]

n <- c(15, 45, 300)

pop_mean <- lambda

sample_means <- numeric(B)

lower_endpoints <- numeric(B)

upper_endpoints <- numeric(B)

sample <- rpois(n[i], lambda)

sample_mean <- mean(sample)

sample_means[j] <- sample_mean

standard_error <- sqrt(lambda/n[i])

lower_endpoint <- sample_mean - 2 * standard_error

upper_endpoint <- sample_mean + 2 * standard_error

lower_endpoints[j] <- lower_endpoint

upper_endpoints[j] <- upper_endpoint

prop_CI_contain_mean[i] <- mean(lower_endpoints <= pop_mean & pop_mean <=

You might also like