You are on page 1of 4

PUB HLT 200B: Winter 2024

Biostatistics Assignment 4

Supporting Files
Data: PATH_W4W5.csv
R Script: PUB HLT 200B – Biostatistics Assignment 4 Script

In today’s assignment we will once again be using data collected through the Population Assessment of
Tobacco and Health (PATH) study. Specific information on the PATH_W4W5.csv file and the variables it
contains can be found at the end of this document. Please refer to that information while completing
the present assignment.

1. When COVID-19 vaccines were first being evaluated among populations of children ages 5-11,
the primary endpoint was a measure of immunogenicity (the ability of the body to invoke an
immune response). For each participating child, SARS-CoV-2 neutralizing geometric mean titers
(GMTs) were measured in serum samples obtained at 7 days after receiving the second dose of
either the vaccine or the placebo. Assume the results of one such study are as follows:

GMTs
Sample No. Participants Mean Standard Deviation
Vaccine 263 1197.6 995.2
Placebo 254 1016.5 981.3

a. Test whether the variance of GMT levels differs significantly across the vaccine and
placebo populations. Provide a statement interpreting the result of your test.

b. Test whether mean GMT levels differ significantly across the two populations (use a
two-sided test with α =0.05 ).
i. What is the name of the test that you used?
ii. What is the value of the test statistic?
iii. Do you reject the null hypothesis?
iv. Provide a statement interpreting the result of your test.

2. Use the R code provided to answer each of the following questions which relate to Exercise 3 of
Assignment 2. Let ‘number of days’ refer to the response provided to the item ‘In past 30 days,
number of days used an electronic nicotine product.’ Conduct the appropriate hypothesis test
and use α =0.05 . For each item (a)-(d) below, you only need to provide answers to (i) and (ii).
a. Is the mean number of days higher among older youth relative to younger youth during
Wave 5?
i. Provide the test statistic (obtained from R).
ii. Do you reject or fail to reject the null hypothesis?
b. Does the mean number of days differ between male and female youth during Wave 5?
i. Provide the test statistic (obtained from R).
ii. Do you reject or fail to reject the stated null hypothesis?
c. Does the mean number of days differ between male and female youth during Wave 4?
i. Provide the test statistic (obtained from R).
ii. Do you reject or fail to reject the stated null hypothesis?
d. Does the mean number of days differ between youth who used tobacco flavored
products and those who did not during Wave 4?
i. Provide the test statistic (obtained from R).
ii. Do you reject or fail to reject the stated null hypothesis?

3. Using the PATH data, you will now ask and answer your own question similar to 2.d. First, fill in
the three blanks:

Is the mean number of days ___[BLANK1]___ among youth who used ___[BLANK2]___ flavored
products relative to youth who did not during ___[BLANK3]___.

BLANK1: Circle/highlight one of the following:

Higher Lower Different

BLANK2: Circle/highlight one of the following:

Menthol or mint Clove or spice Fruit Chocolate


An alcoholic drink A non-alcoholic drink Candy, desserts or other sweets

BLANK3: Circle/highlight one of the following:

Wave 4 Wave 5

Now, answer the following items (a.-c.) for the options you selected above.
a. Is this a one or two-sided test?
b. Did you assume equal variances?
c. Do you reject or fail to reject the null hypothesis?

4. Use the R code provided to complete a small simulation study comparing power across two two-
sample tests. Specifically, we compare power between the paired t-test and an independent
samples t-test. We will do this while assuming different levels of positive correlation between
samples. Generally speaking, positive correlation indicates the propensity for two variables to
change together in the same direction. It is reasonable to assume, for instance, that two
observations taken on the same individual at different points in time will be positively
correlated.

A brief description of the simulation study:


- Simulate 100 observations from a normal distribution, N (0 ,9)
- Simulate 100 observations from a normal distribution, N (1 , 9)
- Simulate the samples above in such a way that the data are actually paired (100 pairs of
observations) with different relative levels of correlation:
o No correlation
o Modest positive correlation
o High positive correlation
- For each of the three scenarios, we replicate the above steps 1,000 times and, for each, we
conduct both a paired samples t-test and an independent samples t-test. We count the total
number of instances (out of 1,000) in which the tests successfully identified a statistically
significant difference in means. This reflects the anticipated power of each test.

Based on your simulation study results, complete the table below:

Anticipated Power
No Modest Positive High Positive
Correlation Correlation Correlation
Paired Samples t-Test
Independent Samples t-
Test

a. When the paired samples are not correlated, is power higher, lower, or about the same
when using the independent samples t-test relative to the paired t-test?
b. When the paired samples are positively correlated, is power higher, lower, or about the
same when using the independent samples t-test relative to the paired t-test?
The Population Assessment of Tobacco and Health (PATH) Study is a large, long-term study of
tobacco use and health in the US. More detailed information can be found here:
https://pathstudyinfo.nih.gov/landing. To date, five PATH data collection waves have been
completed and we will be working with a very limited subset of the publicly-available data
consisting of a select set of variables appearing on the Wave 4 (Dec. 2016—Jan. 2018) and Wave
5 (Dec. 2018–Nov. 2019) questionnaires administered to youth and their parents. Attempts
were made to obtain responses from the same individuals during both waves. Below is a
description of the variables contained in the PATH_W4W5.csv file.

Variable Description
Name/Suffix*
PERSONID ID number uniquely identifying a youth participant
WAVE4 Indicator of whether the youth completed the Wave 4 questionnaire (1 = Yes, 0 =
No)
WAVE5 Indicator of whether the youth completed the Wave 5 questionnaire (1 = Yes, 0 =
No)
_YV1022 In past 30 days, number of days used an electronic nicotine product
_YV1026 Average number of times you pick up your electronic nicotine product to use it for
one or more puffs on days that you use
_YV1027 Number of puffs you take each time you pick up your electronic nicotine product
to use it
In past 30 days, used [electronic nicotine products/electronic nicotine
cartridges/e- liquid] flavored to taste like:
_YV1131_01 Tobacco-flavor
_YV1131_02 Menthol or mint
_YV1131_03 Clove or spice
_YV1131_04 Fruit
_YV1131_05 Chocolate
_YV1131_06 An alcoholic drink
_YV1131_07 A non-alcoholic drink
_YV1131_08 Candy, desserts or other sweets
_YV1131_09 Some other flavor
R_Y_AGECAT2 Age range when interviewed (2 levels: 12 to 14 years, 15 to 17 years)
R_Y_SEX Youth gender
_YOUTHTYPE Youth type classification (continuing youth, aged-up youth, new cohort youth)
*Prefix “R04” is added to variables from the Wave 4 questionnaire and “R05” is added to variables
from the Wave 5 questionnaire.
Special Values: -5 = Improbable response removed; -7 = Refused; -8 = Don’t know; -9 = Missing – Not
ascertained

You might also like