PE and Me

Impact of Physical Education on Physical
Health and Weight Awareness of Adolescents

Erica Wong, Cindy Kang, Abby Vogel

Final Project
Stat 152
May 11, 2017
Contents

1 Introduction 3
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Analysis Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Survey Design 6
2.1 NHANES Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Public Release Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 Design Element Definitions . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Design Elements of Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Exploration of Design Elements . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Methodology 13
3.1 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Data Merging and Cleaning . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3 Fixing Missing Data Values . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4 Results 19
4.1 Effect of Participation in PE on Physical Health . . . . . . . . . . . . . . . . 19
4.2 Effect of Participation in PE on Weight Awareness . . . . . . . . . . . . . . 20
4.3 Effect of Frequency of PE on Physical Health . . . . . . . . . . . . . . . . . 21
4.4 Effect of Frequency of PE on Weight Awareness . . . . . . . . . . . . . . . . 22
4.5 Effect of Enjoyment of PE on Physical Health . . . . . . . . . . . . . . . . . 24
4.6 Effect of Enjoyment of PE on Weight Awareness . . . . . . . . . . . . . . . . 26

5 Conclusions 28
5.1 Caveats and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.2 Next Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
CONTENTS 2

6 Appendix 32
Introduction

In recent decades, America has grown into its reputation as one of the top ten fattest countries
in the world. According to the 2013-2014 National Health and Nutrition Examination Survey
(NHANES), the obesity rate of American adults is at a staggering 38%, compared to a global
average of 13% [10]. In particular, childhood obesity rates (ages 2-19) have nearly tripled
since the early 1980’s and have hovered around 17% for the past ten years. Rates have been
declining among 2 to 5-year-olds, held stable among 6 to 11-year-olds, and increasing among
12 to 19-year-olds [9]. The Center for Disease Control and Prevention (CDC) attributes
much of these trends to unhealthy eating habits, excessive sedentary activity, and lack of
regular physical activity [5].
In response to these alarmingly high rates of both adult and childhood obesity across the
nation, the United States Department of Agriculture (USDA) has taken steps to promote
portion control and healthy food choices among the American public. In 2005, the dietary
system MyPyramid was released, and in 2011 it was later replaced by MyPlate [3]. Both of
these dietary guidelines served as a part of a larger communication initiative encouraging a
shift towards more healthful eating habits. However, in November of 2015, it was revealed
that results of the 2013-2014 CDC survey indicated no significant decrease in obesity rates,
which remained at 17% for youth and 36% for adults [3].
MyPyramid and MyPlate were federal attempts to improve the eating habits of the Amer-
ican public, but many are wondering why no federal regulations have been imposed to en-
courage regular physical activity and exercise, particularly in youth. The question of whether
or not physical education classes should be mandatory in schools has sparked widespread
national debate, with proponents arguing that such classes not only foster healthy and ac-
tive lifestyles, but also drastically improve the physical and mental health of adolescents. In
this study, we examine data from the 2013-2014 National Health and Nutrition Examina-
tion Survey (NHANES) to determine whether or not there exists significant evidence that
physical education classes are correlated with better physical health and weight awareness
of American adolescents aged 12-15.
INTRODUCTION 4

1.1 Background
Body Mass Index (BMI) is a value derived from weight and height measurements that at-
tempts to quantify the amount of tissue mass (muscle, fat, and bone) in an individual and
place that individual in one of four distinct weight categories: underweight, healthy weight,
overweight, and obese [4].

The formula for calculating an individual’s BMI is given by:

2
BMI = ( Weight (lb)
Height (in)
) ∗ 703

Additionally, standard weight categories defined by BMI values are defined as in Table 1.1
below:

Weight Category Range of BMI Values

Underweight Below 18.5
Normal Weight 18.5-24.9
Overweight 25.0-29.9
Obese 30.0 and above

Table 1.1: Weight Categories by BMI

1.2 Research Question
The purpose of this study is to determine the material effects of physical education classes
on American adolescents aged 12-15. Constrained by relevant variables available in the
2013-2014 NHANES dataset, we have chosen to address two primary questions:

1. Does the likelihood of an adolescent being of healthy weight (as defined by BMI) vary
by participation in a physical education class?

2. Does an adolescent’s weight awareness vary by participation in a physical education
class? That is, does the proportion of adolescents who categorize themselves in the
correct weight class (as defined by BMI) vary by participation in a physical education
class?
INTRODUCTION 5

Additionally, we consider the impact of two other variables that may or may not affect
the significance of our results: frequency of physical education class per week and enjoyment
of physical education class. These specific sub-questions are stated below:

1. Does the likelihood of an adolescent being of healthy weight (as defined by BMI) vary
by frequency of physical education class?

Does an adolescent’s weight awareness vary by frequency of physical education class?

2. Does the likelihood of an adolescent being of healthy weight (as defined by BMI) vary
by enjoyment of physical education class?

Does an adolescent’s weight awareness vary by enjoyment of physical education class?

1.3 Analysis Summary
Based on initial graphical analysis of our data and results of hypothesis tests including F-
tests and chi-squared goodness-of-fit tests, we discovered that distribution of adolescents
in the four defined BMI weight categories did not differ significantly across existence or
frequency of physical education classes, but was significantly associated with the enjoyment
of PE. Additionally, weight awareness did not differ significantly across existence, frequency,
or enjoyment of physical education classes. Alongside these findings, we also discovered a
few relationships that clear up some general public misconceptions regarding the topic of
weight category and PE. For one, there is no proven association between participation in PE
classes and increased physical health in adolescents, despite the mandatory quota of weekly
physical activity. Secondly, the idea that PE is only liked/dominated by athletic students
(typically of normal weight) is entirely false, given that a large proportion of adolescents
who are overweight or obese also seem to really enjoy PE.
Survey Design

2.1 NHANES Overview
The National Center for Health Statistics (NCHS) has conducted the National Health and
Nutrition Examination Survey (NHANES) since 1971, with annual surveys occurring since
1999. NHANES seeks to collect and track data on the prevalence and risk factors of diseases
by using complex, multistage national sampling. The target population of this survey is the
entire civilian resident population of the United States, with notable exceptions of groups
such as institutionalized individuals and active military personnel.
Annually, 5000 individuals are sampled and examined as a part of NHANES. Once se-
lected into the sample, each subject is to be screened, interviewed, and administered a phys-
ical and dental examination. During the interview stage, subjects are asked questions about
their demographic characteristics as well as health, socioeconomic and nutritional habits and
behaviors. Physical and dental examinations are performed at Mobile Examination Centers
(MECs) and information on height, weight, and basic vitals are recorded. In addition, blood
and urine samples are collected for further analyses. It should be noted that not all sampling
units in the NHANES survey agree to partake in the physical examination portion of the
exam, even if their demographic and interview data are collected.

2.1.1 Public Release Data
Since its beginnings in the late 1900’s, NHANES has become a leading source of public health
data, with a variety of different surveys administered each year. Data is available to the
public, and is widely studied and analyzed by governmental and non-governmental health and
policy experts. To protect the privacy of sampled individuals due to the sensitive information
collected, NHANES takes extensive measures to ensure confidentiality of respondents. Public
data is made available on a 2-year basis to reduce the likelihood of sensitive survey data being
connected to an individual respondent. In addition, pseudo-PSUs, which are created and
assigned to groups of SSU’s, are included in released data sets in place of original PSU,
SURVEY DESIGN 7

census block, and household information. One indirect consequence of this is reduction
in the variance of estimates, as more PSU’s are included in each release. Because of this
modification in variances, NHANES also releases ”Masked Variance Units” (MVUs) in their
public data set [8]. The 2013-2014 NHANES data used in this study consists of 15 masked-
variance strata and 30 masked-variance primary sampling units [2]. Rather than treating
the survey design as a four-stage sample, the MVUs’ design allows us to treat the survey as
a two stage sample and was designed to ”closely approximate the variances that would have
been estimated using the ’true’ design variables” [7].
It should also be noted that NHANES provides both interview and exam weights due to
two different methods of data collection. For example, in our project, we have information
that was collected via examination (BMI) and information that was collected by interview
(responses about PE). However, since there was not a 100% response rate for everyone that
was attempted to be interviewed/examined, weights need to be adjusted to account for non-
response. Additionally, the CDC instructs that ”the examination weights should be used
exclusively for analyses of data from the examination, or in conjunction with the interview
data” [8].

2.1.2 Design Element Definitions
• PSU: Primary Sampling Unit, the classification of sampling units in the first stage of
a cluster sample. The PSUs in this design are counties and county groups. A subset
of PSU are sampled.

• SSU: Secondary Sampling Unit, the classification of sampling units in the second stage
of a cluster sample. The SSUs in this design are the groups of Census Blocks. A subset
of SSU are sampled.

• Stratum: A grouping used in sampling based on characteristics. Strata have low
within-group variability and high between-group variability. In this sample, stratifica-
tion was based on geographic location and urban and rural population. All strata are
sampled.

• Census Block: The finest grouping used by the US Census bureau in the Decennial
Census. In highly-populated areas Blocks are typically city blocks, but can be much
larger geographic in rural areas.

• Sample Weight: The reciprocal of the inclusion probability of a sample unit. The
sample weight is the number of people that a unit in the sample represents in the
population.
SURVEY DESIGN 8

• Sample Unit: An individual unit that can be sampled in each stage of a sample. The
sample unit of the final stage of this in design is an individual person.

• Observation Unit: The unit which data is taken on. The observation unit of this
design is an individual that has been selected.

2.2 Design Elements of Survey
The NHANES survey uses a four-stage, unequal probability sampling design with stratifi-
cation. In addition, results of the NHANES survey are kept reliable across a specified set
of domains, including age, sex, and origin. A summary table of each of the four stages of
sampling is displayed below in Table 2.1, and discussed in further detail below.

Stage Unit Stratification Method
1 Counties Health & Urban/Rural Comp. PPS
2 Census Blocks Geographic Region PPS
3 Dwelling Units (DU) Equal Prob
4 Individual Reaching Target Rates

Table 2.1: Summary of the Four Stages of Sampling

The first stage samples PSUs from all counties in the Unites States (aggregated contigu-
ously if not sufficiently large). All PSUs were stratified into groups based on health and
urban and rural composition of the county. States were aggregated into 5 groups based on
”derived health factors”, geography, and population size to create homogeneous strata. A
table of how the states were divided can be found in table 2.2.

State Group State
1 CT, HI, IA, MA, MN, ND, NH, NJ, NY, RI, UT, VT, WA
2 CA
3 AK, AZ, CO, FL, ID, IL, KS, ME, NE, NM, OR, SD, VA, WI, WY
4 DE, IN, MD, MI, OH, PA, TX
5 AL, AR, DC, GA, KY, LA, MO, MS, NC, NV, OK, SC, TN, WV

Table 2.2: State Groupings

Next, counties within each group were stratified by urban and rural composition [7]. Of
the 13 major strata and 52 minor strata in the two-year data, two PSU’s per major strata (of
unique minor strata) are selected. These PSU’s were sampled with probability proportional
SURVEY DESIGN 9

to size (PPS), with a select few PSU’s included with probability of 1, determined by the
size of the PSU. Probability for non-certain selection was also determined by the size of the
PSU, with correction to reduce the number of PSU’s selected to both the 2007-2010 and
2011-2014 NHANES [7].
The second stage sampled geographic area segments of census blocks and aggregated
combinations of census blocks. Segments were selected with PPS to create approximately
equal sample sizes per PSU and SSU. Population size of each segment is based off of estimates
from the census of 2000 and updated estimates from the American Communities Survey.
In the third stage, a sample of all households, or Dwelling Units (DUs), within selected
segments is taken. Probability of selection in this stage was mostly determined by domains
from which NHANES intended to over-sample due to low frequencies in the population. In
2013-2014, the over-sampled population consisted of ”Hispanic persons, non-Hispanic black
persons, non-Hispanic non-black Asian persons, low-income, non-Hispanic, non-black, non-
Asian, white, and other persons (at or below 130% of the federal poverty line)” [7].
The final stage of sampling selected individuals within households. Here individuals
were selected in order ”maximize the average number of sampled participants per sample
household” while trying to meet the target sample size based on sex, age, race and Hispanic
origin, and income [7]. Table 2.3 indicates what the target size for some of the categories
are.
Sampled Unit Number
Study Locations 15
Segments 360
DUs to be screened 13,529
Households to be Screened 11,500
Sampled persons 6,888
Examined persons 5,000

Table 2.3: Expected Annual Sample Sizes

2.3 Exploration of Design Elements
Sampling rates were chosen to meet sample sizes of each sub-domain. Projected sub-domain
size was based on Census and ACS data to have a representative sample. Some demographic
groups were purposefully over-sampled and this is reflected in the sample rates.
Sample weights reflect the number of people that each person in the sample represents.
They are calculated as the reciprocal of the inclusion probability and adjusted by NHANES
SURVEY DESIGN 10

to account for non-response and post-stratification to demographic rates. Weights for the
interview and examination are different because a participant could respond in the interview
and fail to go to an MEC. As mentioned previously, NHANES specifies exam weights to be
used for analysis that include data from MEC.

Figure 2.1: Exam Weight Distribution

In the range of the smaller weights, the data looks approximately normal. However, there
are a lot of extreme values and there is a long right tail in the distribution. The average
exam weight was 24,067.91 and the median exam weight is 15,869.53. The largest exam
weight was of an 11 year old Female in PSU 2 and stratum 111.

Figure 2.2: Exam Weight by Pseudo-Stratum

There is high variability of the median exam weights by the pseudo-stratum. This is
expected as the survey design reduces variance when the strata are homogeneous within but
SURVEY DESIGN 11

very different from each other.

Figure 2.3: Exam Weight by BMI

Median weights are consistent along each of the BMI classifications. The under weight
classification has the lowest density of high exam weight responses.

Figure 2.4: Exam Weight by PE Frequency

Comparing the sum of the weights to the distribution of the weights yields an interesting
perspective of the survey design. The sum of the weights of subjects with PE five times per
week represents the highest proportion of the weighted data. This subset has approximately
the same median weight but far more upper-bound outliers for exam weight. Those with
PE five times per week both have the highest proportion of the sample and also represent
SURVEY DESIGN 12

the greatest number of people in the population. Here we also notice that there is a column
with NA values, how to correct for that is addressed in the methodology section.

Figure 2.5: Exam Weight by PE Frequency
Methodology

3.1 Data Description
Data for this study was obtained from the Centers for Disease Control and Prevention (CDC)
website. Specifically, we made use of four data sets within the 2013-2014 National Health
and Nutrition and Examination Survey (NHANES) results:

1. Demographic Variables (DEMO H.XTP)

2. Body Measures (BMX H.XPT)

3. Physical Activity (PAQ H.XPT)

4. Weight History Youth (WHQMEC H.XPT)

For replication purposes, these datasets can be found at https://wwwn.cdc.gov/Nchs/
Nhanes/continuousnhanes/default.aspx?BeginYear=2013.

3.2 Data Merging and Cleaning
Each of the four datasets mentioned above were loaded into R using the sasxport.get
function from the Hmisc package. Using the plyr and dplyr packages, these four datasets
were merged by the seqn variable–a unique index number for each observation unit included
in the survey.
From here, data cleaning was completed in a series of three steps outlined below:

1. Selecting and renaming relevant variables

Using the select function from the dplyr package, we subsetted our merged data
frame for only relevant columns. We then renamed these variables using the rename
function from the same package into more generally comprehensible names. Original
METHODOLOGY 14

variable names, modified variable names, and corresponding variable descriptions of
selected columns are displayed below in Table 3.1 on the following page.

Original Var. Name Modified Var. Name Variable Description
seqn id Respondent ID
wtint2yr int wt Interview Sample Weights
wtmec2yr exam wt Exam Sample Weights
sdmvpsu psu Masked Variance Pseudo PSU
sdmvstra stratum Masked Variance Pseudo Strata
riagendr gender Gender
ridageyr age Age at Time of Survey
bmdbmic bmi Weight Class (as defined by BMI)
whq030m opinion wt Self-Categorization of Weight Class
paq744 pe yn Do you have PE class?
paq746 freq pe How often do you have PE?
paq750 enjoy pe Do you enjoy PE?

Table 3.1: Variables Selected

2. Recoding Factor Levels

Factor variables in the original data set were coded with integer levels that held no
contextual meaning without use of the code book. For purposes of convenience, we re-
coded these factor levels into more comprehensible labels using the mutate function in
the dplyr package. The re-coded variables, original factor levels, and modified labels
are displayed in Table 3.2 on the previous page.

3. Filtering for desired rows

Our main explanatory variables of interest in this study were the interview questions
related to existence, frequency, and enjoyment of physical education classes (pe yn,
freq pe, enjoy pe). Therefore, because the NHANES survey results only reported
values of these variables for adolescents between the ages of 12 and 15, we filtered our
entire data frame to include only respondents of ages 12-15. Completion of this step
using the filter function in the dplyr package left us with 737 observation units.

3.3 Fixing Missing Data Values
Of the remaining 737 rows of our data frame, we noted two main problems:
METHODOLOGY 15

1. Exam weights of zero

2. Item non-response

With regards to exam weights of zero, closer inspection of the data revealed that exam
weights equalled zero when a survey respondent completed the interview portion of the survey
but did not have any data for the exam portion. We considered this as unit non-response with
regards to the examination portion of the survey. With no examination data to cross-analyze
with interview data, it is not possible to draw any conclusions regarding impact of physical
education on physical health and weight awareness from these responses. Additionally, it
did not make sense to impute exam data for these respondents based on interview data,
as the relationship between actual and perceived weight categories is a primary variable of
interest in our study. Therefore, assuming these instances of unit non-response were missing
completely at random (MCAR), we removed all rows corresponding to survey respondents
who had exam weights of zero. Completion of this step left us with 713 rows in our data
frame.
Moving on to issues of item non-response, we first noted that one particular variable with

Variable Original Levels Modified Labels
1 underweight
2 normal
bmi
3 overweight
4 obese
1 overweight
2 underweight
opinion wt 3 normal
7 refused
9 unsure
1 yes
2 no
pe yn
7 refused
9 unsure
1 strongly agree
2 agree
enjoy pe
3 neither agree nor disagree
4 disagree
5 strongly disagree
7 refused
9 unsure

Table 3.2: Recoded Factor Levels
METHODOLOGY 16

a high frequency of missing data was freq pe. Upon closer examination of the data, it was
discovered that freq pe was set to NA for all instances in which pe yn equalled no. That
is, for any respondent who did not participate in physical education classes, the frequency
of physical education class was recorded as NA. To fix this issue, for all survey respondents
who answered no for the pe yn variable, we replaced the NA values with 0, indicating that
these respondents participated in physical education 0 times per week. This substitution
eliminated all instances of item non-response for the freq pe variable.
Remaining issues of item non-response were instances in which a survey respondent had
nonzero exam weight and missing values for some but not all demographic variables and
exam data. In total, there were 46 instances of item non-response. Assuming that these
instances of item non-response were missing at random given covariates (MAR), we chose to
use random hot deck imputation by row in order to fix missing data values. For each row with
one or more missing value, a random row was selected from a subset of observations of the
same age and sex. Missing values were substituted from the random complete observation.
Within-group variability of the subset data is expected to be far lower than overall vari-
ability in the data. Each observation of a variable for a given respondent is highly correlated
with observations of other variables for that given respondent. For example, it would be
highly unlikely that a BMI-defined obese person would categorize himself as underweight.
Therefore, for any row that contained a missing data value, we chose a substitute row at
random in which the provided observations of variables matched with those of the substitute
row, and replaced the missing data values.
Usage of random hot deck imputation by row allowed us to uphold some variability in our
data while simultaneously preserving the correlations between observed variables. However,
we do make note of the fact that these imputed values are not real observations and could
potentially be biased, and that duplication of rows decreases the variance of our estimated
proportions.
Figure 3.1 above displays a plot of the frequency of unit non-response (regarding the
exam portion of the survey) by age and gender as well as a plot of the frequency of item non-
response by age and gender. From the plot on the left of unit non-response, we note that unit
non-response appeared to be random across age and gender, as there are no apparent trends
or patterns from the plot. This remains consistent with our decision to throw out these data
points due to our assumption of unit non-response instances being missing completely at
random (MCAR). However, from the plot of item non-response on the right, we note that
frequency of item non-response among boys was relatively similar, whereas frequency of item
non-response among girls was particular high for 14-year-olds. This could be an indication
that 14-year-old girls are more self-conscious and intentionally left certain questions blank,
METHODOLOGY 17

Figure 3.1: Frequency of Non-response by Gender and Age

but again, for purposes of this study, we assume that all item non-response is classified as
missing at random given covariates (MAR). We make particular note of this as we move on
the the analysis of our data, as it is possible that through random hot deck imputation by
row, we have introduced potential biases into the data or masked significant trends.
With our merged, cleaned, and imputed data set of 713 observations across 7 variables,
we set up our survey design using the survey package in R using the following line of code:

nhanes d e s i g n = s v y d e s i g n ( i d s = ˜psu+id , s t r a t a = ˜stratum ,
n e s t = TRUE, weights = ˜exam wt ,
data = nhanes )

The PSU tell us which pseudo-PSU the individual belongs to. The original survey split
the states into five different PSUs based upon how healthy the state is with group 1 being
the most healthy and group 5 the least healthy. However, to maintain anonymity of the
respondents, NHANES only distributes masked variance units in the form of pseudo-PSUs,
which are labeled as PSU one and PSU two. CDC instructs that the use of MVUs ”closely
approximates the variances that would have been estimated using the ”true” design vari-
ables” [7]. Simplified pseudo-strata are also provided in this data, and therefore we are able
to analyze this complex survey as if it were two-stage. ID helps to identify each unique
individual. Stratum tell us which stratum the individual belongs to. Finally, for the weights
we used exam weights. CDC instructs that ”the examination weights should be used exclu-
sively for analyses of data from the examination, or in conjunction with the interview data”
[8]. Our analysis features use of both examination and household interview data. In our
METHODOLOGY 18

svydesign(), we did not include a finite population correction because we do not have any
information regard the population sizes.
Results

4.1 Effect of Participation in PE on Physical Health

Figure 4.1: Proportion in Each Weight Category by Existence of PE
Question: Does the likelihood of an adolescent being in a particular weight class (as defined
by BMI) vary by participation in a physical education class?
H0 : Proportions in each weight class are consistent across participation/no participation in
a physical education class.
H1 : Proportions in each weight class vary by participation/no participation in a physical
education class.
From Figure 4.1, we note that, visually, the proportions of adolescents in each weight
class appear extremely similar across both groups, with no apparent deviations in patterns.
In both the PE and no-PE groups, the normal weight class categorized the largest proportion
of adolescents, followed by obese, overweight, and then underweight.
RESULTS 20

We confirm these initial observations with a two-sample F-test for difference in propor-
tions between the two groups, which yielded a p-value of 0.2827 >0.05. Therefore, at the
95% significance level, we fail to reject the null hypothesis that there is no difference in dis-
tribution of weight categories according to participation/no participation in a PE show the
weight categories are distributed based on whether or not an adolescent has PE or not. We
have insignificant evidence to conclude an association between participation in a PE class
and likelihood of being of healthy weight.

4.2 Effect of Participation in PE on Weight Awareness

Figure 4.2: Proportion of Correct Perception by Existence of PE

Question: Does an adolescent’s weight awareness vary by participation in a physical edu-
cation class?
H0 : Weight awareness is consistent across participation/no participation in a physical edu-
cation class.
H1 : Weight awareness varies across participation/no participation in a physical education
class.
From Figure 4.2, we note that distribution of weight awareness appears similar between
the two groups. In both the PE and non-PE groups, we observe slightly over 75% of the group
categorizing themselves in the correct weight class (as defined by BMI) and the remaining
RESULTS 21

25% incorrectly categorizing themselves. For those who do participate in PE, the accuracy
level is actually slightly lower than those who do not have PE, but the difference appears to
be very small.
Again, to confirm our initial observations, we conduct a two-sample F-test of proportions
and obtain a p-value of 0.4077 >0.05. Therefore, at the 95% significance level, we fail to reject
the null hypothesis that weight awareness is consistent across participation/no participation
in a physical education class.

4.3 Effect of Frequency of PE on Physical Health
Question: Does the likelihood of an adolescent being in a particular weight class (as defined
by BMI) vary across frequencies of physical education class?
H0 : Proportions in each weight class are consistent across frequencies of physical education
class.
H1 : Proportions in each weight class vary across frequencies of physical education class.
From Figure 4.4 on the following page, we can see that regardless of frequency of PE, the
general trends in proportions of each weight category remain the same. For all frequencies of
PE, we note that the majority of each group falls into the normal weight category, and the
smallest proportion always falls in the underweight category. Additionally, with the exception
of the group having PE once a week having a slightly larger proportion of adolescents in the
normal weight category, most of the other groups show very similar proportions of adolescents
in each weight category.
To confirm these initial observations, we conduct a chi-squared goodness-of-fit test and
obtain a p-value of 0.2827 >0.05. Therefore, at the 95% significance level, we have insufficient
evidence to conclude that proportions in each weight class vary across frequencies of physical
education class.
RESULTS 22

Figure 4.3: Proportion in Each Weight Category by Frequency of PE

Figure 4.4: Proportion in Each Weight Category by Frequency of PE

4.4 Effect of Frequency of PE on Weight Awareness
Question: Does an adolescent’s weight awareness vary across frequencies of physical edu-
cation class?
H0 : Weight awareness is consistent across frequencies of physical education class.
RESULTS 23

H1 : Weight awareness varies across frequencies of physical education class.

Figure 4.5: Proportion of Correct Perceptions by Frequency of PE

Figure 4.6: Proportion of Correct Perceptions by Frequency of PE

From Figure 4.6, we note that there are similar patterns of correct and incorrect self-
weight-categorizations across all frequencies of PE. Groups separated by varying frequencies
of PE class per week exhibited similar behavior across the board, with close to 75% of
adolescents categorizing themselves in the correct weight class (defined by BMI) and the
other 25% having incorrect perceptions about their own weight category. Groups with PE
class 4 or 5 times a week exhibit small deviations from the others, but these differences are
relatively small.
To confirm this initial observation, we conduct a chi-squared goodness-of-fit test and
obtain a p-value of 0.4465 >0.05. Therefore, at the 95% significance level, we have insufficient
RESULTS 24

evidence to conclude that weight awareness varies across frequencies of physical education
class.

4.5 Effect of Enjoyment of PE on Physical Health

Figure 4.7: Proportion of Enjoyment by Weight Category

Figure 4.8: Proportion in Each Weight Category by Enjoyment of PE
RESULTS 25

Question: Does the likelihood of an adolescent being in a particular weight class (as defined
by BMI) vary across enjoyment of physical education class?
H0 : Proportions in each weight class are consistent across enjoyment of physical education
class.
H1 : Proportions in each weight class vary across enjoyment of physical education class.
From looking at figure 4.7, we note that different BMI categories tended to behave in
different ways. For those with normal, overweight, or obese categories of BMI, responses
tended towards the more moderate statements such as: ”I enjoy participating in PE or gym
class.” That is to say, the majority of their responses were ”agree”, ”disagree”, and ”neither
agree nor disagree”. With each of these groups, the majority of adolescents agreed with the
statement of enjoying PE class. However, in contrast, those who had BMI categorized as
underweight tended to have different responses. Within the underweight group, there was a
much higher proportion of ”unsure” responses and no negative responses at all.
Because the responses from the underweight group were so deviant from the other groups,
we wanted to more closely examine the relationship between the different enjoyment levels
of PE and one’s BMI category. In Figure 4.8, we plotted the proportions of PE enjoyment
for each weight group, and the results were very interesting. We had from previous analyses
determined that the normal BMI category was most prevalent among the adolescents sam-
pled, but from the barplot, we note additional information about the response from each
weight category. In particular, we note that a higher proportion of those who responded
with ”disagree” or ”strongly disgree” come from those who were categorized as overweight
than other weight classes. Additionally, we note that most of the respondents who answered
”unsure” were underweight. These observations led us to believe that there did exist some
correlation between BMI category and one’s enjoyment of PE.
Upon completing a chi-squared goodness-of-fit test and obtaining a p-value of .0000516,
we confirmed our initial observations that proportion of adolescents in each weight category
seemed to be dependent upon enjoyment of PE. Therefore, at the 95% significance level,
we reject our null hypothesis that proportions in each weight category are consistent across
enjoyment of physical education class.
Figure 4.9 on the following page displays the table of proportions in each weight category
by enjoyment of PE.
RESULTS 26

Figure 4.9: Proportion in Each Weight Category by Enjoyment of PE

4.6 Effect of Enjoyment of PE on Weight Awareness
Question: Does an adolescent’s weight awareness vary across enjoyment of physical educa-
tion class?
H0 : Weight awareness is consistent across enjoyment of physical education class.
H1 : Weight awareness varies across enjoyment of physical education class.
From Figure 4.11 on the following page, we note that for all but those who answered
”unsure” to the enjoyment of PE, the majority (close to 70%) had a correct perception of
what BMI category they fell under.
To test whether or not deviations from the ”unsure” category were significant, we con-
ducted a chi-squared goodness-of-fit test, which yielded a p-value of 0.3803. Therefore, at the
95% significance level, we failed to reject our null hypothesis that weight awareness is consis-
tent across enjoyment of physical education class. That is, despite the association between
enjoyment of PE and proportion of adolescents in each weight class, we have insufficient
evidence to conclude that weight awareness is dependent upon enjoyment of PE.
RESULTS 27

Figure 4.10: Proportion of Correct Perceptions by Enjoyment of PE

Figure 4.11: Proportion of Correct Perceptions by Enjoyment of PE
Conclusions

The original intent of this study was to determine whether or not distributions of adoles-
cents in the four defined BMI weight categories (underweight, normal, overweight, obese)
and adolescent weight awareness (categorization of self into correct weight category based
on BMI) were significantly different across existence, frequency, or enjoyment of physical
education classes. Based on hypothesis tests using F-tests and chi-squared goodness-of-fit
tests, we found that distribution of weight awareness did not differ significantly across exis-
tence, frequency, or enjoyment of physical education classes. That is to say, the proportion
of adolescents who categorized themselves in the correct weight category (as defined by
their measured BMI index) was not significantly associated with the existence, frequency, or
enjoyment of PE classes.
Additionally, it was determined through F-tests and chi-squared goodness-of-fit tests that
distribution of adolescents in the four defined BMI weight categories did not differ signif-
icantly across existence or frequency of physical education classes. However, distribution
in weight categories did differ significantly across enjoyment of physical education classes.
Specifically, we noted in our analysis that overweight adolescents had the highest tendency
of disliking PE classes, and underweight adolescents had the highest tendency of ”unsure”
opinions regarding their enjoyment of PE class.
Prior to completing this study, we hypothesized that there would undoubtedly be a
correlation between the existence/frequency of physical education classes and an adolescent’s
physical health, but we were proven to be wrong. However, while we found that there does
not exist significant correlation between our original variables, we did come out of this with
other interesting results on general misconceptions regarding this topic. First, we found
that there is a significant association between an adolescent’s enjoyment of PE and said
adolescent’s weight category (as defined by BMI). Second, contrary to popular belief that
PE is only liked/dominated by athletic students (typically of normal weight), we found that
a large proportion of adolescents who are overweight or obese also seem to really enjoy PE.
It has always seemed logical to believe that if a child has PE more frequently, then they
CONCLUSIONS 29

have higher chances of being of healthy/normal weight because they will get more hours of
physical activity. However, we find that the mere existence of a class that promotes physical
activity and healthy lifestyles does not automatically translate to healthier students. On the
contrary, it is a student’s enjoyment and happiness linked to physical activity and PE classes
that contributes more to that student’s physical health (as measure by BMI).
The implications of these findings on the ongoing national debate of whether or not
physical education classes should be mandatory in schools is huge. The primary argument
for mandating physical education in schools is the logical assumption that such classes not
only foster healthy and active lifestyles in youth and adolescents, but also have the capability
of making a direct impact on the physical and mental health of students. Using BMI weight
categories as our measure of physical health in adolescents, our results indicate that neither
the existence nor frequency of physical education classes in schools is directly associated
with the physical health of students. Specifically, the distribution of adolescents in each
weight category did not vary across existence or frequency of physical education classes.
Surprisingly, what did have an impact on the physical health of students as defined by their
BMI weight category was the enjoyment of said physical education class. Therefore, we
conclude that the perceived improvement of physical health in adolescents participating in
physical education classes is not directly linked to those classes, or even frequency of those
classes–but rather, an intrinsic motivation to live an active and healthy lifestyle.

5.1 Caveats and Limitations
As with all other studies of survey data, there are caveats and limitations to our analyses.
One of the primary limitations in our survey analysis was the small sample size due to limited
data collection on the part of NHANES. Because the physical education variables used in
our study were only collected for survey respondents between the ages of 12 and 15, we were
forced to throw out data for survey respondents outside of that age range. The implication
of small sample size on our analyses is increased standard error of estimates, which in turn
makes it more difficult to reject the null hypotheses we tested.
Additionally, another caveat we would like to point out is the validity of BMI mea-
surements as an indicator of physical health of adolescents. As mentioned previously, BMI
attempts to quantify one’s tissue mass (muscle, fat, and bone) from calculations based on
one’s weight and height. However, recent studies have shown that BMI is not always reliable
in predicting one’s physical health. For example, in New Scientist’s article, ”Overweight
Olympians: Guess the BMI of top athletes,” it is revealed that athletes who participated in
CONCLUSIONS 30

the 2012 London Olympics have BMI measurements placing them in the overweight or obese
weight categories [1]. However, these athletes cannot be considered as unhealthy individuals.
FiveThirtyEight’s article ”BMI Is A Terrible Measure Of Health”, states that while there is
a positive correlation between weight and fat composition of one’s body, it is important to
remember that weight is also comprised of bone mass, muscle mass, fluids in the body, etc
[6]. Hence, the implication of using BMI as a measure of physical health is that it is difficult
to differentiate what proportion of weight comes from muscle mass versus what proportion
comes from fat. In the case of these athletes, muscle weight is what makes up most of their
body weight, but BMI incorrectly classifies this as excess fat.
Finally, we would again like to note the caveat of using imputation to deal with missing
data values. By imputing our data using random hot-deck imputation by row, we made the
assumption that people of the same gender, age, and BMI had similar experiences and feelings
towards physical education classes. However, that may not always be the case. Usage of
imputation could possibly have introduced bias into our dataset or masked important trends
or patterns we did not catch during initial exploratory data analysis. For example, as noted
previously, we noticed that item non-response tended to be higher in females of age 14. We
decided to impute their responses, but there could actually have been some reason that this
gender and age group decided not to answer some of the questions (e.g. higher levels of
self-consciousness in 14-year-old girls). The implications of using imputation by row are
decreased variance and likely increased bias of estimates.

5.2 Next Steps
While our data and calculations from NHANES 2013-2014 indicate that the correlation
between physical education (PE) and health is not significant, we believe that it would be
worthwhile for the CDC to continue to administer this survey on a much larger scale. With
more data on adolescents in this age group, or even with an expansion of the age group,
assuming that non-response does not increase or remain the same, we can be more confident
in accepting or rejecting our null hypotheses. Additionally, since BMI is not always the
best indicator of one’s health, the CDC may want to look into other measures of one’s
physical health to be taken into account alongside of one’s BMI, such as blood pressure or
cholesterol. Further, specifically for this research question, it may be worthwhile to carry
out this survey as a longitudinal rather than cross-sectional study. Though we did not find
significant correlation between physical health and frequency of PE classes per week, it may
be possible that long-term participation in PE for adolescents has a significant impact on
physical health as defined by BMI.
CONCLUSIONS 31

Choice and personal desires drive people in different directions and can also have enor-
mous impact on physical health, as demonstrated by the link between enjoyment of PE and
distribution of weight categories. The discussion of whether PE should be mandatory in
schools merely touches the tip of the iceberg. Beyond the scope of PE classes, the discussion
can be expanded to participation in school sports, which allows students the opportunity to
choose what type of physical activity they would like to partake in. With our new findings,
balance between what adolescents want and what is considered ”best” for them is probably
the trickiest thing about the current debate.
Appendix

#####Loading and Cleaning Data#####

# Loading t h e Data

l i b r a r y ( Hmisc )
library ( plyr )
library ( dplyr )

demographics <− s a s x p o r t . get ( ”DEMO H.XPT” )
bmi <− s a s x p o r t . get ( ”BMX H.XPT” )
a c t i v i t y <− s a s x p o r t . get ( ”PAQ H.XPT” )
weights <− s a s x p o r t . get ( ”WHQMEC H.XPT” )

# C r e a t i n g a New Data Frame
l i s t d f s <− l i s t ( demographics , bmi , a c t i v i t y , weights )

f u l l data <− j o i n a l l ( l i s t d f s ) %>%
s e l e c t ( seqn , w t i n t 2 y r , wtmec2yr , sdmvpsu , sdmvstra , r i a g e n d r , r i d a g e y r ,
bmdbmic , whq030m , whq500 , paq744 , paq746 , paq750 ) %>%
f i l t e r ( r i d a g e y r %i n% c ( 1 2 : 1 5 ) ) %>%
d p l y r : : rename ( ‘ id ‘ = seqn , ‘ i n t wt ‘ = w t i n t 2 y r , ‘ exam wt ‘ = wtmec2yr ,
‘ psu ‘ = sdmvpsu , ‘ stratum ‘ = sdmvstra , ‘ gender ‘ = r i a g e n d r ,
‘ age ‘ = r i d a g e y r , ‘ bmi ‘ = bmdbmic , ‘ o p i n i o n wt ‘ = whq030m ,
‘ action wt ‘ = whq500 , ‘ pe yn ‘ = paq744 ,
‘ f r e q pe ‘ = paq746 , ‘ e n j o y pe ‘ = paq750 ) %>%
mutate ( gender = factor ( gender , l e v e l s = 1 : 2 , l a b e l s = c ( ”M” , ”F” ) ) ,
bmi = factor ( bmi , l e v e l s = 1 : 4 ,
l a b e l s = c ( ” underweight ” , ” normal ” , ” o v e r w e i g h t ” , ” o b e s e ” ) ) ,
APPENDIX 33

o p i n i o n wt = factor ( o p i n i o n wt , l e v e l s = c ( 1 , 2 , 3 , 7 , 9 ) ,
l a b e l s = c ( ” o v e r w e i g h t ” , ” underweight ” , ” normal ” ,
” r e f u s e d ” , ” unsure ” ) ) ,
action wt = factor ( action wt , l e v e l s = c ( 1 , 2 , 3 , 4 , 7 , 9 ) ,
l a b e l s = c ( ” l o s e ” , ” g a i n ” , ” maintain ” ,
” n o t h i n g ” , ” r e f u s e d ” , ” unsure ” ) ) ,
pe yn = factor ( pe yn , l e v e l s = c ( 1 , 2 , 7 , 9 ) ,
l a b e l s = c ( ” y e s ” , ”no” , ” r e f u s e d ” , ” unsure ” ) ) ,
e n j o y pe = factor ( e n j o y pe , l e v e l s = c ( 1 , 2 , 3 , 4 , 5 , 7 , 9 ) ,
labels = c ( ” strongly agree ” , ” agree ” ,
” n e i t h e r a g r e e nor d i s a g r e e ” , ” d i s a g r e e ” ,
” s t r o n g l y d i s a g r e e ” , ” r e f u s e d ” , ” unsure ” ) ) ) %>%
mutate ( f r e q pe = i f e l s e ( pe yn == ”no” , 0 , f r e q pe ) )

##########
#####Hot Deck Imputation#####

# Unit Non−Response due t o exam w e i g h t b e i n g 0
data4 <− data3 [ −(which ( data3 $exam wt==0)) ,]

# Random Hot Deck I m p u t a t i o n

#rows w i t h NAs
nas<− which ( i s . na ( data4 ) , a r r . i n d=TRUE)
need buddy <− unique ( nas [ , 1 ] )

find buddy <− function ( x ){
a l l p o s s i b l e <− which ( data5 $ gender==data4 $ gender [ x ] &
data5 $age==data4 $age [ x ] )
rand <− sample ( a l l p o s s i b l e , 1 )
return ( rand )
}

b u d d i e s<− data . frame ( need buddy ,
”buddy”=sapply ( need buddy , find buddy ) )
APPENDIX 34

for ( j i n 1 : nrow( b u d d i e s ) )
for ( i i n seq a l o n g ( data4 [ b u d d i e s $need buddy [ j ] , ] ) )
i f ( i s . na ( data4 [ b u d d i e s $need buddy [ j ] , ] [ i ] ) )
data4 [ b u d d i e s $need buddy [ j ] , ] [ i ] <−
data4 [ b u d d i e s $buddy [ j ] , ] [ i ]

##########
#####EDA and Survey Design Code#####

(nrow( data ) )
head ( data , 5 )
summary( data )

library ( ggplot2 )
e r <− table ( data$bmi , data$ e n j o y pe )
mosaicplot ( er , l a s =1, x l a b=”BMI” , y l a b=” Enjoy PE” ,
main=”BMI and PE Enjoyment ” )

# P l o t : BMI and PE Enjoyment
e r <− table ( data$bmi , data$ e n j o y pe )
mosaicplot ( er , l a s =1, x l a b=”BMI” , y l a b=” Enjoy PE” ,
main=”BMI and PE Enjoyment ” )

# P l o t : Age and PE enjoyment
age <− table ( data$age , data$ e n j o y pe )
mosaicplot ( age , l a s =1, x l a b=”Age” , y l a b=” Enjoy PE” ,
main=”Age and PE Enjoyment ” )

# P l o t : BMI and Weight P e r c e p t i o n
e n j <− table ( data$bmi , data$ o p i n i o n wt)
mosaicplot ( enj , l a s =1, x l a b=”BMI” , y l a b=” P e r c e p t i o n o f Weight ” ,
main=”BMI and Weight P e r c e p t i o n ” )

# P l o t : Exam Weight and Pseudo−Stratum and Pseudo PSU
APPENDIX 35

library ( plyr )
data$bmi <− r e v a l u e ( data$bmi , c ( ” o v e r w e i g h t ”=” o v e r ” ,
” underweight ”=” under ” ) )

boxplot ( data$exam wt ˜ data$stratum ,
main=”Exam Weight by Pseudo−Stratum ” )
boxplot ( data$exam wt ˜ data$psu ,
main=”Exam Weight by Pseudo−PSU” )

h i s t ( data$exam wt˜data$bmi )

# Plot : Total Exam Weight By BMI C l a s s
ugh <− c (
sum( data$exam wt [ which ( data$bmi==” o b e s e ” ) ] ) ,
sum( data$exam wt [ which ( data$bmi==” o v e r w e i g h t ” ) ] ) ,
sum( data$exam wt [ which ( data$bmi==” normal ” ) ] ) ,
sum( data$exam wt [ which ( data$bmi==” underweight ” ) ] ) )

barplot ( ugh , names . a r g=c ( ” Obese ” , ” Overweight ” , ”Normal” ,
” Underweight ” ) ,
main=” T otal Exam Weight i n Sample by BMI C l a s s ” )

# P l o t : T o t a l Exam Weight By PE Enjoyment
ugh2 <− c (
sum( data$exam wt [ which ( data$ e n j o y pe==” s t r o n g l y a g r e e ” ) ] ) ,
sum( data$exam wt [ which ( data$ e n j o y pe==” a g r e e ” ) ] ) ,
sum( data$exam wt [ which ( data$ e n j o y pe==” n e i t h e r a g r e e nor d i s a g r e e ” ) ] ) ,
sum( data$exam wt [ which ( data$ e n j o y pe==” d i s a g r e e ” ) ] ) ,
sum( data$exam wt [ which ( data$ e n j o y pe==” s t r o n g l y d i s a g r e e ” ) ] ) ,
sum( data$exam wt [ which ( data$ e n j o y pe==” unsure ” ) ] )
)

barplot ( ugh2 , main=” Total Exam Weights by PE Enjoyment ” )
x . l a b s = c ( ” S t r o n g l y Agree ” , ” Agree ” , ” N e i t h e r ” , ” D i s a g r e e ” ,
” S t r o n g l y D i s a g r e e ” , ” Unsure ” )
APPENDIX 36

axis ( 1 , a t=c ( 0 . 7 , 1 . 9 , 3 . 1 , 4 . 3 , 5 . 5 , 6 . 6 ) , l a b e l s = x . l a b s ,
cex . axis = . 5 )

# P l o t : Exam Weight D i s t r i b u t i o n by PE enjoyment
plot ( data$exam wt˜data$ f r e q pe , pch =19 , x l a b=”Number o f PE C l a s s e s p er Week” ,

# P l o t : T o t a l Exam Weight By PE Presence
ugh4 <− c (
sum( data$exam wt [ which ( data$pe yn==” y e s ” ) ] ) ,
sum( data$exam wt [ which ( data$pe yn==”no” ) ] )
)

barplot ( ugh4 , main=” Total Exam Weights by PE P r e s e n c e ” )
x . l a b s = c ( ” P r e s e n t ” , ” Absent ” )
axis ( 1 , a t=c ( 0 . 7 , 1 . 9 ) , l a b e l s = x . l a b s , cex . axis = 1 . 5 )

# P l o t : T o t a l Exam Weight By PE Enjoyment

data$ f r e q pe <− as . factor ( data$ f r e q pe )

ugh3 <− c (
sum( data$exam wt [ which ( data$ f r e q pe==”0” ) ] ) ,
sum( data$exam wt [ which ( data$ f r e q pe==”1” ) ] ) ,
sum( data$exam wt [ which ( data$ f r e q pe==”2” ) ] ) ,
sum( data$exam wt [ which ( data$ f r e q pe==”3” ) ] ) ,
sum( data$exam wt [ which ( data$ f r e q pe==”4” ) ] ) ,
sum( data$exam wt [ which ( data$ f r e q pe==”5” ) ] )
)

barplot ( ugh3 , main=” Total Exam Weights by PE Frequency ” ,
x l a b= ”Days p er Week” )
x . l a b s = c ( ”NA” , ”1” , ”2” , ”3” , ”4” , ”5” )
axis ( 1 , a t=c ( 0 . 7 , 1 . 9 , 3 . 1 , 4 . 3 , 5 . 5 , 6 . 6 ) , l a b e l s = x . l a b s , cex . axis = 1 )
APPENDIX 37

##########
#####Analysis Code#####

nhanes = read . csv (
”˜/Desktop/ S t a t 1 5 2 / F i n a l P r o j e c t / f u l l data v4 . c s v ” ) [ , −c ( 1 , 2 ) ] %>%
mutate ( bmi1 = as . factor ( i f e l s e ( bmi == ” o b e s e ” ,
” o v e r w e i g h t ” , as . character ( bmi ) ) ) ) %>%
mutate ( c o r r e c t p e r c e p t i o n = as . l o g i c a l ( bmi1 == o p i n i o n wt ) )
nhanes d e s i g n = s v y d e s i g n ( i d s = ˜psu+id , s t r a t a = ˜stratum ,
n e s t = TRUE, weights = ˜exam wt , data = nhanes )

#P r o p o r t i o n i n Each Weight Category by E x i s t e n c e o f PE
svymean ( ˜ interaction ( pe yn , bmi ) , d e s i g n = nhanes d e s i g n )
bmi t b l = s v y t a b l e ( ˜pe yn+bmi , nhanes d e s i g n )
bmi prop t b l = round ( prop . table ( bmi t b l ) , 5 )
summary( bmi t b l , s t a t i s t i c = ”F” )
s v y c h i s q ( ˜ f r e q pe+bmi , nhanes d e s i g n , s t a t i s t i c = ”F” )

#B a r p l o t
bmi prop df = as . data . frame ( bmi prop t b l )
prop pe yn = tapply ( bmi prop df$Freq , bmi prop df$pe yn , sum)
bmi prop df$prop pe yn = prop pe yn [ bmi prop df$pe yn ]
plot df = bmi prop df %>%
mutate ( s t d prop = Freq/prop pe yn )

g g p l o t ( plot df , a e s ( pe yn , s t d prop ) ) +
geom bar ( a e s ( f i l l = bmi ) , p o s i t i o n = ” dodge ” , stat=” i d e n t i t y ” ) +
l a b s ( x = ” E x i s t e n c e o f PE” ,
y = ” P r o p o r t i o n i n Each Weight Category ” ,
f i l l = ” Weight Category ” ,
t i t l e = ” P r o p o r t i o n o f A d o l e s c e n t s i n Each Weight
Category by E x i s t e n c e o f PE” ) +
theme ( plot . t i t l e = element text ( h j u s t = 0 . 5 ) )

######################################
APPENDIX 38

#P r o p o r t i o n o f C o r r e c t P e r c e p t i o n s by E x i s t e n c e o f PE
svymean ( ˜ interaction ( pe yn , c o r r e c t p e r c e p t i o n ) , d e s i g n = nhanes d e s i g n )
bmi t b l = s v y t a b l e ( ˜pe yn+c o r r e c t p e r c e p t i o n , nhanes d e s i g n )
bmi prop t b l = round ( prop . table ( bmi t b l ) , 5 )
summary( bmi t b l , s t a t i s t i c = ”F” )
s v y c h i s q ( ˜pe yn+c o r r e c t p e r c e p t i o n , nhanes d e s i g n , s t a t i s t i c = ”F” )

#B a r p l o t
bmi prop df = as . data . frame ( bmi prop t b l )
prop pe yn = tapply ( bmi prop df$Freq , bmi prop df$pe yn , sum)
bmi prop df$prop pe yn = prop pe yn [ bmi prop df$pe yn ]
plot df = bmi prop df %>%
mutate ( s t d prop = Freq/prop pe yn )

g g p l o t ( plot df , a e s ( pe yn , s t d prop ) ) +
geom bar ( a e s ( f i l l = c o r r e c t p e r c e p t i o n ) , p o s i t i o n = ” dodge ” ,
stat=” i d e n t i t y ” ) +
l a b s ( x = ” E x i s t e n c e o f PE” ,
y = ” Proportion ” ,
f i l l = ” Correct Perception ” ,
t i t l e = ” P r o p o r t i o n o f C o r r e c t P e r c e p t i o n s by E x i s t e n c e o f PE” ) +
theme ( plot . t i t l e = element text ( h j u s t = 0 . 5 ) )

######################################
#P r o p o r t i o n i n Each Weight Category by Frequency o f PE
svymean ( ˜ interaction ( f r e q pe , bmi ) , d e s i g n = nhanes d e s i g n )
bmi t b l = s v y t a b l e ( ˜ f r e q pe+bmi , nhanes d e s i g n )
bmi prop t b l = round ( prop . table ( bmi t b l ) , 5 )
summary( bmi t b l , s t a t i s t i c = ”F” )
s v y c h i s q ( ˜ f r e q pe+bmi , nhanes d e s i g n , s t a t i s t i c = ”F” )

#B a r p l o t
bmi prop df = as . data . frame ( bmi prop t b l )
prop f r e q pe = tapply ( bmi prop df$Freq , bmi prop df$ f r e q pe , sum)
bmi prop df$prop f r e q pe = prop f r e q pe [ bmi prop df$ f r e q pe ]
plot df = bmi prop df %>%
APPENDIX 39

mutate ( s t d prop = Freq/prop f r e q pe )

g g p l o t ( plot df , a e s ( f r e q pe , s t d prop ) ) +
geom bar ( a e s ( f i l l = bmi ) , p o s i t i o n = ” dodge ” , stat=” i d e n t i t y ” ) +
l a b s ( x = ” Frequency o f PE” ,
y = ” P r o p o r t i o n i n Each Weight Category ” ,
f i l l = ” Weight Category ” ,
t i t l e = ” P r o p o r t i o n o f A d o l e s c e n t s i n Each Weight
Category by Frequency o f PE” ) +
theme ( plot . t i t l e = element text ( h j u s t = 0 . 5 ) )

######################################

#P e r c e i v e d and A c t u a l Weight Category by Frequency o f PE
svymean ( ˜ interaction ( f r e q pe , bmi , o p i n i o n wt ) , d e s i g n = nhanes d e s i g n )
p e r c e p t i o n t b l = s v y t a b l e ( ˜ f r e q pe+c o r r e c t p e r c e p t i o n , nhanes d e s i g n )
p e r c e p t i o n prop t b l = round ( prop . table ( p e r c e p t i o n t b l ) , 5 )
summary( p e r c e p t i o n t b l , s t a t i s t i c = ”F” )
s v y c h i s q ( ˜ f r e q pe+c o r r e c t p e r c e p t i o n , nhanes d e s i g n , s t a t i s t i c = ”F” )

#B a r p l o t
p e r c e p t i o n prop df = as . data . frame ( p e r c e p t i o n prop t b l )
prop f r e q pe = tapply ( p e r c e p t i o n prop df$Freq ,
p e r c e p t i o n prop df$ f r e q pe , sum)
p e r c e p t i o n prop df$prop f r e q pe = prop f r e q pe [ p e r c e p t i o n prop df$ f r e q pe ]
plot df = p e r c e p t i o n prop df %>%
mutate ( s t d prop = Freq/prop f r e q pe )

g g p l o t ( plot df , a e s ( f r e q pe , s t d prop ) ) +
geom bar ( a e s ( f i l l = c o r r e c t p e r c e p t i o n ) ,
p o s i t i o n = ” dodge ” , stat=” i d e n t i t y ” ) +
l a b s ( x = ” Frequency o f PE” ,
y = ” Proportion ” ,
f i l l = ” Correct Perception ” ,
t i t l e = ” P r o p o r t i o n o f A d o l e s c e n t s Whose Weight P e r c e p t i o n
Matched BMI Category \nby Frequency o f PE” ) +
APPENDIX 40

theme ( plot . t i t l e = element text ( h j u s t = 0 . 5 ) )

######################################
#P r o p o r t i o n i n Each Weight Category by Enjoyment o f PE
svymean ( ˜ interaction ( e n j o y pe , bmi ) , d e s i g n = nhanes d e s i g n )
bmi t b l = s v y t a b l e ( ˜ e n j o y pe+bmi , nhanes d e s i g n )
bmi prop t b l = round ( prop . table ( bmi t b l ) , 5 )
summary( bmi t b l , s t a t i s t i c = ”F” )
s v y c h i s q ( ˜ e n j o y pe+bmi , nhanes d e s i g n , s t a t i s t i c = ”F” )

#B a r p l o t
bmi prop df = as . data . frame ( bmi prop t b l )
prop e n j o y pe = tapply ( bmi prop df$Freq , bmi prop df$ e n j o y pe , sum)
bmi prop df$prop e n j o y pe = prop e n j o y pe [ bmi prop df$ e n j o y pe ]
plot df = bmi prop df %>%
mutate ( s t d prop = Freq/prop e n j o y pe )

g g p l o t ( plot df , a e s ( e n j o y pe , s t d prop ) ) +
geom bar ( a e s ( f i l l = bmi ) , p o s i t i o n = ” dodge ” , stat=” i d e n t i t y ” ) +
l a b s ( x = ” Enjoyment o f PE” ,
y = ” P r o p o r t i o n i n Each Weight Category ” ,
f i l l = ” Weight Category ” ,
t i t l e = ” P r o p o r t i o n o f A d o l e s c e n t s i n Each Weight
Category by Enjoyment o f PE” ) +
theme ( plot . t i t l e = element text ( h j u s t = 0 . 5 ) )

######################################
#P r o p o r t i o n Who Enjoy PE by Weight C l a s s
svymean ( ˜ interaction ( e n j o y pe , bmi ) , d e s i g n = nhanes d e s i g n )
bmi t b l = s v y t a b l e ( ˜ e n j o y pe+bmi , nhanes d e s i g n )
bmi prop t b l = round ( prop . table ( bmi t b l ) , 5 )
summary( bmi t b l , s t a t i s t i c = ”F” )
s v y c h i s q ( ˜ e n j o y pe+bmi , nhanes d e s i g n , s t a t i s t i c = ”F” )

#B a r p l o t
bmi prop df = as . data . frame ( bmi prop t b l ) %>%
APPENDIX 41

mutate ( e n j o y pe = factor ( e n j o y pe , l e v e l s = c ( ” s t r o n g l y a g r e e ” ,
” agree ” ,
” neither agree
nor d i s a g r e e ” ,
” disagree ” ,
” strongly disagree ” ,
” unsure ” ) ) )
prop we ight = tapply ( bmi prop df$Freq , bmi prop df$bmi , sum)
bmi prop df$prop weight = prop weight [ bmi prop df$bmi ]
new prop = bmi prop df$Freq/bmi prop df$prop weight
plot df = bmi prop df %>%
mutate ( s t d prop = new prop )

g g p l o t ( plot df , a e s ( bmi , new prop ) ) +
geom bar ( a e s ( f i l l = e n j o y pe ) , p o s i t i o n = ” dodge ” , stat=” i d e n t i t y ” ) +
l a b s ( x = ” Weight Category by BMI ” ,
y = ” Proportion ” ,
f i l l = ” Enjoyment o f PE” ,
t i t l e = ” P r o p o r t i o n o f A d o l e s c e n t s Who Enjoy PE by
Weight Category ” ) +
theme ( plot . t i t l e = element text ( h j u s t = 0 . 5 ) )

######################################
#P e r c e i v e d and A c t u a l Weight Category by Enjoyment o f PE
svymean ( ˜ interaction ( e n j o y pe , bmi , o p i n i o n wt ) , d e s i g n = nhanes d e s i g n )
p e r c e p t i o n t b l = s v y t a b l e ( ˜ e n j o y pe+c o r r e c t p e r c e p t i o n ,
nhanes d e s i g n )
p e r c e p t i o n prop t b l = round ( prop . table ( p e r c e p t i o n t b l ) , 5 )
summary( p e r c e p t i o n t b l , s t a t i s t i c = ”F” )
s v y c h i s q ( ˜ e n j o y pe+c o r r e c t p e r c e p t i o n , nhanes d e s i g n , s t a t i s t i c = ”F” )

#B a r p l o t
perception prop df = as . data . frame ( p e r c e p t i o n prop t b l )
prop e n j o y pe = tapply ( p e r c e p t i o n prop df$Freq ,
perception prop df$ e n j o y pe , sum)
perception prop df$prop e n j o y pe =
APPENDIX 42

prop e n j o y pe [ p e r c e p t i o n prop df$ e n j o y pe ]
plot df = p e r c e p t i o n prop df %>%
mutate ( s t d prop = Freq/prop e n j o y pe )

g g p l o t ( plot df , a e s ( e n j o y pe , s t d prop ) ) +
geom bar ( a e s ( f i l l = c o r r e c t p e r c e p t i o n ) , p o s i t i o n = ” dodge ” ,
stat=” i d e n t i t y ” ) +
l a b s ( x = ” Enjoyment o f PE” ,
y = ” Proportion ” ,
f i l l = ” Correct Perception ” ,
t i t l e = ” P r o p o r t i o n o f A d o l e s c e n t s Whose Weight
P e r c e p t i o n Matched BMI Category \nby Enjoyment o f PE” ) +
theme ( plot . t i t l e = element text ( h j u s t = 0 . 5 ) )

##########
References

[1] Sally Adee. Overweight Olympians: Guess the BMI of top athletes. May 2014. url:
https://www.newscientist.com/gallery/obese-olympians/.
[2] CDC. NHANES Data Documentation 2013-2014, Demographic Data. Oct. 2015. url:
https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/DEMO_H.htm#SDMVPSU.
[3] Cable News Network (CNN). Obesity in the U.S. Fast Facts. July 2016. url: http:
//www.cnn.com/2013/09/02/health/obesity-in-u-s-fast-facts/.
[4] Centers for Disease Control and Prevention (CDC). About Adult BMI. May 2015. url:
https://www.cdc.gov/healthyweight/assessing/bmi/adult_bmi/.
[5] Centers for Disease Control and Prevention (CDC). Childhood Obesity Causes & Conse-
quences. Dec. 2016. url: https://www.cdc.gov/obesity/childhood/causes.html.
[6] Katherine Hobson. BMI Is A Terrible Measure Of Health. Feb. 2016. url: https :
//fivethirtyeight.com/features/bmi-is-a-terrible-measure-of-health/.
[7] Clifford L. Johnson. National Health and Nutrition Examination Survey: Sample De-
sign, 2011–2014. Mar. 2014. url: https://wwwn.cdc.gov/nchs/data/series/sr02_
162.pdf.
[8] Lisa B. Mirel. National Health and Nutrition Examination Survey: Estimation Proce-
dures, 2007–2010. Aug. 2013. url: https://wwwn.cdc.gov/nchs/data/series/
sr02_159.pdf.
[9] The State of Obesity. Obesity Rates & Trends Overview. Sept. 2016. url: http://
stateofobesity.org/obesity-rates-trends-overview/.
[10] World Health Organization (WHO). Obesity and Overweight: Fact Sheet. June 2016.
url: http://www.who.int/mediacentre/factsheets/fs311/en/.