Professional Documents
Culture Documents
Professor Birnir
Introduction
Throughout the semester many areas of inquiry have piqued my interest. From
looking into how party identification impacts views on environmental protection, to the
relationship between marijuana legalization support and church attendance, being able to
use R to uncover connections between variables is a skill I now value. For my final
survey write up I wanted to pick a dependent variable that was relevant and salient in
relation to the current political climate. Whether prisoners should be allowed to vote
while incarcerated fits this description (I will refer to this variable as “vote in prison”
from now on). It has been a topic of much debate for many years, and one that I have had
a changing opinion on. I was intrigued to see how our survey respondents felt about the
incarcerated individuals voting while in prison argue that America is a democracy that
includes all people, and point out that giving them a voice in politics could drastically
improve prison conditions. Opponents, on the other hand, claim that only moral and
This is a charged topic and calls into question whether basic rights like voting
really apply to ALL people in the United States. Personally, I was very curious to figure
out what factors influence how a typical person thinks about this question. Our survey
Merrill 2
asks respondents to rate their support of this idea on a scale from 0 to 100 (with 0 being
strongly disagree and 100 being strongly agree), and I wanted to see how these results
compared with other demographic features. In this paper I will first explore how age (the
independent variable), affects responses to “vote in prison.” Next I will run a second
model using the control variable, gender, to look into other potential factors that could
impact the dependent variable. Finally, I will conclude with a discussion of results and an
examination of differences in my two models to find meaning. This analysis will either
predict my findings. I certainly thought that age, my independent variable, would have a
strong effect on my dependent variable, “vote in prison.” My theory here was that
younger people would be more likely to support prisoners being able to vote while
incarcerated because they tend to be more liberal and inclusive than the older generation.
When formulating this theory I also took into account the fact that the University of
Maryland is located in a fairly liberal area, meaning many of the respondents, particularly
those under 25, would probably lean left. My hypothesis was that when looking at the
data we will see a downward trend in support for “vote in prison” when moving along the
I also decided to look into the effect that my control variable, gender, would have
alone on the original question, and then again when paired with the age variable. My
theory here was that women would be more likely to support incarcerated individuals
Merrill 3
voting while in prison, and men would be less likely, because women usually have higher
empathy scores. This known characteristic that is intrinsic to women, suggests that they
will be able to sympathize with prisoners and be more willing to give them a second
chance, in the form of being able to vote. There is also an element of political affiliation
that comes into play for the question of gender impact. Women are known to identify
more often with a liberal ideology, and being supportive of the “vote in prison” measure
confounding factor. We had more women respond to the survey than men, and it’s
possible that high rates of support for “vote in prison” among young people were only
because many of these young people were women. I believe that this is somewhat
unlikely, but still necessary to test. My hypothesis is that the effect of age will remain
statistically significant even after controlling for gender. In other words, both factors will
students in the course GPVT201, through social media and other friends/family. Overall
the survey was taken by 1,681 people, about ⅔ of who were women. Of those who gave
their age, 1,106 were below 25 and 563 were above 25. This reflects that the survey was
My regression models center around the survey question “On a scale from 1 to
100 (with 0 being strongly disagree and 100 being strongly agree), to what extent do you
measure of support for incarcerated individuals voting because it uses a scale and allows
respondents to express a wide variety of opinions. This variable in our survey has a mean
score of 56.17, a median of 60, and a mode of 100. After understanding the descriptive
statistics, we can begin to recode. The first step in adjusting this variable in R, was
“voteinprison.yes” in which all responses greater than 50 would be coded as 1s, and those
under 50 as 0s. I did this to make it easier to divide the data into those who support “vote
The second variable that will be used to explore my question is age. Age is an
interval-level measure, and in this case serves as the independent variable. Our survey
respondents were asked “What is your age.” This question came back with a mean age of
29.55, a median of 20, and a mode of 19. Clearly, this is a younger population, as
and understand the regression model, I decided to recode the age variable, making one
category respondents under age 25, and the other those over 25. To do this I first created
a new variable called cutpoints. Then I used the cut2 function to transform survey$age
into survey$age.4. Following this step I labeled the two new categories in age.4 as
Merrill 5
“Below 25” and “Above 25” using the levels function. I chose to recode in this way to
make a more clear distinction between old and young people who took the survey. It will
also be easier to see how each group interacts with our dependent variable by
The final variable is the control variable. As explained above, I have chosen the
categorical variable gender because I believe being a woman has the potential to change
the effect that age has on the dependent variable, and is therefore essential to examining
our initial question. The survey asks “What is your gender” and gives a few options
including female, male, non-binary, and other please specify. Our survey population was
made up of 1062 women, 592 men, 21 non-binary individuals, and six people who chose
other options, so the mode was clearly being a woman. For a more clear view of this
distribution see Figure 4 in the appendix. This information was initially coded as
binary variable, using the as.numeric function and coding all values in the original gender
data that were labeled “Female” as 1s, and the rest as 0s. This new variable will later be
used in Model 2. The purpose of this recoding was to allow us to see the effect that
specifically being a woman has on responses to “vote in prison” when age is also
explained interact with each other, we will run multiple regression analyses. The first is
called “Model 1” and does not include controls. The table for this model can be found
Merrill 6
under the label Table 1 and the graphical representation under Figure 1. The linear model
(lm) function outputs an intercept of 74.50 and a coefficient for age of -0.62. What
exactly does this mean? Because the coefficient for age is negative, there is an inverse
relationship between the independent and dependent variable. For every additional unit of
age, support of “vote in prison” decreases by .62. In the grand scheme of things, this is a
large amount. This result is statistically significant with a p-value of 2 × 10 −16 . Because
this number is clearly smaller than .05, the null—that age has no impact on support of
“vote in prison”— can safely be rejected. The intercept in the model also has meaning. It
suggests that the base score on the 0-100 scale for “vote in prison” is 74.50, and that this
score would go down with age and time. The last number of interest in our model is the
R-squared value of .083. This suggests that around 8.3 percent of the variation in “vote in
Model 2 is similar to Model 1, but incorporates the control variable gender which
has been transformed into “survey$female.” The outputs for this model differ slightly
from the first, and can be seen more exactly is Table 2 and Figure 2. The new age
coefficient, -0.65, increases in magnitude by a small amount, but it is still negative and
strong. The p-value for this coefficient remains the same as in Model 1. So, even when
controlling for being a woman, age is still a statistically significant factor and actually has
a faintly greater effect. Additionally the coefficient for “female,” 7.32 is a source of
interest. The coefficient is very strong because it is such a high number. Like age, it is
also statistically significant with a p-value of 5.68 × 10 −5 , enough to reject the null that
being a woman has no impact on response to “vote in prison.” When this coefficient is
Merrill 7
paired with the age coefficient and our new intercept/constant (70.50) we can create a
woman will increase the predicted score on the scale by around seven points, and moving
up in age will decrease the score by 0.65 points. This is supported by Figure 2, which
shows that being a woman makes scores much higher among young people, but has less
of an impact among older individuals. The lines showing trends for those for are women
and those who are not, in accordance with our dependent and independent variables
eventually cross, exemplifying this change in effect over age groups. The adjusted
R-Squared tells us that age and being a woman, together account for 9.2 percent of the
Looking between Model 1 and Model 2 can provide new insight. Controlling for
gender in the form of the variable “female” increased the effect of age on the dependent
variable by .03. Therefore, the relationship between the control and independent variable
is additive, even if only to a very small degree. Overall, the results didn’t change much
between models. Age had the same p-value in both and remained statistically significant.
The intercept decreased by around four points in Model 2. While Model 1 indicated that a
base score for all people, regardless of gender, was 74.50, Model 2 provided a base score
for men (70.50) and a base score for women (77.82). If we look at it this way, it makes
sense that when gender isn’t controlled for, the base score for people in general lies
somewhere between these two numbers. A final logical change that can be observed
between the two models is the increase of the R-Squared value. Beginning at 8.3 percent,
the value increases to 9.2 percent when gender is included. This is what should be
Merrill 8
expected, as the addition of more factors typically means less unaccounted variation in
All of my hypotheses proved correct. The first, that gender would have some
effect on the responses, was supported. Once I controlled for being a woman, the impact
of age increased. Additional predictions such as younger people and women being more
likely to score higher on “vote in prison” were also confirmed in the regression analyses.
Conclusion
These findings are important because they provide new insight into the political
be classified as a liberal idea, and our findings could potentially indicate that women and
younger people are more likely to identify with this political leaning. Of course more
research would be required to confirm this and other confounding factors such as the
original location where the survey was dispatched from could have impacted these
results. Another interesting finding was the R-squared coefficient. If I had predicted how
large it would be when both gender and age were measured, I probably would’ve
expected a higher number than 9.2 percent. This shows that there are many many other
factors impacting responses to “vote in prison.” A possible extension would be to see the
Appendix
Additional Graphs
[Table 1]
Table 1. This figure reports the relationship between the IV and DV, “vote in prison” and age.
[Figure 1]
Figure 1. This figure shows the relationship between the IV, and DV, “vote in prison,” and age. The x-axis
shows the different age groups, from youngest to oldest. The y-axis shows scores from 0-100, from least
supportive to most supportive for the DV. There is a clear negative relationship between the two variables.
[Table 2]
Table 2. This figure reports the relationship between the IV, DV, and CV, “vote in prison,” age and gender.
[Figure 2]
Merrill 10
Figure 2. This figure shows the relationship between the IV, DV, and CV, “vote in prison,” age and
gender. The x-axis shows the different age groups, from youngest to oldest. The y-axis shows scores from
0-100, from least supportive to most supportive of the DV. The solid line shows the trend for women and
the dashed line shows the trend for everyone who is not a woman.
[Figure 3]
Figure 3 provides a graphical representation of our independent interval-level variable, age. From this
graph, which plots age on the x-axis and frequency on the y-axis, it is clear that the majority of survey
respondents were younger.
[Figure 4]
Merrill 11
Figure 4 provides a graphical representation of our categorical control variable, gender. From this graph,
which plots gender on the x-axis and frequency on the y-axis, it is clear that about ⅔ of survey respondents
were female, and ⅓ were male.
- Transforming age:
- cutpoints ← 25
- survey$age.4 ← cut2(survey$age, cutpoints)
- levels(survey$age.4) ← c("Below 25", "Above 25")
- Transforming gender
- survey$gender ← survey$`What is your gender? - Selected Choice`
- survey$female ← as.numeric(survey$gender=="Female")
- Creating Model 1
- model1 ← lm(voteinprison ~ age, data=survey)
- summary(model1)
Merrill 12
- Creating Figure 1
- plot(survey$voteinprison~survey$age, ylab="Opinion on Vote in Prison",
xlab="Age", main="Opinion on Voting While Incarcerated by Age")
- abline(model1)
- Creating Model 2
- model2 <- lm(voteinprison ~ age + female, data=survey)
- summary(model2)
- Creating Figure 2
- plot(survey$voteinprison ~ survey$age, ylab="Opinion on Vote in
Prison", xlab="Age", main="Opinion on Voting While Incarcerated by
Age")
- abline(lm(voteinprison ~ age, data=survey, subset=female==1))
- abline(lm(voteinprison ~ age, data=survey, subset=female==0), lty=2)
- legend("right", legend=c("Female", "Others"),lty=c(1,2), inset=.02,
cex=.6)