You are on page 1of 12

Maddy Merrill

Professor Birnir

GVPT 201, 0204

May 14th, 2020

Final Survey Writeup

Introduction

Throughout the semester many areas of inquiry have piqued my interest. From

looking into how party identification impacts views on environmental protection, to the

relationship between marijuana legalization support and church attendance, being able to

use R to uncover connections between variables is a skill I now value. For my final

survey write up I wanted to pick a dependent variable that was relevant and salient in

relation to the current political climate. Whether prisoners should be allowed to vote

while incarcerated fits this description (I will refer to this variable as “vote in prison”

from now on). It has been a topic of much debate for many years, and one that I have had

a changing opinion on. I was intrigued to see how our survey respondents felt about the

issue, especially when linked with their demographic backgrounds. Proponents of

incarcerated individuals voting while in prison argue that America is a democracy that

includes all people, and point out that giving them a voice in politics could drastically

improve prison conditions. Opponents, on the other hand, claim that only moral and

responsible people should be able to take part in this sacred action.

This is a charged topic and calls into question whether basic rights like voting

really apply to ALL people in the United States. Personally, I was very curious to figure

out what factors influence how a typical person thinks about this question. Our survey
Merrill 2

asks respondents to rate their support of this idea on a scale from 0 to 100 (with 0 being

strongly disagree and 100 being strongly agree), and I wanted to see how these results

compared with other demographic features. In this paper I will first explore how age (the

independent variable), affects responses to “vote in prison.” Next I will run a second

model using the control variable, gender, to look into other potential factors that could

impact the dependent variable. Finally, I will conclude with a discussion of results and an

examination of differences in my two models to find meaning. This analysis will either

confirm my predictions/hypotheses that I will outline below.

Theory and Hypothesis

Before running my tests in R I wanted to create a few theories and hypotheses to

predict my findings. I certainly thought that age, my independent variable, would have a

strong effect on my dependent variable, “vote in prison.” My theory here was that

younger people would be more likely to support prisoners being able to vote while

incarcerated because they tend to be more liberal and inclusive than the older generation.

When formulating this theory I also took into account the fact that the University of

Maryland is located in a fairly liberal area, meaning many of the respondents, particularly

those under 25, would probably lean left. My hypothesis was that when looking at the

data we will see a downward trend in support for “vote in prison” when moving along the

age axis, from youngest respondents to oldest.

I also decided to look into the effect that my control variable, gender, would have

alone on the original question, and then again when paired with the age variable. My

theory here was that women would be more likely to support incarcerated individuals
Merrill 3

voting while in prison, and men would be less likely, because women usually have higher

empathy scores. This known characteristic that is intrinsic to women, suggests that they

will be able to sympathize with prisoners and be more willing to give them a second

chance, in the form of being able to vote. There is also an element of political affiliation

that comes into play for the question of gender impact. Women are known to identify

more often with a liberal ideology, and being supportive of the “vote in prison” measure

also aligns with a liberal way of thinking.

It is important to control for gender because it has the potential to be a

confounding factor. We had more women respond to the survey than men, and it’s

possible that high rates of support for “vote in prison” among young people were only

because many of these young people were women. I believe that this is somewhat

unlikely, but still necessary to test. My hypothesis is that the effect of age will remain

statistically significant even after controlling for gender. In other words, both factors will

be statistically significant when model two regression (which includes control) is

performed. I expect to observe an additive relationship.

Variables and Design

Our survey asked a variety of politically related questions. It was distributed by

students in the course GPVT201, through social media and other friends/family. Overall

the survey was taken by 1,681 people, about ⅔ of who were women. Of those who gave

their age, 1,106 were below 25 and 563 were above 25. This reflects that the survey was

probably taken mostly by college students, either on campus or elsewhere.


Merrill 4

My regression models center around the survey question “On a scale from 1 to

100 (with 0 being strongly disagree and 100 being strongly agree), to what extent do you

agree or disagree with the following statement: ‘Incarcerated individuals should be

allowed to vote while in prison” - 0=strongly disagree, 100=strongly agree.’” This

question serves as my interval-level dependent variable and is a good operational

measure of support for incarcerated individuals voting because it uses a scale and allows

respondents to express a wide variety of opinions. This variable in our survey has a mean

score of 56.17, a median of 60, and a mode of 100. After understanding the descriptive

statistics, we can begin to recode. The first step in adjusting this variable in R, was

renaming it to be coded as “voteinprison.” Next, I created a numeric value called

“voteinprison.yes” in which all responses greater than 50 would be coded as 1s, and those

under 50 as 0s. I did this to make it easier to divide the data into those who support “vote

in prison” (1) and those who do not (0).

The second variable that will be used to explore my question is age. Age is an

interval-level measure, and in this case serves as the independent variable. Our survey

respondents were asked “What is your age.” This question came back with a mean age of

29.55, a median of 20, and a mode of 19. Clearly, this is a younger population, as

outlined in Figure 3 which shows the distribution of responses. To achieve my purposes

and understand the regression model, I decided to recode the age variable, making one

category respondents under age 25, and the other those over 25. To do this I first created

a new variable called cutpoints. Then I used the cut2 function to transform survey$age

into survey$age.4. Following this step I labeled the two new categories in age.4 as
Merrill 5

“Below 25” and “Above 25” using the levels function. I chose to recode in this way to

make a more clear distinction between old and young people who took the survey. It will

also be easier to see how each group interacts with our dependent variable by

categorizing in this way.

The final variable is the control variable. As explained above, I have chosen the

categorical variable gender because I believe being a woman has the potential to change

the effect that age has on the dependent variable, and is therefore essential to examining

our initial question. The survey asks “What is your gender” and gives a few options

including female, male, non-binary, and other please specify. Our survey population was

made up of 1062 women, 592 men, 21 non-binary individuals, and six people who chose

other options, so the mode was clearly being a woman. For a more clear view of this

distribution see Figure 4 in the appendix. This information was initially coded as

“survey$gender” but I later recoded it as “survey$female.” This was done to create a

binary variable, using the as.numeric function and coding all values in the original gender

data that were labeled “Female” as 1s, and the rest as 0s. This new variable will later be

used in Model 2. The purpose of this recoding was to allow us to see the effect that

specifically being a woman has on responses to “vote in prison” when age is also

included. We will expect this coefficient to be positive and fairly large.

Analysis and Discussion

To gain an understanding of how the selected variables that we have now

explained interact with each other, we will run multiple regression analyses. The first is

called “Model 1” and does not include controls. The table for this model can be found
Merrill 6

under the label Table 1 and the graphical representation under Figure 1. The linear model

(lm) function outputs an intercept of 74.50 and a coefficient for age of -0.62. What

exactly does this mean? Because the coefficient for age is negative, there is an inverse

relationship between the independent and dependent variable. For every additional unit of

age, support of “vote in prison” decreases by .62. In the grand scheme of things, this is a

large amount. This result is statistically significant with a p-value of 2 × 10 −16 . Because

this number is clearly smaller than .05, the null—that age has no impact on support of

“vote in prison”— can safely be rejected. The intercept in the model also has meaning. It

suggests that the base score on the 0-100 scale for “vote in prison” is 74.50, and that this

score would go down with age and time. The last number of interest in our model is the

R-squared value of .083. This suggests that around 8.3 percent of the variation in “vote in

prison” scores can be explained by our independent variable, age.

Model 2 is similar to Model 1, but incorporates the control variable gender which

has been transformed into “survey$female.” The outputs for this model differ slightly

from the first, and can be seen more exactly is Table 2 and Figure 2. The new age

coefficient, -0.65, increases in magnitude by a small amount, but it is still negative and

strong. The p-value for this coefficient remains the same as in Model 1. So, even when

controlling for being a woman, age is still a statistically significant factor and actually has

a faintly greater effect. Additionally the coefficient for “female,” 7.32 is a source of

interest. The coefficient is very strong because it is such a high number. Like age, it is

also statistically significant with a p-value of 5.68 × 10 −5 , enough to reject the null that

being a woman has no impact on response to “vote in prison.” When this coefficient is
Merrill 7

paired with the age coefficient and our new intercept/constant (70.50) we can create a

formula: Support for “vote in prison” = 70.50 - 0.65*(Age) + 7.32*(Female). Being a

woman will increase the predicted score on the scale by around seven points, and moving

up in age will decrease the score by 0.65 points. This is supported by Figure 2, which

shows that being a woman makes scores much higher among young people, but has less

of an impact among older individuals. The lines showing trends for those for are women

and those who are not, in accordance with our dependent and independent variables

eventually cross, exemplifying this change in effect over age groups. The adjusted

R-Squared tells us that age and being a woman, together account for 9.2 percent of the

variation in the dependent variable.

Looking between Model 1 and Model 2 can provide new insight. Controlling for

gender in the form of the variable “female” increased the effect of age on the dependent

variable by .03. Therefore, the relationship between the control and independent variable

is additive, even if only to a very small degree. Overall, the results didn’t change much

between models. Age had the same p-value in both and remained statistically significant.

The intercept decreased by around four points in Model 2. While Model 1 indicated that a

base score for all people, regardless of gender, was 74.50, Model 2 provided a base score

for men (70.50) and a base score for women (77.82). If we look at it this way, it makes

sense that when gender isn’t controlled for, the base score for people in general lies

somewhere between these two numbers. A final logical change that can be observed

between the two models is the increase of the R-Squared value. Beginning at 8.3 percent,

the value increases to 9.2 percent when gender is included. This is what should be
Merrill 8

expected, as the addition of more factors typically means less unaccounted variation in

the dependent variable.

All of my hypotheses proved correct. The first, that gender would have some

effect on the responses, was supported. Once I controlled for being a woman, the impact

of age increased. Additional predictions such as younger people and women being more

likely to score higher on “vote in prison” were also confirmed in the regression analyses.

Conclusion

These findings are important because they provide new insight into the political

leanings of certain demographic groups. Allowing incarcerated individuals to vote would

be classified as a liberal idea, and our findings could potentially indicate that women and

younger people are more likely to identify with this political leaning. Of course more

research would be required to confirm this and other confounding factors such as the

original location where the survey was dispatched from could have impacted these

results. Another interesting finding was the R-squared coefficient. If I had predicted how

large it would be when both gender and age were measured, I probably would’ve

expected a higher number than 9.2 percent. This shows that there are many many other

factors impacting responses to “vote in prison.” A possible extension would be to see the

effect of political affiliation and race on the dependent variable.


Merrill 9

Appendix

Additional Graphs

[Table 1]

Table 1​. This figure reports the relationship between the IV and DV, “vote in prison” and age.

[Figure 1]

Figure 1​. This figure shows the relationship between the IV, and DV, “vote in prison,” and age. The x-axis
shows the different age groups, from youngest to oldest. The y-axis shows scores from 0-100, from least
supportive to most supportive for the DV. There is a clear negative relationship between the two variables.

[Table 2]

Table 2​. This figure reports the relationship between the IV, DV, and CV, “vote in prison,” age and gender.

[Figure 2]
Merrill 10

Figure 2​. This figure shows the relationship between the IV, DV, and CV, “vote in prison,” age and
gender. The x-axis shows the different age groups, from youngest to oldest. The y-axis shows scores from
0-100, from least supportive to most supportive of the DV. The solid line shows the trend for women and
the dashed line shows the trend for everyone who is not a woman.

[Figure 3]

Figure 3​ provides a graphical representation of our independent interval-level variable, age. From this
graph, which plots age on the x-axis and frequency on the y-axis, it is clear that the majority of survey
respondents were younger.

[Figure 4]
Merrill 11

Figure 4​ provides a graphical representation of our categorical control variable, gender. From this graph,
which plots gender on the x-axis and frequency on the y-axis, it is clear that about ⅔ of survey respondents
were female, and ⅓ were male.

Code for Altering Variables

- Transforming “vote in prison”:


- survey$voteinprison ← survey$`On a scale from 1 to 100 (with 0 being
strongly disagree and 100 being strongly agree), to what extent do you
agree or disagree with the following statement: “Incarcerated individuals
should be allowed to vote while in prison” - 0=strongly disagree,
100=strongly agree`
- survey$voteinprison.yes ← as.numeric(survey$voteinprison>50)

- Transforming age:
- cutpoints ← 25
- survey$age.4 ← cut2(survey$age, cutpoints)
- levels(survey$age.4) ← c("Below 25", "Above 25")

- Transforming gender
- survey$gender ← survey$`What is your gender? - Selected Choice`
- survey$female ← as.numeric(survey$gender=="Female")

- Creating Model 1
- model1 ← lm(voteinprison ~ age, data=survey)
- summary(model1)
Merrill 12

- Creating Figure 1
- plot(survey$voteinprison~survey$age, ylab="Opinion on Vote in Prison",
xlab="Age", main="Opinion on Voting While Incarcerated by Age")
- abline(model1)

- Creating Model 2
- model2 <- lm(voteinprison ~ age + female, data=survey)
- summary(model2)

- Creating Figure 2
- plot(survey$voteinprison ~ survey$age, ylab="Opinion on Vote in
Prison", xlab="Age", main="Opinion on Voting While Incarcerated by
Age")
- abline(lm(voteinprison ~ age, data=survey, subset=female==1))
- abline(lm(voteinprison ~ age, data=survey, subset=female==0), lty=2)
- legend("right", legend=c("Female", "Others"),lty=c(1,2), inset=.02,
cex=.6)

You might also like