Whitley 2017

721952
research-article2017
TSOXXX10.1177/0092055X17721952Teaching SociologyWhitley and Dietz
Teaching Note
Teaching Sociology
Turking Statistics: Student-

2018, Vol. 46(1) 44–53
© American Sociological Association 2017
DOI: 10.1177/0092055X17721952
https://doi.org/10.1177/0092055X17721952
generated Surveys Increase ts.sagepub.com
Student Engagement and

Performance
Cameron T. Whitley1 and Thomas Dietz1
Abstract
Thirty years ago, Hubert M. Blalock Jr. published an article in Teaching Sociology about the importance of
teaching statistics. We honor Blalock’s legacy by assessing how using Amazon Mechanical Turk (MTurk)
in statistics classes can enhance student learning and increase statistical literacy among social science
gradaute students. In addition, we assess whether using MTurk has an impact on student ability to make
professional progress. We find that, compared to traditional teaching methods, using MTurk increased
student performance, perceptions, and outcomes. In addition, using MTurk resulted in a measurable
increase in statistical literacy. We recommend that instructors teaching statistics consider how MTurk or
similar technologies can be used in their classrooms.
Keywords
teaching statistics, Mechanical Turk, learning statistics, statistical literacy
Thirty years ago, Hubert M. Blalock Jr. (1987) how to use statistics in their research, but they
brought our attention to the art of teaching statistics should be foundational courses in critical thinking
in an article published in Teaching Sociology. He and in understanding and assessing data. For grad-
articulated five goals that all statistics courses uate students who will go on to careers outside the
should aspire to: (1) overcoming fears, resistances, professorate, statistics courses can help develop
and tendencies to overmemorize; (2) the impor- tangible and transferable skills (Williams, Payne,
tance of intellectual honesty and integrity; (3) and Sloan 2015).
understanding the relationship between deductive
and inductive inferences; (4) learning to play the
role of reasonable critic; and (5) learning to handle Background
complexities in a systematic fashion. Three decades Nearly all graduate statistics courses make some
later, these five goals continue to highlight and use of exercises employing statistical software.
extend what we traditionally think of as statistical While we know of no data on the prevalence of
literacy, which “is the ability to understand and term papers in graduate statistics courses, anec-
critically evaluate statistical results that permeate dotal evidence suggests that many instructors feel
our daily lives-coupled with the ability to appreci- it is useful to require a term paper based on data
ate the contributions that statistical thinking can
1
make in public and private, professional and per- Michigan State University, East Lansing, MI, USA
sonal decisions” (Wallman 1993: 1). Corresponding Author:
Toward these ends, most graduate programs in Thomas Dietz, Department of Sociology, Michigan State
sociology require at least one statistics course. Not University, 316 Berkey Hall, East Lansing, MI 48828, USA.
only should these classes teach graduate students Email: tdietz@msu.edu
Whitley and Dietz 45
analysis in addition to or instead of exercises and to their interests, so students may be less motivated
exams. The term paper has several advantages. to develop statistical and data analysis expertise.
Writing a research paper based on statistical analy-
sis clarifies the relationship between theory and
method. It gives students “tacit knowledge” of how Using Amazon’s
to operationalize theoretical concepts, clean and
manage data, run diagnostic tests, and make opera-
Mechanical Turk (MTurk)
tional decisions about missing data, recoding, and for Data Collection
many other practical matters. In writing up results, An alternative to giving students secondary data
students can gain a deeper understanding of what for analysis is to allow students to collect their own
the data do and do not reveal about the theoretical data. However, collecting enough data to be useful
arguments being tested. And if the paper is of suf- for students in a graduate statistics course can be
ficient quality, it may lead to presentation at a expensive and time consuming. Amazon’s MTurk
meeting or publication or otherwise advance a stu- provides a quick and reasonably inexpensive means
dent professionally. of collecting data. MTurk is a crowdsourcing Internet
However, if students are going to write research application that allows individuals to use human
papers in the statistics course, they must have intelligence to perform specified tasks, including
access to data. There are three typical ways to completing online surveys. Using MTurk to collect
deliver this access. One is to provide all students survey data has become very popular in the social
with access to a single secondary general purpose sciences, always with the caveat that such data are not
data set, such as the General Social Survey (GSS). a representative sample. Once a survey is designed
Another is to ask each student to find her or his and uploaded to the platform, the researcher can get
own data from archives or other sources. Sometimes several hundred responses within hours. In our expe-
such data can be provided by the research advisor. rience, as long as standard protocols are in place to
A third approach is to collect data for the class that preserve anonymity and make clear that participa-
allow each student to utilize a set of questions spe- tion is voluntary, and are described in the introduc-
cific to his or her interests. In this article, we report tion to the survey, human subjects approval has
the results of an experiment comparing the first been straightforward for MTurk surveys because
approach, a secondary data set, with the third, a subjects are volunteers who are compensated for
data set developed for the class. We argue, as has their time.
been shown in other teaching applications, that giv- MTurk samples are more socioeconomically
ing students a less passive experience may evoke and ethnically diverse than social media postings,
greater learning (see e.g., Whitley 2013). In our like Twitter, Facebook, and Reddit (Casler, Bickel,
discussion, we return to a general comparison of all and Hackett 2013), and more representative of the
three approaches. Note that we are focusing on sur- U.S. population than college student respondents
vey data sets using individual respondents as the (Buhrmester, Kwang, and Gosling 2011; Ross et al.
unit of analysis. Of course, one might use data sets 2010; Stewart et al. 2015; Weinberg, Freese, and
on aggregate units, such as counties, states, cities, McElhattan 2014). MTurk data sets are relatively
or nations, but since this seems to be much less representative of the population of U.S. Internet
common, we will not discuss it here (but see Dietz users (Ipeirotis 2010; Ross et al. 2010) but tend to
and Kalof 2009 for a discussion and examples of be more female, be younger (average age of 36
macrolevel data). with a typical range 18 to 80+), have more educa-
The use of secondary data has several advan- tion, and have lower income than the general U.S.
tages. These data sets are usually readily available population (Levay, Freese, and Druckman 2016;
for free, they reduce the time needed to do data col- Paolacci and Chandler 2014; Paolacci, Chandler,
lection and cleaning, and they are seemingly easier and Ipeirotis 2010). Overall, scholars in a variety of
for students to manage. It is possible to select a sec- fields consider MTurk to be a useful data collection
ondary data set, such as the GSS, that is widely tool that can produce reasonable and consistent
accepted as of high quality, making it easier for results (Buhrmester et al. 2011; Goodman, Cryder,
students to publish the results from their class proj- and Cheema 2013). To date, MTurk has been used
ects when that is appropriate. But for many stu- to produce a wide variety of research studies in
dents, existing data sets have not incorporated sociology and kindred disciplines, but it has not
measures that allow work on topics that are central been discussed as a tool to teach statistics.
46 Teaching Sociology 46(1)
Methods Blalock’s (1987) second goal. The custom-

designed surveys were able to accommodate
Introductory graduate-level statistics courses were diverse research interests. Among the questions
taught in the fall of 2012 (“GSS class” hereafter) addressed by students were as follows: What per-
and fall of 2013 (“MTurk class” hereafter). sonal and social characteristics predict support for
Although both classes were taught by a sociology anti–sexual assault campaigns? What aspects of
department, enrollment was open to nonsociolo- employment influence job satisfaction? What fac-
gists. Both classes were taught using the same syl- tors drive public support for protecting natural
labus, same book (Dietz and Kalof 2009), and same space? How do values motivate the choice of
lecture materials; had the same four homework locally grown foods, and what factors mitigate (or
assignments; and required a term paper and final overshadow) these values? What motivates people
presentation. Both classes met once a week in the to complete MTurk tasks? What drives views about
same room at the same time for 220 minutes per how animal–vehicle collisions and roadkill should
week, divided between lab and lecture/recitation. be managed?
The only major differences between the courses Students were taught how to identify a research
was in the data used for the term paper. Of course, question, construct survey questions, and collect
students were not assigned at random to the two data. Specifically, students were first asked to
course offerings, so it is possible that some selec- select a research question they were interested in
tion bias could influence our results. that related to their research trajectory. Next, stu-
In the GSS class, the instructors provided data dents were asked to find in the literature or develop
from the year 2010 GSS (NORC 2016). Students 5 to 10 survey items, based on theory, that would
were given a lecture about survey data collection help them answer their research question. This set
but did not specifically engage in the process of of tasks engaged the class and instructors in what
designing, collecting, or cleaning raw data. The was essentially a workshop in survey instrument
advantage to this method was that data were readily design and involved discussions of the detailed
available and comprised a representative sample of mechanics of survey instruments, including options
the United States. However, the disadvantage was for response scales, issues of question ordering
that students had limited survey items to work within the instrument, appropriate handling of
from, so they were not always able to address missing data categories, redirects and skips, and
research questions that matched their interests or other design topics. The instrument was designed
contributed to developing expertise in their fields in Survey Monkey so students could easily imple-
of specialization. For instance, one student was ment different approaches to designing items and
interested in public support for biodiversity conser- blocks of items, discuss them, and change them.
vation efforts. The closest she could come to this We also discussed the strengths and weaknesses of
topic was the 2010 GSS item “Hunting is more MTurk samples, including the need for attention
likely than climate change to make polar bears checks, screening data for respondents who give
become extinct. Is that true or false?” Another stu- the same answer to every question in a block, and
dent interested in food deserts wanted to assess related issues. These tasks addressed Blalock’s
who was more likely to overcome restricted food (1987) third and fourth goals. We then divided the
access. The only question mentioning food in the class into three groups to run three surveys to avoid
2010 GSS is a general question about global warm- an overly long instrument. Each survey consisted
ing that mentions food production as being of three to five questions from each student,
impacted. Overall, a great deal of effort was amounting to 30 to 50 questions plus standard
required by the students, professor, and teaching demographic and social structural items. Many of
assistant (TA) in finding 2010 GSS question items the questions were presented in Likert-type scale
that even loosely aligned with student research question blocks, so the overall number of questions
interests. was much smaller. The surveys were launched at
In the MTurk class, each student designed sur- the end of the first month of class. A moderate sam-
vey questions that matched his or her interests. ple size (~300) was obtained within 24 hours for
Data were collected via MTurk. Before beginning each of the three samples. Since the surveys took
data collection, all students had to complete institu- less than 10 minutes to complete, we offered 25
tional review board training, which gives instruc- cents compensation to each respondent, for a total
tors an opportunity to discuss and promote cost of $225. This was the standard practice at the
intellectual honesty and integrity, thus meeting time. Now, it is recommended that respondents be
paid the equivalent of minimum wage. In hind- associated with their completion. First, students
sight, we would now offer $1.20 for survey com- were given a diagnostic exam that consisted of 30
pletion. Once the data were available, the class questions, which ranged from defining basic con-
discussed how to organize and clean their data sets, cepts to interpreting a regression table. (Content of
which connected to Blalock’s (1987) fifth goal. each of these batteries of items is available from
The advantage of the MTurk process was that the the authors.) Second, students were presented with
students learned firsthand about survey design, they 31 single-word items related to statistics and asked
developed questions that interested them, and they to rank how familiar they were with each item on a
had a hands-on experience. The disadvantage was scale of 1 to 5 ranging from not familiar at all to
that students struggled with the process, the sample very familiar. Items included statistical concepts,
was not representative of the United States, and it cost such as “confidence interval,” “F test,” and “sam-
a few hundred dollars to collect. Students’ final proj- pling distribution.” Finally, in the end-of-class
ect in both classes was to use data (either GSS or assessment, students were asked to evaluate and
MTurk) and the methods they learned in the course, rank their professor, TA (10 Likert-like items each),
primarily multiple regression, to answer an original and the class (15 Likert-like items). In addition, we
research question. In both classes, students had to pro- conducted follow-up qualitative interviews with
duce a journal-length research paper and present the students in 2015 and 2016, three years after they
results to the class. completed either the GSS or the MTurk course.
We emphasize that in comparing between These interviews were used to determine whether
classes, because students were not randomly assigned the statistics course assisted them in getting through
to the year they took the course, differences any of their graduate school milestones (compre-
between the classes might be due to selection bias hensive exams, dissertation proposal, and disserta-
rather than differences between using MTurk and tion) and whether they presented or published their
the GSS. It also possible that subtle biases on the final paper projects.
part of the instructors influenced student experi- A total of 10 students took the GSS class, 4 (40
ences. For the measures where we have pre- and percent) were sociology students, and 6 (60 per-
postclass data, we examine differences between the cent) were women. There were 13 students in the
two classes in the change in those measures. This is MTurk class, 8 (61 percent) were sociology stu-
equivalent to using fixed-effects panel regression dents, and 9 (70 percent) were women. The same
and controls for characteristics of the individual professor and TA taught both courses. The profes-
students that were constant over time but not for sor has taught graduate statistics courses in sociol-
those that change over time. Thus, any causal inter- ogy with a similar, albeit evolving, syllabus for
pretation of our results must be seen as suggestive over 30 years.
rather than definitive. We offer p values based on
ordinary least squares regression two-tailed t tests
using class as a dummy variable to predict scores Results
or differences in scores. Identical substantive
results and nearly identical p values are obtained Student Perceptions
with t tests using the assumption of unequal vari- Increasing student familiarity with statistical con-
ances and the Welch correction for degrees of free- cepts is intended to reduce fear of quantitative
dom. Again, given the small sample size and the methods, Blalock’s (1987) first goal. Thus, stu-
lack of random assignment, p values should be dents’ perceptions of their understanding of statisti-
seen as approximate. cal concepts is important not just per se but also in
assessing general comfort with statistics more
broadly. On average, students in both classes
Data reported an increase in familiarity with concepts
We assess the impacts of using MTurk using both between the first and final days of the course. At
pre- and postclass measures of students’ knowl- the beginning of the course, students in the MTurk
edge and familiarity with statistical concepts, and GSS class scored 2.2 out of 5 and 2.1 out of 5,
course evaluations, and qualitative follow-up inter- respectively, for concept familiarity (see Table 1).
views. Our pre- and postclass measures are based These scores indicate that students had “heard of
on students completing three tasks on the first and the concepts, but [were] not familiar with them.”
on the final day of the class. All completed tasks The difference between the two classes on day 1
were anonymous and no points toward grades were was not significant (see Table 2).
Table 1. Comparison of Familiarity, Diagnostics, and Overall Rating between the Two Courses.
Class
Variable GSS MTurk Scale

n 10 13
Precourse familiarity with concepts 2.3 2.1 Out of 5
Postcourse familiarity with concepts 3.9 4.6 Out of 5
Precourse diagnostic exam score 19.7 19.6 Out of 100
Postcourse diagnostic exam score 78.4 88.4 Out of 100
Rating of course by students 4.6 4.9 Out of 5
Note: GSS = General Social Survey; MTurk = Mechanical Turk.
Table 2. Comparison of GSS and MTurk Classes.
Comparison Difference Test Statistic p

Difference in familiarity with statistical Difference in means = −0.116 t = −0.54 p = .597
concepts at start of class
Difference in change over the course Difference in differences = 0.737 t = 3.75** p = .01
in familiarity with statistical concepts
Difference in diagnostic test at start Difference in means = −0.026 t = −0.01 p = .996
of class (mean percentage correct)
Change over the course in diagnostic Difference in means = 58.7 t = 12.23** p < .001
test score for GSS class
Change over the course in diagnostic Difference in means = 68.8 t = 27.85** p < .001
test score for MTurk class
Difference in changes in diagnostic test Difference in differences = 10.1 t = 2.01+ p = .057
between GSS and MTurk class
Note: GSS = General Social Survey; MTurk = Mechanical Turk.

+p < .10. **p < .01.
Analysis of the change from the start of class to knowledge on how to analyze social data, but I still
the end of class in each course reveals that familiarity have trouble understanding some parts,” and
increased more in the class using MTurk (see Table another GSS course student noted, “I really wanted
2). On the last day of the course, students in the GSS to learn statistics, but I feel like I just learned how
class reported an average score of 3.9 out of 5 for to write code for STATA.” In contrast, students in
familiarity and comfort with statistical concepts, indi- the MTurk course seemed to feel more confident in
cating they had “a basic command of the concept.” In their skills and their ability to apply what they had
contrast, students in the MTurk class reported an aver- learned to their research. For instance, one MTurk
age of 4.6 out of 5, indicating they were somewhere course student wrote, “I think I have learned the
between having “a basic command of the concept” necessary skills to be successful.” Another MTurk
and having “a strong command of the concept.” course student asserted, “I am comfortable with
In addition to quantitative data, we also obtained everything we learned in the course,” and another
open-ended statements from the students about stated, “I now know what I need to.” Student per-
their familiarity and comfort with statistical con- ceptions are important, but additionally we needed
cepts in the course. When asked if they felt more to know if perceptions translated into actual perfor-
comfortable with statistics after taking the course, mance differences between the two classes.
students in the GSS course generally responded
that they enjoyed the course but still felt uneasy
with statistics and unsure about using a software Student Performance
package. For instance, one GSS course student Students’ self-reported familiarity with statistical
responded, “I feel I gained a good amount of concepts may not match actual understanding.
Students were given a diagnostic quiz on the first course also mentioned that taking the course
and the final day of the course. This diagnostic tool changed their minds about statistics and they have
allowed us to assess if students could distinguish both gone on to take more advanced courses.
between inductive and deductive reasoning, be a Nearly every student from the MTurk class men-
critical consumer of statistics, and work with statis- tioned that the process helped them in forming and
tical complexity—the third, fourth, and fifth goals thinking about their proposal or dissertation, even
outlined by Blalock (1987). (The diagnostic quiz if the topic for the statistics paper was not their dis-
and familiarity were correlated 0.27 on the first day sertation topic. For instance, one MTurk class stu-
of class and 0.56 on the last day.) There were no dent said, “I didn’t publish it . . . but this process
initial differences in the diagnostic test between the helped me create a plan . . . like I knew what to
two classes (see Table 2). We were happy to see expect and do and that was helpful.” In contrast, a
that students in both classes demonstrated a signifi- student from the GSS course said, “I think I learned
cant increase in learning from the first to the final about statistics, but I won’t really know what I
day of the course. Students in the GSS class learned until I apply it to my own research. . . . I
increased their average diagnostic score by 58.7 mean, I’m sure I learned stuff, but it is hard to pin-
percentage points, from 19.7 percent to 78.3 per- point what I learned.”
cent (see Table 2). Students in the MTurk class At the time we submitted this article for publi-
increased their average score by 68.8 percentage cation, one student from the GSS course published
points, from 19.6 percent to 88.5 percent (see Table his paper, a second student reported that she pre-
2). These findings indicate that students in the sented the paper at a conference, and a third student
MTurk class increased their knowledge by an aver- had a revise-and-resubmit (R&R) but elected to not
age of 10 percentage points more than did students resubmit the article to focus on other things. The
in the first class. The difference in learning (the dif- student with the R&R informed us,
ference in differences) approaches but does not
quite achieve statistical significance at the .05 level I didn’t end up publishing that paper or
but is significant at the .10 level (see Table 2). This using it towards my dissertation. I submitted
is consistent with the argument that students it and it got an R&R but I decided to move
learned more in the course using MTurk. on to other projects and not pursue
publication. . . . I picked the topic because of
limited options with the data. It was
Student Achievement after Class something to do, but it doesn’t reflect what I
We wanted to determine if taking either of these am interested in.
courses had an impact on the completion of gradu-
ate school milestones. We conducted interviews So, the student who got an R&R decided not to
with students in both classes three years after com- resubmit because she was not interested in working
pleting the course. We could connect follow-up on the project and did not see it contributing to her
interviews with 8 of the 10 students from the GSS research trajectory. This story is perhaps the most
class and with 10 of the 13 students in the MTurk representative of the issues inherent in secondary
class. In both cases, the missing students left their data: students cannot always pursue their specific
graduate programs without graduating and could research interests, and this lack of interest can
not be reached. mean that they abandon projects prematurely.
We focused our interviews on two open-ended Overall, the students interviewed from the GSS
questions. First, we asked if participating in the sta- class said that learning the course concepts contrib-
tistics course helped them achieve graduate school uted to the completion of their milestones, but only
milestones and, if so, how. Second, we asked stu- one student built on his paper in his dissertation
dents what came of their final course papers. work and got a publication based on work in the
Specifically, we asked if they were they able to class.
present, publish, or defend papers that were ini- In interviewing students from the MTurk class,
tially started in their statistics course. All students we found that two students had published their
interviewed from both classes felt that the statistics final papers, three students had done conference
course contributed to their achieving graduate presentations, and an additional four students had
school milestones. In addition to the usual things submitted their papers to journals for review. One
mentioned, like comprehensive exams, proposals, student had won a graduate student paper award
and dissertations, two students from the MTurk after submitting his final paper to a competition.
An MTurk class student who has not published her management and analysis (Long 2009); (3) it trains
paper, but had submitted it for publication after students to do exactly what is required in quantita-
additional work, noted, “I haven’t published it, but tive research; and (4) it gives the students an oppor-
it was really useful to my dissertation. . . . I mean I tunity to conduct research that can contribute to
used the same methods . . . and the learning process their publication record and perhaps to their
was good. I still hope it gets published, but it was dissertation.
helpful and I learned how to conduct a real survey.” However, a paper assignment faces the instruc-
However, not everyone was excited about the idea tor with a decision to make about what data will be
of using MTurk data. As we mentioned before, a used in those papers. There are three basic choices:
leading concern is that data generated from MTurk, (1) all students can be required to use a general pur-
although more diverse than a sample of college stu- pose data set, such as the GSS; (2) each student can
dents, are not representative of the U.S. population. be required to find data for the paper; and (3) an
For example, the advisor of one MTurk class stu- MTurk survey or other method can be used to gen-
dent was skeptical about her being able to publish erate data for questions proposed by the students to
data collected from MTurk, so he recommended match their research interests. In our analyses, we
against trying it. (The advisor was not a sociolo- compare classes that used approaches 1 and 3, but
gist.) Regarding trying to publish her final paper, in this discussion it will be useful to consider all
she noted, “I didn’t try to publish my paper. I had three options.
considered it, but my advisor was very skeptical of Table 3 provides a rough summary of the advan-
the data collection procedure and didn’t think it tages and disadvantages of each method. Quality of
would pass the peer review process and told me not sample refers to the degree to which the data are
to try.” from a representative sample and were collected
Another student suggested that although her with methodological care. Here a preexisting data
final paper has not been published, the process was source, such as the GSS, performs best. MTurk
helpful in thinking about and piloting her disserta- data are a convenience sample and thus viewed
tion research: with skepticism by some subfields. Student-
provided data vary in quality. Sometimes such data
It allowed me to pilot survey questions that come from an ongoing project of the student’s
were later used in telephone interviews and advisor or is a high-quality data set from a data
mail-back surveys. It was nice to see what archive, but sometimes students find data that,
the data from the questions actually looked while matching their interests, are of limited
like, how I would be able to analyze it, and quality.
make the appropriate changes needed for my By hidden problems, we mean discovering
actual data collection. It was really helpful unexpectedly large amounts of missing data, find-
because I knew the data set, was invested in ing that questions of interest were asked of differ-
it, and it directly related to what I am doing ent subsamples and so cannot be used in the same
for my research. It was also a manageable analysis, learning that the data are not as docu-
size for learning statistics, something my mented, and other issues that arise only after the
actual data sets are not. student has invested considerable time and effort in
moving toward analyzing the data. While one
might expect this is not a problem with high-qual-
Discussion ity secondary data sets, that is not our experience.
Since Blalock (1987) published his article on For example, the GSS has a complex set of sub-
teaching statistics 30 years ago, the use of statistics samples and the documentation is difficult to inter-
in the social sciences has grown substantially. pret, so we have seen students discover serious
Many graduate statistics courses place substantial problems for their analysis only after investing
emphasis on completing a paper based on data considerable effort. Since the class is designing the
analysis. The advantages to requiring a paper in MTurk study, there are no such problems. The
addition to exercises are that (1) it gives students an degree to which this issue arises with student-
opportunity to understand the many subtle and provided data varies.
complex decisions going from a research question The effort to create a data set ready for analysis
and raw data to drawing conclusions from an anal- also varies across approaches. For preexisting data,
ysis; (2) it gives students practical experience in instructors can do much of this work in advance,
conducting an analysis, including skills in data ensuring a usable working sample, creating recodes
Table 3. Issues in Using Secondary, MTurk, and Student-provided Data.
Data Provided by
Issue Secondary Data Set MTurk Data Students
Quality of sample Very high, Diverse convenience Varies but often high
representative sample sample quality
Chance of hidden Moderate to low Low Moderate to high
problems
Effort required to create Moderate to low Low Moderate to high
working data set
Match to student Moderate to low High Moderate to high
interest
Chances of contribution Moderate to low High Moderate to high
to dissertation or
other future research
Monetary costs None Moderate None
Note: MTurk = Mechanical Turk.
of commonly used variables, and so on. This leaves good match to their interests. However, in some
only recoding and related tasks for the data specific cases, those data turn out not to be as documented or
to each student’s project. But as noted, in the case they have serious flaws not obvious until analysis is
of some secondary data sets, there can be subtle under way. One challenge for this approach arises
problems that are hard to detect until analysis because most departments prefer students to take
begins. Since the MTurk data sets are generated for the first statistics course at or near the start of the
the course, there should be no problems with docu- graduate program. First or second semester students
mentation, and the process of creating a data set for are less likely to have learned about data sets suit-
analysis is straightforward. One of the major obsta- able for this approach.
cles to using data provided by students is that each For universities and colleges that are members
student will be doing data preparation with a differ- of data archives consortia, such as the Inter-
ent data set, and in some cases, the amount of work University Consortium for Political and Social
required is very substantial, effectively making the Research, access to many secondary data sources
professor and TA data analysis consultants to each for use in class is free. Data provided by the indi-
member of the class. vidual student are also typically free for the pur-
Our experience is that secondary data sets do not poses of the course. Collecting data via MTurk
provide a good match to many student’s interests. does require a source of funds. The cost structure of
MTurk data will be a good match since each student MTurk has changed somewhat since we used it in
can develop his or her own questions to match his or class. The prevailing norm is that MTurk workers
her interests. Whether the data analysis will advance should be paid for their time at something like the
the students’ careers beyond honing their statistical minimum wage rate. So, for a survey taking eight
skills is contingent on the other issues. If students minutes to complete at a $10 wage rate, the cost
can find questions that match their research inter- with Amazon current fees for an MTurk task is
ests in a high-quality secondary data set, then they about $2 per completed response, or about $600 for
often can produce results of publishable quality. But 300 respondents. We appreciate that for courses
often the match between student interest and what is without a monetary budget, this could be a substan-
available is poor. MTurk data will be a good match tial obstacle. However, the online environment for
to student interests since the students develop the survey research is changing rapidly, and there may
questions. In some subareas of the social sciences, be lower-cost options. We also note that some ser-
convenience samples are acceptable for publication vices, like Qualtrics, offer quota samples that
of exploratory work; in others, they are not. Even approximate a nationally representative sample for
when they are not, the MTurk data give a student an slightly more than $5 per completed response for a
opportunity for pilot testing, which can then be used 10-minute instrument. This might allow an approxi-
in a proposal or for a research grant. For data the mately representative sample albeit for somewhat
student provides, we have found that usually stu- higher cost. We also note that in many disciplines,
dents can find data of reasonable quality that are a courses involving “hands-on” work in collecting and
analyzing data have a lab budget. Sociology depart- uncertainty. While these goals can be achieved
ments might consider moving in this direction. with other forms of data, the engagement students
It should be obvious from Table 3 and our discus- feel with data that reflect their research interests is
sion of it that there is no single “best” approach. Of a major help in moving students forward.
the three methods we have tried, we have been least Our study has a number of limitations. First, since
satisfied with the use of secondary data sets and most we did not have random assignment to our two
satisfied with MTurk. Our analysis of our experience classes, it is possible that some selection bias in who
indicates why we prefer MTurk to use of a general- enrolled in which class, that instructors being espe-
purpose secondary data set. The disadvantage of cially enthusiastic about the use of MTurk, or that
MTurk is that there is an extra financial cost and that, some other difference between the two classes other
in a one-semester course, students and instructors than the use of the GSS versus MTurk generated the
must move quickly to design and implement the sur- differences we observed. Using individual differences
vey early in the semester. Further, MTurk is a source lends credibility to that claim that our results were not
of publishable data in some fields but not others. We due to selection bias. We also examined the differ-
have not done a comparable experiment on the ences in student evaluations of the professor and TA
approach of each student finding his or her own data, and found no significant differences (for professor, p
although one author has used that approach several = .613; for TA, p = .733), which also supports the
times. But we suspect this approach produces more argument that the differences we observed were the
variation within the class than the other two result of using the GSS versus MTurk. But that claim
approaches. Some students use a data set provided by must be viewed as tentative. Second, we are limited
their advisor and on which they are already working. by the size of our classes. Although we did find a sta-
But others, and often those with the most anxiety tistical difference in performance and perceptions
about the course, are asked to find a data set at a point between the two courses, we had only 10 students in
in the academic career when they have very little the GSS course and 13 students in the MTurk course,
familiarity with quantitative data analysis. This can so our ability to detect significant differences is
produce a situation where the students who are limited.
already most experienced at quantitative methods Instructors interested in using MTurk data in
gain most, and those who are least experienced gain statistics classes will of course have to develop
least, a parallel to the “Matthew effect” (Merton some familiarity with the system. We find the easi-
1968). est way to implement the survey instrument is via a
Traditionally, exercises in a graduate-level statis- survey web tool, such as Survey Monkey or
tics course use secondary data on which students per- Qualtrics. These are relatively easy to use and
form analyses. This allows use of a high-quality data allow students to preview alternative ways of struc-
set that instructors can prepare so that students can turing responses and organizing a survey instru-
focus on the goals of the exercise. However, when ment. Using MTurk requires thinking through
students are assigned a term paper, there are draw- some issues that are much the same as in any sur-
backs to secondary data. The greatest of these, in our vey, such as making sure questions are clear and
view, is that the data may not relate to the students’ meaningful. Other issues are specific to any survey
area of interest or research trajectory. As a result, their mechanism where respondents are paid to partici-
interest level in analyzing the data may be minimal. pate, including respondents who “straightline” by
For students who already lack confidence or who do giving the same response to every question and
not see statistics as salient to their interests, this may respondents who do not read questions. Attention-
be an impediment to achieving Blalock’s (1987) goals check questions and checks on time to complete
for the statistics course. We note that data from MTurk can help filter such poor-quality respondents.
surveys or other sources tailored to student interests Finally, we note that we do not believe the benefits
could also be used for exercises. we obtained in our MTurk class were specific to the
Considering why MTurk led to better outcomes, use of MTurk. Rather, we believe the results came
we submit that the approach adhered to the key from providing a mechanism by which students were
goals that Blalock (1987) outlined. By collecting, able to work on a research project that matched their
cleaning, and analyzing data made possible by own professional interests. The assignment of a term
using Amazon’s MTurk technology, students were paper requiring original data analysis is of course a
able to overcome fears, discuss intellectual honesty form of inquiry-based learning, an approach that oth-
and integrity, explore the relationship between ers have noted is beneficial in teaching social science
deductive and inductive reasoning, be a reasonable logic and methods (McCright 2012). In the graduate
critic, and learn to work with complexity and statistics course, many students have well-formed
research interests and others have ideas they wish to NORC. 2016. General Social Survey. Retrieved May 17,
explore. The enthusiasm that can come from pursuing 2017 (http://gss.norc.org/).
these interests can help counterbalance statistical anx- Paolacci, Gabriele, and Jesse Chandler. 2014. “Inside the
iety and demonstrate to students how statistical meth- Turk: Understanding Mechanical Turk as a Participant
Pool.” Current Directions in Psychological Science
odology and, in particular, the logic of statistical
23(3):184–88.
reasoning can support their research interests. So, any Paolacci, Gabriele, Jesse Chandler, and Panagiotis
mechanism that would allow students to pursue their Ipeirotis. 2010. “Running Experiments on Amazon
interests could provide those benefits; at this stage of Mechanical Turk.” Judgment and Decision Making
the evolution of research methods, surveys via MTurk 5(5):411–19.
provide an effective vehicle for this approach. Ross, Joel, Lilly Irani, M. Silberman, Andrew Zaldivar, and
Bill Tomlinson. 2010. “Who Are the Crowdworkers?
Shifting Demographics in Mechanical Turk.” Pp.
Editor’s Note 2863–72 in CHI’10 Extended Abstracts on Human
Reviewers for this manuscript were, in alphabetical order, Factors in Computing Systems. New York: ACM.
Susan Caufield and Pamela Paxton. Stewart, Neil, Christoph Ungemach, Adam J. L. Harris,
Daniel M. Bartels, Ben R. Newell, Gabriele Paolacci,
and Jesse Chandler. 2015. “The Average Laboratory
Authors’ Note Samples a Population of 7,300 Amazon Mechanical
We thank the students in our graduate statistics course for Turk Workers.” Judgment and Decision Making
their engagement in the class and afterward as well as the 10(5):479–91.
Teaching Sociology reviewers for helpful comments. Wallman, Katherine K. 1993. “Enhancing Statistical
Literacy: Enriching Our Society.” Journal of the
American Statistical Association 88(421):1–8.
References Weinberg, Jill D., Jeremy Freese, and David McElhattan.
Blalock, Hubert M., Jr. 1987. “Some General Goals in 2014. “Comparing Data Characteristics and Results
Teaching Statistics.” Teaching Sociology 15:164–72. of an Online Factorial Survey between a Population-
Buhrmester, M., T. Kwang, and S. D. Gosling. 2011. based and a Crowdsource-recruited Sample.” Socio-
“Amazon’s Mechanical Turk: A New Source of logical Science 1:292–310.
Inexpensive, Yet High-quality, Data?” Perspectives Whitley, Cameron T. 2013. “A Picture Is Worth a
on Psychological Science 6(1):3–5. Thousand Words: Applying Image-based Learning to
Casler, Krista, Lydia Bickel, and Elizabeth Hackett. 2013. Course Design.” Teaching Sociology 41(2):188–98.
“Separate but Equal? A Comparison of Participants Williams, Malcolm, Geoff Payne, and Luke Sloan. 2015.
and Data Gathered via Amazon’s MTurk, Social “Making Sociology Count: Some Evidence and
Media, and Face-to-face Behavioral Testing.” Context in the Teaching of Quantitative Methods
Computers in Human Behavior 29(6):2156–60. in the UK.” Pp. 171–86 in An End to the Crisis of
Dietz, Thomas, and Linda Kalof. 2009. Introduction to Empirical Sociology? Trends and Challenges in
Social Statistics: The Logic of Statistical Reasoning. Social Research, edited by L. McKie and L. Ryan.
New York: Wiley-Blackwell. London: Routledge.
Goodman, Joseph K., Cynthia E. Cryder, and Amar
Cheema. 2013. “Data Collection in a Flat World:
The Strengths and Weaknesses of Mechanical Turk Author Biographies
Samples.” Journal of Behavioral Decision Making Cameron T. Whitley is a 2017 sociology PhD graduate
26(3):213–24. from the Department of Sociology at Michigan State
Ipeirotis, Panos. 2010. “The New Demograohics of University. His research addresses altruism and the percep-
Mechanical Turk.” Retrieved May 17, 2017 (http:// tion of others, with a focus on environmental decision mak-
www.behind-the-enemy-lines.com/2010/03/new- ing and, in particular, energy developments. His interest in
demographics-of-mechanical-turk.html). teaching as research has led to publications in Teaching
Levay, Kevin E., Jeremy Freese, and James N. Druckman. Sociology; Education, Citizenship and Social Justice;
2016. “The Demographic and Political Composition of Environmental Education Research; and the International
Mechanical Turk Samples.” SAGE Open 6(1):1–17. Journal of Sustainability in Higher Education.
Long, J. Scott. 2009. The Workflow of Data Analysis
Using Stata. College Station, TX: Stata Press. Thomas Dietz is professor of sociology and environmen-
McCright, Aaron M. 2012. “Enhancing Students’ tal science and policy at Michigan State University. He
Scientific and Quantitative Literacies through an has taught undergraduate and graduate statistics for over
Inquiry-Based Learning Project on Climate Change.” 30 years and has written texts in both social statistics and
Journal of the Scholarship of Teaching and Learning social research methodology. His research focuses on
12(4):86–101. altruism in environmental decision making, on the human
Merton, Robert K. 1968. “The Matthew Effect in driving forces of environmental change, and on the inter-
Science.” Science 159(3810):56–63. play of science and democracy.

Whitley 2017

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Whitley 2017

Uploaded by

Copyright:

Available Formats

721952

Turking Statistics: Student-

generated Surveys Increase ts.sagepub.com

Student Engagement and

Cameron T. Whitley1 and Thomas Dietz1

Methods Blalock’s (1987) second goal. The custom-

Variable GSS MTurk Scale

Note: GSS = General Social Survey; MTurk = Mechanical Turk.

Table 2. Comparison of GSS and MTurk Classes.

Comparison Difference Test Statistic p

Note: GSS = General Social Survey; MTurk = Mechanical Turk.

Table 3. Issues in Using Secondary, MTurk, and Student-provided Data.

Note: MTurk = Mechanical Turk.

You might also like