You are on page 1of 13

Original Research Article

Big Data & Society


January–June: 1–13
“We called that a behavior”: The making ! The Author(s) 2020
Article reuse guidelines:
of institutional data sagepub.com/journals-permissions
DOI: 10.1177/2053951720932200
journals.sagepub.com/home/bds

Madisson Whitman

Abstract
Predictive uses of data are becoming widespread in institutional settings as actors seek to anticipate people and their
activities. Predictive modeling is increasingly the subject of scholarly and public criticism. Less common, however, is
scrutiny directed at the data that inform predictive models beyond concerns about homogenous training data or general
epistemological critiques of data. In this paper, I draw from a qualitative case study set in higher education in the United
States to investigate the making of data. Data analytics projects at universities have become more pervasive and intensive
to better understand and anticipate undergraduate student bodies. Drawing from 12 months of ethnographic research at
a large public university, I analyze the ways data personnel at the institution—data scientists, administrators, and
programmers—sort student data into “attributes” and “behaviors,” where “attributes” are demographic data that
students “can’t change.” “Behaviors,” in contrast, are data defined as reflective of what students can choose: attending
and paying attention in class, studying on campus, among other data which personnel categorize as what students have
control over. This discursive split enables the institution nudge students to make responsible choices according to
behavior data that correlate with success in the predictive model. In discussing how personnel type, sort, stabilize, and
nudge on behavior data, this paper examines the contingencies of data making processes and implications for the
application of student data.

Keywords
Big Data, institutions, predictive modeling, nudge, higher education, student data

Introduction
explained to me and university stakeholders that the
predictive model only drew from “behavior” data—
Be more white. Be more male. Be wealthier. Those are what he described as “things students can change”—
the biggest correlations with success. It’s terrible, but and not demographic markers like race, gender, and
it’s the truth. socioeconomic status. Don, in telling me “the truth,”
suggested these demographic markers were accepted by
—Excerpt from interview with Don,1 a university stakeholders as not only immutable but also indicative
administrator of a student’s prospects for success.3
But despite the collectively understood disparities in
“The truth” was taken for granted among data sci- success, a focus only on the “things students can
entists, administrators, and programmers in their work change” serves to make demographic differences less
on predictive modeling using institutional data to obvious and integral to the university’s model of suc-
anticipate whether or not a student would graduate cess. Through the sociotechnical obscuring of demo-
within a four-year period. This observation came graphic data in predictive modeling and illumination
from Don, a long-time administrator at the university of data that instead highlight behaviors that correlate
who is engaged in the application of nudges2 derived
from the predictive model’s outputs. Sitting across a
table from me in the university’s student union, alert Department of Anthropology, Purdue University, West Lafayette, USA
for eavesdroppers, he explained how he thought of
Corresponding author:
such inequalities as common but mostly unspoken Madisson Whitman, Purdue University, 700 W. State St., West Lafayette,
knowledge among faculty and administrators at the IN 47907-2050, USA.
university. Over the course of my fieldwork, he Email: mwhitma@purdue.edu
Creative Commons NonCommercial-NoDerivs CC BY-NC-ND: This article is distributed under the terms of the Creative Commons
Attribution-NonCommercial-NoDerivs 4.0 License (https://creativecommons.org/licenses/by-nc-nd/4.0/) which permits non-commercial
use, reproduction and distribution of the work as published without adaptation or alteration, without further permission provided the original work is
attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
2 Big Data & Society

with success, the likelihood of a student graduating in predictive analytics in higher education in relation to a
four years appears more contingent on those behaviors larger landscape, the literature that underpins my anal-
than demographic markers. ysis, and my methodological approach. I then address
In this article, I explore an institutional shift from the typing of data into behaviors and attributes, the
reliance on demographic data to what administrators, sorting of data into behaviors, maintenance of behav-
data scientists, and programmers at the university have iors as accurate proxies, and nudging of students on
constructed as behaviors. Universities have long had behaviors as processes that assist personnel in solidify-
involvement in reshaping demographic categories ing behavior data.
(Hanel and Vione, 2016), not only through admissions
processes that shape student bodies but also through Research site
research conducted in association with institutions that
This qualitative case study draws from ethnographic
influence culturally held ideals about meritocracy
fieldwork I conducted at a large public university in
(Warikoo, 2016). As universities take a data-driven
the United States across a 12-month period. Data per-
approach to such ideals, they increasingly seek predic-
sonnel characterize the institution, which hosts roughly
tive power, more context for enrollees and applicants,
30,000 undergraduate students, as at the forefront of
inventive ways to produce knowledge about students,
applying institutional data to success initiatives. As an
and methods of sidestepping contentious issues around
administrator told me, “no one’s ever done the data
inequality (Selwyn, 2015). In a departure from demo-
mining like we’re doing the data mining,” alluding to
graphic data, which are solicited through self-reporting
the university’s computational approach institutional
and traditional markers of a student’s category mem-
research (IR), expansive data infrastructures, and
bership, personnel tout behaviors as a better, more
vision for how data could revolutionize higher educa-
neutral alternative. Automated data collection alleges
tion. At the time of its foray into predictive modeling,
more consistency, standard instruments for gathering,
the university was indeed unique for its repurposing of
an expansive sample, and direct proxies. However,
wireless network usage data as a proxy for students’
I argue that in practice the making of new data—
whereabouts. Moreover, it has digitized much of its
behaviors—is no less fraught.
institutional data. When I interviewed Henry, a data
scientist, he got up to go to his bookshelf and returned
Making data with a massive binder, at least four inches thick. The
How do data become behaviors, and how do those binder held a printout of deidentified, aggregated stu-
behaviors come to take the place of demographic dent data, the precursor to what is now an online, inter-
data in how the institution manages its student body? active visualization.
In this essay, I build on critical data studies scholarship
(see Iliadis and Russo, 2016) by empirically demon- M: Oh my god. Volume one of how many?
strating not solely that data are made but also how H: Uh, I want to say less than ten but more than two.
they are made. In doing so, I illustrate how actors at M: That’s a lot. That’s a lot of paper.
the university discursively render data into types such H: And they would do this every year. It would take
that demographic markers can seem removed from stu- them from the time the data was available until, like,
dent success. December, just to be able to produce a book of this.
Drawing from qualitative research I conducted at a Now, when the data’s available, that [interactive visu-
large public university in the United States, I argue that alization is] updated the next day. So, like, that was the
by revising predictive modeling and nudging to focus world that they were living in, so, like, when you’re
even more intensively on what they deem behaviors, living in that world, you don’t have time to do the
data personnel—data scientists, administrators, and advanced stuff.
programmers—position students as subjects of nudges
responsible for their own success. The institutional To be able to “do the advanced stuff,” the university’s
reframing of success in terms of “what students can multiple information technology (IT) and IR offices
change” enables the institution to transfer the burden worked together on collecting and sorting data, espe-
of success away from itself and keep the tacitly held cially computationally. Solicited by university admis-
knowledge of inequality out of the university’s visions sions through the application process, demographic
for predictive modeling. data, which include data such as high school ZIP
To demonstrate how data personnel produce behav- code, parental income, race, ethnicity, and gender,
ior data in a way that enables the institution to mini- are readily available for IR to incorporate into its stud-
mize the impact of demographic markers on success, I ies of the institution and student body. While demo-
first provide a brief overview of my research site and graphic data have long factored into many aspects of
Whitman 3

IR, the university has more recently sought other sour- Data that correspond with people are ample and are
ces of data for quantifying and understanding its stu- mobilized by institutions through predictive mecha-
dent body. As it moves ahead in its foray into academic nisms. “Data doubles,” or data that stand in for
predictive analytics, wherein it uses Big Data to predict people in systems and analytics, frequently outperform
outcomes, from a student’s likelihood at graduation them (Haggerty and Ericson, 2000; Raley, 2013). Gavin
within a four-year period to a student’s prospects at JD Smith (2016), David Lyon (2003), and Cathy O’Neil
getting an A or B in an individual class, the university (2016) have all pointed out the potential destructive-
has moved away from demographic data into what ness of proxies and data doubles when treated as
data personnel refer to as behavior data. the people they represent: denied loans, extended sen-
tences, increased insurance rates, and lost job opportu-
Predictive analytics: From higher education to a nities, to name a few. Wendy Hui Kyong Chun
broader landscape (2018) writes that these data “absolve one of
responsibility . . . by creating new dependencies and
The deployment of student data in higher education in
relations” in standing in for what is unknown or inac-
the U.S. is widespread and under varying degrees
cessible. It matters which data become doubles, and
of scrutiny, from the College Board’s short-lived
how those data become data in the first place. This
“Adversity Score”4 to growing concerns about student
problem is evident in research and investigative report-
privacy and dataveillance on campuses (Selwyn, 2015).
ing on predictive policing and risk assessment, in which
The datafication of universities and surge of predictive
police departments take past events as predictive of
analytics projects prompt questions about the future of
future activity (Brayne, 2017; Selbst, 2017), or where
higher education and what roles universities play in
algorithmically calculated scores are meant to indicate
society.
the likelihood of recidivism (Angwin et al., 2016;
Higher education is one area in which predictive
Benjamin, 2019). The translation of data into predic-
uses of Big Data are gaining momentum. While per-
tions is not solely algorithmic; it is also wrapped up in
sonnel who worked most closely with student thought
structural inequalities and notions about what society
their work was exceptional among what they regarded
is and ought to be (Eubanks, 2018). And so while the
as controversial applications of data, such as predictive
work data personnel in universities do is not the same
policing, predictive analytics in higher education is
as predictive policing, it occupies a similar imaginary6
nonetheless situated within a broader landscape. For
in which what people will do is both possible to antic-
example, during my fieldwork I was chatting with the
ipate and open to intervention.
director of IT at the university in an elevator. After
hearing more about my research, he immediately
asked me about Black Mirror, the speculative
Relevant literature
Channel 4/Netflix series exploring techno-dystopic States and institutions have long sought to account for
futures, and specifically referred to an episode about and predict people and activities within them via data
social credit systems.5 When we met more formally collection (Scott, 1998). Data enable auditing practices
for an interview, he directly compared data at the uni- (Power, 1997; Strathern, 2000), the quantification of
versity to intensive, artificial intelligence-driven data people (Bouk, 2015; Desrosieres, 1998), and the calcu-
collection in healthcare and precision medicine, lation of risk (Hacking, 1990; Harcourt, 2007).
noting what he deemed “the promise and the threat, Quantification, especially commensuration, is constitu-
that massive quantities of data will “reveal things to tive of people: quantification, as a social act, makes
you that you don’t know about yourself,” both to a what it purports to represent (Espeland and Stevens,
person’s benefit and detriment. 1998, 2008). As scholars in science and technology
While medicine was his immediate reference, the studies (STS) have demonstrated, measuring and pre-
university’s development of nudges was at its height dicting are fraught social processes requiring invest-
during revelations about the analytics firm Cambridge ment and validation through institutional politics and
Analytica using data problematically mined from infrastructures (Porter, 1995; Star, 1999; Star and
Facebook users to target prospective voters. Beyond Ruhleder, 1996). Processes of making data and their
the manipulation of politics, John Cheney-Lippold infrastructures are frequently subject to what Susan
(2011, 2017) has identified the reordering of people Leigh Star (1991) has called “deletion,” or the invisibi-
according to their social media data as a “new algorith- lizing of labor in scientific work. Such deletions have
mic identity,” in which data enable new ways to orga- been at the fore of research on scientific and techno-
nize people, based on an affinity of clicks as opposed to logical practice and related institutions in STS, though
traditional social categories. are less present in inquiries into higher education.
4 Big Data & Society

Scholars addressing Big Data in educational con- Some of the literature about learning analytics offers
texts have largely explored its possibilities, testing out strategies for how to more productively nudge stu-
in-classroom technologies, courses scaled up to enroll dents, in which students are framed not just as consum-
unlimited students (such as massive open online ers but also active partners at universities who should
courses, or MOOCs) (see Jones et al., 2014), and pre- be accountable for their own success (Fritz, 2017; see
dictive uses of data collected from learning manage- also Pascarella and Terenzini, 2005). The notion of
ment systems (e.g. Blackboard, Canvas). As George choice architecture in learning analytics rests on con-
Siemens reports in mapping the field, learning analytics ceptualizations of agency where students have unre-
is the use of data to improve learning (2013: 1382). stricted access to a full range of choices. This take on
While learning can refer to any variety of educational agency is in contrast to social theorizing on agency, in
settings, learning analytics has expanded rapidly in which actors work within and against constraints (see
higher education, where such technologies are used by Bourdieu, 1980; Ortner, 2006).
universities to manage risk and understand student Some education scholars have commented on the
bodies (Wagner and Longanecker, 2016). contradictions of deploying nudges in relation to
Such projects, which typically draw from third-party more liberal views of the purpose of education (see
consultants or are developed in-house, have received Clayton and Halliday, 2017; Hartman-Caverly, 2019).
scrutiny as critics express concerns about universities Jeremy Knox et al. explore a growing trend of educa-
surveilling students (Harwell, 2019; see also Hope, tional institutions integrating datafied behavioral eco-
2016). While much of the learning analytics literature nomics approaches. They remark on the implications
explores the significance and effectiveness of specific of “[shaping] students’ choices and decisions based on
learning analytics initiatives, more recently scholars constant tracking and predicting of their behaviors,
such as Neil Selwyn have argued that “learning analyt- emotions and actions,” noting the potential for dispa-
ics needs to be critiqued as much as possible,” given the rate impacts (2020: 39).
potential to disparately impact students (2019: 11). Some of the appeal of Big Data, and why, perhaps,
And learning analytics is critiqued. Scholars are it links up so well with the surge of behavioral econom-
addressing effects on student data privacy (see ics in education, is rooted in pervasive and influential
Ifenthaler and Schumacher, 2016; Rubel and Jones, “mythologies” of data as truthful and omniscient,
2016; Slade and Prinsloo, 2014; Sun et al., 2019). which critical data studies scholars have challenged,
Juliane Jarke and Andreas Breiter (2019) discuss how recognizing data as partial and always already political
education is changing with datafication, and Ben (Boyd and Crawford, 2012; Dalton and Thatcher,
Williamson (2017, 2018, 2019) has written extensively 2014). The promise of data is evident in institutional
about the implications of large-scale data collection on data mining projects that endeavor to take the place of
students, both in and outside of higher education. self-reporting: data personnel understand data as more
Other education scholars have interrogated the ethics direct proxies, comprehensive and accurate, or as Rob
of learning analytics (Johnson, 2014; Slade and Kitchin and Tracey P Lauriault put it, “exhaustive in
Prinsloo, 2013) and prospects for just approaches scope” and “fine-grained in resolution” (2014: 2). The
(Shahar and Harel, 2017). presumed neutrality of data enables them to seem prior
Data analytics projects like that I discuss deploy to interpretation, an incredible, “raw” resource that
nudges in tandem with predictive outputs to suggest can reveal insights about humanity (Boellstorff, 2013).
to students how they can improve their graduation out- But data must be made. They do not exist as prior to
comes and grade point averages (GPA). The nudging processing. Lisa Gitelman and Virginia Jackson (2013:
that personnel use is aligned with Richard H Thaler 3) write that “data need to be imagined as data to exist
and Cass R Sunstein’s outline of nudges and “choice and function as such.” As I discuss herein, the discur-
architecture,” in which architects structure the “context sive work involved in creating data is ongoing and lay-
in which people make decisions” to “nudge” them ered; it relies on a great deal of labor and
toward particular choices (2009: 3). Thaler and transformation. Nonetheless, data are treated by per-
Sunstein frame nudging as “libertarian paternalism,” sonnel as a stable, bounded entity, not unlike how the
where people are ultimately capable of making their engineers in Diana Forsythe’s (2001) work regarded
own choices—a nudge is a mild intervention. knowledge in programming expert systems. The ways
However, Karen Yeung argues that in contexts of Big that personnel imagine behaviors and attributes mate-
Data, the array of data and analytics dynamically rialize as data, and in turn those data shape how per-
available to choice architects means that nudging is sonnel produce and use those categories. Technologies,
“subtle, unobtrusive, yet extraordinarily powerful” as materialized discourses that reflect broader social
thanks to the magnitude and networks of data (2017: epistemologies, naturalize and crystalize concepts
119). (Suchman, 2007). In the case of data collection,
Whitman 5

technologies create the categories of people and activ- entanglements operate as a check on what they under-
ities they purport to measure, making them manage- stand as choices available to them.
able (Foucault, 1972, 1977). Although this article is informed by participant
Societal discourses of data draw upon mythologies observation, I mostly utilize interviews in my analysis
of data and so seem like a neutral means of revealing because they function as a central space for actors to
order intrinsic to society, although social theorists have map out a sociotechnical imaginary of data technolo-
demonstrated that ordering processes are a means gies at the university. In interviews, actors articulate
through which actors make society (Bowker and Star, their work and their visions for predictive projects so
1999; Jasanoff, 2004; Latour, 1990). In data technolo- that modeling is integrated into such an imaginary, in
gies, ordering processes make the subjects of ordering what Sheila Jasanoff has described as “collectively
ready to be taken up in a system, scaled, standardized, held, institutionally stabilized, and publicly performed
predicted, and nudged (Cheney-Lippold, 2011; Raley, visions of desirable futures” (2015: 4; see also Jasanoff
2013; Stark, 2018). I take the conditions of ordering in and Kim, 2009). The top-down discursive organizing of
the form of discourse as a fruitful focal point to look at data that occurs before, during, and after modeling,
how data personnel as actors give shape to data: how especially in the context of interviewing in which per-
they make sense of the institution, their social contexts, sonnel are asked to provide an account of modeling
and their ideas about data are part of the data technol- and nudging, is critical to the formation of an imagi-
ogies they design and implement. nary of predictive technologies. As Nick Seaver (2017:
8) has observed in his ethnographic approach to algo-
rithms, “interviews do not extract people from the flow
Methods
of everyday life, but are rather part of it.” Interviews
In this qualitative case study, I used a combination of enable personnel to imagine the concepts on which
interviewing and participant observation in university their projects hinge.
IT and IR offices in which personnel render students I transcribed and coded interviews and field notes in
into data. I interviewed 30 data personnel using semi- NVivo, a qualitative analysis environment. As
structured techniques in interviews lasting 60– Foucault (1972, 1977, 1978) has elucidated, discursive
90 minutes, and I conducted follow-up interviews with practices make the categories they describe, rendering
five key interlocutors who worked most closely with them measurable, governable, and here, nudgeable.
deploying the model and constructing nudges I identified where personnel defined data, explained
(Bernard, 2011). These personnel primarily included to me or to each other what a data type meant to
data scientists, developers, and IT administrators, but them, or decided which data could function as proxies
also network architects and stakeholders involved in for students. I focused on moments in interviews in
developing predictive outputs for students. which personnel speak, define, and sort behavior data
Much of the participant observation of my field- into a fixed category (see Wood and Kroger, 2000).
work took place in meetings. Meetings covered a This analysis illuminates personnel’s implicit and
range of topics, from monthly development updates explicit delineations about what data are and
to explanations of technical details of the predictive what they represent, along with how personnel
model to workshopping nudges to debates about thought data ought to be classified (Strauss, 2005).
what data mean. Meetings were places where multiple Interviewing, transcribing, and coding all helped me
teams came together, data scientists painstakingly to make sense of the conceptual work involved in han-
explained the mechanics of modeling or qualified dling institutional data. Because I began my fieldwork
results, programmers explained why they arrived at a well after modeling began, interviews helped me to
particular form of nudging, and administrators nixed reconstruct narratives of decision-making about data
nudges and passed along institutional memories of data sources, modeling, and nudging.
sources. In these spaces, personnel discursively chal- I have structured my findings to reflect a chronology
lenge and solidify not only the technical dimensions of data processing. However, because some of this
of modeling but also the data that inform it (see work occurs simultaneously, I also conceptually order
Brown et al., 2017; Sandler and Thedvall, 2017). The findings, layering them on top of a foundational con-
constraints and limits personnel face become evident in cern with demographic data and an imperative to
such spaces, where their ideas are curbed by the top- nudge.
down vision of the current university administration or I begin with the problem of removing demographic
where they must execute a stage of development over data from modeling, which prompted personnel to
which they are not in total agreement owing to rapidly think about data in terms of types (i.e. attributes and
approaching deadlines and desire to receive behaviors). I then explore the work of sorting the avail-
the approval of stakeholders. Their institutional able data at the institution into a category of behaviors
6 Big Data & Society

and assigning proxies. By maintaining data as accurate recommendations, and they were quick to assure me
proxies, personnel help behavior data begin to hold that they would never do such a thing.
together. Finally, the solidification of a category of By drawing boundaries around attributes, data per-
behaviors enables personnel to nudge students. I con- sonnel attempt to seal them off and open up other types
clude by discussing the implications of making institu- of data for usage. The discursive and computational
tional data. effects of relegating some data as attributes are that
those data, and the students who provide them,
Typing data into “attributes” and “behaviors” become stable entities. That is, by treating demograph-
ic data as attributes that are frozen—everything a stu-
The typing of data was the result of conscious attempts dent “can’t change”—personnel remove those data
from data personnel to nudge students not only effec- from an ongoing conversation about what they can
tively but also fairly. In one of my first interviews with use in the model. The differing experiences students
Don, I sat in his office, across from him again over his have on campus that interlock with their race and
cluttered desk, and asked him to recount some of the socioeconomic status, for example, are no longer part
early decision-making in model development. He had of data projects because personnel define them as fixed.
been involved in the original design of the model and Computationally, when some data become attributes,
determining which data should be incorporated into data scientists no longer include them in the predictive
it. Don summed up one of the key decisions model: demographic data, characterized as attributes,
regarding data: do not factor into calculating the likelihood of gradu-
ation in four years.
And what we found initially was that all the standard The effect of framing some data as attributes that
things that you would guess correlate with student suc- are off limits for nudging is not that they are perma-
cess that students can’t change were the big drivers: nent, but instead that personnel cannot nudge students
race, gender, ethnicity, socioeconomic status, what to change them. Will, an administrator who helped to
high school they came from, certain kinds of grades, develop the model, explained to me that the sidelining
whatever. Well, students can’t do anything about any of attributes prompted data personnel to look for other
of that. So, the idea was to take a look and see, well, is factors involved in success at the university:
there other stuff that seems to correlate.
So, since we’re pulling in all this data at the same time
The notion of “behaviors” emerged from efforts by and dropping it into the algorithm, obviously there are
personnel to avoid utilizing traditional indicators of a number of things that are highly predictive of student
success and instead look at other data that could be success on campus. Their academic preparation before
predictive of graduation within a four-year period. they come into campus. Their GPA while they’re
When data personnel explain what goes into the pre- at [the university] obviously is highly predictive.
dictive model, they divide the data neatly into two Socioeconomic status things. Demographic markers.
But they’re all things that either because it’s too late
major categories: “attributes” and “behaviors.” They
in the game, we can’t tell a student, “Boy, it would have
define attributes as fixed categories, the “standard
been great if you would have studied harder in high
things” that students “can’t change.” These categories
school.” And we certainly can’t tell a student on a
are made up of demographic data, where data on
demographic or socioeconomic thing, we can’t say,
parental income and high school ZIP code are indica-
“Hey, it’d be good if you weren’t so poor.” There’s
tors of socioeconomic status. Universities collect data
nothing a student can do with that. Even though it
on race, ethnicity, and gender in standardized forms, does put ‘em in a higher risk category. So we took
whether through college applications or through those things that were malleable by the students.
reporting mechanisms in university systems. Data per- Things like, how much time they were spending on
sonnel treat attributes as outside of the model’s pur- campus. Whether they were a proxy for whether we
view because while they correlate with graduating in believed they were paying attention in class by how
four years, they are not actionable. For example, per- much data they were downloading in a class.
sonnel cited that a student cannot retroactively attend a
different high school. Moreover, while a student could Instead of nudging on attributes, data personnel have
transition while in college and change gender markers constructed nudges about “things that were malleable”
in university systems or might experience socioeconom- and that the university collects and monitors. For
ic mobility, data personnel would not construct nudges example, while a student’s past grades do factor into
to instruct them to do so. Therefore, data personnel predicting how a student will perform in an individual
regard attributes as off limits in making course in terms of calculating the likelihood of getting
Whitman 7

an A or B, those “things” are not malleable because it is to nudging. By regarding behaviors as always dynamic,
in the past. In nudging, personnel instead focus on personnel minimize—even unintentionally—the con-
correlations between what is malleable and GPA or straints that students face through assuming that stu-
getting an A or B in a course. In the model, such var- dents have an enormous amount of agency in
iables include engagement with learning management structuring everyday life.
systems and class attendance (derived from network The focus on behaviors defines students both as rad-
activity), where more posting on discussion boards ical agents and as nudgeable subjects who would ben-
and downloading of course materials correlate with efit from behavioral recommendations based on their
higher GPAs. data, which include spending more time on campus,
Malleability, however, is not a given: it has to be attending class, attending supplemental instruction ses-
translated into a behavior. Will refers to data down- sions, and registering for courses earlier. Predictive out-
loaded in class as a potential proxy for paying atten- puts are meant to be engaged with, not just observed.
tion, where the data on downloading are available for The implication that students can act on predictions
data personnel to match up with a behavior. The ques- and improve their prospects for success is contested
tion is not if downloading data is a proxy but rather if among data personnel at the university. In general, per-
it is a reasonable proxy for paying attention. The data sonnel, particularly those who worked closely with stu-
available for modeling predate the model itself. Data dents, wanted nudges to have an encouraging vibe that
on students’ downloading habits in class were original- motivated students to act on nudges and incorporate
ly collected for the maintenance of network infrastruc- behavioral changes into their daily lives. However, data
tures, but personnel have repurposed them as a proxy scientists in team meetings cautioned against giving
for attention. Data on downloading were not always students false hope. They argued that the likelihoods
behavior data. they modeled are accurate enough that spending more
Massive amounts of data are available to data per- time on campus would not impact the predicted out-
sonnel, and it is not self-evident what is a proxy for come enough to make a substantial difference in the
what, nor was it apparent to me if a hard line between space of a semester. While personnel are not in agree-
attributes and behaviors existed for personnel. I asked ment about the potential effects of nudging and some
Don how data personnel went about distinguishing are torn about its utility, the notion remains that stu-
between attributes and behaviors. He first depicted dents are responsible for their outcomes.
behaviors as what was left after attributes were removed:
Sorting data: Assigning data to categories
We tried to be blind to all of those, and only look at
behaviors. Only look at different numbers we had that Data do not automatically fall into categories of attrib-
were indicators of behavior. Behavior could be grades utes and behaviors; rather, they are assigned and are
you made in your prior classes, here at [the university]. products of discursive moves. As I discovered, the types
It could be how many credit hours you’re taking, it of data that comprise a student body are multiple, as
could be where you’re living, it could be, any of these are their uses and sources. They serve several institu-
things that you have control over, we’ve just clumped tional offices simultaneously and outlive the original
them all into the behaviors bin. intentions behind them.
As a way to demonstrate the array and possibilities
When I asked how he and other personnel decided for sorting, I arrange types of data in loose sets in
what students had control over, he explained that the Table 1. The data I include in the table are general
distinction came down to choice: categories that I have derived from interviews, docu-
mentation from an external review, and administrators’
I guess that we assume that what [students] did in the conference presentations about the model. The table
course of the day, they had control over. Right, so they lists data that the model does not incorporate, such
chose whether they were gonna eat or not . . . they chose as demographic data, but I add such data to show
the gym or not, being on campus or not . . . They chose how the kinds of data that data scientists have de-
living where they chose to live. I think they have some siloed are put in conversation with other data sources.
say in that . . . So it seemed to me that any time that The assignment of meaning to data in the model,
they had an opportunity to make a decision about what while not arbitrary, is not strictly linked to data sour-
they were going to be doing, we called that a behavior. ces. That is, the data could align with other interpreta-
tions and proxies, and personnel indeed mobilize them
While it may seem obvious that “behavior” indicates for purposes other than their initial use. In my table,
what a student does or does not do, the invocation of I have created three columns and labels to reframe data
choice and what a student “has control over” is central in terms of how the institution makes and collects them
8 Big Data & Society

Table 1. Arrangements of data incorporated into initial data mining and predictive modeling.

Self-reported Infrastructural Accumulated

 Gender  Network logs (device geolocation,  GPA


 “Race/ethnicity” timestamps, duration)  Enrollment (registration status,
 Parental income  Network activity courses, grades)
 Standardized test scores  Card swipes (dorm access, dining halls, gym)  Degree plan
 High school ZIP code  Learning management systems
 High school GPA  Historic grade distribution
ZIP: zone improvement plan; GPA: grade point average.

rather than what personnel either offer up as attributes but they could not get it because not all faculty take
and behaviors or large, unsorted lists of variables attendance in their classes or in consistent formats. In
decontextualized from sources. I use “self-reported” the absence of data collected specifically to track atten-
to describe data that students provide to the dance, network logs seemed to the development team
institution, typically through the college application like a reasonable substitute in that logs can indicate
process or in campus systems. The data I organize in where a student’s device is. Network logs became a
“infrastructural” data are data created through the proxy for a student’s physical presence on campus
everyday operations of the university. Finally, I use because personnel appropriated data the institution
“accumulated” as a group for data that students gen- was already collecting to serve as more than indicators
erate as they move through the university in terms of of network traffic and reliability.
enrollment, grades, and coursework.7 The table is not Will formulated the leap from attendance to net-
exhaustive; rather, I aim to depict that data at the uni- works differently, recalling his early involvement in
versity are multiple and extensive. institutional data collection. He talked about moving
The primary data that personnel position as indica- from surveys to Big Data. To him surveys were a prob-
tive of behavior are network logs, which personnel use lematic proxy for campus engagement.
because they describe them as the best available proxy
for behavior. For this, data personnel have repurposed And you give [a survey] to the students at the end of the
data originally collected by IT to monitor the health and year and. . .it would measure how integrated you were
usage of campus WiFi networks. Network logs contain to the campus and what your commitment was to it. Or
data about time, date, and duration of a student’s use of we’d use the NSSE survey, which is the National
the WiFi network, along with which routers they con- Survey of Student Engagement, where you’d say, like,
nect to and some general information about browsing “Over the last semester, on average, how many hours a
activity. Because students must log in to the WiFi net- week did you study? How many hours a week did you
work using unique accounts administered by the univer- meet with professors outside of class? How many hours
sity, they are associated with their WiFi use. a week did you meet with your peers outside of class?”
In an office similar to Don’s but in the IR office, Those kinds of things. And these were relying on self-
I asked Jenny, an administrator involved with data gov- report surveys, often after the fact, to measure that
ernance at the university, how she and personnel decided level of engagement and integration. And what we pro-
to use network logs as a proxy for attendance. She vided, in the [model], was nope, here’s an actual behav-
explained how personnel came to use network data: ioral marker where we can truly see how much time a
student spent on campus.
And how we ended up on network logs, you know, it’s
just having the right people that are thinking, you In Will’s description, network logs are behavioral
know, probably someone picked up their phone and markers: they are not only where a student is but also
was like, “Hey, I just connected to the WiFi, right.” what a student does. That rendering of network logs as a
It’s like, oh, yeah! The WiFi, right. If we want to make proxy for behavior makes the logs a proxy for the stu-
the model better, what kind of data, when you think dent. Will makes a discursive move that prioritizes auto-
about behaviors, would you want to include, and then mated data collection over self-reported surveys.
you just start thinking, how might you get that data. Notably, in his description, surveys are “after the fact,”
while automated data collection is live and thus does not
In this example, Jenny maps out the steps for me to pull from unreliable memories or potentially amended
describe how the development team decided to use net- self-reporting responses. Moreover, network logs are
work logs. The team wanted data on class attendance, “an actual behavioral marker where we can truly see.”
Whitman 9

The allure of Big Data as a replacement for surveys an impact on the prediction. That may or may not be
is that the interpretation is invisible, so smoothly true, but it’s an assumption that we have to make
deployed that data are the behaviors. Even as person- because we don’t have a lot of choice.
nel talk about proxies and readily acknowledge that
network logs unevenly record data owing to outages, In Henry’s explanation, some proxies are maintained
network overcrowding, and poor connections, the dis- out of necessity. Discrepancies have a technical solu-
cursive framing and subsequent treatment of data as tion that can maintain the integrity of the overall proj-
“actual” enable data to start to hold together as behav- ect. The alternative is that if the proxy falls apart, the
iors. Students become data and their data possess more model becomes compromised. While missing data sug-
veracity than the students themselves. gest that institutional Big Data are not consistently
complete or comprehensive, data scientists suggested
Maintaining data as accurate proxies that those data do not matter. Nick, another data sci-
entist, responded similarly when I asked him about
The assignment of behaviors to data, and vice versa, missing data, he said that as long as the sample is pre-
requires investment. While I conducted my fieldwork, I sent, the missing data have “minimal effect” on the
had access to the campus WiFi network and through predictive outputs. Technically, the missing data may
an interface with the predictive model, could see a visu- not influence modeling outputs. However, I include
alization of my network logs. I kept a personal account missing data because of Henry’s explanation that “it’s
of my campus whereabouts and compared it against an assumption” crucial to modeling. As becomes clear
the network logs. I consistently found chunks of miss- in the way that personnel act on proxies, the entire
ing time, incorrect geolocations, and overall an inaccu- intervention the institution tries to make with the
rate picture of my time on campus. model rests on consistent support for data as proxies
I brought the disparity in data up to data personnel, for behavior.
who were either intrigued or unsurprised depending on
their proximity to working with the data. Some even
Nudging on “behaviors” and promoting self-
joked with me about how their own network logs made
it look like they were never at work. Personnel know regulation among students
that network logs are not a neat substitute for the time The discursive separation of attributes and behaviors
a student spends on campus. Network outages aside, allows for personnel to treat behaviors as actionable,
competition for network access and brevity of connec- and it enables a more data-driven form of institutional
tion might prevent a student’s device from registering. management of students. By making something called
Moreover, connecting in the first place requires a stu- behavior and making it legible through data proxies
dent to have a WiFi-enabled device. Missing data that minimize gaps between data and what they pur-
abound, for many reasons. port to measure, the institution can track students and
Nonetheless, personnel at the university maintain monitor them. Ultimately, the university uses these
network logs as an inventive and actual proxy for data to formulate a narrative of success that maps
behavior. I asked Henry, a data scientist working on onto behavior. Following such moves, the institution
the predictive model, about absent data, citing my net- can relocate where the possibilities for success reside.
work logs, and he explained that personnel have to Through the assignment of behaviors to data, the
move forward without those data: category of behaviors holds together. The formation
of a category of behaviors allows students to become
If stuff’s missing, I mean, there’s nothing you can do subjects of modeling and nudging. By reconceptualiz-
about it. You just have to hope, and in most cases this ing network logs and activity as behavior, personnel
is the case, that there is a uniformity to it. So either the effectively produce behavior that they can nudge,
whole day is missing, that’ll sometimes happen. That’s whereas before the deployment of the predictive
fine. Because there’s enough of the data to pick up the model, the university could not nudge students based
slack there . . . Most of the variables that I’ve made deal on what were primarily demographic data, now it can
with that elegantly . . . If I just don’t have a certain via behaviors.
amount of the data, I don’t say that there was a class In this case, the institution mobilizes behaviors as a
session at all. I just say there wasn’t one. So, for a means to encourage students to self-regulate and make
percentage of absences, like, it’s not going to affect it responsible choices that move them toward graduation.
at all. Otherwise, for missing data, the hope is that it’s Owing to the discursive maneuvering enacted by per-
sufficiently random that for any machine learning pur- sonnel, data classified as behaviors become directly tied
pose, it will not matter that it is missing. Because any- to students and their activities, as illustrated by Will’s
thing that is sufficiently small and random won’t have description of data as “an actual behavioral marker.”
10 Big Data & Society

Akin to how Wendy Nelson Espeland and Mitchell remaining data are then sorted by personnel into
Stevens (1998) have discussed quantification as similar behaviors based on what is available and fits within a
to a speech act, data as a metric for behavior become mold of what personnel understand students to “have
student behaviors. control over.” Personnel stabilize data as accurate
Correlations, even as personnel insist they are just proxies by explaining and accounting for inconsisten-
that, seem causal in nudges because of the implication cies. Data further solidify as personnel make them act
that students could improve their likelihoods of success through nudges that direct students to match behaviors
by adhering to particular behaviors. The model of a correlated with success. The discursive separation of
student presented in nudging is one who attends all behaviors and attributes allows the university to pin
classes, spends time on campus outside of class, success on behaviors, or everything that a student
engages with student organizations, does not browse appears to have a choice in. At a time when the use
the internet in class, visits office hours, and so on. If of demographic data in higher education is ever
students want to succeed, they should match the model fraught, behavior data seem full of promise to univer-
and attune their choices to the behaviors that correlate sities as a step toward the meritocratic ideals of higher
with success. In an automated manner, the model and education. However, as I have shown, behavior data
related nudging reflect student data onto students, sig- are not a neutral alternative, and what students “have
naling that they are continuously documented and control over” is not self-evident.
brought into a system of recommendations. As I sat in his noisy, shared office, Nick, one of the
Because attributes are removed from the model and data scientists, responded to my question about where
nudging, the reliance on behaviors suggests that stu- he thought predictive modeling in higher education was
dents’ choices are at the heart of their success at the headed with an answer about what he felt was more
institution. Because demographic data are not incorpo- pressing. He thought that nudging had a long way to
rated into the predictive model at all, success is linked go, and that the information in nudging needed to be
with behaviors and students’ choices. The purposeful more useful to students.
presentation of data to students encourages students to
internalize those data and act on them. As such, My theoretical framework is a very traditional eco-
responsibility now rests on the students to take hold nomical perspective which could be way wrong . . . but
of their success. The visualizations of certain kinds of it doesn’t mean that it’s not useful. The theory is that
data—namely data students ought to use to inform every person chooses optimum behavior for him or
their everyday decision-making—and obscuring of herself, based on the constraints, his ability, his resour-
demographic data place the burden of responsibility ces, his information. He cannot do anything about the
and success on students. By minimizing the role that first and second, but he maybe can do something about
race, class, and gender play on graduation outcomes, the third.
the institution, through the model, can present behav-
iors as major factors in the likelihood of a student grad- The purpose of modeling and nudging is to give stu-
uating within four years. If students do not attend dents that information so they can “choose optimum
class, a low GPA is a consequence of that decision. behavior.” But when both the information and the
Thus, the constraints around choices become invis- behaviors are more contingent and rooted in the imag-
ible. The university and its existing inequalities start to inative discourse of personnel than in inherent differ-
vanish because success is placed in the hands of stu- ences in data, how should students relate to their data?
dents. Social climate problems, structural barriers, The use of behavior data to nudge students and
issues of belongingness, and resource shortages disap- inspire a regime of self-regulation prompts questions
pear. A student cannot cite external factors in this about behavior data not only as a more accurate sub-
model of success dominated by behaviors. The result stitute for demographic data but also as a source of
is a shift in a locus of responsibility, wherein nudging is knowledge about students. As this article demon-
meant to give students tools to manage themselves and strates, making behavior data is a contingent process.
regulate their own behavior based on insights they Data proxies and data doubles do not acquire form
ought to draw from their data. naturally. They must be made, and they are made by
actors under institutional constraints and imaginaries
alike.
Conclusion The other aim of this article is to identify processes
One of the aims of this article is to demonstrate how of making and sorting data as problem spaces. If data,
data become behaviors, based on a collective decision students, and behaviors are imprecisely matched and
to avoid attributes in modeling and nudging, those subject to institutional pressures and sociotechnical
fixed demographic markers that “can’t change.” The limitations, how should those data be understood,
Whitman 11

especially in the context of nudging where actors put 3. “Success” is consistently defined by university administra-
them into play? To refer back to the emergence of what tors and staff as graduation within four consecutive years.
Cheney-Lippold (2011, 2017) has called the “new algo- 4. The “Adversity Score” drew from 31 measures of socio-
rithmic identity,” in which what people do is more rep- economic status to indicate the level of adversity an indi-
resentative of who they are than more traditional vidual high school student faced (see Belkin, 2019;
Washington and Hemel, 2019). This score, a tool for col-
concepts of social categories, what happens when
lege admissions, was meant to contextualize an applicant
behavior data are just as fraught as demographic data?
as a way to account for inequality in access to resources.
As predictive analytics become more widespread in
The Adversity Score was eventually dropped, though the
areas beyond education, such as policing and sentenc- College Board maintains a dashboard.
ing, finance, healthcare and medicine, and social media, 5. Black Mirror, series 3, episode 1: “Nosedive.”
it is necessary to illuminate the data that underpin pre- 6. I use “imaginary” in alignment with how Jasanoff and
dictions, not just technically but sociotechnically. The Kim (2009: 120) have described sociotechnical imaginaries
everyday, frequently mundane processes that make as “‘collectively imagined forms of social life and social
data, proxies, and data doubles hold together are sub- order reflected in the design and fulfillment of nation-
ject to deletion, which allows for the linkages between specific scientific and/or technological projects.”
data and people to appear seamless. They are not. 7. These “accumulated” data contain a great deal of activity.
Understanding how those data come together and For example, enrollment data includes the time between
how they stabilize is a continuing task for critical the opening of registration and when a student actually
data studies, and one that I explore in the context of registered, along with the time between the start of a
higher education. semester and when a student registered for the class.
Learning management system data consist of how a stu-
dent is using that system: when a student engaged with
Acknowledgements
course materials relative to other students, and the
Sincere thanks to Zoe Nyssa, Evelyn Blackwood, Kendall amount of times a student posted to discussion boards.
Roark, and Geneva Smith for reading previous drafts and
offering insightful feedback. Earlier iterations of the manu-
script benefitted tremendously from comments from Sheila ORCID iD
Jasanoff and engagement with Fellows in the Program on Madisson Whitman https://orcid.org/0000-0003-0936-
Science, Technology and Society at Harvard University. I 1358
am grateful for fellowship support from Purdue University.
I also wish to thank the three anonymous reviewers for their References
generous and constructive comments.
Angwin J, Larson J, Mattu S, et al. (2016) Machine bias.
ProPublica, 23 May. Available at: www.propublica.org/
Declaration of conflicting interests
article/machine-bias-risk-assessments-in-criminal-sentenc
The author(s) declared no potential conflicts of interest with ing/ (accessed 1 October 2019).
respect to the research, authorship, and/or publication of this Belkin D (2019) SAT to give students ‘adversity score’ to
article. capture social and economic background. The Wall
Street Journal, 16 May. Available at: www.wsj.com/
Funding articles/sat-to-give-students-adversity-score-to-capture-
social-and-economic-background-11557999000/ (accessed
The author(s) disclosed receipt of the following financial sup-
1 October 2019).
port for the research, authorship, and/or publication of this Benjamin R (2019) Race after Technology: Abolitionist Tools
article: Purdue Research Foundation research grant. for the New Jim Code. Cambridge: Polity Press.
Bernard HR (2011) Research Methods in Anthropology:
Qualitative and Quantitative Approaches. Lanham:
Notes AltaMira Press.
1. All names in this text are pseudonyms. Boellstorff T (2013) Making big data, in theory. First Monday
2. See Thaler and Sunstein (2009). Nudges are arrangements 18(10). Available at: https://firstmonday.org/ojs/index.ph
of suggestions meant to prompt people to make particular p/fm/article/view/4869/3750/ (accessed 31 May 2020).
choices. In the case of predictive modeling in higher edu- Bouk D (2015) How Our Days Became Numbered: Risk and
cation as discussed in this essay, nudges are recommenda- the Rise of the Statistical Individual. Chicago: The
tions developed out of the modeling results intended to University of Chicago Press.
influence students to make certain choices. For example, Bourdieu P (1980) 1992. The Logic of Practice. Stanford, CA:
if the model indicated a student was unlikely to get an A or Stanford University Press.
B in a particular course, the student would receive mes- Bowker GC and Star SL (1999) Sorting Things out:
saging prompting them to go to class or manage their time Classification and Its Consequences. Cambridge: The
more effectively. MIT Press.
12 Big Data & Society

Boyd D and Crawford K (2012) Critical questions for big Harcourt BE (2007) Against Prediction: Profiling, Policing,
data. Information, Communication & Society 15(5): and Punishing in an Actuarial Age. Chicago: The
662–679. University of Chicago Press.
Brayne S (2017) Big data surveillance: The case of policing. Hartman-Caverly S (2019) Human nature is not a machine:
American Sociological Review 82(5): 977–1008. On liberty, attention engineering, and learning analytics.
Brown H, Reed A and Yarrow T (2017) Introduction: Library Trends 68(1): 24–53. DOI: 10.1353/lib.2019.0029.
Towards an ethnography of meeting. Journal of the Harwell D (2019) Colleges are turning students’ phones into
Royal Anthropological Institute 23(S1): 10–26. surveillance machines, tracking the locations of hundreds
Cheney-Lippold J (2011) A new algorithmic identity: Soft of thousands. The Washington Post, 24 December
biopolitics and the modulation of control. Theory, Available at: www.washingtonpost.com/technology/2019/
Culture & Society 28(6): 164–181. 12/24/colleges-are-turning-students-phones-into-surveil
Cheney-Lippold J (2017) We Are Data: Algorithms and the lance-machines-tracking-locations-hundreds-thousands/
Making of Our Digital Selves. New York: New York (accessed 30 January 2020).
University Press. Hope A (2016) Biopower and school surveillance technolo-
Chun WHK (2018) On patterns and proxies, or the perils of gies 2.0. British Journal of Sociology of Education 37(7):
reconstructing the unknown. E-flux. Available at: www.e- 885–904.
flux.com/architecture/accumulation/212275/on-patterns- Ifenthaler D and Schumacher C (2016) Student perceptions
and-proxies/ (accessed 1 October 2019). of privacy principles for learning analytics. Educational
Clayton M and Halliday D (2017) Big data and the liberal Technology Research and Development 64(5): 923–938.
conception of education. Theory and Research in Iliadis A and Russo F (2016) Critical data studies: An intro-
Education 15(3): 290–305. duction. Big Data & Society 3(2): 1–7.
Dalton C and Thatcher J (2014) What does a critical data Jarke J and Breiter A (2019) Editorial: The datafication of
studies look like, and why do we care? Seven points for a education. Learning, Media and Technology 44(1): 1–6.
critical approach to ‘big data’. Society & Space. Available Jasanoff S (2004) Ordering knowledge, ordering society. In:
at: https://www.societyandspace.org/articles/what-does-a- Jasanoff S (ed.) States of Knowledge: The Co-Production of
critical-data-studies-look-like-and-why-do-we-care/ Science and Social Order. London: Routledge, pp.13–45.
(accessed 31 May 2020). Jasanoff S (2015) Future imperfect: Science, technology, and
Desrosieres A (1998) The Politics of Large Numbers: A the imaginations of modernity. In: Jasanoff S and Kim
History of Statistical Reasoning. Cambridge: Harvard S-H (eds) Dreamscapes of Modernity: Sociotechnical
University Press. Imaginaries and the Fabrication of Power. Chicago: The
Espeland WN and Stevens ML (1998) Commensuration as a University of Chicago Press, pp.1–33.
social process. Annual Review of Sociology 24(1): 313–343. Jasanoff S and Kim S-H (2009) Containing the
Espeland WN and Stevens ML (2008) A sociology of quan- atom: Sociotechnical imaginaries and nuclear power in
tification. European Journal of Sociology 49(3): 401–436. the United States and South Korea. Minerva 47(2):
Eubanks (2018) Automated Inequality: How High-Tech Tools 119–146.
Profile, Police, and Punish the Poor. New York: Johnson JA (2014) The ethics of big data in higher education.
St. Martin’s Press. International Review of Information Ethics 7: 3–10.
Forsythe DE (2001) Studying Those Who Study Us. Stanford: Jones GM, Flamenbaum R, Buyandelger M, et al. (2014)
Stanford University Press. Anthropology in and of MOOCs. American
Foucault M (1972) The Archaeology of Knowledge. London: Anthropologist 116(4): 829–838.
Routledge. Kitchin R and Lauriault TP (2014) Towards critical data
Foucault M (1977) Discipline and Punish: The Birth of the studies: Charting and unpacking data assemblages and
Prison. New York: Vintage Books. their work. The Programmable City Working Paper 2;
Foucault M (1978) The History of Sexuality, Vol. 1. 1–19. Available at: https://papers.ssrn.com/sol3/papers.
New York: Vintage Books. cfm?abstract_id=2474112/ (accessed 31 May 2020).
Fritz J (2017) Using analytics to nudge student responsibility Knox J, Williamson B and Bayne S (2020) Machine behav-
for learning. New Directions for Higher Education 179: iourism: Future visions of ‘learnification’ and ‘datafica-
65–75. tion’ across humans and digital technologies. Learning,
Gitelman L and Jackson V (2013) Introduction. In: Gitelman Media and Technology 45(1): 31–45.
L (ed.) ‘Raw Data’ is an Oxymoron. Cambridge: The MIT Latour B (1990) Drawing things together. In: Lynch M and
Press, pp.1–14. Woolgar S (eds) Representation in Scientific Practice.
Hacking I (1990) The Taming of Chance. Cambridge: Cambridge: The MIT Press, pp. 19–68.
Cambridge University Press. Lyon D (2003) Introduction. In: Lyon D (ed.) Surveillance as
Haggerty KD and Ericson RV (2000) The surveillant assem- Social Sorting: Privacy, Risk, and Digital Discrimination.
blage. The British Journal of Sociology 51(4): 605–622. London: Routledge, pp.1–9.
Hanel PHP and Vione KC (2016) Do student samples pro- O’Neil C (2016) Weapons of Math Destruction: How Big Data
vide an accurate estimate of the general public? PloS One Increases Inequality and Threatens Democracy. New York:
11(12): 1–10. Crown.
Whitman 13

Ortner SB (2006) Anthropology and Social Theory: Culture, Star SL (1999) The ethnography of infrastructure. American
Power, and the Acting Subject. Durham: Duke University Behavioral Scientist 43(3): 377–391.
Press. Star SL and Ruhleder K (1996) Steps toward an ecology of
Pascarella ET and Terenzini PT (2005) How College Affects infrastructure: Design and access for large information
Students: A Third Decade of Research. San Francisco: spaces. Information Systems Research 7(1): 111–134.
Jossey-Bass. Stark L (2018) Algorithmic psychometrics and the scalable
Porter TM (1995) Trust in Numbers: The Pursuit of subject. Social Studies of Science 48(2): 204–231.
Objectivity in Science and Public Life. Princeton: Strathern M (2000) New accountabilities: Anthropological
Princeton University Press. studies in audit, ethics and the academy. In: Strathern M
Power M (1997) The Audit Society: Rituals of Verification. (ed.) Audit Cultures: Anthropological Studies in
Oxford: Oxford University Press. Accountability, Ethics and the Academy. London:
Raley R (2013) Dataveillance and countervailance. In: Routledge, pp.1–18.
Gitelman L (ed.) ‘Raw Data’ is an Oxymoron. Strauss C (2005) Analyzing discourse for cultural complexity.
Cambridge: The MIT Press, pp.121–145. In: Quinn N (ed.) Finding Culture in Talk: A Collection of
Rubel A and Jones KML (2016) Student privacy in learning Methods. New York: Palgrave MacMillan, pp.203–242.
analytics: An information ethics perspective. The Suchman L (2007) Human-Machine Reconfigurations: Plans
Information Society 32(2): 143–159. and Situated Actions. Cambridge: Cambridge University
Sandler J and Thedvall R (2017) Exploring the boring: An Press.
introduction to meeting ethnography. In: Sandler J and Sun K, Mhaidli AH, Watel S, et al. (2019) It’s my data!
Thedvall R (eds) Meeting Ethnography. London: Tensions among stakeholders of a learning analytics dash-
Routledge, pp.1–23. board. In: Proceedings of the 2019 CHI Conference on
Scott JC (1998) Seeing like a State: How Certain Schemes to
Human Factors in Computing Systems. ACM, pp.1–14.
Improve the Human Condition Have Failed. New Haven:
Thaler RH and Sunstein CR (2009) Nudge: Improving
Yale University Press.
Decisions about Health, Wealth, and Happiness.
Seaver N (2017) Algorithms as culture: Some tactics for the
New York: Penguin Books.
ethnography of algorithmic systems. Big Data & Society
Wagner E and Longanecker D (2016) Scaling student success
4(2): 1–12.
with predictive analytics: Reflections after four years in
Selbst AD (2017) Disparate impact in big data policing.
the data trenches. Change: The Magazine of Higher
Georgia Law Review 52(1): 109–196.
Learning 48(1): 52–59.
Selwyn N (2015) Data entry: Towards the critical study of
Warikoo NK (2016) The Diversity Bargain: And Other
digital data and education. Learning, Media and
Dilemmas of Race, Admissions, and Meritocracy at Elite
Technology 40(1): 64–82.
Selwyn N (2019) What’s the problem with learning analytics? Universities. Chicago: The University of Chicago Press.
Journal of Learning Analytics 6(3): 11–19. Washington AJ and Hemel D (2019) By omitting race, the
Shahar B and Harel T (2017) Educational justice and big SAT’s new adversity score misrepresents reality. TIME,
data. Theory and Research in Education 15(3): 306–320. 21 May. Available at: https://time.com/5592661/sat-test-
Siemens G (2013) Learning analytics: The emergence of a dis- adversity-score-race/ (accessed 1 October 2019).
cipline. American Behavioral Scientist 57(10): 1380–1400. Williamson B (2017) Big Data in Education: The Future of
Slade S and Prinsloo P (2013) Learning analytics: Ethical Learning, Policy and Practice. Thousand Oaks: SAGE.
issues and dilemmas. American Behavioral Scientist Williamson B (2018) The hidden architecture of higher edu-
57(10): 1510–1529. cation: Building a big data infrastructure for the ‘smarter
Slade S and Prinsloo P (2014) Student perspectives on the use University’. International Journal of Educational
of their data: Between intrusion, surveillance and care. In: Technology in Higher Education 15(1): 1–26.
Challenges for Research into Open & Distance Learning: Williamson B (2019) Policy networks, performance metrics
Doing Things Better – Doing Better Things. Oxford, UK: and platform markets: Charting the expanding data infra-
European Distance and E-Learning Network, pp.291–300. structure of higher education. British Journal of
Smith GJD (2016) Surveillance, data and embodiment: On Educational Technology 50(6): 2794–2809.
the work of being watched. Body & Society 22(2): 108–139. Wood LA and Kroger RO (2000) Doing Discourse Analysis:
Star SL (1991) The sociology of the invisible: The primacy of Methods for Studying Action in Talk and Text. Thousand
work in the writings of Anselm Strauss. In: Maines D (ed.) Oaks, CA: SAGE Publications.
Social Organization and Social Process: Essays in Yeung K (2017) ‘Hypernudge’: Big data as a mode of regu-
Honor of Anselm Strauss. New York: Aldine de lation by design. Information, Communication & Society
Gruyter, pp.265–283. 20(1): 118–136.

You might also like