You are on page 1of 6

Critical Educational Questions for Big Data

Monday, September 05, 2016 Comment

Big data has arrived in education. Educational data science, learning analytics, computer
adaptive testing, assessment analytics, educational data mining, adaptive learning
platforms, new cognitive systems for learning and even educational applications based on
artificial intelligence are fast inhabiting the educational landscape, in schools, colleges and
universities, as well as in the networked spaces of online learning.
I was recently asked what I thought were some the most critical questions about big data in
education today. This reminded me of the highly influential paper “Critical questions for big
data” by danah boyd and Kate Crawford, in which they “ask critical questions about what all
this data means, who gets access to what data, how data analysis is deployed, and to what
ends.” With that in mind, here are some preliminary critical questions to ask about big data
in education — a second set of questions will follow next time.

How is “big data” being conceptualized in relation to


education?
Large-scale data collection has been at the centre of the statistical measurement,
comparison and evaluation of the performance of education systems, policies, institutions,
staff and students since the mid-1800s. Does big data constitute a novel way of
enumerating education, and how do we specifically think about big data in relation to
education? The sociologist David Beer has suggested we need to think about the ways in
which big data as both a concept and a material phenomenon has appeared as part of a
history of statistical thinking and in relation to the rise of the data analytics industry. He
argues social science still needs to understand “the concept itself, where it came from, how
it is used, what it is used for, how it lends authority, validates, justifies, and makes
promises.”
Within education specifically, how is big data being conceptualized, thought about, and
used to animate specific kinds of projects and technical developments? Where did it come
from (data science, computer science?) and who are its promoters and sponsors in
education? What promises are attached to the concept of big data as it is discussed within
the domain of education? It’s notable that the dominant discourse of big data in education is
that of “personalization,” precisely the same discourse that catalyzes the social media
industry, with experts in personalization from companies like Google now becoming
influential educational entrepreneurs. We might wish to think about a “big data imaginary” in
education — a certain way of thinking about, envisaging and visioning the future of
education through the conceptual lens of big data — that is now animating specific ed-tech
projects, becoming embedded in the material reality of educational spaces, and being
enacted in pedagogic practice.
Is big data changing how we think and learn?
Media theorist N. Katherine Hayles claims we have always thought “through, with, and
alongside media.” Historically the ways people think have been formed by techniques of
print production. Today, Hayles claims, the more we work with digital technologies the more
we appreciate the capacity of networked and programmed machines to carry out
sophisticated cognitive tasks. As a consequence, computers are increasingly seen as
extensions of thought and cognition. Are big data technologies changing how we think and
learn? With new kinds of machine learning and cognitive computing systems, we might see
ourselves as being extended into vast networks of automated learning, predictive cognition
and encyclopaedic knowledge-making potential. Again, as Hayles notes, digital media are
pushing us in the direction of faster communication, more intense information streams, and
more integration of humans with “nonconscious” cognitive systems, all of which are exerting
considerable effects on how people think, perhaps even on how their brains function. The
potential capacity of big data-based forms of machine learning and cognitive systems to
alter neurology and cognition clearly raises significant questions for education, not least
about whether existing theories of learning are adequate to explain human-machine
cognitive learning processes.

What theories of learning underpin big data-driven


educational technologies?
Big data-driven platforms such as learning analytics are often claimed to “optimize learning,”
but it is not always clear what is meant by “learning” by the organizations and actors that
build, promote and evaluate them. Much of the emerging field of “educational data science”
— which encompasses educational data mining, learning analytics and adaptive learning
software R&D — is informed by conceptualizations of learning that are rooted in cognitive
science and cognitive neuroscience. These disciplines tend to focus on learning as an
“information-processing” event — to treat learning as something that can be monitored and
optimized like a computer program—and pay less attention to the social, cultural, political
and economic factors that structure education and individuals’ experiences of learning.
Aspects of behaviourist theories of learning also persist in behaviour management
technologies that are used to collect data on students’ observed behaviours and distribute
rewards to reinforce desirable conduct.
Many actors involved in educational big data analyses are also deeply informed by the
disciplinary practices and assumptions of psychometrics and its techniques of psychological
measurement of knowledge, skills, personality and so on. This reflects the combination of
big data analytics with psychometrics that has been termed “psycho-informatics.” Are we
seeing the birth of psycho-informatics as a new field of methodological invention and theory
development in education via learning analytics, educational data mining and adaptive
learning platform providers? Are the strongly psychological, neuroscientific and
computational methods and concepts that dominate big data development in education
adequate to the task of theorizing the messy, embodied, socially situated and socioculturally
embedded complexity of learning?
How are machine learning systems used in education
being “trained” and “taught”?
The machine learning algorithms that underpin much educational data mining, learning
analytics and adaptive learning platforms need to be trained, and constantly tweaked,
adjusted and optimized to ensure accuracy of results — such as predictions about future
events. This requires “training data,” a corpus of historical data that the algorithms can be
“taught” with to then use to find patterns in data “in the wild.” Who selects the training data?
How do we know if it is appropriate, reliable and accurate? Is there a “coded gaze” at work
in the training of such systems  —  “the embedded views that are propagated by those who
have the power to code systems?” What if the historical data is in some ways biased,
incomplete or inaccurate? Educational research has long asked questions about the
selection of the knowledge in school curricula that is taught to students, and the ways it
reflects and serves to reproduce the economic, social and cultural advantage of powerful
groups. We may now need to ask about the selection of the data for inclusion in the training
corpus of machine learning platforms — about the data themselves, the experts that select
them, their assumptions about purposes of education and processes of learning, and the
goals that animate them — as these data and the coded gaze of the systems designed to
process them could be consequential for learners’ subsequent educational experience.
Moreover, we need to ask questions about the nature of the “learning” being experienced by
machine learning algorithms. Enthusiastic advocates in places like IBM are beginning to
propose that advanced machine learning is becoming more “natural,” with “human qualities”
of learning, based on computational models of aspects of human brain functioning and
cognition. To what extent do such claims appear to conflate understandings of the biological
neural networks of the human brain that are mapped by neuroscientists with the artificial
neural networks designed by computer scientists? Does this reinforce computational
information-processing conceptualizations of learning, and risk addressing young human
minds and the “learning brain” as computable devices that can be debugged and rewired?
These questions are of course not exhaustive, and another set will be coming next month
focusing on big data ownership, divides, algorithmic accountability, issues about student
voice and data literacy, and finally ethical implications and challenges of big data in
education.
Banner image credit: Dave Herholz
PART 2

I started a list of critical questions for big data in education earlier this week. This is a big
topic, raising lots of big questions and serious topics and problems for further debate and
discussion. Here, I focus on questions about big data ownership, divides, algorithmic
accountability, issues about voice and literacy, and, finally, ethical implications and
challenges of big data in education.

Who “owns” educational big data?


The sociologist Evelyn Ruppert has asked, “who owns big data?” noting that numerous
people, technologies, practices and actions are involved in how data is shaped, made and
captured. The technical systems for conducting educational big data collection, analysis and
knowledge production are expensive to build. Specialist technical staff are required to
program and maintain them, to design their algorithms, to produce their interfaces.
Commercial organizations see educational data as a potentially lucrative market, and ‘own’
the systems that are now being used to see, know and make sense of education and
learning processes. Many of their systems are proprietorial, and are wrapped in intellectual
property and patents which makes it impossible for other parties to understand how they are
collecting data, what analyses they are conducting, or how robust their big data samples
are. Despite claims to exhaustivity, big data can still only ever be a sample based on users
of a platform or a system, not a true census of total populations, especially in education
where access to the technologies required for big data collection purposes is highly uneven.
Specific commercial and political ambitions may also be animating the development of
educational data analytics platforms, particularly those associated with Silicon Valley where
ed-tech funding for data-driven applications is soaring and tech entrepreneurs are rapidly
developing data-driven educational software and even new institutions. In this sense, we
need to ask critical questions about how educational big data are made, analysed and
circulated within specific social, disciplinary and institutional contexts that often involve
powerful actors with significant economic capital and extensive social networks of support
and influence.

Is a new “big data divide” emerging in education?


Not all schools, colleges or universities can necessarily afford to purchase a learning
analytics or adaptive software platform — or to partner with platform providers. This risks
certain wealthy institutions being able to benefit from real-time insights into learning
practices and processes that such analytics afford, while other institutions will remain
restricted to the more bureaucratic analysis of temporally discrete assessment events. In
other words, a new educational data divide may be emerging where certain institutions will
be able to gain a competitive advantage by having access to the insights available from
educational data analytics services and platforms. This reflects the wider emergence of a
“big data divide” that Mark Andrejevic has described as a separation between the “hands of
the few who use it to sort, manage, and manipulate,” and those “without access to the
database who are left with the ‘poor person’s’ strategies for cutting through the clutter: gut
instinct, affective response, and ‘thin- slicing’ (making a snap decision based on a tiny
fraction of the evidence).” To what extent might a new big data divide in education reinforce
and reproduce existing forms of advantage and disadvantage, and exacerbate existing
regimes of comparison, competition, and public ranking of institutions?

Can educational big data provide a real-time alternative to


bureaucratic policymaking?
Policy makers in recent years have depended on large-scale assessment data to help
inform decision-making and drive reform. Educational data mining and analytics can provide
a real-time stream of data about learners’ progress, as well as automated real-time
personalization of learning content appropriate to each individual learner. To some extent
this changes the speed and scale of educational change by removing the need for
cumbersome assessment and country comparison techniques that have tended to underpin
policy intervention in recent years. But, it potentially places commercial organizations such
as the global education business Pearson in a powerful new role in education, with the
capacity to predict outcomes and shape educational practices at timescales that
government intervention cannot match. Though standardized testing and country
comparison has become widely critiqued as a mode of governance in education, the
emerging alternative of real-time analytics raises questions about for-profit influence and the
privatization of public education by tight networks of corporate education reformers.

Is there algorithmic accountability to educational


analytics?
Learning analytics is focused on the optimization of learning and one of its main claims is
the early identification of students at-risk of failure. What happens if, despite being enrolled
on a learning analytics system that has personalized the learning experience for the
individual, that individual still fails? Will the teacher and institution be accountable, or can
the machine learning algorithms (and the platform organizations that designed them) be
held accountable for their failure? Simon Buckingham Shum has written about the need to
address algorithmic accountability in the learning analytics field, and noted that “making the
algorithms underpinning analytics intelligible” is one way of at least making them more
transparent and less opaque. Significant questions remain however about fairness and
equal treatment in relation to big data-based education, and particularly about where
accountability lies when algorithmic data-processing systems narrow an individuals’
opportunities in the name of “personalized” learning.

Is student data replacing student voice?


Data are sometimes said to “speak for themselves,” but education has a long history of
encouraging learners to speak for themselves too. Is the history of student voice initiatives
being overwritten by the potential of student data, which proposes a more reliable, accurate,
objective and impartial view of the individual’s learning process unencumbered by personal
bias? Or can student data become the basis for a data-dialogic form of student voice, one in
which teachers and their students are able to develop meaningful and
caring relationships through mutual understanding and discussion of student data?

Do teachers need “data literacy”?


Many teachers and school leaders possess little detailed understanding of the data systems
that they are using, or required to use. As glossy educational technologies
like ClassDojo are taken up enthusiastically by millions of teachers worldwide, might it be
useful to ensure that teachers can ask important questions about data ethics, data privacy,
data protection, and be able to engage with educational data in an informed way? Despite
calls in the US to ensure that data literacy become the focus for teachers’ pre-service
training, there appears little sign that the provision of data literacy education for educational
practitioners is being developed in the UK.

What ethical frameworks are required for educational big


data analysis and data science studies?
The Council for Big Data, Ethics and Society recently published a white paperdetailing
many of the ethical implications of big data. It raised important issues and recommendations
about the need for informed consent when collecting data from users of platforms, and
called in particular for new “social, structural, and technical mechanisms to assess the
ethical implications of a system throughout the entire development and analysis lifecycle.”
The UK government recently published an ethical framework for policymakers for use
whgen planning data science projects. Similar ethical frameworks to guide the design of
educational big data platforms and education data science projects are necessary. New
kinds of privacy frameworks and considerations of rights in relation to educational big data
also need to be considered and developed, drawing not least on existing considerations of
the potential privacy harmsassociated with data collection, processing, and dissemination.
This list of questions is of course not exhaustive, but helps I think to identify some of the key
issues emerging as big data, analytics, algorithms and machine learning processes
integrate into educational institutions and practices.
Banner image credit: Torkild Retvedt

You might also like