History and Theory - 2022 - KANSTEINER - DIGITAL DOPING FOR HISTORIANS CAN HISTORY MEMORY AND HISTORICAL THEORY BE

History and Theory 61, no. 4 (December 2022), 119–133 © 2022 The Authors.
History and Theory published by

Wiley Periodicals LLC on behalf of Wesleyan University. ISSN: 0018-2656
DOI: 10.1111/hith.12282
ARTICLE
DIGITAL DOPING FOR HISTORIANS: CAN HISTORY, MEMORY, AND

HISTORICAL THEORY BE RENDERED ARTIFICIALLY INTELLIGENT?
WULF KANSTEINER
ABSTRACT
Artificial intelligence is making history, literally. Machine learning tools are playing a key
role in crafting images and stories about the past in popular culture. AI has probably also
already invaded the history classroom. Large language models such as GPT-3 are able to
generate compelling, non-plagiarized texts in response to simple natural language inputs,
thus providing students with an opportunity to produce high-quality written assignments
with minimum effort. In a similar vein, tools like GPT-3 are likely to revolutionize his-
torical studies, enabling historians and other professionals who deal in texts to rely on
AI-generated intermediate work products, such as accurate translations, summaries, and
chronologies. But present-day large language models fail at key tasks that historians hold
in high regard. They are structurally incapable of telling the truth and tracking pieces of
information through layers of texts. What’s more, they lack ethical self-reflexivity. There-
fore, for the time being, the writing of academic history will require human agency. But
for historical theorists, large language models might offer an opportunity to test basic hy-
potheses about the nature of historical writing. Historical theorists can, for instance, have
customized large language models write a series of descriptive, narrative, and assertive his-
tories about the same events, thereby enabling them to explore the precise relation between
description, narration, and argumentation in historical writing. In short, with specifically
designed large language models, historical theorists can run the kinds of large-scale writing
experiments that they could never put into practice with real historians.
Keywords: artificial intelligence (AI), GPT-3, historical theory, collective memory, histor-
ical writing, large language models, description, narration, argumentation, OpenAI, ma-
chine learning
Historians write texts. And as public historians, they help shape twenty-first-
century memory cultures in museums, memorials, and other popular venues, such
as history television and video gaming.1 But when it comes to donning the man-
tle of the professional historian, they remain wedded, or chained, to the book.
The monographs doctoral students file at the end of their creative struggles with
sources, including digitized, digital, and visual sources, assume a format that has
not much changed since the days of Leopold von Ranke, as a quick look at one
1. See The Oxford Handbook of Public History, ed. James B. Gardner and Paula Hamilton (Oxford:
Oxford University Press, 2017).
This is an open access article under the terms of the Creative Commons Attribution-NonCommer-
cial-NoDerivs License, which permits use and distribution in any medium, provided the original work
is properly cited, the use is non-commercial and no modifications or adaptations are made.
14682303, 2022, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/hith.12282 by CAPES, Wiley Online Library on [03/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
120 WULF KANSTEINER
celebrated detail of Ranke’s work illustrates. A digital copy of History of the Ref-
ormation in Germany reveals that Ranke, like his twenty-first-century disciples,
initially uses footnotes to discuss terminology before settling into a rhythm of
including one or two footnotes containing references to primary sources on each
page.2 The textual appendix of the footnote was already familiar to Ranke’s con-
temporaries, although none of Ranke’s scholarly and literary predecessors had
managed to infuse the appendix with so much professional glamour and establish
it as a decisive benchmark of the burgeoning historical profession.3 The footnote
rendered attractive by Ranke came to certify the epistemological integrity of the
written text in both symbolic and practical terms. Today, the footnote and what it
represents might prove a welcome stumbling block as historians enter the world
of artificial intelligence (AI).
Since historians are as text-focused as they are, the latest breakthrough in text-
generating AI will open up new opportunities for historical research and the teach-
ing of history, and it will likely shape the future of the discipline.4 Important
problems remain, however. The AI-driven, machine learning technologies that
currently exist are able to create grammatically flawless and semantically com-
pelling texts that even expert readers cannot distinguish from human writing.5
Moreover, the machine learning tools can be set into motion by simple, short com-
mands delivered in natural languages. But these large language models (LLMs),
as they are called, are designed with commercial objectives in mind and, as far
as we know, are unable to distinguish truth from falsehood.6 They can write in
the style of Ranke and, as part of that stylistic exercise, even generate plausible-
looking footnotes at the bottom of the page. But they, unlike Ranke, cannot pro-
duce truthful footnotes. The most advanced publicly accessible LLM is GPT-3; it
was developed by the San Francisco-based (initially philanthropic, then commer-
cial) OpenAI initiative.7 GPT-3 is structurally unable to attribute the statements it
generates to a specific textual origin, let alone assess the factual reliability of any
of its textual inputs or outputs. LLMs cannot yet operate in the categories of refer-
ential truth that constitute a basic prerequisite for scholarship in the discipline of
history. They lack the vertical memory required to generate accurate references.
Instead, they follow a simple yet powerful horizontal logic, asking themselves
time and again what word or sign is likely to follow a given string of words or
2. Leopold von Ranke, History of the Reformation in Germany, ed. Robert A. Johnson, transl.
Sarah Austin (London: Routledge and Sons, 1905), vii, 3, https://books.google.com/books?id=
BXsfAAAAMAAJ.
3. Anthony Grafton, The Footnote: A Curious History (Cambridge, MA: Harvard University Press,
1997), 223.
4. For a brief history of AI, see Yuchen Jiang et al., “Quo Vadis Artificial Intelligence?” Discover
Artificial Intelligence 2 (2022), https://link.springer.com/article/10.1007/s44163-022-00022-8.
5. Robert Dale, “GPT-3: What’s It Good For?” Natural Language Engineering 27, no. 1 (2021),
113–18.
6. Adam Sobieszek and Tadeusz Price, “Playing Games with AIs: The Limits of GPT-3 and Similar
Large Language Models,” Minds and Machines 32, no. 2 (2022), 341–64.
7. Ben Dickson, “The Untold Story of GPT-3 Is the Transformation of OpenAI,” TechTalks (blog),
17 August 2020, https://bdtechtalks.com/2020/08/17/openai-gpt-3-commercial-ai/.
DIGITAL DOPING FOR HISTORIANS 121
signs and taking the huge number of texts they have absorbed during their train-
ing as a statistical benchmark for making that decision one word or sign at a time
and at a rapid speed.8 This is perhaps not rocket science, but LLMs are an ex-
plosive technology for all professional writers, including, although not primarily,
historians.
The enduring attachment to texts as the professional output of choice for his-
torians sustains beautiful visions of steadily accumulating historical knowledge.
Twenty-first-century historians of the Reformation have shifted analytical focus,
moving from political history to social and cultural history and into cutting-edge
fields, such as the history of emotions.9 Pursuing new questions and wielding new
concepts, many historians acknowledge Ranke only in passing, although they still
celebrate him for having shaped scholarship for more than a century as a result
of “the sheer range of his analytical vision.”10 A lot has happened in the inter-
vening years, including two world wars, several genocides, decolonization, and
a disillusionment with nationalisms of all kinds, but Ranke’s insights are appar-
ently still valid. His texts remain citable with their truth value intact. But the
beautiful fantasy of a large homogenous body of research constitutes a signifi-
cant ethical risk factor for all AI-generated texts, including AI-generated histo-
ries. Sooner rather than later, any structural prejudices embedded in the textual
corpus that was used to train a given LLM will find their way into that LLM’s
textual output. To put it bluntly, if an LLM has been trained with racist and sexist
material, it is likely to produce racist and sexist texts. And the use of prejudicial
material to train LLMs is difficult to avoid because LLMs only attain their stun-
ning human-like writing skills as a result of having mechanically absorbed a huge
number of human-generated texts that tend to contain a lot of problematic opin-
ions and prejudices. Historians are very aware of, but also often fall victim to, the
very same risk of replicating their sources’ skewed moral worlds.11 Unlike his-
torians, however, LLMs are structurally unable to write against the grain of their
own training feed. In this regard, the smooth text surfaces produced by LLMs
hide more moral failures than the professionally sculpted surfaces of historical
scholarship do.
8. Steve Tingiris, Exploring GPT-3: An Unofficial First Look at the General-Purpose Language
Processing API from OpenAI (Birmingham: Packt, 2021), 5.
9. Susan C. Karant-Nunn, The Reformation of Feeling: Shaping the Religious Emotions in Early
Modern Germany (Oxford: Oxford University Press, 2010). For Karant-Nunn, Ranke remains a col-
league to be reckoned with (9).
10. C. Scott Dixon, Contesting the Reformation (Chichester: Wiley-Blackwell, 2012), 20. See also
Thomas A. Brady Jr., German Histories in the Age of Reformations, 1400–1650 (Cambridge: Cam-
bridge University Press, 2009), 3.
11. Consider, for example, the discussions about an editorial written by the president of
the American Historical Association who bemoaned the presentism in a lot of public history
interventions but failed to recognize the partisanship for the political agenda of past elites
that is structurally embedded in many sources used by historians and therefore often inad-
vertently replicated by said historians. See James H. Sweet, “Is History History? Identity
Politics and Teleologies of the Present,” Perspectives on History, 17 August 2022, https://
www.historians.org/publications-and-directories/perspectives-on-history/september-2022/
is-history-history-identity-politics-and-teleologies-of-the-present.
122 WULF KANSTEINER
SOCIAL MEMORY
There exist important differences in the social construction of historical truth

within the historical profession, on the one hand, and the social construction of
historical authenticity in the vast range of visual and immersive media that sup-
port twenty-first-century memory cultures, on the other hand. In principle, there
is no reason why historians should not produce digital film, games, or VR envi-
ronments with an infrastructure of “footnotes” that supply references and explain
conceptual and aesthetic choices. However, text-focused as they are, historians
are convinced that they have developed protocols for creating prose texts based
on empirical sources that, to the best of their knowledge, reflect historical real-
ity. Lacking experience in manipulating other media, members of the historical
profession do not possess a similar conviction regarding film, television, or gam-
ing.12 Historians would have a tough time deciding which visual renditions of past
events—especially renditions that were not crafted from documentary footage—
deserve to be called historically accurate. Given the many aesthetic parameters
of film and gaming, that reticence makes perfect sense, although many historical
theorists would argue that even prose texts, ancient information technology that
they are, convey meaning by way of aesthetic and other structural parameters that
are detached from the texts’ referential function.13
In visual memory culture, unlike in professional historical writing, AI already
plays a decisive and rapidly expanding role, shaping people’s perception of how
the past must have looked and felt and how it should be remembered. AI is, for
example, well advanced in reading and composing fixed images.14 Think of the
wonderfully entertaining output of machine learning tools such as DALL-E 2—
for example, the “paintings” of sumo wrestlers that were rendered in the style
of Giorgio de Chirico.15 And AI excels at providing visual backgrounds, spe-
cial effects, and transitional scenes in video games and film.16 As a result, an
12. Robert A. Rosenstone made this point in the 1990s in Visions of the Past: The Challenge of Film
to Our Idea of History (Cambridge, MA: Harvard University Press, 1995); Adam Chapman confirmed
it in Digital Games as History: How Videogames Represent the Past and Offer Access to Historical
Practice (New York: Routledge, 2016).
13. See, for instance, Kalle Pihlainen, The Work of History: Constructivism and a Politics of the
Past (New York: Routledge, 2017) and Jörn Rüsen, Evidence and Meaning: A Theory of Historical
Studies (New York: Berghahn Books, 2017).
14. That applies, for example, to AI’s ability to read faces of humans, and even fetuses, for positive
as well as very problematic purposes. For more on this, compare Lotte Houwing, “Reclaim Your Face
and the Streets: Why Facial Recognition, and Other Biometric Surveillance Technology in Public
Spaces, Should Be Banned,” in (Dis)Obedience in Digital Societies: Perspectives on the Power of
Algorithms and Data, ed. Sven Quadflieg, Klaus Neuburg, and Simon Nestler (Bielefeld: Transcript,
2022), 318–41, and Yasunari Miyagi et al., “Recognition of Facial Expression of Fetuses by Artificial
Intelligence (AI),” Journal of Perinatal Medicine 49, no. 5 (2021), 596–603.
15. Clemens J. Setz, “Digitale Kunst: Noch sind sie unschuldig,” Süddeutsche Zeitung,
14 June 2022, https://www.sueddeutsche.de/kultur/clemens-setz-dall-e-mini-kuenstliche-intelligenz
-jorge-luis-borges-1.5602238. For an assessment of DALL-E 2, see Gary Marcus, Ernest Davis, and
Scott Aaronson, “A Very Preliminary Analysis of DALL-E 2,” ArXiv, last modified 2 May 2022,
https://arxiv.org/abs/2204.13807v2.
16. Pei-Sze Chow, “Ghost in the (Hollywood) Machine: Emergent Applications of Artificial Intelli-
gence in the Film Industry,” NECSUS 9, no. 1 (2020), 193–214; Xiaoyu Li et al., “Deep Sketch-Guided
ever-expanding range of social memories come into existence through human
interaction with images that were generated by machine learning technology with
as yet underexplored and undertheorized consequences for the evolution of those
social memories.17 The machines have ingested a staggering number of human-
made images (along with all the biases and lacunae that these images entail) with-
out yet having recourse to any set of ethical guidelines that would enable them to
overcome these biases and lacunae.
There exist additional wrinkles in the triumphant advance of AI in the pro-
duction and curation of visual culture. Machine learning tools apparently have
a tough time dealing with the type of visual record that is particularly relevant
for historians and documentary filmmakers. Visual computer analytics are still
often foiled by moving images, especially lower-resolution documentary footage.
The applications have problems keeping track of the many variables involved
in recognizing a given moving object or person from different angles and in
different settings.18 Therefore, at the moment, copyright issues and commercial
viability aside, it is still impossible to assemble a complete digital inventory of
moving images that could revolutionize our engagement with twentieth- and
twenty-first-century visual history. Imagine a fully searchable digital archive of
all film material about a popular subject that AI could use to execute a straight-
forward prompt based on an exhaustive review and selective assemblage of all
existing footage about the topic. AI could then produce customized, non-fiction
television fare on the spur of the moment—for example, a documentary about
Adolf Hitler in the style of Ken Burns using the most spectacular footage in
existence. Until more powerful computing technologies—for instance, freely
programmable quantum computers—are generally available, such visions of total
visual recall remain elusive.
Let’s dream for a moment anyway, and let’s try to dream responsibly. Once
computers recognize Hitler and his ilk from all angles, audiences will have a
field day. Tapping into a large body of original footage and postwar productions,
AI technology could come up with highly competent results. Viewers could
essentially have AI create their own private video fare, assembling existing
film material, one shot at a time, to reflect their specific aesthetic and narrative
predilections. It should be immediately added, however, that this is an analog
dream befitting the intellectual needs of the historical profession, not a machine
learning dream reflecting the priorities of commercial AI providers. An AI
machine consisting of an AI archive that can analyze and categorize each shot
with regard to its subject matter, point of view, lighting, mood, technical quality,
Cartoon Video Inbetweening,” IEEE Transactions on Visualization and Computer Graphics 28, no. 8
(2022), 2938–52.
17. The rise of AI-supported memory culture highlights the need for critical digital memory stud-
ies. For more on this, see Digital Memory Studies: Media Pasts in Transition, ed. Andrew Hoskins
(New York: Routledge, 2018) and Anna Reading, Gender and Memory in the Globital Age (London:
Palgrave, 2016).
18. Tobias Ebbrecht-Hartmann, Lital Henig, and Noga Stiassny, “Report on Digital Curation
of Popular Culture Content,” Visual History of the Holocaust, last modified 31 December 2020,
https://www.vhh-project.eu/deliverables/d2-5-report-on-digital-curation-of-popular-culture-content/.
124 WULF KANSTEINER
et cetera, on the one hand, and AI technology that can splice these shots together
into a coherent whole in response to viewer input, on the other hand, would
retain a fully functional vertical memory of all material analyzed. If so desired by
its users, this wonderful, energy-craving machine would leave each shot intact,
free from digital manipulation, and could also reveal the precise origin of each
shot that it has integrated into, say, an on-demand documentary about Hitler’s
life. In essence, the machine, relying on a vast system of indexes, could provide
referential footnotes.19 In this regard, the machine would be the antithesis of
the AI technology that is currently all the rage. DALL-E 2 does not assemble
authentic visual material; it invents new (sometimes authentic-looking) material,
and its acts of apparent creativity arise from the structure of the corpus that
was used to train it, not from any specific images in that corpus. In that sense,
DALL-E 2’s output is categorically untraceable.
On first sight, our imagined visual memory machine should make historians
and archivists happy, although the complete inventory of twentieth- and twenty-
first-century documentary film and television material would only be able to pro-
vide truthful output if the material, vast as it is, has been subjected to human
quality control and consistently screened by AI filters that have been trained to
detect digital manipulation of the material. Otherwise, our visual database would
be quickly infected by other AI accomplishments, such as deepfakes, which reg-
ularly show up in the media stream and are difficult to eliminate from the rapidly
churning digital platforms that drive our memory culture.20 And deepfakes are
only the tip of the iceberg. Our imagined corpus of twentieth- and twenty-first-
century film and television footage will contain plenty of technically pristine, un-
manipulated historical propaganda material, and deep learning technologies will
give image to whatever biases the material contains. Nazi propaganda films, for
example, are structurally more compatible with neo-Nazi emplotments of Hitler’s
life than with anti-fascist ones. All the more reason for human quality control of
inputs and outputs, which, with regard to film and television, is an ambitious—
but perhaps not altogether unrealistic—proposition. If we think that the stories
and images we consume influence our memories, identities, and future behavior,
we should be very wary about letting AI craft our future entertainment on the
basis of our morally and politically deeply flawed cultural heritage. Good mem-
ory politics in the digital age appear to require sensible and pervasive censorship
efforts.21
Our utopian/dystopian thought experiment, which, for the time being, is un-
likely to become a reality, highlights a serious dilemma underlying all AI
19. A precursor of this machine exists in the form of the algorithm that enables users to access the
vast collection of visual material assembled in the USC Shoah Foundation’s Visual History Archive.
For more on this, see Todd Presner, “The Ethics of the Algorithm: Close and Distant Listening to the
Shoah Foundation Visual History Archive,” in Probing the Ethics of Holocaust Culture, ed. Claudio
Fogu, Wulf Kansteiner, and Todd Presner (Cambridge, MA: Harvard University Press, 2016), 175–
202.
20. Noah Giansiracusa, How Algorithms Create and Prevent Fake News: Exploring the Impacts of
Social Media, Deepfakes, GPT-3, and More (Acton: Apress, 2021).
21. Wulf Kansteiner, “Censorship and Memory: Thinking Outside the Box with Facebook,
Goebbels, and Xi Jinping,” Journal of Perpetrator Research 4, no. 1 (2021), 35–58.
technology. On the one hand, the technology only becomes so fabulously efficient
at the task of self-generating media products that are compelling to its human con-
sumers as a result of the vast, indiscriminate intake of similar media products. The
machine can decide what scene should plausibly follow another scene because it
can calculate what scenes are likely to follow one another based on the structure
of the vast corpus it has absorbed. It reliably creates an output that, for good rea-
sons, reminds media consumers of other productions about the same topic. On the
other hand, and this is important, the machine operates on the principle of sequen-
tial plausibility, not truthfulness. Its output is only as truthful and unbiased as the
archives the machine has gobbled up are.22
HISTORY
At this point in our counterfactual scenario, we are likely to thank historians who,
methodologically and aesthetically conservative as they are, have stuck to their
time-tested textual format for delivering historical knowledge. They are unlikely
to be deeply implicated in the visually and haptically immersive AI-generated
memory culture of the future. But there is no reason to relax. All the above-
described dilemmas regarding a future digitally enabled visual archive already ap-
ply to the world of texts. Large language models, unlike large film models, already
exist. They deliver human-like writing, cannot differentiate between truthful and
untruthful statements, and rapidly disseminate the structural prejudices contained
in the material that was used to train them. Precisely due to their text fetishism,
historians, following in the footsteps of other professional writers, are—whether
or not they like it—about to be catapulted from the backbenches to the forefront
of the digital revolution. And it indeed seems that LLMs can provide valuable,
time-saving assistance for writing academic texts, a fact that is probably not lost
on the more adventurous members of the historical profession, their supervisors,
and their students. After many years of hype and disappointment, AI has finally
delivered on the high expectations and is taking over the routine production of
texts for human consumption: “People whose jobs still consist in writing will be
supported, increasingly, by tools such as GPT-3.”23 As the most advanced, pub-
licly accessible large language model, GPT-3 produces over 4.5 billion words
each day, and with that output, it probably generates a significant share of all
texts intended for human consumption. As Tobias Rees puts it laconically: “We
are exposed to a flood of non-human words.”24
The large amount of writing produced by the historical profession could be a
dream come true for AI neural networks, and perhaps also for historians, although
22. DALL-E 2, for example, appears to largely follow OpenAI’s guidelines for ethical use.
For more on this, see Matthias Bastian, “OpenAI’s DALL-E 2 Is Pretty Compliant—But Who
Is Responsible Anyway?” The Decoder, 20 May 2022, https://the-decoder.com/openais-dall-e
-2-is-pretty-compliant-but-who-is-responsible-anyway/.
23. Luciano Floridi and Massimo Chiriatti, “GPT-3: Its Nature, Scope, Limits, and Consequences,”
Minds and Machines 30, no. 4 (2020), 691.
24. Tobias Rees, “Non-Human Words: On GPT-3 as a Philosophical Laboratory,” Daedalus 151,
no. 2 (2022), 180.
126 WULF KANSTEINER
“large” and “dream” are relative terms here. The combined output of all histori-
ans produced since the days of Ranke constitutes just a little snack for universal
language models like GPT-3.25 Moreover, the models probably cannot dream. Ac-
cording to most—but by no means all—experts, the models lack consciousness
and, therefore, the ability to plan ahead, to be truly creative, and to dream.26
Then again, how creative are historians really? Hasn’t the vision of history as
a scientific endeavor always included the idea that two professional historians—
having enjoyed comparable levels of graduate training, sharing similar ethical
values, and being supplied with the same data, the same research questions, and
the same conceptual schemes—will arrive at largely compatible, intersubjectively
valid results? The ideal type of a scientific historian resembles an algorithm in the
sense that the historian’s disciplined mind reliably produces similar outputs from
similar inputs. There seems to be a natural affinity between the most advanced
forms of AI and the discipline of history. Both display a particular fondness
for rules, predictability, and the written word. Consequently, historical knowl-
edge might be exactly the kind of method-bound imagination that AI can reliably
mimic. Like GPT-3, historians write about what they have read somewhere else,
but the decisive difference between the two is that historians try to keep track of
the “somewhere else.”
The marriage of AI and the historical profession will only make for a dream
wedding if AI develops consciousness and begins to care about the truthfulness
of its statements or, more likely, if AI tools can be assembled in a possible fu-
ture GPT-History that is built on different construction principles than currently
available LLMs are. For one, a GPT-History would need an archival component
that is a lot more forthcoming about the origins of at least some of its textual out-
puts. Like the GPT-Film imagined above, the GPT-History would need a vertical
memory that would enable it to provide footnotes for the quotes and summaries
of texts that it deems relevant to a given historian’s query. In addition, the text-
creating component of the GPT-History should be trained exclusively with the
help of truthful texts and up-to-date scholarship.
Most likely, machine learning already plays a role in history, as it does in many
text-focused, rule-governed arts and professions, such as literature, journalism,
and the law.27 In all these settings, AI is primarily (and sometimes sensibly) de-
ployed for routine, small-scale writing tasks, and the scale of successful AI writ-
ing is likely to increase rapidly. One implication for history is obvious. The disci-
pline will need to acquire more relaxed and flexible definitions of plagiarism, not
least of all because we will likely face a wave of clever undergraduates handing
in assignments that will have been written by machine learning tools and might
25. Tom B. Brown et al., “Language Models Are Few-Shot Learners,” ArXiv, last modified 22 July
2020, 8, https://arxiv.org/abs/2005.14165.
26. Cade Metz, “A.I. Is Not Sentient. Why Do People Say It Is?” The New York Times, 5 August
2022, https://www.nytimes.com/2022/08/05/technology/ai-sentient-google.html. See also the discus-
sion of GPT-3’s intelligence in Carlos Montemayor, “Language and Intelligence,” Minds and Ma-
chines 31, no. 4 (2021), 471–86.
27. Mike Sharples and Rafael Pérez y Pérez, Story Machines: How Computers Have Become Cre-
ative Writers (New York: Routledge, 2022); Amy Cyphert, “A Human Being Wrote This Law Review
Article: GPT-3 and the Practice of Law,” UC Davis Law Review 55, no. 1 (2022), 401–43.
be difficult to expose as fakes, especially if students use the tools consistently
throughout their college careers.28 According to Mike Sharples, “a student can
now generate an entire essay or assignment in seconds, at a cost of around 1 US
cent.”29 In fact, I think the responsible use of machine writing tools, especially the
skill of crafting effective natural language prompts for LLMs, should become a
key component of all undergraduate curricula. After all, we have also given up on
the idea that correct spelling is the mark of a true scholar. Plus, GPT-3 is an excel-
lent platform for some tasks, such as providing competent summaries of articles
and whole books.30
What works for students should, in principle, also work for scholars. Let’s con-
sider some innovative benchmarks in the field of Holocaust and genocide history
as counterfactual examples. In 2007, after decades of Holocaust memory, which
set a premium on the perspective of the victims, Saul Friedländer inserted that
perspective into a multiperspectival account of the established history of the Fi-
nal Solution.31 In 2010, the Eastern European history specialist Timothy Snyder
retold the history of mass crimes before and during World War II from the per-
spective of Eastern European collective memory as it evolved after the end of the
Cold War.32 Both projects attest to the relevance of social memories, and both
may have profited from text-mining software. For instance, text mining might
have helped both authors identify useful descriptions of specific historical scenes
or locales from survivors’ memoirs and other sources.33 In a similar vein, both
projects could have profited from text-generating AI technology had machines
like GPT-3 already existed. AI could have, for example, supplied short chrono-
logical summaries of Hitler’s or Stalin’s actions and statements, which the authors
could have processed further and inserted into their manuscripts.
For the time being, “short” is the operative term here. Depending on the
prompts used, GPT-3, for instance, tends to repeat itself after a few pages, and
it has a particular predilection for racist and sexist rants that reflect the domi-
nant themes of GPT-3’s diet of internet texts, social media feeds, digital books,
28. Nassim Dehouche, “Plagiarism in the Age of Massive Generative Pre-trained Transformers
(GPT-3),” Ethics in Science and Environmental Politics 21 (March 2021), 17–23.
29. Mike Sharples, “New AI Tools That Can Write Student Essays Require Educa-
tors to Rethink Teaching and Assessment,” LSE Impact Blog (blog), 17 May 2022, https://
blogs.lse.ac.uk/impactofsocialsciences/2022/05/17/new-ai-tools-that-can-write-student-essays
-require-educators-to-rethink-teaching-and-assessment/.
30. Bharath Chintagunta et al., “Medically Aware GPT-3 as a Data Generator for Medical Dia-
logue Summarization,” in Proceedings of the Second Workshop on Natural Language Processing for
Medical Conversations, ed. Chaitanya Shivade et al. (Stroudsburg: Association for Computational Lin-
guistics, 2021), 66–76. For a discussion of one of the many commercial providers of summarization
services, see Matt Payne, “State of the Art GPT-3 Summarizer for Any Size Document or Format,”
Width.ai (blog), 7 September 2021, https://www.width.ai/post/gpt3-summarizer.
31. Saul Friedländer, The Years of Extermination: Nazi Germany and the Jews, 1939–1945 (New
York: HarperCollins, 2007).
32. Timothy Snyder, Bloodlands: Europe between Hitler and Stalin (New York: Basic Books,
2010).
33. This could have been done in the way that Christopher R. Browning used the digital search tools
of the USC Shoah Foundation’s Visual History Archive while working on Remembering Survival:
Inside a Nazi Slave-Labor Camp (New York: Norton, 2010).
128 WULF KANSTEINER
and Wikipedia.34 Some experts have therefore concluded that “GPT-3 is an

extraordinary piece of technology, but as intelligent, conscious, smart, aware, per-
ceptive, insightful, sensitive and sensible (etc.) as an old typewriter.”35 More care-
fully tailored diets are likely to produce different results, although the repetitive
nature of the output cannot be easily circumvented with currently available tech-
nology. There is a randomness factor built into GPT-3 to enhance its “creativity,”
but the problem remains that GPT-3 does not forget the answer to the one question
it has been trained to answer over and over again: What word is most likely to fol-
low a given sentence or phrase? Algorithms can be ordered to omit or disfavor ex-
plicitly racist and sexist terms, but they will still repeat themselves. Asking them
to forget and improvise in a purposeful manner is difficult because the output of
an updated GPT version that has been taught a greater measure of forgetfulness
might lack conceptual and stylistic cohesion. At the moment, machine learning
technology is simply not embedded in the kind of complex neural networks that
are able to perform the innovative, motivated leaps of creative imagination that
are routinely accomplished by creative writers and historians. Nevertheless, on
first sight, the technology excels at crafting intelligent text blocks that are appar-
ently ready to be inserted into the next scholarly article or monograph—that is,
until one stumbles, again, over the little detail of the footnote and the referential,
vertical memory that footnotes represent.
In a few recent experiments, GPT-3 managed to write more or less compelling-
sounding academic articles, including a rather underwhelming article about itself
that was submitted for peer review.36 But in these experiments, GPT-3 simply
invented compelling-looking notes that do not refer to actually existing sources.
For instance, it created fictitious links to convincing-sounding websites and, even
more scary, references to fictitious journal articles. The references contain the
names of actual, prominent scholars publishing in the field and the titles of ac-
tual journals relevant to the field, but GPT-3 invented journal issues that do
not exist.37 Precisely because GPT-3 does not plagiarize—its sentences are not
flagged by anti-plagiarism software and it does not violate copyright as practiced
today—it also cannot cite correctly. GPT-3 has no explicit vertical memory. It
will never be able to tell us where the content of a given sentence originated be-
cause each new word is the result of thousands and thousands of calculations of
probability. GPT-3 can provide many useful semantic services, but “truth-telling
is not amongst them”—let alone truth-telling in a transparent, verifiable fashion.38
Therefore, there exists the very real danger that large language models will cause
a “permanent pollution of our informational ecosystem with massive amounts
of very plausible but often untrue texts.”39 It is difficult to predict for how long
34. Brown et al., “Language Models Are Few-Shot Learners,” 9.

35. Floridi and Chiriatti, “GPT-3: Its Nature, Scope, Limits, and Consequences,” 690.
36. Almira Osmanovic Thunström, “We Asked GPT-3 to Write an Academic Paper about
Itself—Then We Tried to Get It Published,” Scientific American, 30 June 2022, https://
www.scientificamerican.com/article/we-asked-gpt-3-to-write-an-academic-paper-about-itself-then
-we-tried-to-get-it-published/.
37. Sharples, “New AI Tools That Can Write Student Essays.”
38. Sobieszek and Price, “Playing Games with AIs,” 341.
39. Ibid.
this problem will persist. Wu Dao 2.0, the Chinese answer to GPT-3 that was
developed by the Beijing Academy of Artificial Intelligence and presented to the
public in 2021, is apparently much better than GPT-3 at remembering tasks it
has previously mastered, thus bringing AI “closer to human memory and learning
mechanisms,” including, perhaps at some point in the future, an appreciation of
truthful texts.40
In order to excel at creating truthful notes, machine learning tools would have
to collaborate with academic literature exploration tools (like Iris.ai) that might
be able to find appropriate references for automatically generated content.41 In
addition, in order to generate truthful content, a History-GPT should be trained
with texts that have passed quality control—for instance, peer-reviewed academic
writing and quality journalism. Finally, avoiding troublesome structural bias will
prove to be the biggest challenge. In its current form, GPT-3 knows no ethics
and is not a neutral machine. It favors the textual status quo as it appears in its
training feed, which thus renders critical writing directed against the grain of the
corpus difficult. There exist promising initiatives to improve GPT-3 ethics, and
careful content curation is one of them; this raises interesting questions about
whether publications like Ranke’s should be kept out of the training feed of the
text-generating component of our GPT-History and relegated to its archival com-
ponent and who should make that decision.42
A GPT-History deserving of this designation—that is, deserving to be called
an academic, scholarly machine—seems to require the intelligent, purposeful in-
tegration of two types of AI: a digitally enhanced, very diverse, analog-structured
collection of effectively indexed and, upon demand, competently summarized
sources, on the one hand, and a creative writing LLM component that has been
trained with state-of-the-art scholarship, on the other hand. Integrating these two
components into one text and deciding to what extent the output of the one side
requires adjusting the output of the other side will likely require human agency
for a long time to come.
In the same way that today’s scholarship identifies the software that was used
to generate and manipulate large quantitative data sets wherever applicable, future
scholarship will acknowledge the machine learning technology that was used to
compose intermediate work products, such as chronologies and descriptive sum-
maries of sources, and even the machine tools that were applied to generate, or
put the final stylistic touches on, entire manuscripts. Having machine learning
technology make one’s manuscript reflect the style of one’s academic hero might
constitute the last phase of the dissertation writing process.
40. Alberto Romero, “GPT-3 Scared You? Meet Wu Dao 2.0: A Monster of 1.75
Trillion Parameters,” Towards Data Science, 5 June 2021, https://towardsdatascience.com/
gpt-3-scared-you-meet-wu-dao-2-0-a-monster-of-1-75-trillion-parameters-832cd83db484.
41. Andy Extance, “How AI Technology Can Tame the Scientific Literature,” Nature, 10 Septem-
ber 2018, https://www.nature.com/articles/d41586-018-06617-5.
42. Anastasia Chan, “GPT-3 and InstructGPT: Technological Dystopianism, Utopianism, and
‘Contextual’ Perspectives in AI Ethics and Industry,” AI and Ethics (April 2022), https://doi.org/
10.1007/s43681-022-00148-6.
130 WULF KANSTEINER
HISTORICAL THEORY
The characteristics that diminish GPT-3’s value as an author of historical schol-

arship might turn it into an interesting playground for historical theory. A good
digital point of departure in this context is the well-established field of stylometry,
which measures the prevalence of stylistic features of texts on the level of indi-
vidual symbols, words, and sentences and on the level of more complex linguistic
structures—for example, rhetorical figures and larger, overarching textual pat-
terns.43 The method helps to establish literary and legal authorship and to describe
the evolution of writing styles over decades and centuries. Like so many other sta-
tistical analyses, stylometry lends itself to attractive visualizations—for instance,
in the form of heatmaps and dendrograms.44 To my knowledge, the method has
not been deployed to describe the style of individual historians or to track key
features of professional historical writing over the course of the last few hundred
years.
Stylometry and machine learning technology open opportunities for develop-
ing and testing hypotheses about the micro and macro grammar of professional
historical writing. Conventionally, historical theorists analyze samples of histor-
ical scholarship written by actual historians in order to identify the patterns and
rules that govern the discipline at large. LLMs enable us to turn that process on its
head by converting our assumptions about how history works into a set of instruc-
tions and having an LLM put these instructions into textual practice. Through a
kind of reverse historical theory, we can have machine tools write historical trea-
tises according to linguistic specifications that reflect our hypotheses about how
history works; then, we can see to what extent the results match up with histori-
ography created “in the wild.”
Historical theory is unsettled knowledge. There exist three different (and, in my
view, equally plausible) claims about the nature of history. For some theorists and
most historians, history is an empirical academic discipline that provides readers
with descriptions of carefully delineated segments of past reality.45 The linguis-
tic turn has suggested a different raison d’être for history, identifying narrative
performance as the primary intrinsic value of professional historical writing.46 Fi-
nally, some postnarrativists, such as Jouni-Matti Kuukkanen, have maintained that
historical texts should be appreciated and assessed as argumentative interventions
that were designed to convince readers of the soundness of a set of propositions.47
I think all three claims are correct. There is no reason to assume that history
is a zero-sum game that is exclusively concerned with description, narration, or
43. Jacques Savoy, Machine Learning Methods for Stylometry: Authorship Attribution and Author
Profiling (Cham: Springer, 2020).
44. K. V. Lagutina and A. M. Manakhova, “Automated Search and Analysis of the Stylometric Fea-
tures That Describe the Style of the Prose of 19th–21st Centuries,” Automatic Control and Computer
Sciences 55, no. 7 (2021), 866–76.
45. C. Behan McCullagh, The Truth of History (London: Routledge, 2003).
46. Alun Munslow, Narrative and History, 2nd ed. (London: Red Globe Press, 2019).
47. Jouni-Matti Kuukkanen, Postnarrativist Philosophy of Historiography (New York: Palgrave,
2015).
argumentation. In fact, I would suggest that history is a hybrid intellectual writing
activity that constantly blends description, narration, and argumentation on dif-
ferent levels of the text. At the same time, many pieces of historical writing can
be determined to serve one primary goal. As a result, there might actually exist at
least three different types of history that should be assessed according to different
criteria. There are narrative histories that use descriptions and argumentation to
deliver a good story, assertive histories that deploy narrative and descriptive pas-
sages to convince readers of the soundness of a specific argument, and descriptive
histories that marshal narrative and assertive text elements to capture a historical
scene or setting.48
To put the above hypotheses to the test, one could cast text linguistic defini-
tions of description, argumentation, and narration into algorithmic form in order
to determine the prevalence and distribution of descriptive, assertive, and narra-
tive sentences and passages in a given piece of historical writing.49 One could
then take a close analytical look at passages of intense description, narration,
and argumentation and seek to determine their semantic and possible hierarchi-
cal relationships to one another. In this context, it would be particularly interest-
ing to have machines extract the patterns of textual advancement in specific text
segments (such as introductions, conclusions, and passages that feature discus-
sions of colligatory concepts) while also having them track the different semantic
strands of the text (for instance, by distinguishing the level of historical exposition
from the level of historiographical contextualization).50 In short, one could deter-
mine how a given text functions linguistically, compare its text linguistic profile
to other publications, and even track the evolution of dominant textual construc-
tion principles over time in different subdisciplines and the field as a whole. Some
possible results include the observation that historians primarily used to narrate,
as maintained by narrativists since the linguistic turn of the 1970s, but that they
are now more likely to argue, as claimed by postnarrativists. Furthermore, one
might be able to illustrate empirically that historians favor descriptive styles of
text advancement when they begin working in a new research area, such as digital
history.51
48. Wulf Kansteiner, “History beyond Narration: The Shifting Terrain of Bloodlands,” in Analysing
Historical Narratives: On Academic, Popular and Educational Framings of the Past, ed. Stefan
Berger, Nicola Brauch, and Chris Lorenz (New York: Berghahn Books, 2021), 51–82.
49. Consider, in this context, the concrete text linguistic definitions of description, narration, and
argumentation that were suggested by Carlota S. Smith in Modes of Discourse: The Local Structure of
Texts (Cambridge: Cambridge University Press, 2003) and that have potential for digital operational-
ization. It sounds promising to combine these categories with the method of “middle reading” with the
help of AI; see Jon Chun and Katherine Elkins, “What the Rise of AI Means for Narrative Studies: A
Response to ‘Why Computers Will Never Read (or Write) Literature’ by Angus Fletcher,” Narrative
30, no. 1 (2022), 104–13.
50. Promising in this regard is the successful computer-based analysis of narrative structures in
Ryan L. Boyd, Kate G. Blackburn, and James W. Pennebaker, “The Narrative Arc: Revealing Core
Narrative Structures through Text Analysis,” Science Advances 6, no. 32 (2020), https://doi.org/
10.1126/sciadv.aba2196. See also Joshua Daniel Eisenberg, “Automatic Extraction of Narrative
Structure from Long Form Text” (PhD diss., Florida International University, 2018), https://
digitalcommons.fiu.edu/etd/3912.
51. Stephen Robertson and Lincoln Mullen, “Arguing with Digital History: Patterns of Histori-
cal Interpretation,” Journal of Social History 54, no. 4 (2021), 1005–22. Furthermore, some research
132 WULF KANSTEINER
A following step of reverse engineering would entail developing good prompts

to have a successor of GPT-3, preferably a truthful GPT-History, write descrip-
tive, narrative, and assertive histories of specific events or epochs, compare the
results with existing scholarship, and thus test the notion that fundamentally dif-
ferent types of historical scholarship indeed exist.52 By changing the test param-
eters from one test run to the next, we might, for instance, be able to determine
how far statements of fact reach into the narrative and argumentative superstruc-
ture of historical texts. Perhaps there are indeed different types of historical narra-
tion and argumentation that are more or less firmly integrated into and dependent
upon their respective empirical infrastructure. We can play with the linguistic and
semantic grammar of history, including levels of abstraction, degrees of empiri-
cal saturation, intensity of narrative immersion, and attention to logical detail. We
might be able to identify tipping points that mark the transition of an empirical
treatise into a narrative performance or a well-crafted argumentative intervention.
Well-designed LLMs offer experimental opportunities for historical theory that
never existed previously and a chance to develop precise yet flexible criteria of
judgment that do justice to empirical, narrative, and assertive pieces of scholar-
ship and the diversity of the field. LLMs might be able to teach us, in concrete
linguistic terms, that there are more ways to succeed and fail at the complex task
of writing history than we previously assumed.
CONCLUSION
What is to be gained from all of this? First and foremost, speed and scale and all
the advantages they entail. These advantages include attractive, engaging sites
of collective memory that reflect their creators’ political and ethical preferences,
a comprehensive command of sources and literatures that can no longer be at-
tained by individual scholars and research teams, new forms of multiperspectival
and multiscalar writing across different languages, the testing of hypotheses
and counterfactual scenarios at the spur of the moment, access to histories of
societies and epochs that have hitherto been neglected by professional historians
(including easy access to multimedia archives53 ), the rapid, in-depth analysis
indicates that descriptive texts are indeed more concerned with spatial than chronological advance-
ment; see Christine Peters, “Text Mining, Travel Writing, and the Semantics of the Global: An
AntConc Analysis of Alexander von Humboldt’s Reise in die Aequinoktial-Gegenden des Neuen Kon-
tinents,” in Digital Methods in the Humanities: Challenges, Ideas, Perspectives, ed. Silke Schwandt
(Bielefeld: Bielefeld University Press, 2021), 192.
52. For an assessment of the argumentative skills of language representation models, see Mayank
Kejriwal et al., “Designing a Strong Test for Measuring True Common-Sense Reasoning,” Nature
Machine Intelligence 4, no. 4 (2022), 318–22. Even GPT-2 is apparently already quite good at argu-
ing; see Khalid Al-Khatib et al., “Employing Argumentation Knowledge Graphs for Neural Argument
Generation,” in Proceedings of the 59th Annual Meeting of the Association for Computational Lin-
guistics and the 11th International Joint Conference on Natural Language Processing, vol. 1, ed.
Chengqing Zong et al. (Stroudsburg: Association for Computational Linguistics, 2021), 4744–54.
53. Oral history archives that have been rendered remotely accessible and indexed through AI tech-
nology capturing natural language and other social cues (such as technology that can identify facial
expressions and body language as well as analyze breathing) could, for instance, serve historians and
their machine learning writing tools. For more on this, see Francisca Pessanha and Almila Akdag
of linguistic and semantic patterns of historical writing on different scales, and
opportunities for testing basic axioms of historical theory from a vantage point
located within the historical text—that is, from the perspective of the writing
subject. All of this comes at a steep, possibly prohibitive, price. Machine learning
technologies are expensive, have huge carbon footprints, and are designed to
serve the interests of their commercial owners.54 Somebody would have to pay
for academia-compatible machine learning technology. This GPT-History should
be publicly controlled, not least of all to ensure that it adheres to democratically
legitimated rules of censorship. Finally, its output needs to advance the cause
of environmental protection quickly and decisively; otherwise, the required
investment of scarce resources would be difficult to justify. As GPT-3 so aptly
reminds us: “Large language models require a lot of electricity to train. This
electricity consumption results in greenhouse gas emissions, which contributes to
climate change. Additionally, these models are often hosted on servers that use
fossil fuels, further worsening their environmental impact.”55
Aarhus University
Salah, “A Computational Look at Oral History Archives,” Journal on Computing and Cultural Her-
itage 15, no. 1 (2021), https://doi.org/10.1145/3477605.
54. Dieuwertje Luitse and Wiebke Denkena, “The Great Transformer: Examining the Role of
Large Language Models in the Political Economy of AI,” Big Data and Society 8, no. 2 (2021),
https://doi.org/10.1177/20539517211047734.
55. This output was created on OpenAI’s playground (https://beta.openai.com/playground) on 26
September 2022 in response to my prompt, “Write one paragraph about the negative effects of large
language models like GPT-3 on the world’s climate” with a temperature setting of 1 and a maximum
length of 485 tokens.

History and Theory - 2022 - KANSTEINER - DIGITAL DOPING FOR HISTORIANS CAN HISTORY MEMORY AND HISTORICAL THEORY BE

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

History and Theory - 2022 - KANSTEINER - DIGITAL DOPING FOR HISTORIANS CAN HISTORY MEMORY AND HISTORICAL THEORY BE

Uploaded by

Copyright:

Available Formats

History and Theory 61, no. 4 (December 2022), 119–133 © 2022 The Authors.

History and Theory published by

DIGITAL DOPING FOR HISTORIANS: CAN HISTORY, MEMORY, AND

There exist important differences in the social construction of historical truth

and Wikipedia.34 Some experts have therefore concluded that “GPT-3 is an

34. Brown et al., “Language Models Are Few-Shot Learners,” 9.

The characteristics that diminish GPT-3’s value as an author of historical schol-

A following step of reverse engineering would entail developing good prompts

You might also like