You are on page 1of 15

‘We can’t read it all’: Theorizing a

hermeneutics for large-scale data


in the humanities with a case study
in stylometry

Downloaded from https://academic.oup.com/dsh/article/37/4/1157/6459241 by UNAM user on 02 March 2023


............................................................................................................................................................
Hannah Ringler
Department of English, Carnegie Mellon University, Pittsburgh,
PA 15213, USA
......................................................................................................................................

Abstract
Computational methods often produce large amounts of data about texts,
which create theoretical and practical challenges for textual interpretation.
How can we make claims about texts, when we cannot read every text or analyze
every piece of data produced? This article draws on rhetorical and literary
theories of textual interpretation to develop a hermeneutical theory for gaining
insight about texts with large amounts of computational data. It proposes that
computational data about texts can be thought of as analytical lenses that make
certain textual features salient. Analysts can read texts with these lenses, and
argue for interpretations by arguing for how the analyses of many pieces of data
support a particular understanding of text(s). By focusing on validating an
understanding of the corpus rather than explaining every piece of data, we allow
space for close reading by the human reader, focus our contributions on the
humanistic insight we can gain from our corpora, and make it possible to glean
insight in a way that is feasible for the limited human reader while still having
Correspondence: Hannah
strategies to argue for (or against) certain interpretations. This theory is dem-
Ringler, Carnegie Mellon
University, Pittsburgh, PA onstrated with an analysis of academic writing using stylometry methods, by
15213, USA. E-mail: offering a view of knowledge-making processes in the disciplines through a close
hringler@andrew.cmu.edu analysis of function words.
.................................................................................................................................................................................

1 Introduction doubled every year from 2015 to 2020. In digital


humanities spaces, researchers have been grappling
As ‘big data’ has found a place in the sciences and with how data from computational analysis can con-
humanities alike over the last few decades, the prob- tribute to traditional humanistic inquiry: Rockwell
lem of how to tackle interpretation and meaning- and Sinclair (2016), for example, engaged explicitly
making with unassailable amounts of data has with how computational tools can model different
gained interest while continuing to be theoretically views of texts that help analysts to read them in new
and practically difficult. On arXiv, an open-access ways, and Ramsay (2011) theorized an ‘algorithmic
archive for research in some science, technology, en- criticism’ in which tools enable criticism and inter-
gineering, and math (STEM) areas, the search hits pretation rather than just verify hypotheses. Digital
for ‘interpretable machine learning’ have nearly tools in the humanities should ideally ‘assist the

Digital Scholarship in the Humanities, Vol. 37. No. 4, 2022. V


C The Author(s) 2021. Published by Oxford University Press on 1157
behalf of EADH. All rights reserved. For permissions, please email: journals.permissions@oup.com
https://doi.org/10.1093/llc/fqab100 Advance Access published on 10 December 2021
H. Ringler

critic in the unfolding of interpretive possibilities’ methodological tensions in working with texts in
(Ramsay, 2011, p. 10), but when faced with tables particular.
upon tables of numerical data about a corpus or text, In this article, I use stylometry methods as a case
the realities of how to actually make that happen are study for investigating hermeneutic questions in com-
often unclear. putational humanities work: when faced with large
Stylometry is one example of a method that has amounts of numerical data, how do we make sense
been especially tricky to tackle in terms of interpret- of it in a way that helps us to interpret artifacts and
ation because of the large amount of data produced. engage in humanistic inquiry, even though we may

Downloaded from https://academic.oup.com/dsh/article/37/4/1157/6459241 by UNAM user on 02 March 2023


Stylometry methods are most commonly used for not be able to read every text? What is our hermen-
authorship attribution, and typically work by calculat- eutical theory? Mailloux (1991) reminds us that a her-
ing distances between texts in a corpus with the ex- meneutic theory does not guarantee an accurate or
pectation that texts with smaller distances between ‘correct’ interpretation, but rather ‘provide[s] add-
them are more stylistically similar and thus more likely itional argumentative strategies for making one’s
to be written by the same author. The methods are case’ (p. 246). When we use computational methods
quite effective, and have been used for cases like iden- in the humanities to make claims about artifacts, what
tifying the authorship of Federalist papers (Mosteller are our argumentative strategies for making the case
and Wallace, 1964), and suggesting that Robert for an interpretation of lots of data? I take up these
Galbraith, the author of A Cuckoo’s Calling, was actu- questions in part in this article, drawing on theory on
ally a penname for J.K. Rowling (Juola, 2013). textual interpretation to make an argument for a her-
The distances calculated between texts with styl- meneutics or interpretive theory for stylometry meth-
ometry methods are typically based on the distribu- ods in particular that has as its goal gaining insight
tion of most frequent words, which are often function about the corpus. In doing so, I hope to offer more
words like the, and, or of. Thus, Evert et al. (2017) traditional humanistic insights into a particular cor-
explain that the distances between texts (and determi- pus, while proposing a theoretical framework for
nations of authorship) are therefore based on many thinking about how to interpret textual corpora
variables that are likely weak discriminators on their through computational methods that produce large
own, as opposed to a small number of stronger vari- amounts of data.
ables. Interpreting the results of a stylometric analysis,
or in other words asking, why does this work? or what
do we learn from discriminating authors?, has been 2 Interpretation of Texts in
widely recognized as difficult because of the huge Computational Humanities
amount of data and variables at play (e.g. Mosteller
and Wallace, 1964; Craig, 1999; Burrows, 2002; In literary and rhetorical criticism, the human reader
Kestemont, 2014). While some have begun to explain has been historically valued as an integral part of her-
why stylometry methods work from a mathematical meneutics. From Richard’s (1929) close reading to
perspective (e.g. Argamon, 2008) or begun to analyze Black’s (1978) rhetorical criticism, interpretation of
what they might tell us about authors or genres (e.g. texts has historically involved analysts reading individ-
Craig, 1999; Argamon et al., 2008; Noecker et al., ual texts with different processes so as to better under-
2013), we still lack a robust interpretive framework stand them and their contexts. Moretti (2013) most
for thinking more theoretically about how to make notably challenged this focus on individual texts in his
sense of the amounts of data produced by these meth- argument for ‘distant reading’, spurring theoretical
ods and others like them. Certainly fields like data debates about the place and value of human close
science or statistics have developed interpretive theo- reading within and given distant reading approaches
ries for certain purposes, but as van Zundert (2016) (e.g. Hockey, 2000; Ramsay, 2011).
explains, the use of computation in textual interpret- While close and distant reading each can stand on
ation and bringing together of disciplines requires new their own to allow new and nuanced readings of texts,
forms of hermeneutics, given the theoretical and considering how they might complement each other is

1158 Digital Scholarship in the Humanities, Vol. 37. No. 4, 2022


We can’t read it all

also a fruitful, if difficult, space. Many scholars have With very large textual corpora though, serial
argued for the value of combining the two approaches, reading for a particular feature quickly becomes an
arguing, for example, that ‘the macroscale perspective unmanageable task. An analyst might serial read
should inform our close readings of the individual through a handful of texts and develop ideas about
texts’ (Jockers, 2013, p. 28), that artificial views of a why certain textual patterns occur, but how do we
text ‘encourage a reader to read it differently’ know that the reasons scale? As corpora get larger,
(Rockwell and Sinclair, 2016, p. 189), or that macro statistical differences become more stable but the
approaches can help us model and theorize about the analyst’s ability to read each text closely wanes.

Downloaded from https://academic.oup.com/dsh/article/37/4/1157/6459241 by UNAM user on 02 March 2023


relationships between specific textual and social vari- When working with big data in the humanities, com-
ables (Underwood, 2016, p. 531). The models that we putational methods offer a unique insight into regu-
develop of texts and corpora with computational larly occurring patterns that can act as lenses to shape
methods can also be used to gather insight into the our close reading, but we also face a verifiability prob-
texts themselves—the model becomes a springboard lem in knowing that our interpretation scales appro-
for imagining new interpretive possibilities in readings priately to the entire corpus. Some initial work has
of individual texts, as opposed to only looking at a big begun to fruitfully theorize how to move between
picture (this is what Breiman, 2001, p. 199 refers to as close and distant reading toward interpretation (e.g.
the ‘information’ goal in statistical data analysis). In Piper, 2015), but there are still many practical and
some sense, using computational methods in this way theoretical challenges to developing hermeneutics
is still distant in that in can provide a look at an entire for the variety of computational methods in the
corpus, but closer in that it highlights features of par- humanities, especially regarding questions of scale.
ticular texts to be noticed and analyzed by a reader.
These patterns, once noticed, might lead to new in- 2.1 Ricoeur’s hermeneutical arc and
sight into a text or corpus as close reading traditionally big data
would provide, but also are insights which distant Paul Ricoeur was a 20th-century philosopher who
reading alone cannot interpret and which close read- engaged with text interpretation and hermeneutical
ing alone cannot easily provide given the limits of the theory, and whose work can help us to think critically
human serial reader to both notice and trace them about text interpretation theoretically, including in a
across entire corpora. big data context. Ricoeur (1981) described hermeneut-
To fully explore and more deeply understand the ics as ‘the theory of the operations of understanding in
patterns revealed by computational methods then, we their relation to the interpretation of texts’ (p. 43), and
need to grapple with the role of the human reader and saw the understanding of texts as at least somewhat
close reading. The relationships and patterns that stat- separate from the author’s intent and created by readers
istical models reveal can function as a sort of analytical at different points in time. In particular, Ricoeur (1981)
lens, making present textual patterns that may have theorized a ‘hermeneutical arc,’ or different phases that
previously gone unnoticed but which the analyst can readers go through in constructing their understanding
now read for with the knowledge that they statistically of a text. Bell (2011) slightly adapted this arc for dis-
recur. For example, a stylometry analysis might show course analysis into what he called the ‘interpretive arc’.
that the frequency of the helps statistically differentiate The ideas share a lot of similarities, and I draw my
one corpus from another. This is a pattern likely to description of the ‘arc’ that follows heavily from both
have gone unnoticed by the average serial reader, both sources, as Bell’s adaptation has a useful focus on text-
because of how common (and thus, frequently un- ual interpretation as we might do in close reading.
noticed) the is, and how slight (even if regular) the Ricoeur describes the process of interpretation as
difference between corpora might be. Once pointed having several steps that the reader progresses
out by models though, the analyst can return to close through. First, the reader experiences distanciation,
reading texts with a focus on this word to more deeply meaning texts are necessarily estranged and separated
investigate. from the author and their original intent through time

Digital Scholarship in the Humanities, Vol. 37. No. 4, 2022 1159


H. Ringler

and context (Ricoeur, 1981). Second, the reader expe- they might reveal. After we conduct computational
riences pre-understanding, which is the state of mind analyses, we are then in a space to start making
that the reader encounters a text with (Ricoeur, 1981). guesses about what those numbers mean. But at
For any text, a reader might have some knowledge or this point, we are only in Ricoeur’s ‘naı̈ve under-
preconceived notions of the topic, author, and so standing’. It is not until we return to reading indi-
forth, which shape their impression of the text before vidual texts that we can start the process of analysis/
they ever read it. Next, the reader experiences naı̈ve explanation and making sense of bits and pieces of
understanding, or their initial, even visceral and im- data (e.g. making sense of why the is one of many

Downloaded from https://academic.oup.com/dsh/article/37/4/1157/6459241 by UNAM user on 02 March 2023


pressionistic, guess at what the text means (Ricoeur, weak discriminators in a stylometric analysis).
1976). Ricoeur talks about this initial understanding But how are we to know that the explanations of
as a guess in that readers do not have access to what those bits and pieces scale? In other words, how do we
authors meant, and the understandings generated validate the understandings that we are forming and
here are starting points and not the result of analysis. gathering bits of evidence to support? Ricoeur
After developing these initial guesses, the reader addresses the concern of validating guesses (though
moves into more analytic processes, which are the not specifically on a corpus scale), falling somewhat
processes we are working towards with computational in line with Karl Popper’s (1959) falsifiability thesis:
data in the humanities. there are many possible interpretations, but the best
Following naı̈ve understanding, the reader moves interpretation is the one that is the least capable of
into developing explanation (or what Bell (2011) calls being falsified (Ricoeur, 1976, p. 79). Bell (2011) fur-
‘analysis’) and then understanding. Bell’s reading of ther explains that, ‘as discourse analysts, we know that
Ricoeur is especially useful here, as Bell envisions ex- we are not actually able to prove that our reading of a
planation/analysis as the goal of discourse analysis text is the ‘right’ or even the best one. But we can try to
with texts. Bell (2011) describes these phases as fol- demonstrate that competing readings are less valid or
lows: ‘Understanding is mediated by explanation/ana- probable than the alternative’ (p. 537). In sum, the
lysis, which is fulfilled in understanding. While more probable interpretation of a text is that which
explanation lays out the parts through analysis, under- ‘takes account of the greatest number of facts fur-
standing grasps the whole through synthesis’ (p. 534). nished by the text’ (Ricoeur, 1981, p. 175). Heuser
In other words, analysis of the text helps to explain bits and Le-Khac (2011) actually make a related point
and pieces of what the text means. Combined as a about computational text analysis, arguing that inter-
whole, the reader can then gain understanding or pretation can be made ‘more rigorous by having to
‘the ability to take up again within oneself the work account for a wide set of related observations’ (p. 85).
of structuring that is performed by the text’ (Ricoeur, When arguing for an interpretation of computa-
1991, p. 18). The processes of analysis may validate (or tional data, we can do so not by objectively proving
challenge) the reader’s initial guesses about what the our understanding is correct but rather by demon-
text means, but are ‘the second-order operation strating how our understanding takes account of as
grafted onto this understanding which consists in many facts as possible furnished about the corpora
bringing to light the codes underlying this work of through both computational and traditional analyses.
structuring that is carried through in company with The distinction between analysis and understanding
the reader’ (Ricoeur, 1991, p. 18). becomes useful here, as many different pieces of ana-
Ricoeur was not interested in computational text lysis might support one particular understanding. By
analysis, but turning back to the problem of inter- showing how several disparate pieces of data analysis
preting big data in the humanities, we can think all support one understanding of the corpus (even if
about reading computational data similarly to each analysis results from close readings of only a few
how Ricoeur describes reading texts. We are dis- texts), we account for more and more facts and
tanced from it in a sense, in that computational strengthen the argument for both the analyses and
representations of texts are strange, numerical rep- that particular understanding.
resentations of actual written texts. We may even A hermeneutics for big data in the humanities can
come to these corpora with notions about what thus focus on demonstrating how the analyses of

1160 Digital Scholarship in the Humanities, Vol. 37. No. 4, 2022


We can’t read it all

many pieces of data support a particular understanding of pieces of data to be analyzed that all point to one
of a text or corpus. Indeed, we can even conduct add- larger understanding.
itional computational analyses and create more data so In the remainder of this section, I first describe the
as to cross-validate our understanding with new forms corpus used for analysis and then detail the stylometry
of data analysis that we can test our understandings analysis and analysis of separate pieces. I then present
(rather than specific analyses) against. This approach this analysis and show how many pieces of data can be
to hermeneutics has resemblances to Piper’s (2015) analyzed to argue for broader understandings of the
model of computational hermeneutics wherein a reader corpus.

Downloaded from https://academic.oup.com/dsh/article/37/4/1157/6459241 by UNAM user on 02 March 2023


oscillates between close and distant reading to converge
on an interpretation, but expands upon that model by 3.1 Methods
theorizing how to form and argue for a deeper textual
3.1.1 Corpus
interpretation from a model despite not being able to
read every text and explore every piece of data. By A wide variety of corpora could be used to illustrate
focusing on validating an understanding rather than hermeneutical theory, but this analysis in particular
many different explanations of many different pieces uses academic research texts. Academic texts are
of data, we allow space for close reading by the human often already a source of corpus and detailed discourse
reader, focus our contributions on the humanistic in- analysis in the English for Academic Purposes com-
sight we can gain from our corpora, and make it pos- munity, and can speak to more theoretical debates
sible to glean insight in a way that is feasible for the around disciplinarity and epistemology. To focus
limited human reader while still having strategies to this analysis, academic research texts are conceived
argue for (or against) certain interpretations. as representing knowledge-making processes as they
appear in text across disciplines. While philosophers
as far back as Plato in Theaetetus or Gorgias to
3 A Case Study in Stylometry Descartes to Kuhn (1962) theorized various social
and logical influences in how we conceive of know-
Moving from theory to praxis is difficult, and thus this ledge, rhetoricians and language scholars often point
section is intended to both illustrate this hermeneutical to the role of texts for socially constructing and trans-
theory and detail it more specifically for stylometry mitting that knowledge: for example, Bazerman
methods. Stylometry methods are most commonly (1992) argued for the importance of focusing on texts
used in a classificatory way, meaning that they show when considering knowledge production in academic
similarity between texts that share the same classifica- disciplines, claiming that the text is ‘the medium
tion (like authorship). Many computational text ana- through which knowledge is transmitted and is the
lysis methods are effective at classification, and to gain matter in which knowledge is embodied outside the
insight into corpora from these methods, we can start consciousness of any individual’ (p. 31).
by investigating what textual features they use to make Research texts thus textually represent the huge
classifications and then asking what those features tell variety of knowledge-making processes that occur
us about the corpus. Stylometry methods in particular within the academy. Knowledge-making processes
show similarity between texts by summing up small are plural and many as we consider the wide range
differences in the usage of several words to calculate of disciplines and types of knowledge in the academy
‘distances’, and thus the distance between texts as a (and indeed, vastly more so when we look outside the
whole is the accumulation of many small differences. academy). In crafting a text, authors work within the
These methods, then, create a useful type of data to processes and standards that a disciplinary commu-
demonstrate what this hermeneutical process might nity needs for accepting their claim. For example, a
look like, simply because of the wealth of data they biologist who outlines the details of an experiment
create and how differences in each word’s frequency and a literary historian who recounts the historical
might not be especially informative on its own but context around a novel are both supporting their final
together accumulate to a larger effect (Evert et al., claims by narrating particular lines of inquiry and
2017). In that sense, there is an opportunity for lots knowledge-producing processes. Because of this,

Digital Scholarship in the Humanities, Vol. 37. No. 4, 2022 1161


H. Ringler

rather than distinguishing disciplines in terms of the Educational Researcher, and Teaching and Teacher
specific content matter that they engage with, writing Education (five years * five journals * thirteen disci-
theorists like Carter (2007) have found it more pro- plines ¼ 1,625 articles). In total, the corpus contains
ductive to conceive of disciplines as ‘ways of knowing 13,604,023 words.
and doing’ rather than static repositories of know- Each text was formatted as a .txt file and cleaned by
ledge. For example, Carter categorizes disciplines hand to remove footnotes, endnotes, appendices,
like animal science, accounting, and engineering as headers and footers, bibliographies, page numbers,
‘problem-solving’ disciplines, and other disciplines and other unreadable characters. Image captions and

Downloaded from https://academic.oup.com/dsh/article/37/4/1157/6459241 by UNAM user on 02 March 2023


like biology, chemistry, and sociology as ‘empirical section headers were retained because the text of these
inquiry’ disciplines. Especially because of the con- have some meaningful place in the argument of the
stantly shifting state of knowledge in a discipline text itself, and do not regularly serve as only citation-
over time, thinking of a discipline as a particular related or otherwise meta information (while foot-
way of knowing and doing around an idea allows notes or endnotes do sometimes provide detail that
for a more stable description of the work of a is pertinent to the text itself, depending on the citation
discipline. style they sometimes provide purely bibliographic in-
These particular ways of knowing and doing are formation and thus were not retained to be consist-
precisely the knowledge-making processes that hap- ent). The goal was for each text file to contain only the
pen in a discipline, and the ways that knowledge gets text of the article itself, and to remove other extrane-
constructed in texts. As such, generic features and ous or meta-related information.
regularities within texts or disciplinary writings give
a window into certain regularities within the social 3.1.2 Stylometric analysis
acts of what disciplines do and how. MacDonald Stylometric analyses typically use the top most
(1994) echoes this, claiming that ‘if academic writing frequent words to show similarity between texts,
is a form of knowledge-making, then differences in which in most corpora are primarily function words.
knowledge problems or ways of addressing such prob- For this particular analysis, I chose to calculate distan-
lems should account for much of the variation among ces between texts using only a list of 209 function
the disciplines’ (p. 21). For example, different meta- words (rather than the top most frequent words), so
genres like the scientific article or literary criticism as to focus the analysis on words more clearly related
structurally represent the epistemic commitments of to form and style rather than content (though the two
the metadisciplines they are tied to (Carter, 2007). are not neatly separable), as well as to demonstrate
This analysis thus investigates what we can learn about methods of interpretation that will be more easily
knowledge-making and disciplinarity through study- transferrable to other stylometric analyses. In practice,
ing textual features as represented in a stylometric using only function words or the top 200 most fre-
computational model. quent words produced very similar results. The list of
The actual corpus consists of the texts of 1,625 209 function words used in this analysis was derived
academic journal1 articles, gathered equally from thir- from the Longman Grammar (Biber et al., 1999), and
teen disciplines. The goal of the corpus construction excluded pronouns given their close connection to
was to obtain a random sampling of articles published subject matter.
in disciplines across academia that was not heavily Figures 1 and 2 show the corpus visualized using
influenced by time or specific publication venues. Delta distances (Burrows, 2002) with only function
For this reason, each discipline’s subcorpus was gath- words, and visualized as a network per Eder (2017)
ered equally across five top publication venues in the in Fig. 1 and as a dendrogram in Fig. 2.
field,2 and across the five-year span of 2013–17. For Figure 1 demonstrates that, based on the frequen-
example, in the education sub-corpus, five articles cies of function words, disciplinary writing clusters
were randomly selected from all of the research articles together. In other words, articles within the same dis-
published in each year (2013–17) in the following ciplines appear to have similar frequencies of function
journals: Higher Education, American Educational words and therefore small distances and thicker lines
Research Journal, Early Childhood Research Quarterly, between them. Figure 1 suggests two large groupings

1162 Digital Scholarship in the Humanities, Vol. 37. No. 4, 2022


We can’t read it all

Downloaded from https://academic.oup.com/dsh/article/37/4/1157/6459241 by UNAM user on 02 March 2023


Fig. 1 A visualization of distances between texts in a corpus of 1,625 academic journal articles. Each text is colored
according to its discipline. Every point represents one unique article, and thicker lines indicate that the texts are closely
related in terms of function word frequencies (for more detail, see Eder, 2017). The distances were calculated using
Burrows’ (2002) Delta, and were completed in R with Stylo (Eder et al., 2016). The visualization itself was created with Gephi
(Bastian et al., 2009)

of disciplines on the left (sciences) and right (human- to build the whole tree. In this sense, it privileges the
ities). These groupings indicate similar frequencies of first-nearest neighbor when mapping how close texts
function words between groups of similar disciplines, are. The diagram in Fig. 1 is created using a technique
or metadisciplines (Carter, 2007). described by Eder (2017), and is especially informative
The split between the humanities and sciences because the connections depicted take into account
becomes even more pronounced when the distances not only the first-nearest neighbor, but also the second
are visualized on a traditional dendrogram using clus- and third. In that sense, Fig. 1 gives a more nuanced
ter analysis, as in Fig. 2. Figure 2 is created using the view as to the many relationships between texts, while
same Burrows’ Delta measurements used for Fig. 1. Fig. 2 allows for an easier visualization of groups.
The dendrogram in Fig. 2 is created by drawing con- An analysis could focus on any level of grouping
nections between whichever two texts have the smallest from comparing two subcorpora to comparing every
distance between them, and then working up this way individual text. I choose a high-level, two subcorpora

Digital Scholarship in the Humanities, Vol. 37. No. 4, 2022 1163


H. Ringler

Downloaded from https://academic.oup.com/dsh/article/37/4/1157/6459241 by UNAM user on 02 March 2023


Fig. 2 A dendrogram created using agglomerative hierarchical cluster analysis. The distances between texts were calculated
using Burrows’ Delta. Each endpoint represents one text, which are colored according to discipline

XP A l B l 
split (suggested by the groupings in Figs 1 and espe-  i i i i
DðiÞ ¼   
cially 2) to prioritize a broad view of the corpus and c¼1 ri ri
manageability of analysis. To analyze this split, I focus
on the words that drive the difference between the where D ¼ the total distance contributed by a function
humanities and sciences. These words are identified word i summed over every pair of humanities and
by calculating the distance contributed by each func- science texts; i ¼ the function word being analyzed;
tion word to the distance between every possible pair c ¼ the combination of humanities and science texts
of humanities and science texts, and then summing up being compared; P ¼ the total number of unique pairs
the total distances by word. To isolate the distance of humanities and sciences texts (for this particular
contributed by each word throughout the whole cor- corpus, that is 625,000 possible unique pairs); A, B
pus, I slightly modify the traditional Burrows’ Delta ¼ texts being compared; Ai ¼ frequency of i in text
formula to the following: A; Bi ¼ frequency of i in text B; li ¼ mean frequency

1164 Digital Scholarship in the Humanities, Vol. 37. No. 4, 2022


We can’t read it all

Table 1. The top eleven function words that contribute most to the distance between the sciences and humanities
Word D Subcorpus Log-likelihood Effect size N
the 794,889.6 Sciences 286.41 0.053 864,090
with 786,419.2 Sciences 1,367.87 0.347 98,942
for 784,973.8 Sciences 1,599.56 0.319 137,746
but 778,957.4 Humanities 5,001.60 1.372 32,883
by 759,898.5 Sciences 385.97 0.205 81,460
not 756,177.8 Humanities 5,471.20 1.045 60,362

Downloaded from https://academic.oup.com/dsh/article/37/4/1157/6459241 by UNAM user on 02 March 2023


is 753,195.3 Sciences 1,140.84 0.247 171,836
who 751,306.2 Humanities 9,514.77 4.188 15,109
where 749,233.5 Sciences 2,604.24 1.182 15,724
can 747,434.4 Sciences 1,374.20 0.599 37,359
to 744,783.4 Humanities 7,502.43 0.491 304,778

The D column shows the sum of the Delta distances between each science and humanities text for that particular function word. The
subcorpus column shows which subcorpus (sciences or humanities) that the function word is most common in, based on which subcorpus
has the highest average z-score for that word. The table also shows the log-likelihood measure of the word for the subcorpus it is most
common in (using the less-common subcorpus as a reference corpus), the effect size of the word for the subcorpus it is most common in
using log-ratio (Hardie, 2014), and the total number of instances in the corpus (N).

of i in the corpus; ri ¼ standard deviation of frequen- rhetorical strategies that support different under-
cies of i in the corpus. standings of the corpus; and by using the descriptions
Table 1 demonstrates which words contribute of function words in grammar resources like the
the most to the total distance between sciences Longman Grammar (Biber et al., 1999) to identify cer-
and humanities texts in the corpus. This table forms tain rhetorical patterns that the words were used in.
the basis of the textual analysis, methodologically While it was not feasible in most cases to verify the
detailed in Section 3.1.3. precise frequency of usage patterns (outside of mas-
sive hand-coding efforts), the various strategies listed
3.1.3 Textual analysis here allowed for a refinement and testing of the under-
For the top four function words in each subcorpus, standings of the corpora that the function words sug-
several individual texts were identified that had the gested. Ultimately, the function word analysis allowed
most average z-scores for both subcorpora on each for new insights into knowledge-making processes as
word. I serial read through each of these texts, taking they occur textually across disciplines and expansions
note on the first read of the general argument and of existing insights that other types of analyses have
structure of the texts. Using a concordance table, I suggested on their own, by drawing attention to very
then re-read through the texts paying attention to specific grammatical patterns and rhetorical moves
where the function words tended to be used and in that regularly occurred throughout the texts.
what kinds of constructions. While function words are
indeed often used in a variety of different ways (for 3.2 Analysis: A theory of knowledge
example, to can be used as part of an infinitive and a production in the humanities and sciences
preposition), automatically tagging for these patterns When we look at the function words in Table 1, then,
is difficult to do accurately and other approaches can what do these words reveal about the work of these
be used to pull out more interpretively useful patterns. disciplines? What kinds of theories of knowledge pro-
In particular, as patterns of usage emerged and sug- duction across the sciences and humanities do they
gested understandings of the corpora, I then tested support? Beginning with the sciences, the analysis sug-
those understandings in several ways: comparing gests that much of scientific writing is heavily engaged
them to how other function words were being used; in description of the physical and technological
conducting rhetorical analyses with Docuscope3 worlds. On average, at least 5.4% of science papers
(Ishizaki and Kaufer, 2012), which helped to identify are made up of descriptive language, as opposed to

Digital Scholarship in the Humanities, Vol. 37. No. 4, 2022 1165


H. Ringler

only 3.6% in the humanities.4 This description ultim- interpretations or meanings of those specified objects,
ately forms the basis of analysis that supports building and create only one true, possible object.
precise models and theories about these worlds that This function of the might be seen more clearly
are more generalizable and useful for future applica- when contrasted to article usage in the humanities.
tions. The process of creating knowledge in this way is While the of course exists in humanities writing, it is
represented by the prevalence of the, with, and for in slightly less frequent because objects are often framed
the sciences, per Table 1. as plural and many. For example, this sociology paper
In the sciences broadly, the primarily supports describes models created to represent how separated

Downloaded from https://academic.oup.com/dsh/article/37/4/1157/6459241 by UNAM user on 02 March 2023


naming physical objects, undergirding heavy physical parents’ custody situations influence the time pressure
description. This description is primarily of particular experienced by those parents:
objects, as opposed to abstract or general objects: it
In line with our expectations, sole-resident
allows for naming distinct tools, processes, data, and
mothers experience more time pressure than
so forth. Of all of the noun phrases in the sciences
mothers with shared residence in Models 1
corpus that start with the, 45.1% of the words after
and 2 (Table 2), although the difference is
the determiner in these noun phrases are either aca-
slightly smaller when conflict is controlled for.
demic terms or descriptive words (as opposed to
When we alternate the reference category, we
35.4% in the humanities—humanities the-noun
observe that sole-resident mothers also experi-
phrases have much higher occurrences of character
ence more time pressure than nonresident
references, narrative terms and public entities). For
mothers in Models 1 and 2 (see the notes to
example, in a physics paper on the photodissociation
Table 2) (van der Heijden et al., 2016, pp.
dynamics of H2O using a VUV-induced pump-probe
476–477).
approach, the authors write the following in the intro-
duction when briefly summarizing their study: The instances of the here all refer to the study itself:
the difference observed in models, the reference cat-
The interpretation of the delay-dependent dy-
egory, and the notes to Table 2. The is notably ab-
namics of the water isotopologues is supported
sent, though, when referring to sole-resident
by a mixed quantum-classical approach to cal-
mothers, time pressure, and nonresident mothers.
culate not only the molecular trajectories in the
~ 1B1 state, but also investigate the In other words, the authors do not write, ‘In line
dissociative A
with our expectations, the sole-resident mothers
influence of different final electronic states, ac-
experience. . .’ and therefore do not localize sole-
cessible by photoionization with a single probe
resident mothers to this particular study. Neither
photon, on the observed delay-dependent yield
do they write, ‘the sole-resident mother experi-
(Baumann et al., 2017, p. 2).
ences. . .’, creating one model sole-resident mother.
The first instance refers to the interpretation they do Instead, the authors pluralize the subject, recogniz-
in their own paper, but the remaining uses all specify ing the varied and generalized group of possible
particulars: the authors are interpreting the specific sole-resident mothers represented by the statement
delay-dependent dynamics, investigating the particu- (possibly even including those outside of the de-
lar water isotopologues, studying specific molecular scription of their immediate study).
trajectories in a particular state, investigating the exact In addition to the, with and for are also frequently
influence of states on a specific yield. Each of these used in the sciences. With ultimately supports naming
uses of the implies that there is only one possible in- features of particular objects (as in a phrase like, ‘the
stance of each. In other words, there are not multiple object with the round structure’), while for allows
types of or possible delay-dependent dynamics, nor authors to theorize more abstractly about those
conceivable other A~1B1 states, nor possible other objects (as in a phrase like ‘For a value a, we can write
observed delay-dependent yields: the text itself makes the following equation’). This analysis does not spe-
it impossible. By determining each of these with the, cifically detail these usages both for the sake of space
the authors limit the description to their specific study and as it only intends to illustrate the analytic process.
by restricting the possibility for other possible Ultimately though, the function words showing up as

1166 Digital Scholarship in the Humanities, Vol. 37. No. 4, 2022


We can’t read it all

highly distinctive of the sciences point to a process of order to allow isolate them or add relevant detail that
knowledge creation itself that focuses on describing is useful for analysis. For example, in a history paper
the localized, specific physical and technological analyzing the collection and exhibition of photo-
worlds to build models and theories of them. graphs during the Second World War liberation of
The humanities appear to operate quite differently, Paris, the author writes the following about German
in that they are fundamentally interested in grappling soldiers:
with and understanding the human experience. They
But while public photography was thus restricted
might make theoretical claims about what humans do
for the French, the German soldiers who visited

Downloaded from https://academic.oup.com/dsh/article/37/4/1157/6459241 by UNAM user on 02 March 2023


and how they create meaning and why, ranging from
Paris came armed with cameras (Clark, 2016, pp.
how social groups behave under certain conditions to
840–841).
how art fits into a certain cultural period. In doing so,
humanists are necessarily dealing with the incredible The who here clarifies the specific group that Clark is
contingency and complexity that surrounds the talking about: not all German soldiers, but German
human experience. The claims that are made then soldiers who visited Paris. By using this construction,
are largely made in the service of understanding and Clark is able to name a subgroup of people and thus
interpreting human action, as opposed to prediction enable further discussion on why the German soldiers
or direct future utility. For example, a literary theorist in Paris had cameras and what they did with them.
would not necessarily offer an analysis of a piece so as Casting various people as part of larger groups sup-
to predict future pieces, but so as to better understand ports placing stories about specific people into larger
the cultural or social forces that created that piece. narrative structures (e.g. placing a story about camera-
Wrestling with the human experience as part of aca- armed soldiers in Paris into a larger Second World
demic inquiry dates back to at least the ancient Greeks, War narrative). The same construction can also be
and we find the continuation of that long line of work used to add analytical detail about a person, which
most clearly in the humanities today. allows more complex storytelling by connecting sep-
Knowledge production in the humanities is driven arate narratives together through the same character.
by interpreting and analyzing human action, which For example, in a performing arts research article
typically starts with stories about people. Humanities about the Shigang Mama Theatre Group in Taiwan
papers broadly are filled with narrative stories about and how their theatre explores their postcolonial cul-
people and what they have done in the world: many tural identities in the wake of an earthquake, the au-
papers have large sections or scattered chunks of the thor describes scenes from a particular performance:
paper that are recounting what people, characters, or
The second scene, ‘The Raining Sound of the
groups did. On average, humanities texts are made up
Temporary House,’ depicts the childhood
of at least 2.9% explicitly narrative-based language (as
memory of a Hakka woman, Yu-Chiung, who,
opposed to only 1.6% in the sciences). These stories
along with her grandmother, was evacuated
with characters and narrative language then form the
from their house after it collapsed during the
basis of analyses, providing human actions and deci-
earthquake (Hu, 2013, p. 452).
sions to explain, interpret, and complicate. This type
of work is especially supported by three words in The use of who here allows this sentence to have two
Table 1: who, to, and but. main statements of action: the second scene depicting
Who as a relative pronoun supports storytelling in the memory of a Hakka woman, and the woman being
texts. As a whole, humanities texts spend a large evacuated from her house after the earthquake. The
amount of time talking about people—on average, statements of action are intimately connected by Yu-
humanities papers have roughly three times the Chiung, who serves a vital role in both. The who serves
amount of character language as science papers as a relative pronoun that restates ‘a Hakka woman,
(4.3% versus 1.3% of the text), with most of that space Yu-Chiung’ that was established in the first main
being used to name specific character types (1.9%) or clause, in order to use that same character as the
proper names (0.9%). As a relative pronoun, who main subject in the second clause. These kinds of con-
refers back to these specific names and characters in structions support a complex storytelling because it

Digital Scholarship in the Humanities, Vol. 37. No. 4, 2022 1167


H. Ringler

allows both the movement of characters from one This analysis produced humanistic insight into
statement of action to another, and the intimate con- a large corpus that was not only feasible, but useful
necting of those two statements of action. In this case, for understanding the broad knowledge-making
it is clear that the memory being depicted in the se- processes that occur across disciplines and the
cond scene is related to the evacuation of Yu-Chiung rhetorical moves that undergird them. Moreover,
and her grandmother, primarily because of the con- though, this analysis illustrated both what it looks
necting of the two sentences through the relative pro- like to develop a defensible interpretation of a
noun: if the clauses were split into two sentences, Hu computational model of a corpus when we simply

Downloaded from https://academic.oup.com/dsh/article/37/4/1157/6459241 by UNAM user on 02 March 2023


would need to do more work to explain that Yu- could not read it all or look at all of the data, as well
Chiung’s evacuation is related to the memory, and as what kind of space there is for human close read-
not just a separate detail about her given for context. ing in that process. The movement between the
In this way, the relative pronoun is supporting more stylometry results and other techniques like
complex stories through allowing characters to move Docuscope or counting phrasal patterns demon-
through many narratives. strated a movement between close and distant
While who supports naming people in complex nar- readings. With these new kinds of additional data
ratives, to allows for an analysis of the actions of those though, the argument for an understanding of the
people (as in, ‘she acted to represent her people’). But, corpus strengthened even without being able to
account for every possible instance of who or any
then, supports the building of complexity in those
other word. This approach of focusing on under-
actions, recognizing that one action can have multiple
standings ultimately forms an argumentative strat-
meanings. For in building an argument, a writer
egy for an interpretation of computational data
explores multiple paths and alternatives to come to a
that can be used as a key knowledge-making pro-
decision space (Kaufer and Butler, 2000), and but sup-
cess for the field of digital humanities, by moving
ports that type of exploration. The details of these two
the focus of the analysis away from being compre-
words are again elided here, but the function words hensive in accounting for data and towards
here point to a process of knowledge-making in the answering humanistic questions with a sufficient
humanities as one that begins with narrative and results amount of evidence for a strong argument.
in complex, nuanced analysis of what characters do. The human reader also functions crucially in this
process. Focusing on the results of any computational
model, but perhaps stylometry in particular with its
4 Reflections and Conclusions focus on function words, certainly makes the text
‘strange’. A close reader would be incredibly unlikely
In this analysis, it became infeasible to validate that to focus on the occurrences of who or the in academic
every (or even most) instance of a function word could writing without this model (indeed, other analyses of
be explained in the same way given both the size of the academic writing have tended to focus on more notice-
corpus and method of data analysis. While one analyst able features like novelty moves, modality in claims, or
could hand-code through several texts and find pat- citations, e.g. MacDonald, 1994; Hyland and Bondi,
terns, there is no assurance that those patterns neces- 2006), much as keeping a distant read without a close
sarily hold up as you move across disciplines or look at the texts might have been unlikely to realize that
authors, or even that the texts being hand-coded which who forms such a crucial part of storytelling in these
are average in frequency are necessarily average in how genres. By moving the focus to an understanding of this
they use the words. Rather, the individual texts being corpus though, the analysis had space for a close human
analyzed for function word usage suggest certain reader to move beyond the computational data and
understandings of the texts—here, for example, who make connections to outside theories and contexts to
suggested a focus on characters and narrative elements. talk about the purpose of these function words in the
This understanding, rather than particular analysis of overall argument of the texts. The close reading can
who, was then validated with other analytical techni- complement the distant in the reader’s ability to draw
ques that suggested a similar understanding. connections and understand how a pattern might fit

1168 Digital Scholarship in the Humanities, Vol. 37. No. 4, 2022


We can’t read it all

within the overall argument of the text, while the dis- up distant reads to allow more clearly for close reading
tant reading can complement the close by drawing at- that can lead towards stronger humanistic insight of
tention to patterns to be noticed at scale and giving corpora.
additional evidence for possible understandings.
Refocusing hermeneutics for computational
text analysis with a goal of gathering evidence for
an understanding is a strategy to be developed
Notes
more fully for different methods. Stylometry models 1. Computer science was the only discipline gathered
from conference papers as opposed to journal

Downloaded from https://academic.oup.com/dsh/article/37/4/1157/6459241 by UNAM user on 02 March 2023


in particular have been historically difficult to ex-
articles. This is because computer science, more
plain, because function words feature so prominently
than other disciplines, tends to value conference
in them: Mosteller and Wallace (1964, p. 17) talk
papers over journal articles (perhaps because of their
about function words as ‘the filler words of the lan-
timeliness) and tends to publish its higher quality and
guage’ and Kestemont (2014) remarks that they are so
most novel research in them (Meyer et al., 2009;
unlikely to be noticed by writers or readers. Some
Vrettas and Sanderson, 2015).
other work (e.g. Craig, 1999) as well as this analysis
2. The thirteen disciplines chosen include the following:
demonstrates function words’ uniqueness in their
computer science, chemical engineering, physics, cell
ability to allow (or in their absence, disallow) certain
and molecular biology, statistics, education, soci-
types of language patterns and rhetorical moves. The
ology, political science, linguistics, literature and lit-
patterns evidenced by function words may not neces- erary theory, philosophy, history, and performing
sarily be the ones that stick out to a reader as most arts. The disciplines were chosen to span the range
significant in a text, but they are the patterns that of fields represented at the four digit level of 2010 CIP
make up the actual bulk of a text’s volume. By focus- codes provided by the National Center for Education
ing on these words and not necessarily explaining Statistics (2010). For example, computer science
every possible instance of how they might appear accords to CIP code 11.07, chemical engineering to
(an unwieldy task when working with grammar), CIP code 14.07 and so on. Within each discipline,
but rather considering what kinds of argumentative publication venues were chosen among those with
patterns they allow, we have a strategy for interpret- the highest h-5 values for 2013–17 as provided by
ation that allows us to make claims about what these SCImago (n.d.) and Google Scholar (n.d.) for the
analyses show and back our interpretations up with discipline, while also trying to balance choosing jour-
evidence. nals that spanned the range of scholarship produced
Moving forward, there is more work to be done on in a discipline.
expanding this hermeneutics from theory to praxis, 3. Docuscope is now freely available to researchers to
especially as it relates to other types of computational download. See: https://www.cmu.edu/dietrich/eng
methods: how much evidence do we need to justify an lish/research-and-publications/docuscope.html.
understanding? What kinds of evidence might we be 4. These percentages, and others like it throughout the
not accounting for that could change our understand- analysis, are calculated via Docuscope (Ishizaki and
ing? How do you gather different types of evidence for Kaufer, 2012). The noun phrases were isolated using
an understanding with a logistic regression model, a spaCy (Honnibal and Montani, 2017) in Python.
keyword analysis, a topic model, or any other types of
approaches? How might thinking about hermeneutics
more carefully and fully influence the development References
and use of tools in machine learning fields? These Argamon, S. (2008). Interpreting Burrows’s Delta: geomet-
are big questions, but questions whose answers can ric and probabilistic foundations. Literary and Linguistic
change the capabilities of computational humanities Computing, 23(2): 131–47.
work in contributing to the humanities and comput- Argamon, S., Dodick, J., and Chase, P. (2008). Language
ing fields alike. I am hopeful that this hermeneutical use reflects scientific methodology: a corpus-based study
theory in particular helps to focus and allow analyses of peer-reviewed journal articles. Scientometrics, 75(2):
in a way that is feasible and defensible, while opening 203–38.

Digital Scholarship in the Humanities, Vol. 37. No. 4, 2022 1169


H. Ringler

Bastian, M., Heymann, S., and Jacomy, M. (2009). Gephi: Hardie, A. (2014). Log ratio–an informal introduction.
an open source software for exploring and manipulating http://cass.lancs.ac.uk/log-ratio-an-informal-introduc
networks. International AAAI Conference on Weblogs and tion/ (accessed 19 February 2019).
Social Media, 8: 361–62. Heuser, R. and Le-Khac, L. (2011). Learning to read data:
Baumann, A., Bazzi, S., Rompotis, D., et al. (2017). bringing out the humanistic in the digital humanities.
Weak-field few-femtosecond VUV photodissociation dy- Victorian Studies, 54(1): 79–86.
namics of water isotopologues, Physical Review A, 96(1): Hockey, S. (2000). Electronic Texts in the Humanities:
1–7. Principles and Practice. Oxford: Oxford University Press.

Downloaded from https://academic.oup.com/dsh/article/37/4/1157/6459241 by UNAM user on 02 March 2023


Bazerman, C. (1992). The interpretation of disciplinary Honnibal, M. and Montani, I. (2017). spaCy 2: Natural
writing. In Brown, R. (ed.), Writing the Social Text: language understanding with Bloom embeddings, convo-
Poetics and Politics in Social Science Discourse. New
lutional neural networks and incremental parsing.
York: Routledge, pp. 31–8.
Hu, T. (2013). Hakka female identity in postcolonial Taiwan:
Bell, A. (2011). Re-constructing Babel: discourse analysis.
the Shigang Mama Theatre Group and images of Hakka
Hermeneutics, and the Interpretive Arc, Discourse Studies,
women. Asian Theatre Journal, 30(2): 445–65.
13(5): 519–68.
Hyland, K. and Bondi, M. (eds) (2006). Academic Discourse
Biber, D., Johansson, S., Leech, G., Conrad, S., and
across Disciplines. Bern: Peter Lang.
Finegan, E. (1999). Longman Grammar of Spoken and
Written English. Harlow: Pearson. Ishizaki, S. and Kaufer, D. (2012). Computer-aided rhet-
orical analysis. In McCarthy, P. M.and Boonthum-
Black, E. (1978). Rhetorical Criticism: A Study in Method.
Denecke, C. (eds), Applied Natural Language Processing:
Madison: University of Wisconsin Press.
Identification, Investigation, and Resolution. Hershey: IGI
Breiman, L. (2001). Statistical modeling: the two cultures. Global, pp. 276–96.
Statistical Science, 16(3): 199–231.
Jockers, M. L. (2013). Macroanalysis: Digital Methods and
Burrows, J. (2002). ‘Delta’: a measure of stylistic difference Literary History. Champaign: University of Illinois Press.
and a guide to likely authorship. Literary and Linguistic
Computing, 17(3): 267–87. Juola, P. (2013). How a computer program helped reveal
J.K. Rowling as author of A Cuckoo’s Calling. Scientific
Carter, M. (2007). Ways of knowing, doing, and writing in
American. https://www.scientificamerican.com/article/
the disciplines. College Composition and Communication,
how-a-computer-program-helped-show-jk-rowling-
58(3): 385–418.
write-a-cuckoos-calling/ (accessed 11 September 2020).
Clark, C. E. (2016). Capturing the moment, picturing his-
Kaufer, D. and Butler, B. (2000). Designing Interactive Worlds
tory: photographs of the liberation of Paris. American
with Words: Principles of Writing as Representational
Historical Review, 121(3): 824–60.
Composition. Mahwah: Lawrence Erlbaum Associates,
Craig, H. (1999). Authorial attribution and computational Publishers.
stylistics: if you can tell authors apart, have you learned
Kestemont, M. (2014). Function Words in Authorship
anything about them?. Literary and Linguistic Computing,
Attribution: from black magic to theory?, Proceedings of
14(1): 103–13.
the 3rd Workshop on Computational Linguistics for
Eder, M. (2017). Visualization in stylometry: cluster analysis Literature (CLFL), Gothenburg, Sweden, April 2014.
using networks. Digital Scholarship in the Humanities,
32(1): 50–64. Kuhn, T. S. (1962). The Structure of Scientific Revolutions.
Chicago: University of Chicago Press.
Eder, M., Rybicki, J., and Kestemont, M. (2016).
Stylometry with R: a package for computational text ana- MacDonald, S. P. (1994). Professional Academic Writing in
lysis. R Journal, 8(1): 107–21. the Humanities and Social Sciences. Carbondale: Southern
Illinois University Press.
Evert, S., Proisl, T., Jannidis, F., et al. (2017).
Understanding and explaining delta measures for author- Mailloux, S. (1991). Rhetorical hermeneutics revisited. Text
ship attribution. Digital Scholarship in the Humanities, and Performance Quarterly, 11(3): 233–248.
32(2): ii4–16. Meyer, B., Choppy, C., Staunstrup, J., and van Leeuwen, J.
Google Scholar. (n.d.) Top publications. https://scholar. (2009). Research evaluation for computer science.
google.com/citations?view_op¼top_venues&hl¼en Communications of the ACM, 53(4): 31–4.
(accessed 7 August 2019). Moretti, F. (2013). Distant Reading. London: Verso.

1170 Digital Scholarship in the Humanities, Vol. 37. No. 4, 2022


We can’t read it all

Mosteller, F. and Wallace, D. L. (1964). Inference and Disputed Ricoeur, P. (1991). From Text to Action: Essays
Authorship: The Federalist. Reading: Addison-Wesley. in Hermeneutics, II, Blamey, K. and Thompson, J. B. (eds
National Center for Education Statistics. (2010) and trans). Evanston: Northwestern University Press.
Classification of Instructional Programs – 2010. https:// Rockwell, G. and Sinclair, S. (2016). Hermeneutica:
nces.ed.gov/ipeds/cipcode/resources.aspx?y¼55 Computer-Assisted Interpretation in the Humanities.
(accessed 29 September 2020). Cambridge: MIT Press.
Noecker, J., Jr., Ryan, M., and Juola, P. (2013). SCImago. (n.d.). SJR – SCImago Journal & Country Rank
Psychological profiling through textual analysis. [Portal]. http://www.scimagojr.com (accessed 7 August

Downloaded from https://academic.oup.com/dsh/article/37/4/1157/6459241 by UNAM user on 02 March 2023


Literary and Linguistic Computing, 28(3): 382–87. 2019).
Piper, A. (2015). Novel devotions: conversional reading, Underwood, T. (2016). Distant reading and recent intellec-
computational modeling, and the modern novel. New tual history. In Gold, M. K. and Klein, L. F. (eds), Debates
Literary History, 46(1): 63–98. in the Digital Humanities 2016. Minneapolis: University
Popper, K. R. (1959). The Logic of Scientific Discovery. of Minnesota Press, pp. 530–33.
London: Hutchinson. van der Heijden, F., Poortman, A., and van der Lippe, T.
Ramsay, S. (2011). Reading Machines: Toward an Algorithmic (2016). Children’s postdivorce residence arrangements
Criticism. Champaign: University of Illinois Press. and parental experienced time pressure. Journal of
Richards, I. A. (1929). Practical Criticism: A Study of Literary Marriage and Family, 78: 468–81.
Judgment. New York: Harcourt. van Zundert, J. J. (2016). Screwmeneutics and hermenu-
Ricoeur, P. (1976). Interpretation Theory: Discourse and the mericals: the computationality of hermeneutics. In
Surplus of Meaning. Fort Worth: The Texas Christian Schreibman, S., Siemens, R., and Unsworth, J. (eds), A
University Press. New Companion to Digital Humanities. New York:
Ricoeur, P. (1981). Paul Ricoeur: Hermeneutics and the Wiley-Blackwell, pp. 331–47.
Human Sciences – Essays on Language, Action and Vrettas, G. and Sanderson, M. (2015). Conferences versus
Interpretation, Thompson, J. B. (ed. and trans). journals in computer science. Journal of the Association for
Cambridge: Cambridge University Press. Information Science and Technology, 66(12): 2674–84.

Digital Scholarship in the Humanities, Vol. 37. No. 4, 2022 1171

You might also like