Professional Documents
Culture Documents
Distant Reading Melantha
Distant Reading Melantha
Literary and Linguistic Computing, Vol. 28, No. 4, 2013. ß The Author 2013. Published by Oxford University Press on 582
behalf of ALLC. All rights reserved. For Permissions, please email: journals.permissions@oup.com
doi:10.1093/llc/fqt040 Advance Access published on 23 July 2013
Distant listening to Gertrude Stein’s ‘Melanctha’
1 Using Prosody Features in Indeed, Efstatios Stamatatos calls the use of topic-
independent words such as function words, the
Similarity Metric and Author ‘pure stylistic choices of the authors across different
Attribution Studies topics’ (2009, p. 540). Burrows describes author at-
tribution studies that use ‘weak discriminators’ such
uses character n-grams as features (Juola, 2004; something goes awry, we may have difficulty
Juola et al., 2006). in attributing this to problems with the col-
Most of these studies do not use syntactical fea- lection process or the specification of the
tures such as part of speech or sentence and phrase features (Weiss et al., 2005, p. 51).
structure, primarily because syntactic parsers
On the other hand, Weiss et al. maintain (like
produce many errors and therefore noise. On the
Burrows and others) that the results of text
other hand, Stamatatos cites several studies
mining procedures are easier for developers than
(Baayen et al., 1996; Gamon, 2004; Stamatatos
more quantitative data mining, because the results
et al., 2000, 2001) in which ‘results have shown
include whole words. ‘For text mining’, they write,
that this type of measure performs better than do
‘we are much closer to understanding the data, and
vocabulary richness and lexical measures’ (2009,
we all have some expertise. The document is text.
p. 542). At the same time, Baayen et al. note that
study, ‘unmasking’ refers to systematically dimin- using a supervised learning paradigm. In the super-
ishing the number of features for study to ‘gauge vised learning paradigm, the goal is to maximize
the speed with which cross-validation accuracy de- predictive accuracy. For instance, a researcher
grades as more features are removed’ to determine wants to determine if Shakespeare wrote a given
the depth of difference between texts (p. 1264). In text. This can be modeled as a two-class prediction
other words, by letting researchers slice or ‘unmask’ problem based on labeled examples. One class
results in different ways, studies are strengthened. would be all Shakespeare documents and the other
Authorship attribution studies and stylometric ana- class would be documents Shakespeare did not write
lysis in general point to the fact that it is essential from the same period and location. The perform-
when working with advanced computational simi- ance of the system would be measured by predictive
larity metrics to support the user’s ability to inter- accuracy, meaning how likely the machine learning
pret the process and the results and to ask iterative system can predict whether a new unseen text was
added a new document. By definition, prediction window size equal to 14 and uses accent, stress,
modeling is asking a closed set question: This and tone, then 14 3 is the number of features
phrase came from which of these particular texts? for each example). An example is a feature vector
In addition, our initial prosody research used super- describing the window in terms of the chosen fea-
vised learning with bias optimization to determine tures. Distance is computed as the sum of the abso-
the best system parameters, so it was computation- lute value of the differences between all features in
ally intensive. If the collection of documents chan- the feature vector.
ged, then the whole analysis would need to be run First, we randomly chose 10,000 samples for each
again. For a result, it also took days to discover the of the 150 FPN documents and for each Stein text,3
best system parameters. Besides the fact that scaling with each sample comprising the five-feature set
up was computationally expensive, this method- described previously. Next, this 10,000 random
ology did not facilitate the kind of iterative user sample from each work was compared with the sam-
similar to two texts by Stein and then interwoven which texts to look at more closely for comparison
between FPN corpus and the other Stein works. within the ProseVis environment.
The tables represent a range of documents writ-
ten by a variety of men and women from different 2.2 Using the Meandre/ProseVis
racial backgrounds. Specifically, in the FPN collec- discovery system
tion, there are 154 authors4: 49 are female, of whom
In the ProseVis webform5, the researcher is given
45 are white and 4 are former slaves; 105 are male,
the opportunity to upload a selection of texts, and
of whom 77 are white and 28 are former slaves.
control the features to use for the analysis.6 The
When we compared Three Lives with samples from
following are the parameters researchers can use
FPN, the top 10 matches (listed in Table 1) included
to control the experiment:
eight women and two slave authors. When we com-
pared samples from Three Lives with the FPN texts, Comparison Range—This is comma-
two female authors appear in the top 10 matches separated list of indices of the documents to
and five slave authors. The system picked two of the be compared. For example, the user can choose
four slave narratives written by women for this top to compare just the first document with the
list. This initial study provided an indication of remaining documents in a set by using ‘1’.
Table 1 This list includes counts of FPN samples that are higher when FPN samples are compared with Three Lives
samplesa
Author #FPN like TL
Grimball, Margaret Ann Meta Morris, 1810–1881. Journal of Meta Morris Grimball: South Carolina, 73
December 1860-February 186.
Pringle, Elizabeth Waties Allston. A Woman Rice Planter. 71
Avary, Myrta Lockett. A Virginia Girl in the Civil War, 1861-1865. 69
Battle, Laura Elizabeth Lee. Forget-me-nots of the Civil War; A Romance, Containing Reminiscences and Original 69
Letters of Two Confederate Soldiers
LeConte, Joseph. The Autobiography of Joseph LeConte. 64
Dawson, Sarah Morgan. A Confederate Girl’s Diary. 63
Veney, Bethany. The Narrative of Bethany Veney: A Slave Woman. 62
Table 2 This list includes counts of Three Lives samples when Three Lives samples are compared with FPN samples
Author and title #TL like FPN
Malone, Bartlett Yancey. The Diary of Bartlett Yancey Malone 598
Horton, George. The Poetical Works of George M. Horton: The Colored Bard of North Carolina: To Which is Prefixed 390
the Life of the Author, Written by Himself.
Ward, Dallas T. The Last Flag of Truce. 296
Patton, James. Biography of James Patton. 249
McLeary, A. C. Humorous Incidents of the Civil War. 220
A Georgia Negro Peon. The New Slavery in the South–An Autobiography. 215
Jones, Thomas H. The Experience of Rev. Thomas H. Jones, Who Was a Slave for Forty-Three Years. Written by a 206
Friend, as Related to Him by Brother Jones
Horton, George. The Life of George M. Horton. The Colored Bard of North Carolina. 180
Mitchel Cora. Reminiscences of the Civil War. 179
Roper, Moses. A Narrative of the Adventures and Escape of Moses Roper, from American Slavery. 157
Other stein works
Stein, Three Lives 849
Stein, Four Saints 719
Stein, ‘Matisse’ 395
Stein ‘Picasso’ 371
Stein, ‘Miss Furr and Miss Skeene’ 305
Stein, Making of Americans 178
Using ‘1, 3, 7’ means that the first, third, and Window Size in Sounds—This is the number
seventh documents will be compared against of phonemes to be considered a phrase for
each other and all of the other documents. analysis. Because we are working on prosodic
Using ‘all’ means that all documents will be patterns that are affected by phrasal patterns
compared with each other. (Clement et al., 2013), it makes sense for this
value to represent the average number of used in image processing and statistics to ‘blur’
sounds in a phrase. If texts use shorter phrases, out more detailed features and emphasize the
then a smaller window serves as a better repre- larger scale features. In ProseVis, smoothing is
sentation of the average phrase size for a used to find longer patterns by averaging the simi-
given text. If texts have longer phrases, then a larity values over a neighborhood. Using the data
larger window might yield more productive produced through Meandre to compute document
results. similarity based on prosody features, ProseVis allows
Sound Features to Use—This refers to the at- researchers to explore these results mapped back to
tributes of the sounds, which are determined the original text with colors. By default, Meandre
from the features extracted by OpenMary, a returns a collection of raw similarity values on a
pre-processing module in Meandre. As per-syllable-per-document basis that is often too
described in Clement et al. (2013), this small to display without some form of normaliza-
turned soldier and sergeant from North Carolina. sounds with similarity analysis using the features
The blue blocks indicate parts of ‘Melanctha’ that discussed earlier—part of speech, accent, tone,
the system has determined sound most similar to stress, and break index. Each panel shows a differ-
The Poetical Works of George M. Horton: The ent weighting power, from left to right: these are
Colored Bard of North Carolina: To Which is 16, 32, and 64. Although the blocks of colors (pri-
Prefixed the Life of the Author, written by himself marily blue and green) are the same in all three
(1845). The colors range in intensity based on the panels, the left panel shows larger blocks of color
value of the similarity value. Figure 3 shows the than the panel on the far right, where the colors
tool panel where a user can see which colors corres- are more varied. This might indicate that to exam-
pond to the texts in this similarity study and the ine the texts in this sample for longer textual pat-
check boxes that allow a user to deselect a text terns (i.e., multi-phrasal blocks, sentences, or
and remove it from the comparison. For in- paragraphs) that make sense to readers, the lower
stance, if the Horton document (labeled here as weighting power will yield more productive
‘hortonpoem’) were deselected, the blocks that are visualizations.
blue would change to reflect the color of the text Figure 5 is also a comparison of three versions of
with the next highest similarity metric. Examples of results on Three Lives. In this view, the researcher
this ‘unmasking’ are included in the third section has chosen to differentiate which features to choose.
of this article. Each panel includes results produced using the
First, we tested various parameters such as 14-sound window and 16 for a weighting power.
weighting. Figure 4 below shows three panels, The difference here is that the first panel includes
each showing ‘Melanctha’ from Three Lives. These all the features used previously, the second includes
results are based on using a phrase window of 14 all but break index, and the third contains all but
Fig. 4 Three ProseVis panels, each with an excerpt from ‘Melanctha’ showing based on a 14-sound window and
different weighting powers from left to right: 16, 32, and 64
blood lines are often blurred and cultural traditions work with thematic and narrative elements to create
merged’ (p. 144). Specifically, Peterson argues, Stein an interplay between sounds and syntax that gestures
captures this perspective by appropriating African toward a story of shared cultures.
American musical traditions—coon songs, early Continuing with the vein of inquiry suggested by
folkblues, and ragtime music—with her prosody Peterson and Smedman, we compare Gertrude
that historically have been ‘inextricably bound’ to a Stein’s ‘Melanctha’ with 150 FPN of the American
variety of American ethnicities and cultural back- South collection to interrogate how the system meas-
grounds. In addition, Peterson and Smedman point ures the extent to which Stein’s ‘Melanctha’ sounds
to Stein’s ‘double identity as a Jew and a lesbian’ like or contains prosodic elements similar to those
(Peterson, 1996, p. 155) as a thematic element in found in these narratives. Self-described, the FPN ‘is
the text that is also inscribed in its racial discourse, a collection of diaries, autobiographies, memoirs,
specifically in its work to investigate racialized signi- travel accounts, and ex-slave narratives written by
fiers (Smedman, 1995, p. 570). ‘Since Stein’s linguis- Southerners. The majority of materials in this collec-
tic tampering involved an erotics and experience tion are written by those Southerners whose voices
outside of the normative heterosexual boundaries’, were less prominent in their time, including African
Smedman writes, ‘it is not surprising that she makes Americans, women, enlisted men, laborers, and
the link between ‘‘improper’’ racialized language and Native Americans’.8 Even though ‘Melanctha’ is writ-
‘‘taboo’’ sexuality so often in these texts’ (p. 571). In ten from the third-person perspective, it is written in
other words, these critics are arguing that Stein uses the free indirect style. In the free indirect style, a
stylistic features—specifically prosodic elements—to character’s way of speaking, either out loud or in
his or her thoughts, dictates the style of narration, (p. 86), are facing the denouement of their relation-
making the narrative much like a first-person narra- ship, the building of which has formed the central
tive. Using ProseVis to distant-listen to ‘Melanctha’ narrative of the story. After this point, their rela-
by comparing its prosodic elements with those in the tionship begins to unravel. The break up has been
FPN documents allows for new readings of the text’s foreshadowed: from the beginning of the story, Jeff
portrayal of identity construction as it corresponds Campbell ‘did not like Melanctha’s ways,’ and ‘he
to the sound of the text. did not think that she would ever come to any good’
(Stein, 2004, p. 77). Melanctha’s ‘way’ through the
3.1 Discussion: Unmasking the sound of text is to ‘wander’ both sexually and intellectually,
what the narrator calls ‘wandering after wisdom’. At
identity construction in ‘Melanctha’
the point of the text shown in Fig. 7, Jeff has heard
This study’s driving question is not to ask the ques- more rumors about Melanctha’s past from
wandering ways. The following is an example of a what was the right way for me, to live
green-colored section: regular . . . (Stein, 2004, p. 108)
She began to tell everything she ever knew Further, with a wider view of the entire story as visua-
about you. She didn’t know how well now lized in Fig. 2, the researcher can see that the begin-
I know you. I didn’t tell her not to go on ning of the story, which corresponds to the narrative
talking. I listened while she told me everything of Melanctha’s upbringing and the maturing and so-
about you (Stein, 2004, p. 102). lidifying of her ‘wandering’ ways, is predominately
like Horton’s document, whereas the end of the text,
The blue maps to the more loose and multi-phrasal
which corresponds with Melanctha’s decline into
text that corresponds to either a description of
despondency and ultimately sickness, the ceasing of
Melanctha’s actions and thoughts or Jeff’s when he
her wandering, shows more similarity with Malone.
is feeling positive or affected by Melanctha. The fol-
At first glance at these patterns, it would seem that we
lowing is an example of a blue-colored section:
could make a simple assertion that the system found
I see that now, sometimes, the way you cer- the slave narrative (Horton’s document) to sound
tainly been teaching me, Melanctha, really, more like Melanctha’s wandering narrative, whereas
and then I love you those times, Melanctha, it found the narrative corresponding to Jeff’s way of
like a real religion, and then it comes over me thinking to sound more like that told by the soldier
all sudden, I don’t know anything real about (Malone’s document).
you Melanctha, dear one, and then it comes As discussed earlier, however, interacting with
over me sudden, perhaps I certainly am wrong the data is an important advancement in similarity
now, thinking all this way so lovely, and testing. By ‘unmasking’ (Koppel et al., 2007) or de-
not thinking now any more the old way selecting texts in ProseVis, the researcher can quickly
I always before was always thinking, about see how these similarity patterns are more complex.
For example, if the researcher starts with selecting all Dallas T. Ward’s The Last Flag of Truce, the story of
of the texts as shown in Fig. 4, the green and blue a railroad conductor or merchant (the history is
blocks are evident. unclear) who was asked to make the Confederates’
In Fig. 8, the researcher has deselected the blue truce flag of surrender.
Horton text to reveal a purple pattern that indicates To summarize, by unmasking the blue Horton
similarity with The Narrative of Bethany Veney: A document (written by a male slave), the researcher
Slave Woman (1889). This is because for the same reveals the purple Veney document (written by a
section of the text, the next highest value for these female slave) and then the pale purple Dawson
phrases corresponds to the Veney document. The document (written by a female Confederate). By
green block remains mostly unchanged. unmasking the green Malone document (written
In Fig. 9, both Horton and Veney are deselected by a male Confederate), the researcher reveals the
and a pale purple is revealed, indicating that the pink McLeary document (written by a male
Fig. 8 The Horton (blue) document comparison has been deselected to reveal the Veney (purple) document similarity
Fig. 10 The Malone document comparisons have been deselected and the McCleary similarity (pink) is revealed
the ‘temporalities of identity’ on ‘conscious display’ it is, possessing only the merit of a ‘‘plain,
(Smith, 1993, p. 160) in the FPN and the ‘Melanctha’ unvarnished tale,’’ it asks for generous consid-
documents. For instance, the blue group authors all eration and extended sale.
taught themselves to write. Only Dawson had 10
Taken at face value, this comment would seem to
months of formal schooling. The blue group docu-
indicate that Veney’s document was written in a
ments are also self-proclaimed ‘literary’ documents.
plain style, but the intent of the text is to convince
Horton, who was the first African American to pub-
the audience that ‘the biographies of saintly, endur-
lish a book in the American South, wrote poetry. His
ing spirits like that of Betty Veney will be read, and
document is primarily a book of poetry with a long
will serve to inspire the discouraged and down-trod-
personal narrative as an introduction. Veney’s nar-
den to put their trust in the almighty arm of
rative, a tract that is meant to illustrate Veney’s
Jehovah’, and it is clear that the bishop’s introduc-
Christian character to a Reconstruction-era reader-
tion is meant to quell concerns that such a tale, writ-
ship still reeling from the war, is introduced by Rev.
ten in a style to inspire empathy and religious
Bishop Mallalieu, and includes ‘Commendatory
zeal, might be untrue. Finally, the last of the blue
Notices from Rev. V. A. Cooper, Superintendent of
group documents by Dawson is introduced by a
Home for Little Wanderers, Boston, Mass., and Rev.
long introduction from her son, who describes the
Erastus Spaulding, Millbury, Mass’. The Rev. Bishop
narrative’s ‘flowing sentences’, its ‘certain uses of
Mallalieu writes:
words to which the twentieth century purist will
It is greatly to be regretted that the language take exception’, and its likeness to Victorian litera-
and personal characteristics of Bethany cannot ture as a ‘remarkable feat of style’ (p. xii). The au-
be transcribed. The little particulars that give thors’ backgrounds and the assumed audiences for
coloring and point, tone and expression, are the green group of documents are remarkably differ-
largely lost. Only the outline can be given. As ent. Two of the writers (Malone and Ward) are
soldiers. Both Malone and McLeary’s diaries are re- to elude researchers who work in attribution studies
ports of daily happenings. Malone’s is described as corresponds to the mixed borderline of racialized
‘Reported in a simple and matter-of-fact manner, and gendered identity construction to which
include notations on his diet, his regiment’s marches, Peterson and Smedman refer. Like voices and iden-
and biblical texts referred to in the sermons he hears’. tities constructed with mixed histories and mixed
Ward’s tale is also matter-of-fact and dedicated ‘To influences, texts are often the result of collaborative
the Soldiers’. It is introduced with letters from a authoring. The lack of studies that consider this
businessman and a judge who attest to its veracity. aspect of texts in authorship attribution has been
The similarities that tie the ‘blue group’ documents described as ‘a pitfall’ common to attribution stu-
(Horton, Veney, and Dawson) together and the dies (Eder, 2012). Eder contends that the future of
‘green group’ documents (Malone, McLeary, such study rests in using ‘stylometric techniques to
Ward) together to the same spots in ‘Melanctha’ trace stylistic imitations or unconscious inspirations
that close listening remind us that ‘individual im- Clement, T., Tcheng, D., Auvil, L., Capitanu, B., and
pulses need substantiality before unifying them can Monroe, M. (2013). Sounding for Meaning: Using
generate much dynamism’ and ‘the near language- Theories of Knowledge Representation to Analyze Aural
like qualities of the musics of writing . . . gives them Patterns in Texts. Digital Humanities Quarterly 7.1.
an outwardly blinking and scanning and surfing Dawson, S. M. (1913). A Confederate Girl’s Diary.
involvement with a body politic or political economy Cambridge, MA: The Riverside Press. Documenting
of sense’ (Bernstein, 1998, p. 83). Likewise, tools that the American South. http://docsouth.unc.edu/fpn/
dawson/menu.html (accessed 11 November 2012).
provide for readerly interactions such as the kind of
distant (and close) listening (and reading) we have Diederich, J., Kindermann, J., Leopold, E., and
outlined here can advance the sensitivity of systems Paass, G. (2000). Authorship attribution with support
vector machines. Applied Intelligence, 19(1–2):
that use algorithms such as similarity metrics, but
109–23.
more importantly, we advance researchers’ under-
Juola, P. (2004). Ad-hoc authorship attribution competi- Morristown, NJ: Association for Computational
tion. Proceedings of the Joint Conference of the Linguistics, pp. 482–491.
Association for Computers and the Humanities and the Saldı́var-Hull, S. (1989). ‘Wrestling Your Ally: Stein,
Association for Literary and Linguistic Computing. Racism, and Feminist Critical Practice’. In Lynn, M. B.
Goteborg, Sweden, pp. 175–6. and Ingram, A. (eds), Women’s Writing in Exile. Chapel
Juola, P., Sofko, J., and Brennan, P. (2006). A prototype Hill, NC: University of North Carolina, pp. 181–98.
for authorship attribution studies. Literary Linguistic Smedman, L. (1995). ‘‘Cousin to Cooning’’: Relation,
Computing, 21(2): 169–78. Difference, and Racialized Language in Stein’s
Koppel, M., Schler, J., and Bonchek-Dokow, E. (2007). Nonrepresentational Texts. MFS Modern Fiction
Measuring differentiability: unmasking pseudonymous Studies, 42: 569–588.
authors. Journal of Machine Learning Research, 8: Smith, S. (1993). Subjectivity, Identity, and the Body:
1261–76. Women’s Autobiographical Practices in the Twentieth
Weiss, S. M., Indurkhya, N., Zhang, T., and boundary, and a paragraph-final boundary, is particu-
Damerau, F. (2005). Text Mining: Predictive Methods larly important because phrasal boundaries determine
for Analyzing Unstructured Information. New York: the rise and fall or emphases of particular words based
Springer. on their context within the phrase.
Yu, B. (2008). An evaluation of text classification methods 3 The texts by Gertrude Stein include Four Saints in
for literary study. Literary and Linguistic Computing, 23: Three Acts, ‘Matisse’, The Making of Americans, ‘Miss
327–43. Furr and Miss Skeene’, ‘Picasso’, Three Lives, and
Zhang, D. and Lee, W. S. (2006). Extracting key-sub- Tender Buttons. All of these texts are freely available
string-group features for text classification. In online from Project Gutenberg. The Making of
Proceedings of the 12th Annual SIGKDD International Americans edition was published by Dalkey Archive
Conference on Knowledge Discovery and Data Mining. Press (1995).
New York: ACM Press, pp. 474–83. 4 Some of the FPN documents have multiple authors.
We do not include illustrators in this count.