You are on page 1of 12

Vocabulary decay in category

romance

Downloaded from https://academic.oup.com/dsh/article/31/2/321/2463027 by Shanghai International Studies University user on 22 November 2022
............................................................................................................................................................
Jack Elliott
Centre for Literary and Linquistic Computing, University
of Newcastle, Australia
.......................................................................................................................................
Abstract
Writers of a best-selling category romance imprint share a common tendency to
decrease their deployment of unique words over the span of their novels—a
phenomenon of ‘vocabulary decay’. This tendency cannot be found in the
Correspondence: novels of Jane Austen, suggesting this drop is not intrinsic to the romance
Jack Elliott, Centre for genre itself, and is unlikely to have any true narrative purpose. A study of
Literary and Linquistic Charles Dickens shows that vocabulary decay extends beyond the romance
Computing, University of genre. Closer examination reveals vocabulary decay is a result of progressive
Newcastle, Australia.
E-mail: 結論
amounts of linguistic chunking—due to author fatigue or a desire to produce
jack.elliott@uon.edu.au. a more readable narrative.
.................................................................................................................................................................................

1 Introduction ‘international settings from your every fantasy’


(Harlequin, 2012b). This is romance in an almost
Romance is the most popular genre in the world, archetypal sense with an emphasis on the social class
accounting for 14.3% of all fiction sold in 2011 of the hero: ‘There’s nothing in the world his
(Romance Writers of America, 2012). The most ef- powerful authority and money can’t buy’
ficient definition of a romance is given by Pamela (Harlequin, 2012b). Although the authorship is
Regis: ‘A romance novel is a work of prose fiction relatively diverse—writers range from young to
that tells the story of the courtship and betrothal of old, New Zealand to Canada—all must conform
one or more heroines’ (Regis, 2003, p. 27).1 to the expectations of their readership and submit
If romance is the most popular genre in the to the demands of the publisher.
world, the most popular publisher of romance is Although criticism of the romance genre has
Harlequin, which sells one book every five seconds, traditionally focused on debates locating the genre
into 94 markets worldwide, in 25 different languages within a feminist context (Makinen, 2001), in recent
(Harlequin, 2012a). Harlequin specializes in the years it has moved to evaluating the novels as works
publication of branded romantic fiction, so-called of art in their own right (Selinger, 2007, p. 310).
category romances, and none of its categories is Statistical analysis of romance novels has shown
more popular, or more prolific, than Harlequin that the genre is well marked and stable (Dillon,
Presents. No other romance category from any pub- 2007, p. 181). It has also shown that different cate-
lisher releases as many titles every month and only gories employ distinctive rhetorical stances in their
one (Harlequin Romance) has a longer history. language (Opas-Hänninen and Tweedie, 1999,
Harlequin Presents novels are shorter-form fic- pp. 97–8). The large quantity, well-defined nature,
tion of roughly 50,000 words each that promises popularity, and international reach are the primary
‘captivating internal emotional conflicts’ in reasons for studying Harlequin Presents.

Digital Scholarship in the Humanities, Vol. 31, No. 2, 2016. ß The Author 2014. Published by Oxford University 321
Press on behalf of EADH. All rights reserved. For Permissions, please email: journals.permissions@oup.com
doi:10.1093/llc/fqu069 Advance Access published on 8 December 2014
J. Elliott

Studies of the Harlequin bibliographic data deserts of the South-Western United States in the
reveal twin impulses of diversity and conformity. first chapter:
On the one hand, novels are written around the

Downloaded from https://academic.oup.com/dsh/article/31/2/321/2463027 by Shanghai International Studies University user on 22 November 2022
Alcantar was thousands of miles away and its
globe by thousands of women2 who differ by gen-
endless miles of gently undulating sand ended
erational cohort, national origin, and level of formal
at the turquoise waters of the Persian Sea.
education. On the other, publication is controlled
(Marton, 2012)
by a single editorial group in one country. Studying
category romance begs the question as to which Deserts are mentioned throughout the novel, but
must be accorded greater significance—the mono- the special vocabulary is reduced until, in the final
lithic nature of the publisher and the strict rules of chapter, they serve only as window dressing (only
the category or the diversity of the authors and the the words ‘overhead’ and ‘fan’ are unique to this
readership. part of the novel):
As a typical Harlequin Presents novel progresses, She woke abruptly, alone in a strange room, with
the working vocabulary contracts. This phenom- a ceiling fan turning high overhead, rain pounding
enon, a sort of ‘vocabulary decay’, is driven by against the arched windows.
either the rapid speed of composition or the popular
Rain in the desert.
nature of the genre. Tight economic conditions
It seemed appropriate.
imposed by Harlequin place a premium on the She sat up and pushed her hair from her face
rapid completion of a novel. In this model, writers (Marton, 2012)
make their language less and less unique as they
hurry through their novel, jettisoning vocabulary The effect of vocabulary decay is observable,
variation as they go. Vocabulary decay may also though at a weaker scale, in the work of Charles
be a deliberate strategy to maintain a rapid Dickens, who was also known to write quickly for
narrative pace. Words that obscure understanding, a wide audience. It is not observable in the work of
or are potentially difficult are metered down by Jane Austen, who wrote at a more leisurely pace for
the author as they seek to keep their readership a more limited readership. The fact that Dickens
engaged. displays this effect while Austen does not suggests
The most extreme example of vocabulary decay there is nothing intrinsic to the romance novel
can be found in Sheikh Without A Heart, by Sandra about vocabulary decay.
Marton.3 In this case, the first section of the novel is
consumed with extensive descriptions of surround-
ings and characters, whereas later segments are given 2 Methodology
over to long stretches of repetitive dialogue. Here, for
example, is the second line of the novel, a reflection As data 181 texts from the Harlequin Presents back
on the desert at night from the hero, Sheikh Karim al catalogue were downloaded from the Harlequin store
Safir, as he gazes out the window of his jet: (these were all originally published between 2000
and 2012). Principal components analysis revealed
Black silk sky. Stars as brilliant as bonfires. An
that the three novellas in the sample were very
ivory moon that cast a milky glow over the
different from the others in the language they
endless sea of sand (Marton, 2012).
used. The novellas were then set aside, yielding 178
And here is the second last line of Sheikh Without novels.
A Heart, the response from the heroine to the pro- Some novels used US spelling, whereas others
posal of the hero: used British spelling. All spelling was normalized
to US spelling by running the standard, open-
‘Yes’, she said ‘yes, yes, yes-’ (Marton, 2012)
source Aspell application twice on the novels; once
The hero of Sheikh Without A Heart is the using the British English dictionary, and again using
handsome, dutiful Karim, who broods over the the US dictionary. This provided two lists, and

322 Digital Scholarship in the Humanities, Vol. 31, No. 2, 2016


Vocabulary decay in category romance

where the lists could not agree on a spelling, a North Note that decreasing the number of segments to
American spelling was used. Novels were then 10 evenly sized bins and recalculating the entropy
stripped of their preamble (table of contents and makes the effect even stronger. Both The Millionaire

Downloaded from https://academic.oup.com/dsh/article/31/2/321/2463027 by Shanghai International Studies University user on 22 November 2022
author biography) and postscript (copyright Affair and The Costerella Contract remained nega-
notice). tively correlated.
Custom software based on the GATE system
(Cunningham et al., 2011) was then used to extract
each word in every novel and associate it with a
location within the file4. Each novel was then cut
3 Changing Nature of Vocabulary
into 100 separate segments based on its overall Table 1 summarizes the top 30 words found only in
length. These segments were populated by an even the first segment of any novel. The second column
number of words. At the end of this process, each shows how many novels use this word only in the
word for every novel was associated with a segment first segment, and not thereafter.
number, ranging from 1 (the first segment) to 100 The most obvious similarity of these words is
(the last). Each segment in every novel contained that they have to do with introductions, or are pri-
roughly 600 words. marily words associated with the billionaires, sheiks,
Each word was then associated with a string of and playboys who provide the magnetic attraction
100 numbers, representing how many times that at the heart of these novels. The second most obvi-
word appeared each segment of every novel. A ous trend in the words is their very low frequency—
word appearing only in the first segment of a these are, at most, shared between six novels, not
single novel would be represented by the number throughout the genre as a whole.
one, followed by ninety-nine zeros. A word appear- The lack of physical description in these words is
ing twice in the second segment of two novels would surprising—critics acknowledge that physical de-
be represented by the number zero in the first pos- scriptions mark out the romance genre (Dillon,
ition, four in the second, then ninety-eight zeros, 2007, p. 173)—but they are not unique to the first
and so on. segment of the novel. These words seem more intent
The Shannon Entropy of every segment was then on socially positioning the hero and heroine. Words
calculated using maximum likelihood, and the such as ‘financial’, ‘impressive’, ‘corporate’, and
means and standard deviations plotted in Fig. 1. ‘owned’ dominate the list. Few physical description
Entropy is a measure of disorder, and when applied words either have a social connotation (‘aristo-
to word counts is a measure of vocabulary ‘rich- cratic’, ‘broad-shouldered’) or are equally likely to
ness’: The higher the entropy, the wider the vocabu- be heroine words as hero words (‘dark-haired’,
lary range. Unlike a typical stylometric study, we are ‘British’).
focused here on the range of words used rather than The submissions guidelines suggest that
the nature of the words themselves. Harlequin Presents is often thought of as being a
The average entropy declines as a novel pro- genre about billionaires and oligarchs (Harlequin,
gresses, showing the vocabulary becoming less di- 2012b). The uniqueness of these particular words
verse. Although each segment has the same total modifies this understanding—the genre includes
number of words, segments have progressively millionaires and Greek shipping tycoons, but is
fewer and fewer different types of word. In other about the love story between hero and heroine.
words, working vocabulary contracts over the The first segment of the novel brings the social set-
course of these novels, with the exception of the ting into focus: The hero’s wealth and power are
final segment, which re-inflates the vocabulary central at this point in the novel—but they will
slightly. drift to the sidelines for the rest of the novel. The
This effect is very strong throughout the genre. physical attributes of the hero will recur again in the
Only two novels—The Millionaire Affair and The novel, as he works his charms on the heroine (or
Costerella Contract—do not exhibit this trend. succumbs to hers).

Digital Scholarship in the Humanities, Vol. 31, No. 2, 2016 323


324
J. Elliott

5.2

5.0

4.8

entropy (nats)

Digital Scholarship in the Humanities, Vol. 31, No. 2, 2016


4.6

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100

segment

Fig. 1 Box plot of Shannon Entropy of each segment. Curve has been fitted with Friedman’s supersmoother through the mean of each segment. X-axis are
segments from 1 (the first) to 100 (the last)

Downloaded from https://academic.oup.com/dsh/article/31/2/321/2463027 by Shanghai International Studies University user on 22 November 2022
Vocabulary decay in category romance

Table 1 Words unique to first segments level of a phrase, and that the brain has frequent
recourse to producing the same type of chunk over
Word Number of novels
and over again, in a phenomenon known as ‘syn-

Downloaded from https://academic.oup.com/dsh/article/31/2/321/2463027 by Shanghai International Studies University user on 22 November 2022
Dark-haired 6
tactic priming’. Lancashire describes syntactic prim-
Fail 6
Arrival 5 ing as ‘the tendency that uttering one syntactic
Exclusive 5 construction increases the chances that the same
Loaded 5 grammatical structure, or a related one, will be
Offices 5 used again soon’ (Lancashire, 2010, p. 83).
Surveyed 5
Aristocratic 4
N-Grams appear to be a useful proxy for this sort
Assessing 4 of syntactic chunk. Although studies of syntactic
British 4 priming in action typically restrain themselves to
Broad-shouldered 4 only a single author (Lancashire, 2010, pp. 116–93),
Corporate 4
or a handful of patterns (Gries, 2005, p. 366), no
Employees 4
Enter 4 such luxury is afforded to a study of romance nov-
Financial 4 elists as a group, as there is no consensus on what
Fools 4 linguistic tricks they may prefer. N-Grams are a way
Freedom 4 to extract sequences from the text without prejud-
Groomed 4
Heads 4 ging any findings.
Ill 4 N-Grams were extracted from the Harlequin
Impressive 4 novels used in previous sections. A sequence of
Magazine 4 three tokens taken together defined a single
Major 4
N-Gram. The N-Grams included punctuation, as
Manicured 4
Mission 4 this would seem to be relevant information.
Owned 4 N-Grams were assigned to segments in exactly the
Polish 4 same way as individual words. Taking this informa-
Revealing 4 tion, the number of unique N-Grams per segment was
Services 4
Smart 4 calculated, and averaged out over the novels. This in-
formation is plotted in Fig. 2. As the number of
unique words declines throughout the novels, so
does the number of unique sequences of words.
Although the function of the first segment serves If chunks of text are repeated throughout these
to explain why it has some unique words, it does not novels, a quick examination of the most frequent
explain the composition of the rest of the novel. chunks will show whether this is an example of syn-
tactic priming and not a more complex phenom-
3.1 Progressive linguistic chunking enon. After taking all N-Grams in every novel, those
In fact, unique words in the first segment of a appearing in less than 100 novels were discarded.
Harlequin Presents novel are crowded out by se- The remainder N-grams were then sorted by the
quences of more common words, especially those average number of segments in which they ap-
used to coordinate dialogue. Any given sequence peared. Table 2 shows the top 30 most used
of words typically has a much lower chance of ap- N-Grams across all Harlequin novels in the sample.
pearing than those words taken individually, so In the later sections of the novels, unique words
comparisons of these sequences is likely to produce are crowded out by sequences of common text.
a more readily interpretable result (Manning and Many of these are used to coordinate speech—but
Schutze, 1999, pp. 191–228). These sequences are in each case, this speech has a very strongly inter-
known as N-grams. personal quality. Dialogue now flows directly on,
Neuroscience supports the idea that language is from one character to another (row 1). The speech
produced in chunks of a certain size, beneath the is declarative in nature (row 2). Questions are being

Digital Scholarship in the Humanities, Vol. 31, No. 2, 2016 325


J. Elliott

0.74

Downloaded from https://academic.oup.com/dsh/article/31/2/321/2463027 by Shanghai International Studies University user on 22 November 2022
proportion of unique ngrams

0.72

0.70

0.68

0.66

20 40 60 80
segments

Fig. 2 Mean number of unique three-count N-Grams per segment

asked (row 4) by heroes (row 14) and heroines (row Here is the introduction of the hero (note the
12). Although romance is notable for the way its many parallels with the words from Table 1):
writers avoid using ‘he said’ and ‘she said’ (Dillon,
‘I’m Loukas Christakis’, he announced, strid-
2007, p. 183), rows 17 and 18 show that this ten-
ing towards her. Supremely confident and
dency tends to decline throughout a novel.
self-assured, he moved with surprising grace
Fragments such as ‘he was’, ‘she had’, and ‘it was’
for such a big man. He was well over six feet
also display evidence of priming. They also suggest tall, Belle estimated, and narrow-hipped, his
that writers have a greater recourse to ‘telling’ rather long legs encased in faded denims that
than ‘showing’ the action. This appears to violate an moulded his powerful thighs. Through his
aesthetic principle of romance writing to show close-fitting black tee shirt she could see the
rather than tell the action (De Geest and Goris, delineation of his abdominal muscles, and
2010, p. 97). In some cases, the author may have the shirt’s vee-shaped neckline revealed an ex-
developed a narrative velocity so fast that mere panse of bronzed skin and wiry black chest
showing is no longer enough. hair. (Shaw, 2012)
The other striking N-Gram here is the triple per-
iod—‘. . .’. This is employed when a character is ‘lost After the Greek Affair, like Sheikh Without A
for words’ or ‘trails off’ at the end of a sentence, Heart, and like our N-Gram data would suggest,
leaving their utterance unfinished. The narrative moves from more descriptive prose to long strings
voice sometimes deploys this usage, but the triple of interpersonal dialogue:
period is more frequently coordinated with speech ‘You wanted me to give up Wedding Belle,
(row 13). Perhaps this is an example of the inter- didn’t you? But why?’ she asked desperately.
ruption of syntactic flow where the writer, like his or ‘I made it clear that I would always put the
her characters, is unable to go on. babies first’. She drew a shuddering breath,
One novel with a very strong decay—After the pain and anger ripping through her. ‘I
Greek Affair, by Chantelle Shaw—starts as strongly thought you were different than John. I
descriptive, with dialogue, either using more im- thought I could trust you. But you are just
aginative inquit tags or eliding them altogether. like him. You want your own way, and you

326 Digital Scholarship in the Humanities, Vol. 31, No. 2, 2016


Vocabulary decay in category romance

Table 2 N-grams shared between segments added the repetitive nature of some of the dialogue
(often a question, frequently a declarative). It is pos-
N-gram Average Number
sible that this is a result of tiredness on the part of

Downloaded from https://academic.oup.com/dsh/article/31/2/321/2463027 by Shanghai International Studies University user on 22 November 2022
(delimited with dashes) number of of novels
segments the writer—laboratory studies of syntactic priming
1 .-’-’ 68.70 141 typically control for tiredness (Hartsuiker and Kolk,
2 .-’-i 68.39 145 1998, p. 229). Additionally, evidence from brain
3 .-.-. 56.10 176 scanning studies shows that neural activity decays
4 ? -’-’ 50.19 140 with overstimulation of any one area, and fatigue in
5 ,-’-she 50.10 140
6 ,-’-he 48.27 141
individual neurons is one possible explanation
7 .-’-you 45.30 142 (Grill-Spector et al., 2006, pp. 18–19).
8 .-’-she 44.34 141 There is another explanation for this phenom-
9 .-’-he 44.01 142 enon—writing in the neurological models is also
10 ’-’-i 37.57 140 reading (Lancashire, 2010, p. 100). Just as it is
11 .-it-was 36.03 178
12 ? -’-she 33.25 140
easier to write a sentence by reusing a syntactic
13 .-.-’ 32.11 141 unit, such a sentence is easier to read because the
14 ? -’-he 30.80 141 reader has previously been exposed to that
15 .-’-it 29.40 141 structure.
16 ’-i-’m 29.21 142
These results should be approached with caution:
17 ’-she-said 28.40 138
18 ’-he-said 28.01 140 the science is still young, and consensus is not yet
19 i-do-n’t 27.98 176 fully formed. The exact role fatigue plays in either
20 her-.-’ 26.08 140 reading or writing is yet to be authoritatively
21 .-she-was 25.69 178 determined.
22 .-’-and 25.35 140
23 .-’-the 24.10 140
24 she-did-n’t 23.87 175 3.2 Category romance writers
25 him-.-’ 22.34 140
Romance writers must write quickly because they
26 .-he-was 21.87 178
27 .-’-what 21.07 141 write so much. For Harlequin Presents, the most
28 .-she-had 20.55 178 important ‘super-authors’ are responsible for most
29 .-she-’d 20.27 161 of Harlequin’s line-up (Elliott, 2014). Some writers
30 ’-’-you 19.94 140 do not even stay in the same imprint, but publish
two or three novels annually in each imprint to
which they are committed (Krentz, 2011).
don’t care who you hurt as long as you are in
This tendency is exaggerated by the economics of
control’. (Shaw, 2012)
the situation, and writers of category romances see
Vocabulary decay stretches beyond the increasing themselves as businessmen and businesswomen
use of dialogue. Here is a paragraph from the final (Krentz, 1992, p. 3). Novels must be written for a
section of After the Greek Affair that uses only words particular category from very early in the writing
found elsewhere in the novel—note the high density process (De Geest and Goris, 2010, p. 100), and
of function words: such a novel will have limited use if it is rejected
by the targeted publisher.5 The terms of the con-
He reached for her hand, and after a second
tracts are typically one-sided, and the authors are
she slipped her fingers in his and allowed him
not always well paid for each novel (Flesch, 2004,
to lead her down to the shore, where lazy
p. 63). From this perspective, it makes sense for a
waves rippled onto the sand. (Shaw, 2012)
romance writer to minimize the investment of time
As a typical Harlequin Presents novel progresses, in a product that may be rejected, or for which they
its language grows less unique, there is more dia- may be paid very little.
logue, and the number of stock phrases ‘he was’, If the writers write quickly to make money and
‘she had’ increases. To these stock phrases can be minimize their risk, they also write quickly to satisfy

Digital Scholarship in the Humanities, Vol. 31, No. 2, 2016 327


J. Elliott

a market. This market is insatiable—a global phe- constraint of the romance genre, it should arise in
nomenon with over 127 different markets that con- Austen’s writing.
sumes the entire Harlequin line of thousands of Charles Dickens, on the other hand, was a pro-

Downloaded from https://academic.oup.com/dsh/article/31/2/321/2463027 by Shanghai International Studies University user on 22 November 2022
novels, plus all single-title romantic fiction in add- fessional writer operating under enormous time
ition to blockbuster breakouts such as Twilight and pressures. His works were long narratives carved
Fifty Shades of Grey (Elliott, 2014). The form of ro- out in a serial form. Each had to be readied for
mance found in the categories is designed to be the printer under considerable stress, and there is
enjoyed quickly, suggesting that vocabulary decay evidence that he struggled with the deadlines when
can be viewed as a successful narrative strategy. working (Tomalin, 2011, p. 103). If vocabulary
decay is a product of either the conscious choices
of catering to a mass-market audience, or the result
3.3 Vocabulary decay in context of writer fatigue, vocabulary decay should be a fea-
There are two alternative explanations for vocabu- ture of Dickens’ writing.
lary decay. It is either an artefact of the rapid writing
of the texts or it is a product of the genre itself. 3.4 Jane Austen
These possibilities can be narrowed by studying
Although Austen is not a contemporary author, her
the works of Charles Dickens and Jane Austen.
novels meet the definition of romance given by
Dickens produced his serial novels mostly outside
Regis. Certainly, the broad outlines of the novels
the romance genre, but wrote them quickly under
are similar; the stories are about heroines, their
extreme deadline pressures. Austen wrote her novels
struggles in love, and their final triumph in happi-
at a leisurely pace, but her novels would fall squarely
ness. Possibly the best reason for not placing these
into the romance genre.
classic novels within the romance genre is that the
Comparing contemporary literature to classic
classics are taken seriously, whereas romance novels
novels has several advantages; the novels themselves
are not (Modleski, 1982, p. 1).
are familiar, the author’s lives are reasonably well The inclusion of Austen as an author of ro-
known and the works are unburdened by copyright. mances is not a novel phenomenon—in fact, some
They provide a control for the rather less well- of the first studies of the category romance did so
known Harlequin novels. Indeed, Austen is import- explicitly (Modleski, 1982, p. 37), and it is now pre-
ant for being one of the few romance writers we can sented relatively uncritically (Pearce, 2004, p. 534).
study directly using quantitative methods; some, Indeed, one of the most popular current subgenres
such as Georgette Heyer, are protected by copyright, of the category romance are historical romances set
and others, such as Samuel Richardson, wrote in an in Regency England that parallel Austen’s own
epistolary style. Dickens is a good control for a work. What none contest is that Austen’s pace of
writer under pressure because his life is so well writing was far from hasty and the composition of
documented; records exist of not only original her novels is masterly. If anywhere in the English
manuscripts, printer’s receipts, and contracts but language uses precise control, it is here.
even working notes (Butt and Tillotson, 1957). To test this assumption, the texts of all six com-
Jane Austen’s novels all sit easily inside the ro- pleted Austen novels were taken from the Project
mance genre and are direct progenitors of Gutenberg website. Their headers and footers con-
Harlequin Presents (Regis, 2003, pp. 75–84). Her taining copyright and catalogue information were
working habits were unusual for many reasons, stripped, and the words were allocated into seg-
but for these purposes, the most important is her ments using exactly the same method described
low rate of publication. Years would pass between a for the Harlequins. The entropy of each segment
novel’s first draft and publication. This time allowed was then calculated, in exactly the same way as the
extensive revision between drafts, although the exact Harlequin novels.
methods remain ‘mostly guesswork’ (Tomalin, Fig. 3 shows the entropy of each segment across
1997, p. 154). If vocabulary decay is an intrinsic all Jane Austen’s novels. There is no real pattern to

328 Digital Scholarship in the Humanities, Vol. 31, No. 2, 2016


Vocabulary decay in category romance

5.90

5.85

Downloaded from https://academic.oup.com/dsh/article/31/2/321/2463027 by Shanghai International Studies University user on 22 November 2022
5.80
entropy (nats)

5.75

5.70

5.65

5.60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
segment

Fig. 3 Sum of the Shannon Entropy of each segment across Austen’s novels. The data are not normally distributed, so
means and deviations are not shown

this distribution. Unlike the Harlequin novels ana- fiction on a tight schedule. His novels frequently
lyzed above, Austen’s vary in length—from 76,917 lack happy endings—a requirement for a romance
in Northanger Abbey to 159,533 in Mansfield Park; a novel (Regis, 2003, p. 9)—and the narrative focus is
variation of roughly 50%. never mainly concerned with the betrothal of a
Intriguingly, it is the middle section as well as the heroine. Much of the drama in Dickens comes
conclusion that has the greatest chance of contain- from not knowing what happens next, whereas the
ing unique words. There is no clear progression in pleasure of romance is typically thought of as an-
the enlargement or diminution of the Austen ticipation of the inevitable happy ending (Barlow
novel’s vocabulary. and Krentz, 1992, pp. 16–17).
If the phenomenon of vocabulary decay was in- Once again, novels were downloaded from the
trinsic to the romance novel, it should be here, in Project Gutenberg website, segmented and the en-
Austen’s writing. Rather, it seems, some property of tropy of each segment calculated across all novels.
Austen’s writing allowed her to avoid the symptoms. The results can be seen in Fig. 4.
In Austen, just as in contemporary romance, the Unlike Austen, the progression of the writing in
hero and heroine must be introduced and Dickens shows a pattern: With the exception of the
sketched effectively. In both Austen and contempor- final segment, Dicken’s vocabulary trends slightly
ary romance, the hero and heroine must meet, fall downwards throughout his novels. Two explan-
in love, not be together, and then, finally, live hap- ations of the anomalous final segment present
pily ever after. The narrative structure of Austen’s themselves—pre-planning and post-editing (Butt
novels is the same as a contemporary romance, but and Tillotson, 1957, pp. 13–34). While Dickens fre-
they lack the progressive decline in unique quently had no plan as to what would happen from
vocabulary. installment to installment, he always plotted the last
chapter well in advance. Similarly, Dickens would
3.5 Charles Dickens often revisit the final chapter when republishing a
If Austen wrote within the romance genre, but serial as a novel. In either case, the anomalous final
wrote it at a steady, measured pace, Dickens wrote segment is unable to arrest the general trend of
well outside it, but produced a startling amount of Dickens’ vocabulary decay.

Digital Scholarship in the Humanities, Vol. 31, No. 2, 2016 329


J. Elliott

6.30

6.25

Downloaded from https://academic.oup.com/dsh/article/31/2/321/2463027 by Shanghai International Studies University user on 22 November 2022
6.20
entropy (nats)

6.15

6.10

6.05

6.00

5.95
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
segment

Fig. 4 Sum of the Shannon Entropy across the novels of Charles Dickens. Dotted line is fitted with Friedman’s
supersmoother. Data are not normally distributed, so means and deviations are not shown

4 Conclusion Charles Dickens, another writer under pressure,


also shows signs of vocabulary decay in his serial
The progression of a Harlequin novel is marked by novels, suggesting that the phenomenon could be
the steady shedding of unique words and recourse due to the speed of composition. Austen’s novels
to repeated chunks of text, particularly interpersonal do not exhibit the pattern, making it clear that
dialogue. While this lightens the cognitive load on this is not an intrinsic demand of the romance
the writer, allowing them to quickly write a novel novel more generally. Vocabulary is one of the
with only limited sale value outside a single pub- most fundamental aspects of writing style. This
lisher, it also allows a reader to relax into the flow of study demonstrates that vocabulary can not only
the narrative as the linguistic demands are progres- vary throughout a novel, but does so in a predict-
sively reduced. Harlequin Presents may present an able fashion across an entire genre. Machine analysis
extreme variation, but it is not a trivial one; this is usually relies on word counts to discover variation
one of the most popular forms of fiction in the between texts. When constructing these word
world. counts, care should be taken to account for the vari-
Vocabulary decay in the Harlequin Presents ation within texts, particularly when comparing
novel highlights a shift from detailed specifics to- textual fragments. More strikingly, variation in vo-
wards generic romance as the novel progresses. cabulary could be used to develop new algorithms
Words found only in the first segment (Table 1) that not only compensate for vocabulary decay, but
also quantify it as another element of style.
relate particularly to the high status of the hero,
allegedly the selling point of the Harlequin
Presents (Harlequin, 2012b), showing that this
element decreases in importance throughout the Reference
novels. Common N-Grams used to coordinate dia- Barlow, L. and Krentz, A. J. (1992). Beneath the surface:
logue between a masculine hero and feminine hero- the hidden codes of romance. In Krentz, A. J. (ed.),
ine take the place of the aristocratically inflected Dangerous Men and Adventurous Women.
unique words (Table 2). Philadelphia, PA: University of Pennsylvania Press.

330 Digital Scholarship in the Humanities, Vol. 31, No. 2, 2016


Vocabulary decay in category romance

Butt, J. and Tillotson, K. (1957). Dickens at Work. Marton, S. (2012). Sheikh Without A Heart. Toronto:
London: Methuen. Harlequin.
Cunningham, H., Maynard, D., Bontcheva, K. et al. Modleski, T. (1982). Loving with a Vengeance: Mass-

Downloaded from https://academic.oup.com/dsh/article/31/2/321/2463027 by Shanghai International Studies University user on 22 November 2022
(2011). Text Processing with GATE (Version 6). Produced Fantasies for Women. Oxford: Clarendon Press.
Sheffield: University of Sheffield Department of Opas-Hänninen, L. L. and Tweedie, F. (1999). The magic
Computer Science. carpet ride: reader involvement in romantic fiction.
Dillon, G. L. (2007). The genres speak: using large cor- Literary and Linguistic Computing, 14(1): 89–101.
pora to profile generic registers. Journal of Literary Pearce, L. Popular romance and its readers. In
Semantics, 36(2): 159–87. Saunders, C. (ed.), A Companion to Romance from
Elliott, J. (2014). Patterns and trends in harlequin cat- Classical to Contemporary. London: Blackwell
egory romances. In Arthur, P. and Bode, K. (eds), Publishing, pp. 521–39.
Advancing Digital Humanities. London: Palgrave. Rabine, L. (1985). Reading the Romantic Heroine. Ann
Flesch, J. (2004). From Australia with Love: A History of Arbor: University of Michigan Press.
Modern Australian Popular Romance Novels. Fremantle, Regis, P. (2003). A Natural History of the Romance Novel.
Australia: Fremantle Arts Centre Press. Philadelphia, PA: University of Pennsylvania Press.
Gries, S. (2005). Syntactic priming: a corpus-based ap- Romance Writers of America. (2012). Romance Market
proach. Journal of Psycholinguistic Research, 34(4): Share Compared to Other Genres. http://www.rwa.org/
365–99. p/cm/ld/fid¼580 (accessed 3 October 2012).
Grill-Spector, K., Henson, R., and Martin, A. (2006). Selinger, E. M. (2007). Rereading the Romance.
Repetition and the brain: neurological models of Contemporary Literature, 48(2): 307–24.
specific effects. Trends in Cognitive Sciences, 10(1):
Shaw, C. (2012). After the Greek Affair. Toronto:
14–23.
Harlequin.
De Geest, D. and Goris, A. (2010). Constrained writing,
Tomalin, C. (1997). Charles Dickens: A Life. London:
creative writing: the case of handbooks for writing ro-
Penguin.
mances. Poetics Today, 31(1): 81–106.
Tomalin, C. (2011). Jane Austen: A Life. London: Penguin.
Harlequin. (2012a). Harlequin Company History. http://
www.fundinguniverse.com/company-histories/ Vivanco, L. (2011). For Love and Money: The Literary Art
Harlequin-Enterprises-Limited-Company-History.html of the Harlequin Mills and Boon Romance. Penrith:
(accessed 3 October 2012). Humanities-Ebooks.
Harlequin. (2012b). Harlequin Presents Website.
http://www.harlequin.com/articlepage.html?articleId¼
547&chapter¼0 (accessed 3 October 2012). Notes
Hartsuiker, R. and Kolk, H. (1998). Syntactic facilitation 1 Other, more exotic, definitions have been advanced in
in agrammatic sentence production. Brain and the past (Makinen 2001), but no longer seem current.
Language, 62(2): 221–54. Category romances are published as part of a category;
Krentz, A. J. (1992). Introduction. In Krentz, A.J. (ed.), each category has a unique brand identity—romantic
Dangerous Men and Adventurous Women. Philadelphia, suspense, supernatural, or medical romance etc. The
PA: University of Pennsylvania Press. ‘brand’ of the category is given more prominence
Krentz, A. J. (2011). Jayne Ann Krentz Bibliography. http:// than the author or even the title. One reason for study-
www.krentz-quick.com/chronolog.html (accessed 3 ing category romance is the convenience of having the
October 2012). object of study (the category) defined by the publisher.
2 Author pseudonyms in Harlequin Presents are more
Lancashire, I. (2010). Forgetful Muses: Reading the Author
about marketing than anonymity. Virtually, all authors
in the Text. Toronto: University of Toronto Press.
from the past 10 years maintain websites with their
Makinen, M. (2001). Feminist Popular Fiction. London: various pseudonyms, publication history, and brief
Palgrave. biography. Only three authors in the history of
Manning, C. and Schütze, H. (1999). Foundations Harlequin Presents have been men (Elliott 2013).
of Statistical Language Processing. Boston, MA: MIT 3 Strength of effect is the Spearman correlation with the
Press. mean trend.

Digital Scholarship in the Humanities, Vol. 31, No. 2, 2016 331


J. Elliott

4 All tokens—words and punctuation—were extracted, between the constraints of the category and the author’s
but this article uses the terms interchangeably. creativity (Vivanco 2011, pp 21–2). Category identity
5 The precise degree of similarity between category ro- is strong enough, though, that it influences every-

Downloaded from https://academic.oup.com/dsh/article/31/2/321/2463027 by Shanghai International Studies University user on 22 November 2022
mance novels from the same category is an ongoing thing from the language in the novel to the
research question. Some critics believe the genre to be formatting of the manuscript (De Geest 2010,
radically constrained (Rabine 1985, pp. 164–5), whereas pp 101–2).
most critics understand the novels as a negotiation

332 Digital Scholarship in the Humanities, Vol. 31, No. 2, 2016

You might also like