You are on page 1of 21

Applied Linguistics: 31/2: 215–235 ß Oxford University Press 2009

doi:10.1093/applin/amp023 Advance Access published on 18 June 2009

Probabilities and Surprises: A Realist

Approach to Identifying Linguistic and
Social Patterns, with Reference to an
Oral History Corpus

Downloaded from at Islamic Azad University on September 3, 2010

University of Birmingham, UK

The relationship between language and identity has been explored in a number
of ways in applied linguistics, and this article focuses on a particular aspect of
it: self-representation in the oral history interview. People from a wide range
of backgrounds, currently resident in one large city in England, were asked
to reflect on their lives as part of a project to celebrate the millennium, resulting
in a corpus of 144 transcribed interviews. The article considers the utility of
realist social theory and complexity theory in the analysis of patterns—and
deviations from those patterns—in both the linguistic features of these inter-
views and the social categories to which people are routinely ascribed. Corpus
linguistic software was used to identify discourse features of the corpus as a
whole, and to compare and contrast features produced by different speakers
with reference to the conventional social categories used in quantitative
research. These categories, with their homogenizing limitations, are challenged
with reference to complex causation. The article uses the category of gender
to exemplify the multi-method approach advocated.

This article is concerned with probabilistic patterns, including deviations
from them, in social behaviour in general and in language use in particular.
The approach conceives of social and linguistic processes as complex, agency-
driven, and susceptible to changing kinds of analysis as computer technology
develops. The data used to illustrate the discussion are a corpus of 1.8 million
words of transcribed speech, comprising 144 oral history interviews. As exam-
ples of discourse, they constitute a rich resource for exploring the probabilis-
tic—but not determined—linguistic patterns to be found in a set of texts
that have much in common with each other but that are each nevertheless
unique; each interviewee demonstrates the ever-present potential for linguis-
tic creativity while simultaneously contributing to the collective entity
that emerges as ‘the discourse of life histories’. Located in between the com-
posite identity of the whole set and the unique identity of the individual inter-
view are some patterns associated with the speakers’ membership of various
sub-categories, identifiable from both social and linguistic analysis.

The original aim of the project within which these interviews were gener-
ated was not specifically to contribute to linguistic research. Rather, it was
to record, from a socio-cultural perspective, the accounts of a diverse range
of residents of the large English city of Birmingham, in order to recognize
and celebrate the experiences of its citizens. From this angle, the corpus
represents a potential means of understanding complex social processes,
where, again, each individual is a social actor contextualized in a complex
web of structured social relations. As Uprichard and Byrne (2006: 668) put
it, ‘narratives are descriptions not of single systems but of the interweaving

Downloaded from at Islamic Azad University on September 3, 2010

of complex systems,’ because, ‘[p]eople never tell just the story of their own
life; nor do they project simply in terms of themselves. All lives are embedded
in the social; there is no personal without the social’.
The approach taken here aims to bring together insights from both realist
social theory and applied corpus linguistics. From the former comes an empha-
sis on the duality of structure and agency, along with the recognition that
while social actors have interests and intentions, their scope for realising
these is constrained. As Archer (2000: 262) expresses it, ‘Because of the pre-
existence of those structures which shape the situation in which we find
ourselves, they impinge upon us without our compliance, consent or compli-
city’. From the latter comes the recognition that in making meaning in
speech and writing, human beings are both constrained and enabled by
the linguistic choices available to them. That is, in the formulation closely
associated with Halliday (e.g. 1989, 1991), instances of discourse represent
choices from within the language system, and, as Sinclair (1991) suggests,
speakers deploy ready-made sequences (the ‘idiom principle’) but remain
able to exercise ‘open choice’ as well—although if speakers are to be
readily understood, they are constrained to choose from within pre-existing
grammatical and semantic resources.
These observations are consistent with a view of the social, discursive
world as systematic—patterned and often predictable, but where the systems
in play are open and dynamic, with human meanings and human agency
not only reproducing familiar patterns, but also generating novelty and
surprise. The approach to research explored in this article is concerned with
both trends and probabilities on the one hand, and variation on the other,
in relation to both biographical experience and the discursive representation
of that experience in the corpus of 144 transcribed oral history interviews.
Which people, with which kinds of attributes, use language in similar/
contrasting ways to tell their life histories? The article concludes with a case
study of a sub-group identified by the method presented.

The data
The interviews which comprise the data in this study were recorded in 2000–
2001 by two oral historians as part of the ‘Millennibrum’ Project and deposited
in the Local Studies and History section of Birmingham Central Library.

The aim was to preserve the narrative accounts of a diverse range of residents
of the city at the turn of the millennium, for local people to participate
‘in presenting and recording their experiences, beliefs, contributions to the
community and hopes for the future’ (Dick 2002). Of these 150 interviews,
each lasting up to 90 minutes, 144 have been released for further research in
accordance with the ethical consent procedures employed by the library. For
each interviewee, information is available about place of birth, age, sex, occu-
pation, level of education, marital status, and religious affiliation—all topics
which are usually covered in the interviews themselves.

Downloaded from at Islamic Azad University on September 3, 2010

The interviews were transcribed as part of the original project, in which
I was not directly involved, but (with the support of a small grant) I oversaw
the post-editing of the texts to make them suitable for corpus analysis. This
included checking samples of the transcripts against the sound recordings,
standardizing transcript conventions, indicating speaker turns, so that the
(usually brief) contributions of the interviewer could be excluded from
the analysis when desirable, and anonymizing the interviews and the spread-
sheet of meta-data which records demographic details such as age, sex, place
of birth, and so on.
The oral historians who conducted the interviews are skilled in encouraging
people to articulate their memories, views, and beliefs. The interviews were
conducted, by one of the two interviewers, usually in the interviewees’ homes,
lasting up to 90 minutes, so that the full corpus represents about 250 hours of
recording. During the conversation, both interviewer and interviewee held
a postcard, with a few single-word subject headings on it. The topics covered
in the interviews varied slightly, but generally included, to reflect the funders’
goals for this project: the interviewees’ childhood memories and experiences of
school; first experiences of work and subsequent jobs; family life before leaving
home, and relationships with parents and siblings; adult relationships, includ-
ing courtship, marriage and, in some cases, the breakdown of relationships;
experiences of moving and migration; parenting and hopes for their children’s
future. In addition, interviewees were asked about the role of religion in their
lives, about social changes they had witnessed in their own lifetimes, and
particularly about changes in Birmingham they had noticed during the time
they had lived in the city. All had the option, of course, to omit any of these
topics from their account if they wished.
Despite the similarities in themes, each interview is a record of a specific
social interaction, and each interviewee interprets this in his or her own
way. Interviewees inevitably make judgements about the interviewer and
her expectations, including about how far she shares their knowledge about
the things they reference. For example, interviewees who are much older
than their interlocutor, or who have lived in places beyond Birmingham,
tend to assume that some of their experiences will be unfamiliar to the inter-
viewer, and so they explain them in greater detail. In addition, these inter-
views cannot be neutral descriptions, or representations, of each ‘self’ and
its history, as they are interactional tellings, produced in a context of

interpretation and negotiation (Wortham 2000; Pavlenko 2007). Nevertheless,

there is sufficient homogeneity about the interviews for them to have certain
features in common, including linguistic features.


The shared linguistic features of the corpus have been identified using a
range of techniques, including software developed specifically for the analysis
of electronic corpora, mainly the suite of applications, WordSmith Tools (Scott

Downloaded from at Islamic Azad University on September 3, 2010

2004), and WMatrix (Rayson 2008a). From a purely lexical point of view,
the detailed consistency wordlist generated by WordSmith Tools can be con-
sulted to identify those items that are found in all—or the majority—of
the 144 transcripts. There are 46 items that occur in every text, of which
a large proportion, predictably, comprises grammatical words. Of the rest,
all are core vocabulary, but a glimmer of the ‘genre’ of ‘life history’ is suggested
by the list: good, home, know, like, old, school, see, still, things, think, time, way.
More revealing, perhaps, is the list of frequent n-grams (i.e. sequences
of ‘n’ consecutive characters, including spaces, such that recurring ‘chunks’
of text longer than single words can be identified). Strings of three words
occurring in 130 or more of the texts include: a lot of/to go to/I went to/and
it was/it was a/and I was/when I was/one of the/I don’t know/there was a/and that
was/I used to/and I think/I think it/I was born/I had a/but it was/and there was.
Strings of four words occurring in 100 or more of the texts include: I think it
was/I was born in/and I used to/and we used to. Two features are striking about
these lists, both, unsurprisingly, concerned with the articulation of memories.
There is a preponderance of expressions concerned with past time, and, in
the use of ‘private’ verbs, some hints of tentativeness about relating what
happened (‘I think’, ‘I don’t know’).
A third way of determining what is distinctive about these texts as a whole
is by comparing this corpus with a reference corpus. Different analysts use
slightly different calculations to identify items which are statistically signifi-
cantly more frequent in the target corpus than the reference corpus, but both
Scott’s (2004) WordSmith Tools and Rayson’s (2008a) WMatrix use log likeli-
hood (LL) values to generate lists of ‘key’ items. In this operation, ‘the word
that has the most significant relative frequency difference between the two
corpora’ has the largest LL value, and thus ‘the words most indicative (or
distinctive) of one corpus, as compared to the other corpus, occur at the top
of the list’ (Rayson 2008b: 528).
In the present study, WMatrix was used to identify which items are ‘key’
in comparison with the Spoken Data in the BNC Sampler Corpus. The top 38
items (those with a log likelihood value of more than 450) include: was,
and, my, $place (the replacement term, to preserve anonymity, for specific
places mentioned in the interviews) used_to, had, school, because, Birmingham,
I, were, very, people, me, went, am, mom, remember, in, in_those_days, children,
years, family, to, had_to, father, life, always, mother, friends, university, parents,

Downloaded from at Islamic Azad University on September 3, 2010

Figure 1: Screenshot of part of the output from the ‘domain cloud’ identifi-
cation by WMatrix of semantic themes in the oral history corpus compared
with the BNC Sampler Spoken Corpus

road, at_the_time, worked, at, quite, city. The semantic fields connoted by the
more content-carrying of these items is fairly self-evident, and accords with
the topics chosen for this project, but another way of representing them is
using the semantic ‘domain clouds’ available in WMatrix (Figure 1).
The WMatrix tool can be used to illustrate the relative frequency differences
between the Millennibrum corpus and a reference corpus in a similar
manner to the ‘tag clouds’ employed in some social networking web sites.
In those, ‘an alphabetically sorted list of words (confusingly for this context
called tags) are shown in a larger font if they are (manually) assigned more
frequently to shared digital photographs . . . or web site bookmarks . . .’ (Rayson
2008b: 533). WMatrix incorporates the USAS tagger (Rayson et al. 2004, cited
in Rayson 2008b), which automatically assigns semantic fields (domains) to
each word or multiword expression in a corpus. The clouds it produces
use larger fonts to indicate greater keyness, so that, in this case, the semantic
domains of ‘moving,_coming_and_going’ and ‘personal_relationship:_
general’, among others, are shown to be significantly more frequent in this
corpus than in the spoken component of the BNC.
From each of these sets of findings, we can conclude that there is
a commonality about these texts, despite the fact that they represent 144
different and, in some ways, strikingly contrasting life stories. Among these
interviewees are individuals who are young and old (their years of birth span
1896–1985), migrants from across the world and ‘Brummies born-and-bred’,
employees from all the major occupation categories of the SOC2000 classifica-
tions (ONC 2000), conventional heterosexual family members and people

with ‘alternative’ lifestyles—and yet all have produced a text in this context
which contributes to a patterned discourse with identifiable features.


One way to account for this relative homogeneity is with reference to
complex systems theory, whose application to empirically observed language
behaviour has been most thoroughly developed by Larsen-Freeman and
Cameron (2008). From this perspective, the production in real time of inter-

Downloaded from at Islamic Azad University on September 3, 2010

actions such as these oral history interviews is accomplished by the speakers
within a nested complex of contexts, by means of a potentially extensive
set of linguistic resources that are nevertheless constrained in various ways.
As outlined elsewhere (Sealey and Carter 2004; Carter and Sealey in press),
these interactions can be conceptualized and described from different analyt-
ical perspectives. Each speaker has their own ‘psychobiography’ (Layder
1997), the unique complex of experiences—and reflections on those experi-
ences—that are ‘integral to the experiencing individual’, each person’s ‘indi-
vidual truth’ (Craib 1998: 31). (In the case of these interviews, however,
the interviewer takes steps to downplay, or mute her individuality and provide
maximum space for the production of the interviewee’s contributions.)
A person’s lifespan may be considered in terms of the ‘ontogenetic timescale’
(Larsen-Freeman and Cameron 2008: 168–9), whereas each moment of
‘online talk’ is experienced on the ‘microgenetic timescale’. In the collabora-
tive event that is the interview, which in Layder’s (1997) classification occu-
pies the ‘domain’ of ‘situated interaction’, the interlocutors ‘soft assemble
their contribution’, as Larsen-Freeman and Cameron (2008: 169) express
it. ‘Soft assembly,’ they explain, ‘describes an adaptive action in which all
aspects of context can influence what happens at all levels of activity’.
Operating on different timescales still, but influencing each interview—and
to some extent acting as components of them—are what Layder (1997) terms
the ‘domains’ of social settings and contextual resources, highlighting
the socio-historical context of structured social and economic relations. This
approach is compatible with Craib’s (1998: 28) acknowledgement of ‘the wider
social structure and . . . the wider historical processes which provide us with
the stage on which we act out our lives’. Larsen-Freeman and Cameron (2008:
169), similarly, situate interactions within various ‘levels’ of social systems,
from the dyad of this kind of interaction, to ‘socio-cultural groups and institu-
tions of various types and sizes all the way up to the society of the speech
These insights from realist social theory and complexity theory present us
with the beginnings of an explanation for the findings from this corpus analysis
of linguistic features that are common to these examples of the life histories
collected in one large English city at the turn of the millennium. Out of all the
possible human sounds these speakers might have recorded, only a circum-
scribed sub-set of words, phrases and grammatical constructions, intelligible

to speakers of English, was actually drawn on. From within that stock of
resources, a further sub-set accounts for all of what was produced, and this
can be partially explained with reference to insights from psycholinguistics
(e.g. scripts, schemas, and the ‘economic’ advantages of processing expected,
rather than novel, input and output; e.g. Rumelhart 1975; 1984) and prag-
matics (e.g. the co-operation principle (Grice 1975) and theories of relevance
(Sperber and Wilson 1986)); also helpful are sociologically derived under-
standings about sedimented patterns of social behaviour, with their parallels
in corpus linguistic findings about ‘discourse patterns’ that cluster around

Downloaded from at Islamic Azad University on September 3, 2010

the ‘meanings [that] are repeatedly expressed in a discourse community,’
and that work by ‘filtering and crystallizing ideas, and by providing pre-
fabricated means by which ideas can be easily conveyed and grasped’
(Stubbs 1996: 158). One indicative finding here is that, from the total vocab-
ulary stock of the English language, just 24,276 items (types) feature in this
corpus of just under 1.8 million words (using WordSmith Tools’ Wordlist func-
tion, which also calculates the standardized type/token ratio for this corpus
as 32.89).
In the account I am seeking to provide, the interviews represent examples
of discourse as a complex system, which, as Larsen-Freeman and Cameron
(2008: 175) suggest:
. . . invokes the image of a multidimensional landscape with
hills and valleys, over which the system roams, creating a
trajectory . . .. The landscape represents the probabilities of various
modes or phases of discourse behaviour, and the trajectory is
carved out as a particular conversation moves from one mode
to another.
Crucial to the description I am proposing is the centrality of human agency,
where each interviewee, in the context of the complex, dynamic, interactional
processes outlined above, makes choices about how to formulate this specific
account of their life experience. As Craib (1998: 28) puts it, ‘We are each
of us given a starting point and we do something with it.’ If the generic oral
life history has certain features, then, in accordance with the fractal patterns
typically found in complex systems (Larsen-Freeman and Cameron 2008:
109–111), each individual instance has its own pattern that potentially
mimics the overall composite on a smaller scale.

By means of corpus analysis, it is possible to identify both similarities and
differences between the whole set of interviews and each individual example,
and also to find patterns among sub-sets of the corpus as a whole. In the
tradition of quantitative, variationist analysis, the conventional approach
involves predetermining which demographic ‘variables’ are likely to correlate
with the differential use of particular linguistic features, and to use some kind

of regression analysis to identify correlations. There are several problems

with this, however, given the claims made above about the nature of social
and linguistic phenomena as complex systems, and the notion of causality
implied by the theory deployed here. Models which assume that speaker char-
acteristics are ‘independent variables’, with linguistic features as ‘dependent
variables,’ imply a linear model of causality. Such models do not allow
for the interaction between the variables, and they do not model well the
dynamic, systems-based realities with which we are concerned in this kind
of data. The limitations of linear modelling are succinctly summarized by

Downloaded from at Islamic Azad University on September 3, 2010

Byrne (2002a: 112):
It may well have a more important role in relation to the interpre-
tation of the products of experiments in domains where experi-
ments are useful, but when we deal with the products of surveys,
with accounts of the world as becoming—that is, as a set of nested
complex evolutionary systems, as inherently dynamic—then mun-
dane exploration is as far as it goes.
Methods of analysis that are suitable for data collected in experiments are
unlikely to be appropriate for the analysis of linguistic data such as the inter-
view transcripts described above. For example, derived from an experimental
paradigm, sex (or gender) would be thought of as a ‘variable’; the interviews
of the male speakers would be separated from those of the females and
compared with them on various linguistic dimensions. Indeed, Cameron
(2005: 484), in the context of advocating a post-modern perspective on
gender, maintains that even a ‘modern feminist approach’ to language and
gender research ‘presupposes the existence of two internally homogeneous
groups, ‘‘men’’ and ‘‘women’’, and looks for differences between them’,
and the Keyword tool in WordSmith Tools allows a corpus analyst, should
they choose, to identify words which occur with significantly greater
frequencies in either the men’s or the women’s interviews. Results from this
operation could readily be used to confirm gender stereotypes, as Table 1
While the definition and interpretation of this version of ‘keyness’ are not
without their critics (e.g. Moon 2007), it is clear that this variable-based
analysis does tell us something about the data. It is a fairly blunt instrument,
however, and is very prone to the methodological implication—even if this
is not explicit—that there is some kind of causal link between the attribute
of being male (or female) and using, in the context of telling your life story,
more words from the domains of work and sport (or of the home and relation-
ships). As previous researchers into language and gender have suggested, there
may indeed be correlations between gender and conversational topic (e.g.
Kipers 1987; Bischoping 1993; Tannen 1994)—and hence contrasting frequen-
cies of words from particular domains. However, the realist social theory
with which this analysis is concerned would seek to do something rather
different from establishing such tendencies. For an explanation, rather than

Table 1: Selected lexical items from the top of the lists of

words defined by WordSmith Tools as key in a comparison
of the male and female interviews
Males Keyness Females Keyness

wife 200.54 husband 690.67

period 170.11 children 393.69
trade 139.06 mom 247.78

Downloaded from at Islamic Azad University on September 3, 2010

football 128.04 lovely 209.22
film 124.09 baby 183.93
people 119.78 home 154.47
boats 119.34 nursery 146.31
number 114.89 know 122.61
british 106.03 went 114.48
country 98.606 mother 112.4
city 93.962 pregnant 108.65
company 89.773 sister 105.03

a description, of the variability in these different speakers’ accounts of their

lives, a different methodology is called for. This is because:
In the realist frame of reference we do not see causes as single
factors whose presence inevitably generates an effect and whose
absence means that the effect does not occur. Rather cause is a
property of complex and contingent mechanisms in reality and
such mechanisms, moreover, are not universal but only relatively
permanent—inherently local. (Byrne 2002a: 105)
Byrne’s research orientation derives from the sociological realism stimulated
by Bhaskar and developed in various ways by Archer, Layder, Pawson, Sayer,
and others. These writers ‘reject the conception of society . . . as [a] closed
system, arguing instead that reality is a structured open system’ (Downward
and Mearman 2007: 87). There is not the space here to elucidate the theoret-
ical underpinnings of this approach (for further discussion see Sealey and
Carter 2004; Sealey 2007), but an example of the method in action is


Realist explanations tend to be ‘theory driven approaches’ which seek
to understand the mechanisms which lead to differential outcomes—why,
for example, social or educational policy interventions turn out differently,
appearing to ‘work’ for some people and not others, and in some circumstances
and not others.

An excellent summary and explanation of this approach is found in

Pawson (2002a). In public policy, meta-analyses of previous studies are
often appealed to for answers to the question ‘what works’, but this is refor-
mulated by Pawson as ‘what works for whom in what circumstances?’.
He advocates a ‘realist synthesis’ perspective, according to which:
. . . it is not ‘programmes’ that work: rather it is the underlying
reasons or resources that they offer subjects that generate
change . . .. Whether the choices or capacities on offer in an initia-
tive are acted upon depends on the nature of their subjects and

Downloaded from at Islamic Azad University on September 3, 2010

the circumstances of the initiative . . .. (Pawson 2002a: 342)
Pawson describes the pattern in policy-making whereby a particular approach
is tried in some area of public policy—such as the ‘naming and shaming’
of offenders, for example—and is then adopted in a whole range of other
areas, as though the policy, the naming, and shaming, is causally effective
(see also Pawson 2002b). This fallacious reasoning can lead to contradictory
outcomes: for example, the car manufacturer whose products are publicized
for their poor safety or security records is quick to improve their performance
(‘successful’ naming and shaming), whereas the protesters who refuse to
pay the community charge (a controversial local tax) find welcome celebrity
in publicity surrounding their court appearances and are not persuaded to
change their behaviour (‘unsuccessful’ naming and shaming).
Pawson’s approach provides a persuasive explanation of the discrepancies,
because it:
. . . adopts a ‘generative’ understanding of causation. What this
tries to break is the lazy linguistic habit of basing evaluation
on the question of whether ‘programmes work’. In fact, it is not
programmes that work but the resources they offer to enable
their subjects to make them work. This process of how subjects
interpret the intervention stratagem is known as the programme
‘mechanism’ and it is the pivot around which realist evaluation
revolves. (Pawson 2002a: 342)
Studies of the social world from this kind of perspective offer an explanatory
power that those based on variables do not. Applied linguists will be very
familiar with studies designed to identify whether particular language teaching
strategies ‘work’—and with the range of conflicting findings that they gener-
ate. The realist approach, by contrast, does not begin with social categories
decided a priori—gender (or sex), or ethnicity, or age-group or social class.
Instead, it assumes that social phenomena are characterized by processes and
relations, and therefore ways are sought to investigate and describe these and
their effects. If social phenomena are complex systems, then they:
. . . are not made up of pre-existing variables, although we can
properly describe them through the measurement of variate
traces. In realist terms, the traces are actual things in the world

that are the products of the generative real system, and the inte-
rior working of the system is not reducible to elements existing
separate and analyzable outside the system. (Byrne 2001: 64)
It is appropriate at this point to summarize the methodological implications
of the social realist theory within which the present study is situated. From
Pawson’s work (which is predominantly concerned with the evaluation of
interventions in social policy), an obvious priority is the identification of con-
texts and mechanisms to explain stability and variation in outcomes. The
interpretations of experience by social actors may be consistent or varied,

Downloaded from at Islamic Azad University on September 3, 2010

and patterns in both experience and interpretation are sought in the research
process. According to Byrne (2001), each of the following is important:
(i) exploration, ‘which involves both descriptive measurement of variate
traces of complex systems and examination of the patterns generated
by those measurements,’ including ‘the exploration of qualitative materials
presented as texts or in other documentary forms’; (ii) classification, including
sorting things into kinds, ‘using, inter alia, numerical taxonomy procedures’,
as well as ‘the identification, however temporary, of what constitutes mean-
ingful boundaries’; (iii) interpretation—both of measures and of ‘ ‘‘natural
language’’ descriptions of qualitative form’; (in relation to this point, Byrne
stresses that he is not referring to ‘post-modern eclecticism, but rather to
the originary conception of hermeneutics in which there is a search for mean-
ing as truth’; (iv) ordering, so that things are ‘sorted and positioned along
a dimension of time’.


Readers who are familiar with corpus linguistics may identify some echoes
here in the challenge to traditional descriptions of language that its methods
and findings have generated. Language corpora (large collections of authentic
writing or transcribed utterances, electronically stored) are analysed using
dedicated software that reveals patterns not usually available from intuition,
introspection or even text-by-text discourse analysis. Reliance on intuition
rather than on the large quantities of empirical evidence available in
the corpus, says Sinclair, meant that the pre-corpus situation in linguistics
‘was similar to that of the physical sciences some 250 years ago’ (1991: 1).
In social science, maintains Byrne (2002a: 63), ‘our science for 300 years has
been a science of analytical reduction to the simple. Now we can address
the complex’, as computers ‘enhanc[e] our capacity to explore’—as they do
with language. As realist theorists do not expect to observe directly the gen-
erative mechanisms responsible for differential social behaviour, so corpus
linguists are familiar with the ‘tools of indirect observation’ (Sinclair 2004:
189) in their analytic software.

The social researchers on whose work I am drawing are sympathetic to

many different research methods, and several stress the advantages of
moving between statistical analysis and local interpretation. For social scien-
tists who want to know the reasons for patterned behaviours, ‘there is no
logical difference between the work of an interpretive researcher conducting
detailed observations of a social setting and a large scale national survey’
(Williams 2002: 128). Byrne (2002b: 68–69) sees a particular role for informa-
tion technology here, ‘as a cybernetic extension of human cognitive capacity
which enables us to get to grips with the big pictures of macroscopic social

Downloaded from at Islamic Azad University on September 3, 2010

change through the management and interpretation of both quantitative
and qualitative data flows’. Similarly, corpus linguists often emphasize the
importance of moving between the corpus and the instance, combining auto-
mated searches and interpretive analysis (e.g. Upton and Connor 2001;
Hunston 2002; Carter 2004: 219–221).
Where conventional accounts of language systems separate lexis and syn-
tax (‘a highly generalized formal syntax, with slots into which fall neat lists
of words’ (Sinclair 1991: 108)), corpus analysis, which allows the researcher
to review a lot of linguistic evidence at once, challenges traditional categories
including word classes (or ‘parts of speech’). It also identifies trends and
probabilities, which, according to Stubbs (2001) are attributable to the con-
straints that derive from both linguistic and social expectations. While not
deterministic, these constraints ‘. . . mean that, although we are in principle
free to say whatever we want, in practice what we say is constrained in many
ways. The main evidence for these constraints comes from observations
of what is frequently said, and this can be observed, with computational
help, in large text collections’ (Stubbs 2001: 19). de Beaugrande (1997: 130)
posits something similar—‘a convergence of data making some meanings or
understandings much more probable than others’, and an explanation for ‘the
rich global complexity of real language data’ as generated by ‘the interaction
of multiple local constraints that are essentially simple’ (de Beaugrande 1999:
131). Complexity, dynamic processes, and relational units of analysis are
critical to understanding both linguistic and social systems—yet both demon-
strate stability: ‘as insights from corpus linguistics show, the stabilities
that speakers employ are diverse—words, phrases, idioms, metaphors, non-
canonical collocations, grammar structures—a much more complex and
diverse set of language-using patterns than the ‘core grammar’ of formal
approaches’ (Larsen-Freeman and Cameron 2008: 99).
I am suggesting, then, that there are some parallels to be drawn between
developments in researching social and linguistic processes, respectively,
with reference to realist social theory and methodologies that are consistent
with it. This is, of course, not to be blind to the dangers of ‘illegitimate cross-
disciplinarity’, and indeed, as I have noted elsewhere (Sealey 2007), since
different kinds of things have different properties and propensities, it is impor-
tant to be mindful of those characteristics of language that are distinctive from
those of social actors and of social structures. Nevertheless, the enterprise

described here aims to explore how far the analysis of a corpus of transcribed
oral histories can be enhanced by the application of the integrative method-
ological approaches advocated by social realism within a corpus linguistic


One of the strategies proposed by Byrne is to start from the case, and to
find ‘ways of sorting cases into categories’, looking for ‘category sets which

Downloaded from at Islamic Azad University on September 3, 2010

emerge from the exploration of our data’ (Byrne 2002a: 100). With corpus
data, such an approach can be achieved in various ways, and indeed Byrne’s
injunction is consistent with Sinclair’s plea to ‘trust the text,’ including allow-
ing computers to ‘. . . show us things that we may not already know and
even things that shake our faith quite a bit in established models, and which
may cause us to revise our ideas very substantially’ (Sinclair 2004: 23).
Sinclair’s corpus-driven approach to language analysis mistrusts the premature
tagging of corpus texts because of the danger of imposing prior models (or,
as Byrne might say, ‘category sets’) and obscuring ‘the clarity of the categories
in the data’ (Sinclair 2004: 191). The demographic categories associated
with the corpus in the present study should therefore be treated with some
caution—although it has to be noted that it is challenging to find useful group-
ings that maintain an appropriate balance between the distortion of over
generalization and the unwieldy and unhelpful method of treating each case
as unique. (A similar problem faces lexico-grammarians trying to be sensitive
to specific patterning while simultaneously identifying groups with similar
characteristics (Hunston and Francis 2000).)
The longer term goal of the work under discussion is to combine corpus
analysis and realist-derived sociological analysis through software adapted
to accommodate both kinds of data. In the meantime, the approach is exem-
plified on a more modest scale using currently available tools.
As a first step to a case-oriented methodology, the demographic meta-data
about the 144 interviewees is entered into a table, with each row representing
a case. This is consistent with Ragin’s (1987) advocacy of the comparative
method to ‘compare cases with different combinations holistically’ (p. 101).
‘The case-oriented approach,’ he maintains, ‘. . . allows investigators to com-
prehend diversity and address causal complexity’ (p. 168).
The next stage is to make use of established corpus analytic techniques
to identify potentially fruitful areas of inquiry. To illustrate this, let us return
to the contrasting keywords between male and female interviewees. A lexical
word identified in this search was lovely, used four times as often by the female
speakers as by the men, at 441 occurrences to the males’ 110. It is customary
for quantitative researchers to stress that such findings indicate an associa-
tion and not an explanation, and yet the inference that the attribute of
being female is in some way causally linked with this language practice is
not unreasonable. However, if being female was a direct ‘cause’ of the frequent

use of this word, then no men would use it often, and all the women would.
This is patently absurd, but we can look somewhat differently at this
Counter-posed to the implicit determinism of the variables-based approach
are the more interpretive research traditions. Instead of inferring that posses-
sion of the attribute [variable] female leads to an increased use of items such
as lovely, some analysts would stress the performative dimension of gender
through language (e.g. Ochs 1992; Meyerhoff 1996; Holmes and Meyerhoff
1999; Weatherall 2002). Holmes (1997: 203), for example, suggests that atten-

Downloaded from at Islamic Azad University on September 3, 2010

tion should be on ‘the linguistic realizations of gender’, through an examina-
tion of ‘the way individuals express or construct their gender identities
in specific interactions in particular social contexts’. More ethnographic, qual-
itative studies, such as those advocated by Holmes, focus on ‘how people
use language to create, construct, and reinforce particular social identities’
(Holmes 1997: 204)—and women’s frequent use of positive evaluators
such as lovely may well index femininity (cf. Lakoff 1975). However, such
approaches are rarely concerned with the probabilities identified in quantita-
tive analysis, and, while they do shed light on the local development of com-
munities of (more or less gendered) practices, their accounts of speakers
resisting the reproduction of gendered patterns of language behaviour tend
not to propose causal explanations.
Returning to the individual cases in my corpus, we can look more closely
at the counter-examples, the cases that seem to buck the trend—rather than
regarding them as ‘outliers’, as conventional statistical procedures would
require. These ‘outliers’ include, from the 16 speakers who use lovely in
their interviews most frequently (10 times or more), the two who are
men. From a case perspective, it transpires that there are similarities across
several dimensions. Both were born in Birmingham in the 1920s; neither
continued their education beyond secondary school; both are fathers to two
or more children and both are married. Further exploration could establish
whether there are additional features, linguistic or demographic, which would
identify reasons to see these speakers as belonging to a sub-group linked
in meaningful ways; alternatively, it may be that choosing lovely to describe
elements of their experience, when most men do not do so in these interviews,
is a weak link, not reinforced by any others.
Looking at the same initial findings from the other direction, so to speak,
it transpires that a significant number—59—of the 144 speakers fail to use
the word lovely in their interviews at all. Are these non-users all men? The
majority—44—are, but the remaining 15 are women. From a case-based
perspective, we can investigate whether these speakers have anything in
common, looking for both linguistic and demographic patterns.
One possibility that suggests itself, of course, is that these particular
women have not experienced anything that could be described as ‘lovely’.
Perhaps (their accounts of) their biographies are distinctively negative. To
investigate this possibility, I used WMatrix to compare each of these 15 texts

with the corpus as a whole, using the semantic domains tool described
above. This allows the user to see in a concordance line view each of the
items assigned to the semantic domains identified by the tagger. From this,
it is apparent that some of these speakers seem to be very positive in their
narratives, using items classified in the ‘evaluation_good’ category and in the
‘happy’ category significantly more frequently than is found in the corpus
as a whole. Some examples from the concordance lines for these data are
given below (see Figure 2).
This suggests that the interplay between these women’s oral histories and

Downloaded from at Islamic Azad University on September 3, 2010

their lexical choices is more subtle than an absence of lovely simply denoting
an absence of positive evaluation in general. From the case-based perspective,
the table of metadata can be reviewed manually, because the number of cases
is so small, although it would be desirable to use appropriate software for
a larger-scale analysis. Of all the attributes recorded in the database, just one
emerges as shared among all but two of this particular group of women:
their classification by ‘marital status’ is single or widowed. The remaining
two have been classified as ‘other’: one is a gay woman who describes
in her interview the difficulties she has faced because of her sexuality, and

(a)else, kind of thing. We had fun when we moved here obviously so

all that. I just go out for a laugh with my friends and then just g
oose to go somewhere where you enjoy yourself rather than the fact of
a day out and that was really good fun because it was really relaxed
ried and having kids and being happy with it because that ’s not me
s quite good, but it was just fun but then you got sick of it in
ple that I know so it ’s quite fun in that respect but as the job
we have kind of got the same sense of humour, so we hang round mostly
at you want to do, so you can enjoy yourself a lot more this time in
so it was quite good. It was enjoyable because it’s in the country an

(b)the drama department is brilliant at my school and if

a for a month which was brilliant and we got to travel
out walking and it was brilliant and we flew &; I think
was Amsterdam which was brilliant and I went to see Anne
it ’s always just been brilliant , there ’s never been

(c) to accept it. And my mom who’s fab was like I don’t care
e her and she loves you. She was fabulous but found it
modation we wanted, that was the best route, doing that.
award and I won which was really fabulous, probably one
probably one of the best moments of my life which was
been to get there really. It was fab .

Figure 2: (a) Selected concordance lines for semantic domain of ‘happy’ in

one interviewee’s transcript. (b) Selected concordance lines for semantic
domain of ‘evaluation_good’ in another interviewee’s transcript. (c) Selected
concordance lines for semantic domain of ‘evaluation_good’ in a third inter-
viewee’s transcript

another, who is in a stable heterosexual relationship, reflects explicitly on

her resistance to getting married and to conforming to conventional gender
roles. Thus the combination of these two approaches, a case-based consider-
ation of the interviewees’ attributes, and a corpus-based linguistic analysis,
generates a potential sub-group who share the classification ‘female’, who
significantly under-use the item lovely in their interviews, relative to other
females in the corpus, and who are not currently in conventional marital
A further iteration of the approach makes use of the keyword facility

Downloaded from at Islamic Azad University on September 3, 2010

to compare the sub-corpus of the interviews of these women with the
corpus as a whole. This makes possible, for example, the identification of
other items, in addition to lovely, that are frequent in the accounts of the
other female speakers, but much less so in these interviews. Items that
WordSmith identifies as ‘negatively key’ (that is, significantly under-used
by this sub-group in comparison with the sub-corpus of the other female
interviewees) include child, mother, wonderful, husband, children—and also
him, his and he.
Very near the top of the list of items they use more frequently than the
other female speakers is don’t, and a concordance of I don’t is interesting.
There are 427 occurrences in the interview transcripts of these 15 women,
many of which frequently occur in strings such as ‘I don’t know’. However,
the most frequent four-word string (27 occurrences) is ‘I don’t want to’, while
there are 29 occurrences of the three-word ‘I don’t like’. All speakers use
one of these strings, 10 of them using only either ‘I don’t like’ or ‘I don’t
want to’, and the other five using both. Looking in detail at the concordance
lines for these strings, it becomes apparent that these particular women tell
their life stories with repeated references to the differences between them-
selves and others, or between their priorities and the expectations or priorities
of the wider society. To illustrate this, just one example has been extracted of
these two expressions from each of the speakers in this group who uses them
(see Figure 3).
This finding receives further confirmation from the analysis of semantic
domains facilitated by WMatrix. Domains identified as occurring significantly
more frequently in the interviews of some of these 15 speakers include
‘not part of a group’ and ‘different’. Some examples of the concordance lines
classified in this way are presented in Figure 4.
Much more analysis would be needed to draw any definitive conclusions,
but this case-based, corpus-informed analysis has begun to identify a sub-
group of speakers who tell their life stories in a way that would not have
been identified using any of these analytic approaches on its own. As
women, they are somewhat unusual in failing to use in their interviews
such positive evaluators as lovely and wonderful. None of them is currently
in a conventional marital relationship, and they recognize and describe them-
selves as often different in their behaviour or outlook from the norms—though
not exclusively the gender norms—presented by others.

I don’t like the current English culture
I don’t like housework very much. I don’t like ironing, I don’t like washing
I’m very direct and I don’t like to hide because I’m not embarrassed about who I am.
I’ll move away from Birmingham anyway, just because I don’t like the city.
I like open spaces but I don’t like the sea
Birmingham is a very, and I don’t like to use the word, multi-cultural environment.
I don’t like that my disability has [been?] made the subject of competitive spirit
I don’t like fighting or anything. I don’t mind drinking but I don’t like fighting, so that’s it.
I thought I am not going back to this, so I thought, no, I don’t like this
I don’t like to mix much, you know

I’ve never visited the country and I really think that I don’t want to because I’ve seen too much of a bad side
So, when he came I said I don’t believe in unions so I don’t want to join. Well most of the others had joined
fair enough, people do want to get married, but personally I don’t want to waste all that money on getting married

Downloaded from at Islamic Azad University on September 3, 2010

I don’t know how you’re supposed to change that and I don’t want to change being a confident person
I don’t want to be ignored, but I don’t want to be looked after either
I was thinking but I don’t want to be like that, I want to be myself
I don’t want to set my sights too high
I didn’t want to do any of that, I just thought no, I don’t want to be anything in medical
oh, oh no, oh no, I don’t want to go to work
it just feels like you are not really needed, and so whatever I decide to do, I don’t want to be a university lecturer

Figure 3: (a) Concordance lines for ‘I don’t like’ from the interviews of ten
different speakers. (b) Concordance lines for ‘I don’t want to’ from the
interviews of ten different speakers

In these brief extracts, I suggest, we gain some brief glimpses into what
Archer (2000: 163) identifies as ‘one of the most important things to probe’,
namely ‘how the self-conscious human being reflects upon his or her invol-
untary placement’. A realist, agency-based perspective has no problem
accounting for this complex configuration of findings as an example of
people making choices from the resources and options available to them—
and in a patterned way that does not necessarily conform to the rather inflex-
ible categories conventionally deployed in quantitative studies. It is more con-
sistent with the ‘performance of gender’ perspective familiar from qualitative
research, but allows for different levels of analytic purchase on the data.

In this article I have sketched briefly the ways in which social researchers
seek to account for probabilistic patterns with reference to realist social
theory. This theory has several important features. It conceives of the social
world as an open, complex, dynamic set of inter-related systems. Human
behaviour is understood to be explicable not with reference to single causes
that are effective in categorical ways, but with reference instead to nested
and interacting sets of interests and circumstances, some of them involuntary
and perhaps even unknown to the people affected by them, others the results
of choices made with reference to what people perceive as in their interests.
In this approach, social category labels are seen as neither discursive ephemera
nor deterministic causal variables. Many of the categories routinely used in
survey research, policy evaluations and monitoring practices are less stable
than they may seem at first sight (Sealey and Carter 2001; Carter and Sealey
in press). The social category used as an illustration here was that of ‘gender’.
Once assumed to be an essential, deterministic attribute firmly linked to

(a)felt like strangers in a way or outsiders because we didn’t realise

t once, because I felt such an outsider because I didn’t listen to
ulated this feeling of being an outsider because they obviously had a
overwhelming sense of being an outsider and not wanting to be an
I was much more solitary in that sense and it was in 1988
I felt like even more of an outsider because I had a different perspe

(b)really. Independence was the one thing I enjoyed at s

‘t think that was right, but I personally feel being at an all girls
n all girls ‘ school I was more independent, more so than going to a
I find that I was more independent than some people that I ‘d met w

Downloaded from at Islamic Azad University on September 3, 2010

anything individually or personally but I believe there is something
o get the experience. For me, personally, a lot of people who have
e that in London, it ‘s a very lonely place . And the other thing
Television and $pla and $pla as freelancer in camera production

(c) I always had a sense that I was different from everyone else from
asked me to ask you why you’re different to us. And I think that
colour. I knew I was different and at home, our life at home wa
communicated with her family was very different to the way that I’d
ifferent, you know language was different and even then, I mean I
but I knew things were different in the way that they acted within
great deals of class differences and certain children doing better
you get compared to the other and that might have given her a b
university, that makes you different and you’re not the same person a
and so I was quite horrified at other people’s personal habits, I thi
r in’92, I did n’t realise how segregated the city was and did n’t

Figure 4: (a) Selected concordance lines for semantic domain of ‘not part of a
group’ in one interviewee’s transcript. (b) Selected concordance lines for
semantic domain of ‘not part of a group’ in a second interviewee’s transcript.
(c) Selected concordance lines for semantic domain of ‘comparing_different’ in
a third interviewee’s transcript

biological sex, gender later came to be understood as a product of socialization

and acculturation, while current perspectives emphasize the notion of per-
formed, ‘gendered’ identities, and of diversity in place of binary difference
(Cameron 2005). The detailed life histories discussed above reveal diversity
in relation not only to gender identities, but also religious affiliation, interpre-
tations of ‘ethnicity’, people’s status as ‘parents’ (where the role may or
may not be biological, legal, adoptive, temporary and so on) and many other
social categories which are often used to assign an individual to membership
of a group.
Recognition of both the diversity and the commonalities that underlie such
social categories raises a crucial question: to what extent are we ‘positioned’
by ‘external socio-cultural factors which . . . predispose us to various courses
of action’ (Archer 2000: 12), and how much scope do individuals have to
choose and fashion for themselves one or more identities? Archer (2000: 13)
puts it this way. ‘Society enters into us, but we can reflect upon it, just as

we reflect upon nature and upon practice. Without such referential reality
there would be nothing substantive to reflect upon; but without our
reflections we would have only a physical impact upon reality.’
Contemporary research that is consistent with this theoretical outlook
makes use of various methods, including the ever-increasing processing
capacity of computers. The methods used are iterative, applying different
scales of analysis, as is consistent with a view of the social world as comprising
different kinds of entities with different properties and powers that operate
on different timescales.

Downloaded from at Islamic Azad University on September 3, 2010

I have suggested that language and discourse, as indispensable elements
of both social processes and the researching of them, can yield particular
kinds of insight, and that there are parallels between corpus-driven linguis-
tic analysis and research into social processes that, like corpus linguistics,
is open to the data-driven identification of categories, patterns, and
With reference to the transcribed life histories of a particular collection
of people, I have demonstrated how these approaches to research have
the potential to be mutually informative. The illustrations provided are nec-
essarily selective and tentative, and much more will need to be achieved
before the full potential of the optimum software for bringing together linguis-
tic and sociological analysis is likely to be realized. For example, the ‘variable’
of sex (or gender) was chosen as a convenient initial category to interrogate,
and others from the demographic data would no doubt generate equivalent
lines of inquiry. The ‘positive evaluators’ chosen as the starting point for
the linguistic analysis could have been substituted by any number of alterna-
tive patterned linguistic features (current work in progress is looking at hedges
and boosters, vague expressions, significantly frequent grammatical words
and so on). And the tools of analysis, including calculators of ‘keywords’,
will ideally be extended as this approach develops.
Nevertheless, the potential identified for linguistic and social researchers
to get the measure of their data in innovative ways is an exciting prospect.

I am grateful to Sian Roberts, Helen Lloyd, and Malcolm Dick for access to, and information about,
the Millennibrum interviews, as well as to the people who recorded them for posterity. I should
also like to thank Pernilla Danielsson, Paul Rayson, and the Collaborative Research Network
at the University of Birmingham which funded the post-editing of the transcripts, most ably
carried out by Juliet Herring. Grateful acknowledgements are also due to Bob Carter, three anon-
ymous referees and the editors of the journal for helpful comments on an earlier version of
this article.

Archer, M. 2000. Being Human: The Problem de Beaugrande, R. 1997. New Foundations
of Agency. Cambridge University Press. for a Science of Text and Discourse: Cognition,

Communication, and the Freedom of Access to Holmes, J. and M. Meyerhoff. 1999. ‘The com-
Knowledge and Society. Ablex. munity of practice: theories and methodologies
de Beaugrande, R. 1999. ‘Linguistics, sociolin- in language and gender research,’ Language in
guistics, and corpus linguistics: ideal language Society 28: 173–183.
versus real language,’ Journal of Sociolinguistics Hunston, S. 2002. Corpora in Applied Linguistics.
3/1: 128–139. Cambridge University Press.
Bischoping, K. 1993. ‘Gender differences in Hunston, S. and G. Francis. 2000. Pattern
conversation topics, 1922–1990,’ Sex Roles Grammar: A Corpus-Driven Approach to the
28/1–2: 1–18. Lexical Grammar of English. John Benjamins.
Byrne, D. 2001. ‘What is complexity science? Kipers, P. S. 1987. ‘Gender and topic,’ Language
Thinking as a realist about measurement in Society 16: 543–557.

Downloaded from at Islamic Azad University on September 3, 2010

and cities and arguing for natural history,’ Lakoff, R. 1975. Language and Woman’s Place.
Emergence 3/1: 61–76. Harper and Row.
Byrne, D. 2002a. Interpreting Quantitative Data. Larsen-Freeman, D. and L. Cameron. 2008.
Sage. Complex Systems and Applied Linguistics. Oxford
Byrne, D. 2002b. ‘Platonic forehand versus University Press.
Aristotelian smash—the use of computers as Layder, D. 1997. Modern Social Theory: Key
macroscopes in knowing the social world,’ Debates and New Directions. UCL Press.
International Journal of Social Research Meyerhoff, M. 1996. ‘Dealing with gender
Methodology 5/1: 61–69. identity as a sociolinguistic variable’
Cameron, D. 2005. ‘Language, gender and in V. L. Bergvall, J. M. Bing, and A. F. Freed
sexuality,’ Applied Linguistics 26/4: 482–502. (eds): Rethinking Language and Gender
Carter, B. and Sealey, A. in press. ‘Reflexivity, Research: Theory and Practice. Addison Wesley
realism and the process of casing’ in D. Byrne Longman.
and C. C. Ragin (eds): Handbook of Case-based Moon, R. 2007. ‘Words, frequencies, and texts
Research Methods. Sage. (particularly Conrad): A stratified approach,’
Carter, R. 2004. Language and Creativity: The Journal of Literary Semantics 36: 1–34.
Art of Common Talk. Routledge. Ochs, E. 1992. ‘Indexing gender’ in A. Duranti
Craib, I. 1998. Experiencing Identity. Sage. and C. Goodwin (eds): Rethinking Context:
Dick, M. (ed.) 2002. Millennibrum: Bringing Language as an Interactive Phenomenon.
Birmingham History to Life CD-ROM. Cambridge University Press.
Birmingham City Council. ONC (Office for National Statistics) http://www.
Downward, P. and A. Mearman. 2007.
‘Retroduction as mixed-methods triangulation current/SOC2000/about-soc2000/index.html.
in economic research: Reorienting economics Accessed July 2008.
into social science,’ Cambridge Journal of Pavlenko, A. 2007. ‘Autobiographic narratives
Economics 31: 77–99. as data in applied linguistics,’ Applied
Grice, H. P. 1975. ‘Logic and conversation’ Linguistics 28/2: 163–188.
in P. Cole and J. L. Morgan (eds): Syntax and Pawson, R. 2002a. ‘Evidence-based policy: The
Semantics; Vol 3: Speech Acts. Academic Press. promise of ‘realist synthesis’ ,’ Evaluation 8/3:
Halliday, M. A. K. 1989. ‘Towards probabilistic 340–358.
interpretations’ in E. Ventola (ed.): Functional Pawson, R. 2002b. ‘Evidence and policy and
and Systemic Linguistics: Approaches and Uses. naming and shaming,’ Policy Studies 23/3–4:
Mouton de Gruyter. 211–230.
Halliday, M. A. K. 1991. ‘Corpus studies and Ragin, C. C. 1987. The Comparative Method:
probabilistic grammar’ in K. Aijmer and Moving Beyond Qualitative and Quantitative
B. Altenberg (eds): English Corpus Linguistics. Strategies. University of California Press.
Longman. Rayson, P. 2008a. Wmatrix: A Web-Based Corpus
Holmes, J. 1997. ‘Women, language and Processing Environment. Computing Department,
identity,’ Journal of Sociolinguistics 1/2: Lancaster University.
195–223. wmatrix/

Rayson, P. 2008b. ‘From key words to key Sealey, A. and B. Carter. 2004. Applied
semantic domains,’ International Journal of Linguistics as Social Science. Continuum.
Corpus Linguistics 13/4: 519–550. Sinclair, J. 1991. Corpus, Concordance, Collocation.
Rayson, P., D. Archer, S. L. Piao and Oxford University Press.
T. McEnery. 2004. The UCREL semantic Sinclair, J. M. 2004. Trust the Text: Language,
analysis system. In Proceedings of the Workshop Corpus and Discourse. Routledge.
on Beyond Named Entity Recognition Semantic Sperber, D. and D. Wilson. 1986. Relevance.
Labelling for NLP Tasks in association with 4th Blackwell.
International Conference on Language Resources Stubbs, M. 1996. Text and Corpus Analysis.
and Evaluation (LREC 2004), 25th May 2004, Blackwell.
European Language Resources Association, Stubbs, M. 2001. ‘Texts, corpora and problems

Downloaded from at Islamic Azad University on September 3, 2010

pp. 7–12. of interpretation,’ Applied Linguistics 22/2:
Rumelhart, D. E. 1975. ‘Notes on a schema for 149–172.
stories’ in D. G. Bobrow and A. M. Collins Tannen, D. 1994. Gender and Discourse. Oxford
(eds): Representation and Understanding: Studies University Press.
in Cognitive Science. Academic Free Press, Uprichard, E. and D. Byrne. 2006. ‘Representing
pp. 211–236. complex places: A narrative approach,’
Rumelhart, D. E. 1984. ‘Schemata and the Environment and Planning A 38: 665–676.
cognitive system’ in R. S. Wyer and T. Upton, T. A. and U. Connor. 2001. ‘Using com-
K. Srull (eds): Handbook of Social Cognition. puterized corpus analysis to investigate the
Lawrence Erlbaum, pp. 161–88. textlinguistic discourse moves of a genre,’
Scott, M. 2004. WordSmith Tools Version 4. English for Specific Purposes 20/4: 313–329.
Oxford University Press. Weatherall, A. 2002. Gender, Language and
Sealey, A. 2007. ‘Linguistic ethnography in Discourse. Routledge.
realist perspective,’ Journal of Sociolinguistics Williams, M. 2002. ‘Generalization in inter-
11/5: 641–660. pretive research’ in T. May (ed.): Qualitative
Sealey, A. and B. Carter. 2001. ‘Social cate- Research in Action. Sage.
gories and sociolinguistics: Applying a realist Wortham, S. 2000. ‘Interactional positioning
approach,’ International Journal of the Sociology and narrative self-construction,’ Narrative
of Language 152: 1–19. Inquiry 10: 157–184.