You are on page 1of 21

from at Islamic Azad University on September 3, 2010applij.oxfordjournals.orgDownloaded

Applied Linguistics: 31/2: 215–235


Oxford University Press 2009

Advance Access published on 18 June 2009

Probabilities and Surprises: A Realist Approach to Identifying Linguistic and Social Patterns, with Reference to an Oral History Corpus


University of Birmingham, UK

The relationship between language and identity has been explored in a number of ways in applied linguistics, and this article focuses on a particular aspect of it: self-representation in the oral history interview. People from a wide range of backgrounds, currently resident in one large city in England, were asked to reflect on their lives as part of a project to celebrate the millennium, resulting in a corpus of 144 transcribed interviews. The article considers the utility of realist social theory and complexity theory in the analysis of patterns—and deviations from those patterns—in both the linguistic features of these inter- views and the social categories to which people are routinely ascribed. Corpus linguistic software was used to identify discourse features of the corpus as a whole, and to compare and contrast features produced by different speakers with reference to the conventional social categories used in quantitative research. These categories, with their homogenizing limitations, are challenged with reference to complex causation. The article uses the category of gender to exemplify the multi-method approach advocated.


This article is concerned with probabilistic patterns, including deviations from them, in social behaviour in general and in language use in particular. The approach conceives of social and linguistic processes as complex, agency- driven, and susceptible to changing kinds of analysis as computer technology develops. The data used to illustrate the discussion are a corpus of 1.8 million words of transcribed speech, comprising 144 oral history interviews. As exam- ples of discourse, they constitute a rich resource for exploring the probabilis- tic—but not determined—linguistic patterns to be found in a set of texts that have much in common with each other but that are each nevertheless unique; each interviewee demonstrates the ever-present potential for linguis- tic creativity while simultaneously contributing to the collective entity that emerges as ‘the discourse of life histories’. Located in between the com- posite identity of the whole set and the unique identity of the individual inter- view are some patterns associated with the speakers’ membership of various sub-categories, identifiable from both social and linguistic analysis.

from at Islamic Azad University on September 3, 2010applij.oxfordjournals.orgDownloaded


The original aim of the project within which these interviews were gener- ated was not specifically to contribute to linguistic research. Rather, it was to record, from a socio-cultural perspective, the accounts of a diverse range of residents of the large English city of Birmingham, in order to recognize and celebrate the experiences of its citizens. From this angle, the corpus represents a potential means of understanding complex social processes, where, again, each individual is a social actor contextualized in a complex web of structured social relations. As Uprichard and Byrne (2006: 668) put it, ‘narratives are descriptions not of single systems but of the interweaving of complex systems,’ because, ‘[p]eople never tell just the story of their own life; nor do they project simply in terms of themselves. All lives are embedded in the social; there is no personal without the social’. The approach taken here aims to bring together insights from both realist social theory and applied corpus linguistics. From the former comes an empha- sis on the duality of structure and agency, along with the recognition that while social actors have interests and intentions, their scope for realising these is constrained. As Archer (2000: 262) expresses it, ‘Because of the pre- existence of those structures which shape the situation in which we find ourselves, they impinge upon us without our compliance, consent or compli- city’. From the latter comes the recognition that in making meaning in speech and writing, human beings are both constrained and enabled by the linguistic choices available to them. That is, in the formulation closely associated with Halliday (e.g. 1989, 1991), instances of discourse represent choices from within the language system, and, as Sinclair (1991) suggests, speakers deploy ready-made sequences (the ‘idiom principle’) but remain able to exercise ‘open choice’ as well—although if speakers are to be readily understood, they are constrained to choose from within pre-existing grammatical and semantic resources. These observations are consistent with a view of the social, discursive world as systematic—patterned and often predictable, but where the systems in play are open and dynamic, with human meanings and human agency not only reproducing familiar patterns, but also generating novelty and surprise. The approach to research explored in this article is concerned with both trends and probabilities on the one hand, and variation on the other, in relation to both biographical experience and the discursive representation of that experience in the corpus of 144 transcribed oral history interviews. Which people, with which kinds of attributes, use language in similar/ contrasting ways to tell their life histories? The article concludes with a case study of a sub-group identified by the method presented.

The data

The interviews which comprise the data in this study were recorded in 2000– 2001 by two oral historians as part of the ‘Millennibrum’ Project and deposited in the Local Studies and History section of Birmingham Central Library.

from at Islamic Azad University on September 3, 2010applij.oxfordjournals.orgDownloaded



The aim was to preserve the narrative accounts of a diverse range of residents of the city at the turn of the millennium, for local people to participate ‘in presenting and recording their experiences, beliefs, contributions to the community and hopes for the future’ (Dick 2002). Of these 150 interviews, each lasting up to 90 minutes, 144 have been released for further research in accordance with the ethical consent procedures employed by the library. For each interviewee, information is available about place of birth, age, sex, occu- pation, level of education, marital status, and religious affiliation—all topics which are usually covered in the interviews themselves. The interviews were transcribed as part of the original project, in which

I was not directly involved, but (with the support of a small grant) I oversaw the post-editing of the texts to make them suitable for corpus analysis. This included checking samples of the transcripts against the sound recordings, standardizing transcript conventions, indicating speaker turns, so that the (usually brief) contributions of the interviewer could be excluded from the analysis when desirable, and anonymizing the interviews and the spread- sheet of meta-data which records demographic details such as age, sex, place of birth, and so on. The oral historians who conducted the interviews are skilled in encouraging people to articulate their memories, views, and beliefs. The interviews were conducted, by one of the two interviewers, usually in the interviewees’ homes, lasting up to 90 minutes, so that the full corpus represents about 250 hours of recording. During the conversation, both interviewer and interviewee held

a postcard, with a few single-word subject headings on it. The topics covered

in the interviews varied slightly, but generally included, to reflect the funders’ goals for this project: the interviewees’ childhood memories and experiences of school; first experiences of work and subsequent jobs; family life before leaving home, and relationships with parents and siblings; adult relationships, includ- ing courtship, marriage and, in some cases, the breakdown of relationships; experiences of moving and migration; parenting and hopes for their children’s future. In addition, interviewees were asked about the role of religion in their lives, about social changes they had witnessed in their own lifetimes, and particularly about changes in Birmingham they had noticed during the time they had lived in the city. All had the option, of course, to omit any of these topics from their account if they wished. Despite the similarities in themes, each interview is a record of a specific social interaction, and each interviewee interprets this in his or her own way. Interviewees inevitably make judgements about the interviewer and her expectations, including about how far she shares their knowledge about the things they reference. For example, interviewees who are much older than their interlocutor, or who have lived in places beyond Birmingham, tend to assume that some of their experiences will be unfamiliar to the inter- viewer, and so they explain them in greater detail. In addition, these inter- views cannot be neutral descriptions, or representations, of each ‘self’ and its history, as they are interactional tellings, produced in a context of

from at Islamic Azad University on September 3, 2010applij.oxfordjournals.orgDownloaded


interpretation and negotiation (Wortham 2000; Pavlenko 2007). Nevertheless, there is sufficient homogeneity about the interviews for them to have certain features in common, including linguistic features.


The shared linguistic features of the corpus have been identified using a range of techniques, including software developed specifically for the analysis

of electronic corpora, mainly the suite of applications, WordSmith Tools (Scott 2004), and WMatrix (Rayson 2008a). From a purely lexical point of view, the detailed consistency wordlist generated by WordSmith Tools can be con- sulted to identify those items that are found in all—or the majority—of the 144 transcripts. There are 46 items that occur in every text, of which

a large proportion, predictably, comprises grammatical words. Of the rest,

all are core vocabulary, but a glimmer of the ‘genre’ of ‘life history’ is suggested by the list: good, home, know, like, old, school, see, still, things, think, time, way . More revealing, perhaps, is the list of frequent n-grams (i.e. sequences of ‘n’ consecutive characters, including spaces, such that recurring ‘chunks’ of text longer than single words can be identified). Strings of three words occurring in 130 or more of the texts include: a lot of/to go to/I went to/and it was/it was a/and I was/when I was/one of the/I don’t know/there was a/and that was/I used to/and I think/I think it/I was born/I had a/but it was/and there was . Strings of four words occurring in 100 or more of the texts include: I think it was/I was born in/and I used to/and we used to . Two features are striking about these lists, both, unsurprisingly, concerned with the articulation of memories. There is a preponderance of expressions concerned with past time, and, in

the use of ‘private’ verbs, some hints of tentativeness about relating what

happened (‘I think’, ‘I don’t know’). A third way of determining what is distinctive about these texts as a whole

is by comparing this corpus with a reference corpus. Different analysts use

slightly different calculations to identify items which are statistically signifi- cantly more frequent in the target corpus than the reference corpus, but both Scott’s (2004) WordSmith Tools and Rayson’s (2008a) WMatrix use log likeli-

hood (LL) values to generate lists of ‘key’ items. In this operation, ‘the word that has the most significant relative frequency difference between the two corpora’ has the largest LL value, and thus ‘the words most indicative (or distinctive) of one corpus, as compared to the other corpus, occur at the top

of the list’ (Rayson 2008b: 528).

In the present study, WMatrix was used to identify which items are ‘key’ in comparison with the Spoken Data in the BNC Sampler Corpus. The top 38 items (those with a log likelihood value of more than 450) include: was, and, my, $place (the replacement term, to preserve anonymity, for specific places mentioned in the interviews) used_to, had, school, because, Birmingham, I, were, very, people, me, went, am, mom, remember, in, in_those_days, children, years, family, to, had_to, father, life, always, mother, friends, university, parents,

from at Islamic Azad University on September 3, 2010applij.oxfordjournals.orgDownloaded



3, 2010 Downloaded A. SEALEY 219 Figure 1: Screenshot of part of the output from

Figure 1: Screenshot of part of the output from the ‘domain cloud’ identifi- cation by WMatrix of semantic themes in the oral history corpus compared with the BNC Sampler Spoken Corpus

road, at_the_time, worked, at, quite, city . The semantic fields connoted by the more content-carrying of these items is fairly self-evident, and accords with the topics chosen for this project, but another way of representing them is using the semantic ‘domain clouds’ available in WMatrix (Figure 1). The WMatrix tool can be used to illustrate the relative frequency differences between the Millennibrum corpus and a reference corpus in a similar manner to the ‘tag clouds’ employed in some social networking web sites. In those, ‘an alphabetically sorted list of words (confusingly for this context called tags) are shown in a larger font if they are (manually) assigned more

frequently to shared digital photographs

’ (Rayson

2008b: 533). WMatrix incorporates the USAS tagger (Rayson et al. 2004, cited in Rayson 2008b), which automatically assigns semantic fields (domains) to each word or multiword expression in a corpus. The clouds it produces use larger fonts to indicate greater keyness, so that, in this case, the semantic domains of ‘moving,_coming_and_going’ and ‘personal_relationship:_ general’, among others, are shown to be significantly more frequent in this corpus than in the spoken component of the BNC. From each of these sets of findings, we can conclude that there is

a commonality about these texts, despite the fact that they represent 144 different and, in some ways, strikingly contrasting life stories. Among these interviewees are individuals who are young and old (their years of birth span 1896–1985), migrants from across the world and ‘Brummies born-and-bred’, employees from all the major occupation categories of the SOC2000 classifica- tions (ONC 2000), conventional heterosexual family members and people

or web site bookmarks

from at Islamic Azad University on September 3, 2010applij.oxfordjournals.orgDownloaded


with ‘alternative’ lifestyles—and yet all have produced a text in this context which contributes to a patterned discourse with identifiable features.


One way to account for this relative homogeneity is with reference to complex systems theory, whose application to empirically observed language behaviour has been most thoroughly developed by Larsen-Freeman and Cameron (2008). From this perspective, the production in real time of inter- actions such as these oral history interviews is accomplished by the speakers within a nested complex of contexts, by means of a potentially extensive set of linguistic resources that are nevertheless constrained in various ways. As outlined elsewhere (Sealey and Carter 2004; Carter and Sealey in press), these interactions can be conceptualized and described from different analyt- ical perspectives. Each speaker has their own ‘psychobiography’ (Layder 1997), the unique complex of experiences—and reflections on those experi- ences—that are ‘integral to the experiencing individual’, each person’s ‘indi-

vidual truth’ (Craib 1998: 31). (In the case of these interviews, however, the interviewer takes steps to downplay, or mute her individuality and provide maximum space for the production of the interviewee’s contributions.)

A person’s lifespan may be considered in terms of the ‘ontogenetic timescale’

(Larsen-Freeman and Cameron 2008: 168–9), whereas each moment of

‘online talk’ is experienced on the ‘microgenetic timescale’. In the collabora- tive event that is the interview, which in Layder’s (1997) classification occu- pies the ‘domain’ of ‘situated interaction’, the interlocutors ‘soft assemble their contribution’, as Larsen-Freeman and Cameron (2008: 169) express it. ‘Soft assembly,’ they explain, ‘describes an adaptive action in which all aspects of context can influence what happens at all levels of activity’. Operating on different timescales still, but influencing each interview—and

to some extent acting as components of them—are what Layder (1997) terms

the ‘domains’ of social settings and contextual resources, highlighting

the socio-historical context of structured social and economic relations. This approach is compatible with Craib’s (1998: 28) acknowledgement of ‘the wider

social structure and

the stage on which we act out our lives’. Larsen-Freeman and Cameron (2008:

169), similarly, situate interactions within various ‘levels’ of social systems,

from the dyad of this kind of interaction, to ‘socio-cultural groups and institu- tions of various types and sizes all the way up to the society of the speech community’. These insights from realist social theory and complexity theory present us with the beginnings of an explanation for the findings from this corpus analysis

the wider historical processes which provide us with

of linguistic features that are common to these examples of the life histories

collected in one large English city at the turn of the millennium. Out of all the possible human sounds these speakers might have recorded, only a circum- scribed sub-set of words, phrases and grammatical constructions, intelligible

from at Islamic Azad University on September 3, 2010applij.oxfordjournals.orgDownloaded



to speakers of English, was actually drawn on. From within that stock of resources, a further sub-set accounts for all of what was produced, and this can be partially explained with reference to insights from psycholinguistics (e.g. scripts, schemas, and the ‘economic’ advantages of processing expected, rather than novel, input and output; e.g. Rumelhart 1975; 1984) and prag- matics (e.g. the co-operation principle (Grice 1975) and theories of relevance (Sperber and Wilson 1986)); also helpful are sociologically derived under- standings about sedimented patterns of social behaviour, with their parallels in corpus linguistic findings about ‘discourse patterns’ that cluster around the ‘meanings [that] are repeatedly expressed in a discourse community,’ and that work by ‘filtering and crystallizing ideas, and by providing pre- fabricated means by which ideas can be easily conveyed and grasped’ (Stubbs 1996: 158). One indicative finding here is that, from the total vocab- ulary stock of the English language, just 24,276 items (types) feature in this corpus of just under 1.8 million words (using WordSmith Tools ’ Wordlist func- tion, which also calculates the standardized type/token ratio for this corpus as 32.89). In the account I am seeking to provide, the interviews represent examples of discourse as a complex system, which, as Larsen-Freeman and Cameron (2008: 175) suggest:

invokes the image of a multidimensional landscape with

hills and valleys, over which the system roams, creating a


The landscape represents the probabilities of various

modes or phases of discourse behaviour, and the trajectory is carved out as a particular conversation moves from one mode to another.

Crucial to the description I am proposing is the centrality of human agency, where each interviewee, in the context of the complex, dynamic, interactional processes outlined above, makes choices about how to formulate this specific account of their life experience. As Craib (1998: 28) puts it, ‘We are each of us given a starting point and we do something with it.’ If the generic oral life history has certain features, then, in accordance with the fractal patterns typically found in complex systems (Larsen-Freeman and Cameron 2008:

109–111), each individual instance has its own pattern that potentially mimics the overall composite on a smaller scale.


By means of corpus analysis, it is possible to identify both similarities and differences between the whole set of interviews and each individual example, and also to find patterns among sub-sets of the corpus as a whole. In the tradition of quantitative, variationist analysis, the conventional approach involves predetermining which demographic ‘variables’ are likely to correlate with the differential use of particular linguistic features, and to use some kind

from at Islamic Azad University on September 3, 2010applij.oxfordjournals.orgDownloaded


of regression analysis to identify correlations. There are several problems with this, however, given the claims made above about the nature of social and linguistic phenomena as complex systems, and the notion of causality implied by the theory deployed here. Models which assume that speaker char- acteristics are ‘independent variables’, with linguistic features as ‘dependent variables,’ imply a linear model of causality. Such models do not allow for the interaction between the variables, and they do not model well the dynamic, systems-based realities with which we are concerned in this kind of data. The limitations of linear modelling are succinctly summarized by Byrne (2002a: 112):

It may well have a more important role in relation to the interpre- tation of the products of experiments in domains where experi- ments are useful, but when we deal with the products of surveys, with accounts of the world as becoming—that is, as a set of nested complex evolutionary systems, as inherently dynamic—then mun- dane exploration is as far as it goes.

Methods of analysis that are suitable for data collected in experiments are unlikely to be appropriate for the analysis of linguistic data such as the inter- view transcripts described above. For example, derived from an experimental paradigm, sex (or gender) would be thought of as a ‘variable’; the interviews of the male speakers would be separated from those of the females and compared with them on various linguistic dimensions. Indeed, Cameron (2005: 484), in the context of advocating a post-modern perspective on gender, maintains that even a ‘modern feminist approach’ to language and gender research ‘presupposes the existence of two internally homogeneous groups, ‘‘men’’ and ‘‘women’’, and looks for differences between them’, and the Keyword tool in WordSmith Tools allows a corpus analyst, should they choose, to identify words which occur with significantly greater frequencies in either the men’s or the women’s interviews. Results from this operation could readily be used to confirm gender stereotypes, as Table 1 illustrates. While the definition and interpretation of this version of ‘keyness’ are not without their critics (e.g. Moon 2007), it is clear that this variable-based analysis does tell us something about the data. It is a fairly blunt instrument, however, and is very prone to the methodological implication—even if this is not explicit—that there is some kind of causal link between the attribute of being male (or female) and using, in the context of telling your life story, more words from the domains of work and sport (or of the home and relation- ships). As previous researchers into language and gender have suggested, there may indeed be correlations between gender and conversational topic (e.g. Kipers 1987; Bischoping 1993; Tannen 1994)—and hence contrasting frequen- cies of words from particular domains. However, the realist social theory with which this analysis is concerned would seek to do something rather different from establishing such tendencies. For an explanation, rather than

from at Islamic Azad University on September 3, 2010applij.oxfordjournals.orgDownloaded



Table 1: Selected lexical items from the top of the lists of words defined by WordSmith Tools as key in a comparison of the male and female interviews





















































a description, of the variability in these different speakers’ accounts of their lives, a different methodology is called for. This is because:

In the realist frame of reference we do not see causes as single factors whose presence inevitably generates an effect and whose absence means that the effect does not occur. Rather cause is a property of complex and contingent mechanisms in reality and such mechanisms, moreover, are not universal but only relatively permanent—inherently local. (Byrne 2002a: 105)

Byrne’s research orientation derives from the sociological realism stimulated by Bhaskar and developed in various ways by Archer, Layder, Pawson, Sayer,

and others. These writers ‘reject the conception of society

as [a] closed

system, arguing instead that reality is a structured open system’ (Downward and Mearman 2007: 87). There is not the space here to elucidate the theoret- ical underpinnings of this approach (for further discussion see Sealey and

Carter 2004; Sealey 2007), but an example of the method in action is instructive.


Realist explanations tend to be ‘theory driven approaches’ which seek to understand the mechanisms which lead to differential outcomes— why , for example, social or educational policy interventions turn out differently, appearing to ‘work’ for some people and not others, and in some circumstances and not others.

from at Islamic Azad University on September 3, 2010applij.oxfordjournals.orgDownloaded


An excellent summary and explanation of this approach is found in Pawson (2002a). In public policy, meta-analyses of previous studies are often appealed to for answers to the question ‘what works’, but this is refor- mulated by Pawson as ‘what works for whom in what circumstances?’. He advocates a ‘realist synthesis’ perspective, according to which:

it is not ‘programmes’ that work: rather it is the underlying reasons or resources that they offer subjects that generate


tive are acted upon depends on the nature of their subjects and

Whether the choices or capacities on offer in an initia-

the circumstances of the initiative

(Pawson 2002a: 342)

Pawson describes the pattern in policy-making whereby a particular approach is tried in some area of public policy—such as the ‘naming and shaming’ of offenders, for example—and is then adopted in a whole range of other areas, as though the policy, the naming, and shaming, is causally effective (see also Pawson 2002b). This fallacious reasoning can lead to contradictory outcomes: for example, the car manufacturer whose products are publicized for their poor safety or security records is quick to improve their performance (‘successful’ naming and shaming), whereas the protesters who refuse to pay the community charge (a controversial local tax) find welcome celebrity in publicity surrounding their court appearances and are not persuaded to change their behaviour (‘unsuccessful’ naming and shaming). Pawson’s approach provides a persuasive explanation of the discrepancies, because it:

adopts a ‘generative’ understanding of causation. What this

tries to break is the lazy linguistic habit of basing evaluation on the question of whether ‘programmes work’. In fact, it is not programmes that work but the resources they offer to enable their subjects to make them work. This process of how subjects interpret the intervention stratagem is known as the programme ‘mechanism’ and it is the pivot around which realist evaluation

(Pawson 2002a: 342)


Studies of the social world from this kind of perspective offer an explanatory power that those based on variables do not. Applied linguists will be very familiar with studies designed to identify whether particular language teaching strategies ‘work’—and with the range of conflicting findings that they gener- ate. The realist approach, by contrast, does not begin with social categories decided a priori —gender (or sex), or ethnicity, or age-group or social class. Instead, it assumes that social phenomena are characterized by processes and relations, and therefore ways are sought to investigate and describe these and their effects. If social phenomena are complex systems, then they:

are not made up of pre-existing variables, although we can properly describe them through the measurement of variate traces. In realist terms, the traces are actual things in the world

from at Islamic Azad University on September 3, 2010applij.oxfordjournals.orgDownloaded



that are the products of the generative real system, and the inte- rior working of the system is not reducible to elements existing separate and analyzable outside the system. (Byrne 2001: 64)

It is appropriate at this point to summarize the methodological implications of the social realist theory within which the present study is situated. From Pawson’s work (which is predominantly concerned with the evaluation of interventions in social policy), an obvious priority is the identification of con- texts and mechanisms to explain stability and variation in outcomes. The interpretations of experience by social actors may be consistent or varied, and patterns in both experience and interpretation are sought in the research process. According to Byrne (2001), each of the following is important:

(i) exploration, ‘which involves both descriptive measurement of variate traces of complex systems and examination of the patterns generated by those measurements,’ including ‘the exploration of qualitative materials presented as texts or in other documentary forms’; (ii) classification, including sorting things into kinds, ‘using, inter alia, numerical taxonomy procedures’, as well as ‘the identification, however temporary, of what constitutes mean- ingful boundaries’; (iii) interpretation—both of measures and of ‘ ‘‘natural language’’ descriptions of qualitative form’; (in relation to this point, Byrne stresses that he is not referring to ‘post-modern eclecticism, but rather to the originary conception of hermeneutics in which there is a search for mean- ing as truth’; (iv) ordering, so that things are ‘sorted and positioned along a dimension of time’.


Readers who are familiar with corpus linguistics may identify some echoes here in the challenge to traditional descriptions of language that its methods and findings have generated. Language corpora (large collections of authentic writing or transcribed utterances, electronically stored) are analysed using dedicated software that reveals patterns not usually available from intuition, introspection or even text-by-text discourse analysis. Reliance on intuition rather than on the large quantities of empirical evidence available in the corpus, says Sinclair, meant that the pre-corpus situation in linguistics ‘was similar to that of the physical sciences some 250 years ago’ (1991: 1). In social science, maintains Byrne (2002a: 63), ‘our science for 300 years has been a science of analytical reduction to the simple. Now we can address the complex’, as computers ‘enhanc[e] our capacity to explore’—as they do with language. As realist theorists do not expect to observe directly the gen- erative mechanisms responsible for differential social behaviour, so corpus linguists are familiar with the ‘tools of indirect observation’ (Sinclair 2004:

189) in their analytic software.

from at Islamic Azad University on September 3, 2010applij.oxfordjournals.orgDownloaded


The social researchers on whose work I am drawing are sympathetic to many different research methods, and several stress the advantages of moving between statistical analysis and local interpretation. For social scien- tists who want to know the reasons for patterned behaviours, ‘there is no logical difference between the work of an interpretive researcher conducting detailed observations of a social setting and a large scale national survey’ (Williams 2002: 128). Byrne (2002b: 68–69) sees a particular role for informa- tion technology here, ‘as a cybernetic extension of human cognitive capacity which enables us to get to grips with the big pictures of macroscopic social change through the management and interpretation of both quantitative and qualitative data flows’. Similarly, corpus linguists often emphasize the importance of moving between the corpus and the instance, combining auto- mated searches and interpretive analysis (e.g. Upton and Connor 2001; Hunston 2002; Carter 2004: 219–221). Where conventional accounts of language systems separate lexis and syn- tax (‘a highly generalized formal syntax, with slots into which fall neat lists of words’ (Sinclair 1991: 108)), corpus analysis, which allows the researcher

to review a lot of linguistic evidence at once, challenges traditional categories including word classes (or ‘parts of speech’). It also identifies trends and probabilities, which, according to Stubbs (2001) are attributable to the con- straints that derive from both linguistic and social expectations. While not

mean that, although we are in principle

deterministic, these constraints ‘

free to say whatever we want, in practice what we say is constrained in many ways. The main evidence for these constraints comes from observations of what is frequently said, and this can be observed, with computational help, in large text collections’ (Stubbs 2001: 19). de Beaugrande (1997: 130) posits something similar—‘a convergence of data making some meanings or understandings much more probable than others’, and an explanation for ‘the rich global complexity of real language data’ as generated by ‘the interaction of multiple local constraints that are essentially simple’ (de Beaugrande 1999:

131). Complexity, dynamic processes, and relational units of analysis are critical to understanding both linguistic and social systems—yet both demon- strate stability: ‘as insights from corpus linguistics show, the stabilities that speakers employ are diverse—words, phrases, idioms, metaphors, non- canonical collocations, grammar structures—a much more complex and diverse set of language-using patterns than the ‘core grammar’ of formal approaches’ (Larsen-Freeman and Cameron 2008: 99). I am suggesting, then, that there are some parallels to be drawn between developments in researching social and linguistic processes, respectively, with reference to realist social theory and methodologies that are consistent with it. This is, of course, not to be blind to the dangers of ‘illegitimate cross- disciplinarity’, and indeed, as I have noted elsewhere (Sealey 2007), since different kinds of things have different properties and propensities, it is impor- tant to be mindful of those characteristics of language that are distinctive from those of social actors and of social structures. Nevertheless, the enterprise

from at Islamic Azad University on September 3, 2010applij.oxfordjournals.orgDownloaded



described here aims to explore how far the analysis of a corpus of transcribed oral histories can be enhanced by the application of the integrative method- ological approaches advocated by social realism within a corpus linguistic framework.


One of the strategies proposed by Byrne is to start from the case, and to find ‘ways of sorting cases into categories’, looking for ‘category sets which emerge from the exploration of our data’ (Byrne 2002a: 100). With corpus data, such an approach can be achieved in various ways, and indeed Byrne’s

injunction is consistent with Sinclair’s plea to ‘trust the text,’ including allow-

show us things that we may not already know and

even things that shake our faith quite a bit in established models, and which may cause us to revise our ideas very substantially’ (Sinclair 2004: 23). Sinclair’s corpus-driven approach to language analysis mistrusts the premature tagging of corpus texts because of the danger of imposing prior models (or, as Byrne might say, ‘category sets’) and obscuring ‘the clarity of the categories in the data’ (Sinclair 2004: 191). The demographic categories associated with the corpus in the present study should therefore be treated with some caution—although it has to be noted that it is challenging to find useful group- ings that maintain an appropriate balance between the distortion of over generalization and the unwieldy and unhelpful method of treating each case as unique. (A similar problem faces lexico-grammarians trying to be sensitive to specific patterning while simultaneously identifying groups with similar characteristics (Hunston and Francis 2000).) The longer term goal of the work under discussion is to combine corpus analysis and realist-derived sociological analysis through software adapted to accommodate both kinds of data. In the meantime, the approach is exem- plified on a more modest scale using currently available tools. As a first step to a case-oriented methodology, the demographic meta-data about the 144 interviewees is entered into a table, with each row representing a case. This is consistent with Ragin’s (1987) advocacy of the comparative method to ‘compare cases with different combinations holistically’ (p. 101).

‘The case-oriented approach,’ he maintains, ‘

allows investigators to com-

ing computers to ‘

prehend diversity and address causal complexity’ (p. 168). The next stage is to make use of established corpus analytic techniques to identify potentially fruitful areas of inquiry. To illustrate this, let us return to the contrasting keywords between male and female interviewees. A lexical word identified in this search was lovely , used four times as often by the female speakers as by the men, at 441 occurrences to the males’ 110. It is customary for quantitative researchers to stress that such findings indicate an associa- tion and not an explanation, and yet the inference that the attribute of being female is in some way causally linked with this language practice is not unreasonable. However, if being female was a direct ‘cause’ of the frequent

from at Islamic Azad University on September 3, 2010applij.oxfordjournals.orgDownloaded


use of this word, then no men would use it often, and all the women would. This is patently absurd, but we can look somewhat differently at this correlation. Counter-posed to the implicit determinism of the variables-based approach are the more interpretive research traditions. Instead of inferring that posses- sion of the attribute [variable] female leads to an increased use of items such as lovely , some analysts would stress the performative dimension of gender through language (e.g. Ochs 1992; Meyerhoff 1996; Holmes and Meyerhoff 1999; Weatherall 2002). Holmes (1997: 203), for example, suggests that atten- tion should be on ‘the linguistic realizations of gender’, through an examina- tion of ‘the way individuals express or construct their gender identities in specific interactions in particular social contexts’. More ethnographic, qual- itative studies, such as those advocated by Holmes, focus on ‘how people use language to create, construct, and reinforce particular social identities’ (Holmes 1997: 204)—and women’s frequent use of positive evaluators such as lovely may well index femininity (cf. Lakoff 1975). However, such approaches are rarely concerned with the probabilities identified in quantita- tive analysis, and, while they do shed light on the local development of com- munities of (more or less gendered) practices, their accounts of speakers resisting the reproduction of gendered patterns of language behaviour tend not to propose causal explanations. Returning to the individual cases in my corpus, we can look more closely at the counter-examples, the cases that seem to buck the trend—rather than regarding them as ‘outliers’, as conventional statistical procedures would require. These ‘outliers’ include, from the 16 speakers who use lovely in their interviews most frequently (10 times or more), the two who are men. From a case perspective, it transpires that there are similarities across several dimensions. Both were born in Birmingham in the 1920s; neither continued their education beyond secondary school; both are fathers to two or more children and both are married. Further exploration could establish whether there are additional features, linguistic or demographic, which would identify reasons to see these speakers as belonging to a sub-group linked in meaningful ways; alternatively, it may be that choosing lovely to describe elements of their experience, when most men do not do so in these interviews, is a weak link, not reinforced by any others. Looking at the same initial findings from the other direction, so to speak, it transpires that a significant number—59—of the 144 speakers fail to use the word lovely in their interviews at all. Are these non-users all men? The majority—44—are, but the remaining 15 are women. From a case-based perspective, we can investigate whether these speakers have anything in common, looking for both linguistic and demographic patterns. One possibility that suggests itself, of course, is that these particular women have not experienced anything that could be described as ‘lovely’. Perhaps (their accounts of) their biographies are distinctively negative. To investigate this possibility, I used WMatrix to compare each of these 15 texts

from at Islamic Azad University on September 3, 2010applij.oxfordjournals.orgDownloaded



with the corpus as a whole, using the semantic domains tool described above. This allows the user to see in a concordance line view each of the

items assigned to the semantic domains identified by the tagger. From this,

it is apparent that some of these speakers seem to be very positive in their

narratives, using items classified in the ‘evaluation_good’ category and in the

‘happy’ category significantly more frequently than is found in the corpus as a whole. Some examples from the concordance lines for these data are given below (see Figure 2). This suggests that the interplay between these women’s oral histories and their lexical choices is more subtle than an absence of lovely simply denoting an absence of positive evaluation in general. From the case-based perspective, the table of metadata can be reviewed manually, because the number of cases is so small, although it would be desirable to use appropriate software for

a larger-scale analysis. Of all the attributes recorded in the database, just one emerges as shared among all but two of this particular group of women:

their classification by ‘marital status’ is single or widowed. The remaining two have been classified as ‘other’: one is a gay woman who describes in her interview the difficulties she has faced because of her sexuality, and

(a) else, kind of thing. We had fun when we moved here obviously so all that. I just go out for a laugh with my friends and then just g oose to go somewhere where you enjoy yourself rather than the fact of

a day out and that was really good fun because it was really relaxed

ried and having kids and being happy with it because that ’s not me

s quite good, but it was just fun but then you got sick of it in

ple that I know so it ’s quite fun in that respect but as the job we have kind of got the same sense of humour, so we hang round mostly at you want to do, so you can enjoy yourself a lot more this time in so it was quite good. It was enjoyable because it’s in the country an

(b) the drama department is brilliant at my school and if

a for a month which was brilliant and we got to travel out walking and it was brilliant and we flew &; I think was Amsterdam which was brilliant and I went to see Anne it ’s always just been brilliant , there ’s never been

(c) to accept it. And my mom who’s fab was like I don’t care

e her and she loves you. She was fabulous but found it

modation we wanted, that was the best route, doing that. award and I won which was really fabulous, probably one probably one of the best moments of my life which was

been to get there really. It was fab .

Figure 2: (a) Selected concordance lines for semantic domain of ‘happy’ in one interviewee’s transcript. (b) Selected concordance lines for semantic domain of ‘evaluation_good’ in another interviewee’s transcript. (c) Selected concordance lines for semantic domain of ‘evaluation_good’ in a third inter- viewee’s transcript

from at Islamic Azad University on September 3, 2010applij.oxfordjournals.orgDownloaded


another, who is in a stable heterosexual relationship, reflects explicitly on her resistance to getting married and to conforming to conventional gender roles. Thus the combination of these two approaches, a case-based consider- ation of the interviewees’ attributes, and a corpus-based linguistic analysis, generates a potential sub-group who share the classification ‘female’, who significantly under-use the item lovely in their interviews, relative to other females in the corpus, and who are not currently in conventional marital partnerships. A further iteration of the approach makes use of the keyword facility to compare the sub-corpus of the interviews of these women with the corpus as a whole. This makes possible, for example, the identification of other items, in addition to lovely, that are frequent in the accounts of the other female speakers, but much less so in these interviews. Items that WordSmith identifies as ‘negatively key’ (that is, significantly under-used by this sub-group in comparison with the sub-corpus of the other female interviewees) include child, mother, wonderful, husband, children —and also him, his and he . Very near the top of the list of items they use more frequently than the other female speakers is don’t, and a concordance of I don’t is interesting. There are 427 occurrences in the interview transcripts of these 15 women, many of which frequently occur in strings such as ‘I don’t know’. However, the most frequent four-word string (27 occurrences) is ‘I don’t want to’, while there are 29 occurrences of the three-word ‘I don’t like’. All speakers use one of these strings, 10 of them using only either ‘I don’t like’ or ‘I don’t want to’, and the other five using both. Looking in detail at the concordance lines for these strings, it becomes apparent that these particular women tell their life stories with repeated references to the differences between them- selves and others, or between their priorities and the expectations or priorities of the wider society. To illustrate this, just one example has been extracted of these two expressions from each of the speakers in this group who uses them (see Figure 3). This finding receives further confirmation from the analysis of semantic domains facilitated by WMatrix . Domains identified as occurring significantly more frequently in the interviews of some of these 15 speakers include ‘not part of a group’ and ‘different’. Some examples of the concordance lines classified in this way are presented in Figure 4. Much more analysis would be needed to draw any definitive conclusions, but this case-based, corpus-informed analysis has begun to identify a sub- group of speakers who tell their life stories in a way that would not have been identified using any of these analytic approaches on its own. As women, they are somewhat unusual in failing to use in their interviews such positive evaluators as lovely and wonderful . None of them is currently in a conventional marital relationship, and they recognize and describe them- selves as often different in their behaviour or outlook from the norms—though not exclusively the gender norms—presented by others.

from at Islamic Azad University on September 3, 2010applij.oxfordjournals.orgDownloaded




I don’t like the current English culture

I don’t like housework very much. I don’t like ironing, I don’t like washing

I’m very direct and I don’t like to hide because I’m not embarrassed about who I am. I’ll move away from Birmingham anyway, just because I don’t like the city.

I like open spaces but I don’t like the sea

Birmingham is a very, and I don’t like to use the word, multi-cultural environment.

I don’t like that my disability has [been?] made the subject of competitive spirit

I don’t like fighting or anything. I don’t mind drinking but I don’t like fighting, so that’s it.

I thought I am not going back to this, so I thought, no, I don’t like this

I don’t like to mix much, you know


I’ve never visited the country and I really think that I don’t want to because I’ve seen too much of a bad side

So, when he came I said I don’t believe in unions so I don’t want to join. Well most of the others had joined fair enough, people do want to get married, but personally I don’t want to waste all that money on getting married

I don’t know how you’re supposed to change that and I don’t want to change being a confident person

I don’t want to be ignored, but I don’t want to be looked after either

I was thinking but I don’t want to be like that, I want to be myself

I don’t want to set my sights too high

I didn’t want to do any of that, I just thought no, I don’t want to be anything in medical

oh, oh no, oh no, I don’t want to go to work it just feels like you are not really needed, and so whatever I decide to do, I don’t want to be a university lecturer

Figure 3: (a) Concordance lines for ‘I don’t like’ from the interviews of ten different speakers. (b) Concordance lines for ‘I don’t want to’ from the interviews of ten different speakers

In these brief extracts, I suggest, we gain some brief glimpses into what Archer (2000: 163) identifies as ‘one of the most important things to probe’, namely ‘how the self-conscious human being reflects upon his or her invol- untary placement’. A realist, agency-based perspective has no problem accounting for this complex configuration of findings as an example of people making choices from the resources and options available to them— and in a patterned way that does not necessarily conform to the rather inflex- ible categories conventionally deployed in quantitative studies. It is more con- sistent with the ‘performance of gender’ perspective familiar from qualitative research, but allows for different levels of analytic purchase on the data.


In this article I have sketched briefly the ways in which social researchers seek to account for probabilistic patterns with reference to realist social theory. This theory has several important features. It conceives of the social world as an open, complex, dynamic set of inter-related systems. Human behaviour is understood to be explicable not with reference to single causes that are effective in categorical ways, but with reference instead to nested and interacting sets of interests and circumstances, some of them involuntary and perhaps even unknown to the people affected by them, others the results of choices made with reference to what people perceive as in their interests. In this approach, social category labels are seen as neither discursive ephemera nor deterministic causal variables. Many of the categories routinely used in survey research, policy evaluations and monitoring practices are less stable than they may seem at first sight (Sealey and Carter 2001; Carter and Sealey in press). The social category used as an illustration here was that of ‘gender’. Once assumed to be an essential, deterministic attribute firmly linked to

from at Islamic Azad University on September 3, 2010applij.oxfordjournals.orgDownloaded


(a) felt like strangers in a way or outsiders because we didn’t realise

t once, because I felt such an outsider because I didn’t listen to

ulated this feeling of being an outsider because they obviously had a overwhelming sense of being an outsider and not wanting to be an

I was much more solitary in that sense and it was in 1988

I felt like even more of an outsider because I had a different perspe

(b) really. Independence was the one thing I enjoyed at s

‘t think that was right, but I personally feel being at an all girls

n all girls ‘ school I was more independent, more so than going to a

I find that I was more independent than some people that I ‘d met w

anything individually or personally but I believe there is something

o get the experience. For me, personally, a lot of people who have

e that in London, it ‘s a very lonely place . And the other thing Television and $pla and $pla as freelancer in camera production

(c) I always had a sense that I was different from everyone else from asked me to ask you why you’re different to us. And I think that colour. I knew I was different and at home, our life at home wa communicated with her family was very different to the way that I’d ifferent, you know language was different and even then, I mean I

but I knew things were different in the way that they acted within great deals of class differences and certain children doing better you get compared to the other and that might have given her a b university, that makes you different and you’re not the same person a and so I was quite horrified at other people’s personal habits, I thi

r in’92, I did n’t realise how segregated the city was and did n’t

Figure 4: (a) Selected concordance lines for semantic domain of ‘not part of a group’ in one interviewee’s transcript. (b) Selected concordance lines for semantic domain of ‘not part of a group’ in a second interviewee’s transcript. (c) Selected concordance lines for semantic domain of ‘comparing_different’ in a third interviewee’s transcript

biological sex, gender later came to be understood as a product of socialization and acculturation, while current perspectives emphasize the notion of per- formed, ‘gendered’ identities, and of diversity in place of binary difference (Cameron 2005). The detailed life histories discussed above reveal diversity

in relation not only to gender identities, but also religious affiliation, interpre- tations of ‘ethnicity’, people’s status as ‘parents’ (where the role may or may not be biological, legal, adoptive, temporary and so on) and many other social categories which are often used to assign an individual to membership of a group. Recognition of both the diversity and the commonalities that underlie such social categories raises a crucial question: to what extent are we ‘positioned’

predispose us to various courses

by ‘external socio-cultural factors which

of action’ (Archer 2000: 12), and how much scope do individuals have to choose and fashion for themselves one or more identities? Archer (2000: 13) puts it this way. ‘Society enters into us, but we can reflect upon it, just as

from at Islamic Azad University on September 3, 2010applij.oxfordjournals.orgDownloaded



we reflect upon nature and upon practice. Without such referential reality there would be nothing substantive to reflect upon; but without our reflections we would have only a physical impact upon reality.’ Contemporary research that is consistent with this theoretical outlook makes use of various methods, including the ever-increasing processing capacity of computers. The methods used are iterative, applying different scales of analysis, as is consistent with a view of the social world as comprising different kinds of entities with different properties and powers that operate on different timescales. I have suggested that language and discourse, as indispensable elements of both social processes and the researching of them, can yield particular kinds of insight, and that there are parallels between corpus-driven linguis- tic analysis and research into social processes that, like corpus linguistics, is open to the data-driven identification of categories, patterns, and probabilities. With reference to the transcribed life histories of a particular collection of people, I have demonstrated how these approaches to research have the potential to be mutually informative. The illustrations provided are nec- essarily selective and tentative, and much more will need to be achieved before the full potential of the optimum software for bringing together linguis- tic and sociological analysis is likely to be realized. For example, the ‘variable’ of sex (or gender) was chosen as a convenient initial category to interrogate, and others from the demographic data would no doubt generate equivalent lines of inquiry. The ‘positive evaluators’ chosen as the starting point for the linguistic analysis could have been substituted by any number of alterna- tive patterned linguistic features (current work in progress is looking at hedges and boosters, vague expressions, significantly frequent grammatical words and so on). And the tools of analysis, including calculators of ‘keywords’, will ideally be extended as this approach develops. Nevertheless, the potential identified for linguistic and social researchers to get the measure of their data in innovative ways is an exciting prospect.


I am grateful to Sian Roberts, Helen Lloyd, and Malcolm Dick for access to, and information about, the Millennibrum interviews, as well as to the people who recorded them for posterity. I should also like to thank Pernilla Danielsson, Paul Rayson, and the Collaborative Research Network at the University of Birmingham which funded the post-editing of the transcripts, most ably carried out by Juliet Herring. Grateful acknowledgements are also due to Bob Carter, three anon- ymous referees and the editors of the journal for helpful comments on an earlier version of this article.


Archer, M. 2000. Being Human: The Problem of Agency . Cambridge University Press.

de Beaugrande, R. 1997. New Foundations for a Science of Text and Discourse: Cognition,

from at Islamic Azad University on September 3, 2010applij.oxfordjournals.orgDownloaded


Communication, and the Freedom of Access to Knowledge and Society . Ablex. de Beaugrande, R. 1999. ‘Linguistics, sociolin- guistics, and corpus linguistics: ideal language versus real language,’ Journal of Sociolinguistics 3/1: 128–139. Bischoping, K. 1993. ‘Gender differences in conversation topics, 1922–1990,’ Sex Roles 28/1–2: 1–18. Byrne, D. 2001. ‘What is complexity science? Thinking as a realist about measurement and cities and arguing for natural history,’ Emergence 3/1: 61–76. Byrne, D. 2002a. Interpreting Quantitative Data . Sage. Byrne, D. 2002b. ‘Platonic forehand versus Aristotelian smash—the use of computers as macroscopes in knowing the social world,’ International Journal of Social Research Methodology 5/1: 61–69. Cameron, D. 2005. ‘Language, gender and sexuality,’ Applied Linguistics 26/4: 482–502. Carter, B. and Sealey, A. in press. ‘Reflexivity, realism and the process of casing’ in D. Byrne and C. C. Ragin (eds): Handbook of Case-based Research Methods . Sage. Carter, R. 2004. Language and Creativity: The Art of Common Talk. Routledge. Craib, I. 1998. Experiencing Identity. Sage. Dick, M. (ed.) 2002. Millennibrum: Bringing Birmingham History to Life CD-ROM. Birmingham City Council. Downward, P. and A. Mearman. 2007. ‘Retroduction as mixed-methods triangulation in economic research: Reorienting economics into social science,’ Cambridge Journal of Economics 31: 77–99. Grice, H. P. 1975. ‘Logic and conversation’ in P. Cole and J. L. Morgan (eds): Syntax and Semantics; Vol 3: Speech Acts . Academic Press. Halliday, M. A. K. 1989. ‘Towards probabilistic interpretations’ in E. Ventola (ed.): Functional and Systemic Linguistics: Approaches and Uses . Mouton de Gruyter. Halliday, M. A. K. 1991. ‘Corpus studies and probabilistic grammar’ in K. Aijmer and B. Altenberg (eds): English Corpus Linguistics . Longman. Holmes, J. 1997. ‘Women, language and identity,’ Journal of Sociolinguistics 1/2:


Holmes, J. and M. Meyerhoff. 1999. ‘The com- munity of practice: theories and methodologies in language and gender research,’ Language in Society 28: 173–183. Hunston, S. 2002. Corpora in Applied Linguistics . Cambridge University Press. Hunston, S. and G. Francis. 2000. Pattern Grammar: A Corpus-Driven Approach to the Lexical Grammar of English . John Benjamins. Kipers, P. S. 1987. ‘Gender and topic,’ Language in Society 16: 543–557. Lakoff, R. 1975. Language and Woman’s Place. Harper and Row. Larsen-Freeman, D. and L. Cameron. 2008. Complex Systems and Applied Linguistics . Oxford University Press. Layder, D. 1997. Modern Social Theory: Key Debates and New Directions . UCL Press. Meyerhoff, M. 1996. ‘Dealing with gender identity as a sociolinguistic variable’ in V. L. Bergvall, J. M. Bing, and A. F. Freed (eds): Rethinking Language and Gender Research: Theory and Practice . Addison Wesley Longman. Moon, R. 2007. ‘Words, frequencies, and texts (particularly Conrad): A stratified approach,’ Journal of Literary Semantics 36: 1–34. Ochs, E. 1992. ‘Indexing gender’ in A. Duranti and C. Goodwin (eds): Rethinking Context:

Language as an Interactive Phenomenon . Cambridge University Press. ONC (Office for National Statistics) http://www.


Accessed July 2008. Pavlenko, A. 2007. ‘Autobiographic narratives as data in applied linguistics,’ Applied Linguistics 28/2: 163–188. Pawson, R. 2002a. ‘Evidence-based policy: The promise of ‘realist synthesis’ ,’ Evaluation 8/3:


Pawson, R. 2002b. ‘Evidence and policy and naming and shaming,’ Policy Studies 23/3–4:


Ragin, C. C. 1987. The Comparative Method:

Moving Beyond Qualitative and Quantitative Strategies . University of California Press. Rayson, P. 2008a. Wmatrix: A Web-Based Corpus Processing Environment . Computing Department, Lancaster University. wmatrix/

from at Islamic Azad University on September 3, 2010applij.oxfordjournals.orgDownloaded

Rayson, P. 2008b. ‘From key words to key semantic domains,’ International Journal of Corpus Linguistics 13/4: 519–550. Rayson, P., D. Archer, S. L. Piao and T. McEnery. 2004. The UCREL semantic

analysis system. In Proceedings of the Workshop on Beyond Named Entity Recognition Semantic Labelling for NLP Tasks in association with 4th International Conference on Language Resources and Evaluation (LREC 2004), 25th May 2004, European Language Resources Association,

pp. 7–12.

Rumelhart, D. E. 1975. ‘Notes on a schema for stories’ in D. G. Bobrow and A. M. Collins

(eds): Representation and Understanding: Studies in Cognitive Science . Academic Free Press,

pp. 211–236.

Rumelhart, D. E. 1984. ‘Schemata and the

cognitive system’ in R. S. Wyer and T.

K. Srull (eds): Handbook of Social Cognition .

Lawrence Erlbaum, pp. 161–88. Scott, M. 2004. WordSmith Tools Version 4. Oxford University Press. Sealey, A. 2007. ‘Linguistic ethnography in realist perspective,’ Journal of Sociolinguistics 11/5: 641–660. Sealey, A. and B. Carter. 2001. ‘Social cate- gories and sociolinguistics: Applying a realist approach,’ International Journal of the Sociology of Language 152: 1–19.



Sealey, A. and B. Carter. 2004. Applied Linguistics as Social Science . Continuum. Sinclair, J. 1991. Corpus, Concordance, Collocation . Oxford University Press. Sinclair, J. M. 2004. Trust the Text: Language, Corpus and Discourse . Routledge. Sperber, D. and D. Wilson. 1986. Relevance . Blackwell. Stubbs, M. 1996. Text and Corpus Analysis . Blackwell. Stubbs, M. 2001. ‘Texts, corpora and problems of interpretation,’ Applied Linguistics 22/2:


Tannen, D. 1994. Gender and Discourse . Oxford University Press. Uprichard, E. and D. Byrne. 2006. ‘Representing complex places: A narrative approach,’ Environment and Planning A 38: 665–676. Upton, T. A. and U. Connor. 2001. ‘Using com- puterized corpus analysis to investigate the textlinguistic discourse moves of a genre,’ English for Specific Purposes 20/4: 313–329. Weatherall, A. 2002. Gender, Language and Discourse . Routledge. Williams, M. 2002. ‘Generalization in inter- pretive research’ in T. May (ed.): Qualitative Research in Action . Sage. Wortham, S. 2000. ‘Interactional positioning and narrative self-construction,’ Narrative Inquiry 10: 157–184.