Professional Documents
Culture Documents
1. Introduction
When I first read the manuscript of Mona Baker's article "Corpora in Transla-
tion Studies: An Overview and Some Suggestions for Future Research"
(1995), I was inspired by the challenge of working towards developing a
coherent methodology for corpus-based translation studies, because I believed
then (and still do) that this is an essential step for realising the potential
envisaged in this new field of research. In October 1994, I began working on
the creation of a monolingual comparable corpus of English, which was
In order to categorise both the corpus as a whole and each of its components in
a way that is consistent with existing corpus typologies I have adapted and
expanded the groups of contrastive parameters proposed by Atkins et al.
(1992) in their typology of corpora.
The typology presented below is organised along four hierarchical levels.
The first level consists of six sets of contrastive parameters that relate to the
most general features of a text corpus. Subsequent levels concern increasingly
more specific groups of parameters relevant to the corpus type designed in the
present study. The proposed typology is not intended to be exhaustive. Its
function is to provide a common framework within which the English Compa-
rable Corpus can be described in relation to other corpus types. It also
constitutes the first stage in the design process, during which the general
features of the corpus were established.
1) Level I:
Corpus Types: FULL-TEXT
SAMPLE
MIXED (FULL-TEXT AND SAMPLE)
MONITOR
A full-text corpus contains unabridged texts, while a sample corpus is made
up of portions of texts selected according to stated design principles concern-
ing size, location of the sample within the full text and method of selection. A
mixed corpus contains both unabridged texts and portions of others. Finally,
a monitor corpus is made up of full texts scanned continuously and passed
through a filter to keep the data on a given language up-to-date.
Corpus Types: SYNCHRONIC
DIACHRONIC
A synchronic corpus contains texts produced within a restricted period of
time, white a diachronic corpus is made of texts produced over a long period.
292 SARA LAVIOSA
A single monolingual corpus consists of one set of texts all in the same lan-
guage. A monolingual comparable corpus is made up of two single mono-
lingual corpora: one translational, the other non-translational. The two corpora
are set up according to similar design criteria. In corpus linguistics the term
'comparable corpus' is generally used to refer to a bi/multilingual corpus
made up of two or more sets of texts from the same subject domain(s) (Sinclair
1991b; Teubert 1994; Peters and Picchi 1996), while the term 'parallel corpus'
refers to a corpus of original texts in language A and their translations into
language B. However, in translation studies and contrastive linguistics the
terminology is not always consistent; some scholars use 'parallel corpus' to
cover both types of bilingual corpora (Johansson and Holland 1994;
Hartmann 1994; Gellerstam 1996), while others follow the traditional termi-
HOW COMPARABLE CAN 'COMPARABLE CORPORA' BE? 293
nology of contrastive analysis (Aijmer et al. 1996; Granger 1996) and differ-
entiate between a 'translation corpus' (original texts in language A and their
translations in language B) and a 'parallel corpus' (original texts in language
A and B). The terminology I employ in this study overlaps with existing
categorisations in corpus linguistics in order to facilitate the comparison of my
research results with those of other studies in this field. I think it would be
useful to aim towards adopting a consistent terminology in corpus-based
translation studies because this would aid the systematic accumulation of new
data and facts about translation and translating.
3) Level IIIa
Single Corpus Types: TRANSLATIONAL
NON-TRANSLATIONAL
A translational corpus is made up of texts, which are known to have been
translated into a given language. A non-translational corpus consists of
original texts in a given language.
4) Level IIIb
Comparable Corpus Types: TRANSLATION-DEPENDENT
NON-TRANSLATION-DEPENDENT
INDEPENDENT
A translation-dependent comparable corpus is one in which the non-
translational component is modelled on the composition of the translational
set. A non-translation-dependent comparable corpus is one where the
composition of the translational set is modelled on the non-translational
corpus. In an independent comparable corpus the two components are
designed separately and subsequently linked on the basis of independently
established criteria of comparability.
On the basis of the corpus typology so far delineated, ECC is classified as
a monolingual, mixed full-text and sample, synchronic, translation-dependent,
comparable corpus of written general English. TEC is generally described at
this stage as a monolingual, single, full-text, synchronic corpus of written
general translational English. NON-TEC is categorised as a monolingual,
single, mixed full-text and sample, synchronic corpus of written general non-
translational English.
The order in which I present the two components of ECC: TEC first,
NON-TEC after, is not arbitrary, but rather reflects an important aspect of the
methodology, namely the priority given to the design of TEC, since it is
294 SARA LAVIOSA
2.3. The Theoretical and Practical Motivation for the TEC Design
The decision to create a corpus of general language has been taken mainly
because this is considered more representative of the translational language
population, particularly in terms of reception.2 Moreover, since it was to be a
resource for the systematic study of translated text, I have assumed3 that a
general rather than a specialised corpus would interest a larger community of
scholars. The exclusion of technical texts has led to the creation of a mono-
translation-method corpus (human translation) since both Machine Transla-
tion and Computer Assisted Translation systems are used mainly in
specialised subject fields (Sager 1994: 300; 303). Preference has been given to
a multi-source-language corpus because this makes it possible to test and
develop a methodology for the identification of features of translational
language which are assumed to be independent of the influence of the specific
source language involved in the translation process. Preference has also been
given to published translations carried out by professional translators since
these can be regarded as more representative of translational language, given
that one can assume that they are read by a wider audience. Full, rather than
sample texts have been included in order to obtain reliable data on the various
measures of lexical simplification selected for the investigation of the corpus,
such as type/token ratio and lexical density. These may in fact vary from one
section of the text to another (Jeremy Clear, personal communication, 1996),
consistently with Sinclair's claim that few linguistic features of a book-length
text are distributed evenly throughout (Sinclair 1991a: 19). Moreover, a full-
text corpus is a much more useful resource since it permits a greater variety of
linguistic analyses, such as the investigation of large patterns of text and of the
development of characters in a novel (Baker 1995: 240). A full-text corpus is
also invaluable to the researcher who wishes to compare a particular transla-
tion with its source text by creating a parallel corpus alongside the initial
comparable one. The reason why a synchronic rather than a diachronic corpus
was created lies in the nature of the hypotheses being tested on the corpus,
which do not concern the development of linguistic patterns over time, but the
regularities of current linguistic behaviour in translation (Laviosa forthcom-
ing; Laviosa-Braithwaite forthcoming a; b). The mode of the text and the
translating mode are both written for entirely practical reasons, namely greater
and more varied availability of texts, less time and lower costs with regard to
both acquiring the translations and converting them into computer-readable
HOW COMPARABLE CAN 'COMPARABLE CORPORA' BE? 297
form. It is also comparatively easier to establish who holds the copyright for a
written translation than for an interpreted text. Finally, English has been
chosen as the language of the corpus mainly because of the existence of a very
large corpus of general language — the British National Corpus — which has
been made available to the academic community since March 1995 and from
which suitable texts can be extracted for the design of NON-TEC. English is
also the world's best described language (Sinclair 1991a), it is therefore
reasonable to assume that it would attract more scholars to the emerging
corpus-based approach in translation studies. The choice of English has in turn
led to the exclusion of mediated translations since this phenomenon is rare in
current translational English, given the hegemonic status of the English lan-
guage world-wide.
By taking into account the additional dimensions outlined in the previous
section, TEC is now categorised also — if somewhat arbitrarily — as a multi-
source-language, mono-translation-mode (written-to-be-read mode),
mono-translation-method (human translation), largely into-mother-
tongue, professional, published corpus. These characteristics of TEC are
the result of choices made on the basis of both theoretical and practical
considerations.
Once the general features of TEC have been established, the next step in the
design process consists of identifying the text categories which best fit these
characteristics.
Moreover, the choice of suitable text genres is partly governed by the
likelihood of their representing a variety of female and male translators and
authors. At this stage, as in the previous one in which the general features of
TEC were established, both a priori criteria and practical considerations are
taken into account. In case of conflict, priority is generally given to theoretical
principles.
The a priori criteria for identifying the text genres of TEC are as follows:
• General English (not restricted to any particular regional variety)
• Full-texts
• Synchronic: produced within the last 15 years
• Written
• Published
298 SARA LAVIOSA
The reasons for carrying out these additional searches were both practical —
reduction of the initial lists to a manageable number of texts which are not
exceedingly expensive — and theoretical — the most recent books should be
more representative of current translational language and the lower price is
taken as an indicator of wider reception.
As regards General Fiction, which covers the largest number of texts, a
further manual selection was carried out to exclude books whose price is less
than £4.95 in an attempt to weed out pulp fiction. The entire selection stage
ended with the compilation of a single list of suitable texts derived from both
the Whitaker's Bookbank and the other sources.
role, the man leading the Red Cross delegation in Kigali has
decided to return to France. He says there is nothing more his team
can do. Frederic Fischer reports — reason=not-te extent=39xomit
desc=stnewspaper — Le Monde -> SMALL and slim, almost
skinny, Philippe Gaillard is not exactly the kind of man you would
mistake for Rambo. Yet he has been running a delegation of the
International Committee of the Red Cross in Kigali almost every
day for a year. Yesterday, he left Rwanda for good. He has decided
never to return.
Attribute: Mode
Note: This is the mode in which the textual content is delivered
Values: Written-to-be read, Written-to-be-spoken, Spoken, Spoken-
to-be-written
Attribute: Word-count
Values: The actual number of orthographic words counted by the
word-count facility of WordSmith Tools (Scott 1996)
Attribute: Special features
Values: Pictures and/or Diagrams and/or Tables and/or Other (to
specify ad hoc)
Attribute: Date of publication
Values: The date printed in the published translation
Attribute: Place of publication
Values: Country where the translation is published
Attribute : Publisher
Values: The name of the publisher of the translation
Attribute: Publication of the name of the translator(s)
Note: This refers to whether or not the name of the translator is
visible anywhere in the text
Values: Yes, No
Attribute : Copyright
Note: Who holds the copyright for the translated text
Values: Translator, Publisher of the translation, Author, Publisher
of the source text, Other (specify ad hoc)
TRANSLATION PROCESS
Attribute: Relation between translation and source text
Note: This concerns the final status of the target-language text in
relation to that of the source-language text
Values: Complete, Excerpt,9 Direct, Indirect (Mediated)
Attribute: Direction of translation
Note: This is inferred from the data regarding the nationality of
the translator at birth and her/his current nationality. If the
nationality is that of an English-speaking country both at
HOW COMPARABLE CAN 'COMPARABLE CORPORA' BE? 307
SOURCE TEXT
Attribute: Language
Attribute: Status
Values: Original, Translation, Excerpt
Attribute: Name of the author(s)
Values: Full name(s)
Attribute: Gender of the author(s)
Values: Female, Male
Attribute: Sexual Orientation of the author(s) (only for literary
texts)
Values: Lesbian, Gay, Heterosexual, Immaterial, Unknown
Attribute: Date of publication
Attribute: Place of publication
Values: Country where the source text was published or produced
Attribute: Publisher
Values: Name of the publisher
The examination of the actual data relating to these attributes, providing the
corpus is fairly large and representative, may reveal extralinguistic patterning
from which various interrelated external features of current translational
practice in the English-speaking culture may be inferred. For example, they
may throw light on aspects of "translation policy" (Toury 1995: 58), such as
the preference for certain text categories, or they may reveal trends in the
application of copyright laws. Patterns may also emerge as regards the transla-
tion process, for example the type of editing that is generally carried out or the
procedures underlying commissioning, and so on. Such information based on
external evidence may then feed into the theoretical branch of translation
studies and either be integrated into existing models or give rise to new
specific hypotheses which can then be tested with linguistic analyses carried
out with a corpus-based methodology. At the same time, the attributes re-
corded for each TEC text constitute independent variables which have been
derived from existing theories — for example Sager's communicative theory
of translation (Sager 1994), or Toury's conditional laws of translational
behaviour (Toury 1995: 259-279) — and can be used to test the validity of
HOW COMPARABLE CAN 'COMPARABLE CORPORA' BE? 309
some aspects of these models. This illustration of the possible uses of text
attributes within the proposed corpus-based methodology highlights an essen-
tial feature of the corpus-based approach to translation studies; namely, the
interrelationship between description, testing of a priori hypotheses and meth-
odology.
Acquiring and Recording the Extralinguistic Features of TEC Texts. The
features of TEC texts are recorded in a database file. Information about the
actual values of the attributes is collected from three sources:11
• a questionnaire sent with a standard explanatory letter to one of the
following:
the translator of an individual text or group of texts
the editor of a collection of texts
the translation agency subcommissioning a collection of texts
• inspection of the relevant sections of the printed or electronic copies of
the texts
• direct questioning of the professionals involved in the translation
process.
Special mention needs to be made in respect of the problematic acquisition of
the data regarding the sexual orientation of both the translator and the author
of literary translations. Although this is considered valuable information that
may interest scholars concerned with the study of translational language and
gender, it is also recognised that it is a highly sensitive and controversial area
of research. Despite these challenges, I have attempted to develop a method
for eliciting and collecting this data. This involves drafting a different letter
for literary translators which contains a short note explaining the rationale for
eliciting data on various extra-textual features of the translation and an invita-
tion to comment on their own sexual preferences and those of the author if
they feel they can or want to do so.12 Other sources consist of published and/or
publicly available information such as statements made by the professionals
concerned, editorial comments, public speeches and interviews. The latter
method has been employed in the case of two autobiographies by Juan
Goytisolo: Realms of Strife and Forbidden Territory, both translated by Peter
Bush. In this instance the information about the homosexuality of the author is
published on the cover of the books and in the texts themselves, as well as
having been made publicly known by the translator in the course of a talk
delivered at a recent conference on translation studies (Bush 1995).
310 SARA LAVIOSA
• number of articles
• word count
• text extent
This means that a relatively higher level of comparability has been sought and
established in the Newspaper subcorpora, where the collections consist of the
same number of articles; the translational and non-translational texts come
from the same newspaper and newspaper section; they deal with similar topics
and they are both full texts of a similar size.
The synopses, on the other hand, have been included in the text, because,
unlike those that form part of TEC articles, there does not seem to be, on the
whole, a clear division between them and the main body of the article.
The conversion procedure from electronic text to NON-TEC text is the
same as the one adopted for TEC.
There are differences between Narrative works (i.e. Biography and Fiction) on
the one hand and Newspapers on the other, with regard to the dimensions and
consequent level of comparability. This is partly because of intrinsic differ-
ences between the text genres (for example it is arguably more difficult to
identify a unified topic in narrative works), and partly because, in the case of
NON-TEC narrative publications, I had to rely on texts that were already
available in machine-readable form. The combination of these two factors has
resulted in my proposing only a minimal set of dimensions for Biography and
Fiction which concern mainly global, extra-textual features. One dimension
involves a degree of subjective evaluation. This is the target-audience level
which is assessed for NON-TEC narrative texts by the designers of the BNC
on the basis of the perceived difficulty of the text, while for the TEC texts, it is
established by myself on the basis of my own reading of these publications.
Given these constraints, my attempt to seek an adequate level of similarity
between TEC and NON-TEC narrative texts has proved to be highly problem-
atic, particularly in the present initial stages of corpus design when the
methodology is still experimental.
The dimensions of comparability put forward for newspapers are, on the
other hand, greater in number and relatively less problematic to apply, given
the greater availability of translational and non-translational texts within the
same newspaper and the possibility of identifying the topic of each article
from the titles and subtitles with a reasonable degree of accuracy. The level of
comparability pursued with this text genre can therefore be considered reason-
ably adequate. There are however differences between the two collections
within the Newspaper subcorpus. The Guardian articles are on the whole
more similar than those selected from The European, particularly with regard
to the average word-count, distribution of female and male authorship and
time span. Discrepancies in the case of The European are caused by restric-
HOW COMPARABLE CAN 'COMPARABLE CORPORA' BE? 315
tions on the availability of machine-readable texts at the time when the articles
were being selected and downloaded. In future studies, providing permission
is granted by the publishers, more texts could be extracted from existing on-
line services and a more adequate level of comparability could be sought for
The European collections.
The discrepancy in comparability between Biography and Fiction, on the
one hand, and Newspapers, on the other, could be partly reduced if one had
access to a very large, full-text corpus of general English, which included a
large portion of fiction and biography published not only in the UK but also in
the USA. This would ensure comparability on two additional dimensions: text
extent and place of publication. Moreover, the selection of suitable texts could
be refined if the corpus compiler read the original works initially earmarked
according to external categorisations, and then assessed their comparability on
the basis of general criteria, such as the level of difficulty of the language,
intended target audience, and style, in order to supplement, with her/his own
impressions, the classification provided by the team responsible for the cre-
ation of the parent corpus. The application of these criteria would, in my view,
ensure a more accurate evaluation of the individual texts and increase the
comparability of the translated and non-translated components of the corpus.
restricted to two text genres, but may prove typical of translated text in
general.
It follows from the present analysis of the design of a monolingual multi-
source-language comparable corpus of English that the strength of any future
evidence which may or may not confirm the existing findings will depend to a
significant extent on the level of comparability that the researcher will have
established during the two crucial phases of corpus design.
Author's address:
Sara Laviosa • Department of Language Engineering • UMIST • PO Box 88 •
MANCHESTER M60 1QD • United Kingdom
Notes
1. As Atkins et al. point out (1992: 7), texts written to be spoken overlap with the spoken
text. It could therefore be considered a spoken mode or regarded as a separate class
altogether.
2. According to UNESCO statistics (van Slype et al. 1983 quoted in Sager 1994: 297)
scientific, industrial and legislative translation represents 20%, 21% and 9% of the entire
volume of translation activities. The remaining amount is distributed as follows: commer-
cial (35%), press and current affairs (3.5%), audio-visual (3.5%), educational (1.5%),
literary (0.3%), miscellaneous (6.7%). In the UK, the largest share of translation produc-
tion — circa 90% — is technical and scientific (Francis Sutcliffe, ALPNET, personal
communication, 1996). However, I have assumed that the readership of these specialised
publications is rather narrow, compared with that of general translation.
3. This assumption is based largely on the general impression derived from participating, in
the last two years, in several national and international conferences on the subject of
translation, where the papers presented dealt mainly with literary and general language.
4. The terms subject domain, subject field, institutional text type, text category and text
genre are used interchangeably in this study. They all refer to groups of texts considered
similar by the corpus compiler or by general consensus on the basis of their
extralinguistic features. I have deliberately chosen to avoid the words 'genre' and 'text
type' on their own, because these are used by Biber and Finegan (1986; 1991) to refer to
two different notions. According to these scholars, 'genres' are "the text categories
readily distinguished by speakers of English (e.g. novels, newspaper articles, public
speeches)" (Biber and Finegan 1991: 213). The notion of 'genre' is therefore used to
characterise texts on the basis of external criteria (Biber and Finegan 1986: 20). 'Text
types', on the other hand, are defined in terms of the linguistic characteristics of the texts
themselves. They represent sets of texts "that are similar with respect to their linguistic
form, irrespective of genre categories" (Biber and Finegan 1986: 20). Their identification
depends therefore on the analysis of the predominant linguistic features of the texts,
HOW COMPARABLE CAN 'COMPARABLE CORPORA' BE? 317
which, in the case of Biber's studies, is carried out through Factor Analysis. Nakamura
(1989; 1991; 1994) makes the same distinction between 'genre' and 'text type', and uses
a statistical method called "Extended HAYASHI's Quantification Method Type III" to
describe text types in large corpora (Nakamura 1994: 141).
5. Examples of published reports are those commissioned by The European Commission
{Draft Guidelines as to the Form and Content of Schemes) and those published by the
Welsh Language Board {Discussion Document: A Strategy for the Re-Introduction of
Welsh Second Language at Key Stage 4). Examples of reports which are unpublished, but
are available to the public upon request, are the reports produced by UNIDO — Regula-
tions on drugs, Reports on Women's conditions.
6. Examples of speeches are: From Poem to Novel, From Novel to Poem by José Saramago,
translated by Giovanni Pontiero, and Introduction: Elytis in His Own Words by the Nobel
Prizewinner Greek Poet Elytis, translated by David Connolly.
7. For Customs and Folklore it was not necessary to restrict the search to publications dated
from January 95 onwards because the initial total number of translations in print was
reduced to a manageable amount by reducing the time span from January 1992 and the
price to less than £20.00. For General Knowledge the initial total number was reduced by
just applying the search that identified all those publications with a price of less than
£20.00.
8. The layout of this list has been adapted from the taxonomy of text attributes proposed by
Atkins et al. (1992: 5-6).
9. The value 'excerpt' covers any form of incomplete text.
10. This type of information is inferred, rather than collected directly, because of the general
reluctance on the part of translators, translation agencies and publishers to reveal whether
their translations have been produced out of the mother tongue. This in turn originates
from their concern about possible negative evaluations of their work.
11. Unknown data is recorded with an "X" in the corresponding database field.
12. I am particularly grateful to Carol Maier, Peter Bush and Luise von Flotow for their
insightful comments on this issue.
13. The time span for Sevenday is much narrower (5.10-23.11.1995) than intended. This is
due to the unavailability of texts in electronic form.
14. For one of the The European collections (Sevenday) this criterion could not be applied
since the name of the author is not revealed.
15. For the same reason explained in note 14, this criterion could not be applied in the case of
the Sevenday collection for The European.
16. Three levels of target audience are identified in the BNC: high, medium and low. They
are established on the basis of an assessment of a text's technicality, which in turn
depends on the perceived difficulty of the text.
17. Word-count is calculated using WordSmith Tools (Scott 1996). The status of the newspa-
per articles has been established via direct questioning of the editors of the weekly
supplement Guardian Europe and Presswatch respectively.
318 SARA LAVIOSA
References
Aijmer, Karin, Bengt Altenberg and Mats Johansson, eds. 1996. Languages in Contrast:
Papers from the Lund Symposium on Text-Based Cross-Linguistic Studies, 4-5 March
1994. Lund: Lund University Press.
Atkins, Sue, Jeremy Clear and Nicholas Olster. 1992. "Corpus Design Criteria". Literary
and Linguistic Computing 7:1. 1-16.
Baker, Mona. 1995. "Corpora in Translation Studies: An Overview and Some Suggestions
for Future Research". Target 7:2. 223-243.
Biber, Douglas and Edward Finegan. 1986. "An Initial Typology of English Text Types".
Jan Aarts and Willem Meijs, eds. Corpus Linguistics II: New Studies in the Analysis and
Exploitation of Computer Corpora. Amsterdam: Rodopi, 1986. 19-46.
Biber, Douglas and Edward Finegan. 1991. "On the Exploitation of Computerized Corpora
in Variation Studies". Karin Aijmer and Bengt Altenberg, eds. English Corpus Linguis-
tics. London and New York: Longman, 1991. 204-220.
British National Corpus (BNC). 1995. User Reference Guide for the British National
Corpus: Version 1.0. Oxford: Oxford University Computing Services.
Bush, Peter. 1995. "Translator Activism and Translation Theory: A Cuban Case Study".
Talk given at the Conference on the Linguistic Foundations of Translation. University
of Liverpool, 15-17 September 1995.
Gellerstam, Martin. 1996. "Translations as a Source for Cross-Linguistic Studies". Aijmer
et al. 1996: 53-63.
Granger, Sylviane. 1996. "From CA to CIA and Back: An Integrated Approach to Comput-
erized Bilingual and Learner Corpora". Aijmer et al. 1996: 37-52.
Hartmann, R.R.K. 1994. "The Use of Parallel Text Corpora in the Generation of Translation
Equivalents for Bilingual Lexicography". Paper presented at the EURALEX Congress,
Amsterdam 1994.
Johansson, Stig and Knut Hofland. 1994. "Towards an English-Norwegian Parallel Cor-
pus". Udo Fries, Gunnel Tottie and Peter Schneider, eds. Creating and Using English
Language Corpora: Papers from the Fourteenth International Conference on English
Language Research and Computerized Corpora, Zurich 1993. Amsterdam and Atlanta:
Rodopi, 1994. 25-37.
Laviosa, Sara, forthcoming. "Core Patterns of Lexical Use in a Comparable Corpus of
English Narrative Prose". Sara Laviosa, ed. The Corpus-Based Approach: A New
Paradigm in Translation Studies.
Laviosa-Braithwaite, Sara. 1996. The English Comparable Corpus (ECC): A Resource and
a Methodology for the Empirical Study of Translation. Manchester: UMIST. [PhD
Thesis.]
Laviosa-Braithwaite, Sara, forthcoming a. "Analysing Comparable Translational and Non-
Translational Texts with Tools of Corpus Linguistics". The Proceedings of the Second
International Conference on Current Trends in Studies of Translation and Interpreting,
Budapest, 5-7 September, 1996.
Laviosa-Braithwaite, Sara, forthcoming b. "The English Comparable Corpus: A Resource
and a Methodology". The Proceedings of the Conference on Translation Studies: "Unity
in Diversity?" Dublin City University, 9-11 May, 1996.
HOW COMPARABLE CAN 'COMPARABLE CORPORA' BE? 319
Nakamura, Junsaku. 1989. "A Quantitative Study on the Use of Personal Pronouns in the
Brown Corpus". Jacet Bulletin 20. 51-71.
Nakamura, Junsaku. 1991. "The Relationships among Genres in the LOB Corpus Based
upon the Distribution of Grammatical Tags". Jacet Bulletin 22. 55-74.
Nakamura, Junsaku. 1994. "Extended HAYASHI's Quantification Method Type III and Its
Applications in Corpus Linguistics". Journal of language and Literature 1. 141-192.
Peters, Calor and Eugenio Picchi. 1996. "Bilingual Reference Corpora for Translators and
Translation Studies". Paper presented at the Conference on Translation Studies: "Unity
in Diversity?" Dublin City University, 9-11 May, 1996.
Sager, Juan C. 1994. Language Engineering and Translation: Consequences of Automa-
tion. Amsterdam and Philadelphia: John Benjamins.
Scott, Michael. 1996. WordSmith Tools. Oxford: Oxford University Press.
Sinclair, John. 1991a. Corpus Concordance Collocation. Oxford: Oxford University Press.
Sinclair, John. 1991b. Council of Europe Multilingual Lexicography Project. Report
submitted to the Council of Europe under contract no. 57/89.
Sinclair, John. 1992. Lexicographers' Needs. Pisa Workshop on Text Corpora, January
1992.
Teubert, Wolfgang. 1994. "Parallel Corpora and Multilingual Lexicography". Unpublished
manuscript provided by the author.
Toury, Gideon. 1995. Descriptive Translation Studies and beyond. Amsterdam and Phila-
delphia: John Benjamins.
van Slype, G. et al. 1983. Better Translation for Better Communication. Oxford: Oxford
University Press.