Professional Documents
Culture Documents
Analysing Parallel Texts with ning part can be assumed to be approximately the
ParaConc same in both texts. Thus we are able to see how
two different languages encode equivalent mea-
Michael Barlow nings. The art of translation is undeniably complex
and involves many different kinds of processes,
Department of Linguistics, Rice University, Hou- but we can consider three main aspects of transla-
ston TX 77005, USA tion, namely, language particular encodings of (i)
event structure, (ii) discourse structure, and (iii)
KEYWORDS: parallel corpora concordance lexis. Each of these areas can be profitably analy-
sed using parallel corpora.
AFFILIATION: Rice University, USA
1. Event Structure
E-MAIL: barlow@ruf.rice.edu Event structure simply refers to those actions oc-
FAX NUMBER: 713-523-6543 curring in the world that are of interest to humans,
PHONE NUMBER: 713- 630-8761 such as a transitive event in which one object acts
on another object in some way, which is typically
Analysing Parallel Texts with ParaConc encoded using a transitive clause. Since we can
Much of the current research on parallel corpora assume that the translations are about the same
concerns the problem of automatic alignment of events, we can use parallel corpora to examine
two texts that are translations of each other (Gale how languages code events in general, in other
and Church 1994, Kay and Roscheisen 1994, Jo- words, how aspects of an event are expressed
hansson and Hofland 1993). This paper, however, grammatically or lexically in different languages.
focusses on the analysis of aligned parallel corpo- An objection that could be raised here is that the
ra rather than on the aligning process itself. particular choices made by a translator will intro-
In order to analyse a parallel corpus a suitable text duce distortions into the data. It is true that some
analysis program is needed. ParaConc is a simple apparently random choices occur in translations,
parallel text concordance program available in but the accretion of motivated translation choices
Macintosh and Windows versions, which was allow the general patterns to be perceived using a
created by the author as a tool for linguistic re- concordance program.
search. This program allows the user to search for For example, we can examine the coding of cau-
a word or phrase, in the way typical of concordan- sative events in English and French by searching
ce programs. However, the result of the search is for the lemma make and examining patterns such
displayed in two windows rather than one. The as "X makes Y do Z" and then observing the
topmost window displays numbered lines contai- patterns used in French to refer to these same
ning each instance of the search term in the first events. A concordance search reveals that causa-
language, along with its context. The lower wind- tive make in English covers a wide variety of
ow displays numbered sentences in the second situations, for instance, causing a change in state
language which correspond to the text displayed (make something possible), causing someone or
in the first language in the upper window. The something to perform an action (make a dog go
results of a search can be sorted, printed or saved. away), and causing some kind of transformation
To obtain a list of words from each text that expressed as make followed by two contiguous
correspond, as illustrated below for English line noun phrases (make John the president). Having
and French ligne, the results of a search are first searched for English causative constructions in-
saved as a text-only file and then loaded into a volving make, we can investigate how these diffe-
word-processor for further formatting. rent causative event structures are coded in
The use of parallel corpora presents very inter- French. And, in fact, we find a rather different set
esting research opportunities in a variety of disci- of patterns for French. For the construction of the
plines including linguistics, literary studies, trans- type make John president, the equivalent occurs
lation, and language teaching. While these in French with faire in most cases. However, the
different areas may be touched upon, the focus of corpus data shows that other causative uses are
the present paper is on the use of parallel corpora often not translated by faire in French. A variety
in linguistic analysis. This project is similar in of constructions are used instead, including verbs
spirit to a variety of parallel corpus projects such such as rendre, as shown in (1).
as Intersect, Contragram, ENPC, and TRIPTIC,
among others. (1) a. The American blockage makes life
Taking a language to consist of form-meaning very difficult for us.
links, what we have in parallel corpora are two sets Le blocus rend nos conditions de vie
of form-meaning linkings, one for each language. trés rudes.
On the other hand, uses of make expressing a alors que, tandis que, pendant que, contre, and si,
causative event in which an agent acts on an ani- among others. To provide a complete analysis of
mate causee to bring about an event are more these conjunctions it is necessary to examine the
likely to be translated with faire, as exemplified in results of the search in some detail and also to
(2). examine the translations in English of the different
expressions: tout en, alors que, etc. One result that
(2) a. It is a behaviour which makes you think we can identify is that French si is used to indicate
of France... a contrastive meaning. Thus the equivalent senten-
Un comportement qui fait penser à la ces to (4) are those given in (5), which use si for
France ... while.
(2) b. ... their parents had made them lose (5) a. Si elle ne manque aucune occasion de
their French nationality. verser au débit des socialistes la dété-
... leurs parents, ...., leur ont fait perdre rioration de la situation de l’emploi, la
la nationalité française. droite paraît tout aussi désarmée devant
la montée du chômage.
This example shows how ParaConc can be used to (5) b. Si les vendeuses sont toujours aussi peu
investigate fairly subtle cross-linguistic di- aménes, les vitrines, en revanche, se
stinctions in the expression of causative events. font plus alléchantes.
26
a democratic line une ligne démocratique Gale, W. and K. Church. 1994. A program for
dividing line ligne de partage Aligning Sentences in Bilingual Corpora. In S.
dividing line ligne de fracture Armstrong (ed) Using Large Corpora. MIT
drain line canalisation d’écoulement Press: Cambridge.
took a firm line apporté un soutien d’une Hopper, P. and E. Closs Traugott. 1993. Gramma-
fermeté ticalization. Cambridge: CUP.
the following lines cette formule Johansson, S. and K. Hofland. 1993. Towards an
the front line au front English-Norwegian parallel corpus. Paper
front line front from the Fourteenth International Conference
hard-liners l’intransigeance des on English Language Research on Computer-
in line with á l’image de ized Corpora, Zürich, May 19-23, 1993. In U.
kept in line on encadre Fries, G. Tottie, and P. Schneider (eds.), Crea-
not in line with pas correspondre au ting and Using English Language Corpora.
In line with Comme l’indiquait Rodopi: Amsterdam.
in line with s’inscrit dans Kay, M. and M. Roscheisen. 1994. Text-Transla-
into line with en accord avec tion Alignment. In S. Armstrong (ed) Using
line positions Large Corpora. MIT Press: Cambridge.
our line of conduct notre conduite Moon, R. 1987. The Analysis of Meaning. In J.M.
our line notre principe Sinclair (ed) Looking Up. Collins: London.
the poverty line le seuil de pauvreté Noel, Jacques. 1992. Collocation and Bilingual
Text. In G. Leitner (ed) New Directions in
Given these sets of data, it is possible to map out English Language Corpora. Mouton de Gruy-
how the semantic domain of line and ligne resem- ter: Berlin.
ble each other and how they differ in terms of
semantic extensions and usages. The undertaking
of this kind of investigation can play a part in
linguistic investigations of grammaticalisation
(Hopper and Traugott 1993) and of the study of
general constraints on form-meaning mappings
(Barlow and Kemmer 1994).
4. Conclusion
These analyses provide an illustration of how the
common content of parallel corpora can be exploi-
ted to gain linguistic insights into the structure and
function of languages. The technique of investiga-
ting pairs of languages is promising for a variety
of research areas. One advantage is that a two-way
analysis of a domain, from language A to language
B, and from language B to language A provides
clues to the different meanings/uses of each
language form.
In sum: in this paper I describe the analysis of
parallel texts using ParaConc, a parallel concor-
dancer, and outline some fruitful areas of corpus-
based research that are opened up by the use of
such a program.
References
Barlow, M. To appear. Parallel Texts for Linguis-
tic Analysis. In M. Barlow and S. Kemmer
(eds) Usage-Based Models of Language.
Barlow, M. 1995. A Guide to ParaConc. Athel-
stan: Houston.
Barlow, M. and S. Kemmer. 1994. A Schema-ba-
sed Approach to Grammatical Description. In
S. Lima, R. Corrigan and G. Iverson (eds) The
Reality of Linguistic Rules. Amsterdam: Ben-
jamins.
27