Section 13.

Philology and linguistics

Section 13. Philology and linguistics

Kucheruk LiliyaVladimirovna,
Oles Honchar Dnipropetrovsk national university,
Postgraduate student

Methodology of Data Extraction from a Corpus for the

Conceptual Analysis of Metaphorin Legal English
Abstract: The present paper entitled Methodology of data extraction from
a corpus for the conceptual analysis of metaphorin legal English investigates the
main approaches to the study of metaphorin legal language from the point ofview
of cognitive linguistics. It also offers the most widely spread methods of data extrac-
tion form a corpus.
Key words: corpus-bases approach, conceptual mapping, source domain, target
domain, corpora.
In the last 20years, corpus-based approaches to language have achieved great
significancein linguistics, and are now regarded as anindispensable component of
language study (irrespective of theoretical affiliation). Corpus driven studies are re-
quiredin research fields that consider different levels of linguistic structure, and focus
on language use.
But studiesin cognitive semantics, especially those connected with conceptual map-
ping, are stillin dire need of more corpus research, as many such studies are still being
carried on some randomly selected data. Such an approach to theinvestigation of con-
ceptual mappings, namely to theinvestigation of conceptual metaphors, may eventually
create problems, especially when the aim of the researchis a systematic characterization
of a particular conceptual mapping, or of source/target domains. Thus, it turns out to
be necessary to ground the research on some representative empirical data, whichis
authentic and can be accessedvia relevant corpora. Only empirical data will enable the
linguist to make statements which are objective and based on language asit reallyis.
Such statements are to be contrasted with more subjective statements based upon
theindividuals owninternalized cognitive perception of the language [2, 103].


The first problem that should be solvedin the process of using a corpus-based
approach to theinvestigation of metaphorsis that of extracting andidentifying the
data from the corpus. Extracting relevant material from a corpus becomes extremely
difficult when applied to metaphors, as the wayin which the process of conceptual
mapping takes placeis not tied to some particular linguistic forms. Computer pro-
grams can organize language data swiftly and accurately on orthographic principles,
butidentifying and describing features such as grammatical patterns, meaning, and
pragmatic use can only be done by a human analyst [1, 92]. So a researcher whoisin-
vestigating particular linguistic units or patterns has to look through a considerable
amount of linguistic material, searching for some definite manifestations of the pat-
ternsin question. The process for extracting andidentifying relevant datain a corpus
should be carried out by the following procedures:
1.manual searching;
2.searching for thevocabulary of a source domain;
3.searching for thevocabulary of target domain;
4.searching for sentences containingitems from both the source and target domains;
5.extraction from a corpus annotated for semantic fields;
6.extraction form a corpus annotated for conceptual mappings;
7.searching for metaphors based on markers of metaphors.
Manual search consists of starting from a small corpus, or from a small part of
an already existing corpus, and searchingit manually, marking out all the metaphors
one comes across. Then, one proceeds with a larger corpus, searching for the marked
metaphorsinit. Thisis a rather efficient method, asit offers the possibility of reading a
small corpus, or a part of a corpus, entirely and thoroughly, toidentify all the existing
metaphorsinit, and by searching for themin a large corpus to receive more general-
ized linguistic results. On the other hand, this method of retrieving relevant material
limits the potential size of a corpus to a great extent.
As for extraction from a corpus annotated for semantic fields, this procedureim-
plies searching for particular linguisticitemsin the source and target domains that
have previously been taggedin the corpus. Using this method, a researcher can specify
a particular source domain, and analyze all the lexicalitems related toit, instead of
manually searching for lists of lexical expressionsa somewhat tedious and often frus-
trating process that usually yieldsincomplete lists. The analysis of target domains, and
the search for sentences containing both the potential source and target domains can
be carried out as well. As already mentioned, the main drawback of this strategyis
the rare availability of annotated corpora. Other disadvantagesinclude the fact that a
researcher must then rely almost exclusively, in hisinvestigations, on previously exist-
ing annotations. Also, semantically annotated corpora might notinclude the relevant
semantic fields for a particular piece of research.

Extraction form a corpus annotated for conceptual mappings consists of using

corpora annotated for conceptual mappings. Unfortunately, thereis no such corpus,
asits creationis fraught with difficulties and complications. The conceptual annotation
of a corpus poses raises the followingissues:
1.definition of a reliable procedure for discoveringinstances of the phenom-
enonin question;
2.definition of the attributes that are considered relevant for eachinstance and
the set ofvalues that each of these attributes can take as well as guidelines as to how
thesevalues are to be assigned;
3.definition of an annotation format [5, 10].
In the present circumstances, the first two requirements are hard to fulfill because
thereis no general approach for theidentification of metaphorsin a text. Therefore
a researcher must rely, to a great extent, onintuition, knowledge, and general experi-
ence. Theidentification of metaphors might be an easy taskin some exceptionally clear
cases. Even then, it would still be a challenge to provide complete and accurate lists of
metaphoric expressions. Whether theidentificationis simple or complex, the annota-
tor must prove the theoretical grounds the criteria applied during the annotation pro-
cess. In general, the annotationis tied to a specific research project, which means that
thereis still a long way to go before multi-purpose annotated corpora are designedin
the field of metaphor research. The problem of choosing the relevant attributes for a
specific linguistic phenomenon also arises during the process of conceptual annota-
tion. For example, in the case of metaphor and metonymy, there exist some general
attributes for annotation, such as source and target domains. These are metaphoricity,
metonymicity, degree of conventionality, the reason for using metaphor, etc.
Searching for metaphors based on markers of metaphors is another method
for extracting metaphoric material from a corpus was advocated by Andrew Goatly
[4]. Some linguistic markers, he claimed, pointed to the existence of metaphorsin
discourse. Goatly [4] thus defined markers of metaphors as the words and phrases
occurringin the environment of a metaphorsvehicle term, or a unit of discourse that
unconventionally refers to, or colligates with, the topic of a metaphor, on the basis of
similarity, matching, or analogy. The Explicit markers, intensifiers, hedges and down-
toners, or symbolisms are determined on a functional basis; semantic metalanguage,
mimetic terms, perceptual processes, misperception terms, or cognitive processes
are connected with semantics; and modals, conditionals, or copular similes represent
grammatical categories. Unfortunately, initial evaluations of the method by Walling-
ton, Barndeb, Ferguson, and Glasbey have clearly established that Goatlys list of meta-
phorical markersis not a sureindicator of the presence of metaphorical expressionsin
a text. So other strategies have to be adopted, based on searching a corpus foritems
belonging to the source or target domains presentin conceptual metaphors.


Searching for thevocabulary of a source domain. The first step for this type ofin-
vestigationinvolves finding the existing linguisticitems (metaphors), or whole sets of
suchitems, which represent a particular conceptual metaphor. A listis made, then a
searchis carried out with a concordance program to seeif theitems are presentin the
corpus, as the computer cannot work from a list of conceptual metaphors toidentify
their linguistic realizations [1, 93]. The selection of these linguisticitems can be based
on hypothetical decisions, on already existing lists of suchitems, or on a preceding
analysis of the keywords of the texts connected with the target-domain topics. Once
metaphors have been retrieved from a corpus, they can be further classifiedinto sub-
groups and sub-types.
Searching for thevocabulary of target domain. Different scholars have suggested
different ways of working with target domains for retrieving relevant data. In his meth-
od, whichis based on searching keywordsin the target domain, Partington [3] suggests
creating lists of terms characteristic of particular genres of discourse, analyzing them,
then running a concordance foritems that appearin more than one key list, or which
seem to belong to the same semantic set. According to Partington, such an analysis
would help reveal some systematic metaphors for experimentation, and to distinguish
the particular cases of their use. This method hasits strong and weak points. Its main
weakness liesin the fact that, for such an analysis, one needs a huge amount of homo-
geneous monothematic texts, connected to the target domain. Another weak pointis
that, in order to become a keyword expression, a word should be widely representedin
the target domain, and, thus, this type of analysis will reveal only those source domains
that are widely represented by the keywordsin the target domain. Thus, the method
will not provide utterly reliable results.
Searching for sentences containingitems from both the source and target do-
mains. Closely related to the previous two strategies searching forvocabulary from
the source or target domains is the method thatinvolves searching for sentences
containing theitems from both the source and target domains. Using this method, a
researcher should look for sentencesincluding thevocabulary from both source and
target domains. This method requires exhaustive lists of source and target domain
expressions. But to eliminate errors caused by the literal use of linguisticitems, bothin
the source and target domains, a fair amount of manual processing and editingis re-
quired, which can be burdensome. Also, itis rarely possible to find complete lists of
target and source domain expressions, so some units are likely to be missing and the
results of theinvestigation cannot be foolproof. Furthermore, this method can only
be used to reveal conceptual mappings, andis thus restricted to the metaphorical
expressions known beforehand. The main advantage of this methodis thatit can be
used for the analysis of large numbers of texts, and that, asitis based on an annotated
corpus, it can be processed automatically.

All the strategies used to extract relevant material from a corpus whileinvestigat-
ing conceptual mappings have their advantages and drawbacks. None can give com-
plete and reliable results, since all depend on the quality of the software used and on
the experience orintuition of the researcher. Thus, in order to receive reliable resultsis
necessary to use a combination of the methods mentioned above.
