Professional Documents
Culture Documents
There are broadly three basic MT strategies. The earliest historically is the 'direct
approach', adopted by most MT systems of what has come to be known as the first
generation of MT systems. In response to the apparent failure of this strategy, two
types of 'indirect approach' were developed: the 'transfer method', and the use of
an 'interlingua'. Systems of this nature are sometimes referred to as second
generation systems.
The direct method
The first (and historically oldest) type is generally referred to as the 'direct
translation' approach: the MT system is designed specifically for one particular pair
of languages, e.g. Russian as the language of the original texts, the source
language, and English as the language of the translated texts, the target language.
The direct approach lacks any kinds of intermediate stages in translation processes:
the processing of the source language input text leads 'directly' to the target language
output text.
First generation direct MT systems began with a morphological analysis phase. In
this phase the system identified word endings and reduced inflected forms to their
uninflected basic forms. Then it input the results into a large bilingual dictionary look-
up program.
In other words, when the system would find the canonical form of a word, it would
look up in the bilingual dictionary to find an equivalent in the target language. There
would follow some local reordering rules to give more acceptable target language
output, and then the target language text would be produced.
Why then is the transfer approach so often preferred to the interlingua method?
The first reason is that it is far too difficult to devise language-independent
representations (interlingua).
The second is that in the transfer approach the analysis and generation grammars
work between two languages and are not so difficult to write. In contrast to that in the
interlingua approach these grammars are language-independent and must work for
any language of the system.
The differences between direct and indirect, transfer and interlingua, rule-based,
knowledge-based and corpus-based are becoming less useful for the categorization
of systems.
Transfer systems consist typically of three types of dictionaries (SL dictionaries
containing detailed morphological, grammatical and semantic information, similar TL
dictionaries, and a bilingual dictionary relating base SL forms and base TL forms)
and various grammars (for SL analysis, TL synthesis and for transformation of SL
structures into TL forms).
levels of linguistic description: morphology, syntax, semantics.
Hence, analysis may be divided into morphological analysis (identification of word
endings, word compounds), syntactic analysis (identification of phrase structures,
dependency, subordination, etc.), semantic analysis (resolution of lexical and
structural ambiguities); synthesis may likewise pass through
- semantic synthesis (selection of appropriate compatible lexical and structural
forms),
- syntactic synthesis (generation of required phrase and sentence structures), and
- morphological synthesis (generation of correct word forms).
In transfer systems, the transfer component may also have separate programs
dealing with lexical transfer (selection of vocabulary equivalents) and with structural
transfer (transformation into TL appropriate structures). In some earlier forms of
transfer systems analysis did not involve a semantic stage, transfer was restricted to
the conversion of syntactic structures, i.e. syntactic transfer alone.
The direct translation approach was typical of the "first generation" of MT systems.
The indirect approach of interlingua and transfer based systems is often seen to
characterise the "second generation" of MT system types. Both are based
essentially on the specification of rules (for morphology, syntax, lexical selection,
semantic analysis, and generation). Most recently, corpus-based methods have
changed the traditional picture (see below). During the last five years, there is
beginning to emerge a "third generation" of hybrid systems combining the rule-
based approaches of the earlier types and the more recent corpus-based methods.