An Analogical Learner for Morphological Analysis

An Analogical Learner for Morphological Analysis
Nicolas Stroppa & François Yvon

GET/ENST & LTCI, UMR 5141
46 rue Barrault, 75013 Paris, France
{stroppa,yvon}@enst.fr
Abstract induced, based on one or several analogs. The im-

plementation of this kind of inference process re-
Analogical learning is based on a two- quires techniques for searching for, and reasoning
step inference process: (i) computation with, structural mappings, hence the need to prop-
of a structural mapping between a new erly define the notion of analogical relationships and
and a memorized situation; (ii) transfer to efficiently implement their computation.
of knowledge from the known to the un- In Natural Language Processing (NLP), the typ-
known situation. This approach requires ical dimensionality of databases, which are made
the ability to search for and exploit such up of hundreds of thousands of instances, makes
mappings, hence the need to properly de- the search for complex structural mappings a very
fine analogical relationships, and to effi- challenging task. It is however possible to take ad-
ciently implement their computation. vantage of the specific nature of linguistic data to
work around this problem. Formal (surface) analog-
In this paper, we propose a unified defini-
ical relationships between linguistic representations
tion for the notion of (formal) analogical
are often a good sign of deeper analogies: a surface
proportion, which applies to a wide range
similarity between the word strings write and writer
of algebraic structures. We show that this
denotes a deeper (semantic) similarity between the
definition is suitable for learning in do-
related concepts. Surface similarities can of course
mains involving large databases of struc-
be misleading. In order to minimize such confu-
tured data, as is especially the case in Nat-
sions, one can take advantage of other specificities
ural Language Processing (NLP). We then
of linguistic data: (i) their systemic organization in
present experimental results obtained on
(pseudo)-paradigms, and (ii) their high level of re-
two morphological analysis tasks which
dundancy. In a large lexicon, we can indeed expect
demonstrate the flexibility and accuracy of
to find many instances of pairs like write-writer: for
this approach.
instance read -reader, review-reviewer...
Complementing surface analogies with statistical
1 Introduction information thus has the potential to make the search
problem tractable, while still providing with many
Analogical learning (Gentner et al., 2001) is based good analogs. Various attempts have been made to
on a two-step inductive process. The first step con- use surface analogies in various contexts: automatic
sists in the construction of a structural mapping be- word pronunciation (Yvon, 1999), morphological
tween a new instance of a problem and solved in- analysis (Lepage, 1999a; Pirrelli and Yvon, 1999)
stances of the same problem. Once this mapping and syntactical analysis (Lepage, 1999b). These ex-
is established, solutions for the new instance can be periments have mainly focused on linear represen-
120
Proceedings of the 9th Conference on Computational Natural Language Learning (CoNLL),
pages 120–127, Ann Arbor, June 2005. 2005
c Association for Computational Linguistics
tations of linguistic data, taking the form of finite our reasoning ability; it is also invoked to explain
sequences of symbols, using a restrictive and some- some human skills which do not involve any sort of
times ad-hoc definition of the notion of an analogy. conscious reasoning. This is the case for many tasks
The first contribution of this paper is to propose a related to the perception and production of language:
general definition of formal analogical proportions lexical access, morphological parsing, word pronun-
for algebraic structures commonly used in NLP: ciation, etc. In this context, analogical models have
attribute-value vectors, words on finite alphabets and been proposed as a viable alternative to rule-based
labeled trees. The second contribution is to show models, and many implementation of these low-
how these formal definitions can be used within an level analogical processes have been proposed such
instance-based learning framework to learn morpho- as decision trees, neural networks or instance-based
logical regularities. learning methods (see e.g. (Skousen, 1989; Daele-
This paper is organized as follows. In Section 2, mans et al., 1999)). These models share an accepta-
our interpretation of analogical learning is intro- tion of analogy which mainly relies on surface simi-
duced and related to other models of analogical larities between instances.
learning and reasoning. Section 3 presents a general Our learner tries to bridge the gap between these
algebraic framework for defining analogical propor- approaches and attempts to remain faithful to the
tions as well as its instantiation to the case of words idea of structural analogies, which prevails in the
and labeled trees. This section also discusses the AI literature, while also exploiting the intuitions of
algorithmic complexity of the inference procedure. large-scale, instance-based learning models.
Section 4 reports the results of experiments aimed
at demonstrating the flexibility of this model and at 2.2 Analogical learning
assessing its generalization performance. We con- We consider the following supervised learning task:
clude by discussing current limitations of this model a learner is given a set S of training instances
and by suggesting possible extensions. {X1 , . . . , Xn } independently drawn from some un-
known distribution. Each instance Xi is a vector
2 Principles of analogical learning containing m features: hXi1 , . . . , Xim i. Given S,
the task is to predict the missing features of partially
2.1 Analogical reasoning informed new instances. Put in more standard terms,
The ability to identify analogical relationships be- the set of known (resp. unknown) features for a new
tween what looks like unrelated situations, and to value X forms the input space (resp. output space):
use these relationships to solve complex problems, the projections of X onto the input (resp. output)
lies at the core of human cognition (Gentner et al., space will be denoted I(X) (resp. O(X)). This set-
2001). A number of models of this ability have ting is more general than the simpler classification
been proposed, based on symbolic (e.g. (Falken- task, in which only one feature (the class label) is
heimer and Gentner, 1986; Thagard et al., 1990; unknown, and covers many other interesting tasks.
Hofstadter and the Fluid Analogies Research group, The inference procedure can be sketched as fol-
1995)) or subsymbolic (e.g. (Plate, 2000; Holyoak lows: training examples are simply stored for fu-
and Hummel, 2001)) approaches. The main focus ture use; no generalization (abstraction) of the data
of these models is the dynamic process of analogy is performed, which is characteristic of lazy learning
making, which involves the identification of a struc- (Aha, 1997). Given a new instance X, we identify
tural mappings between a memorized and a new sit- formal analogical proportions involving X in the in-
uation. Structural mapping relates situations which, put space; known objects involved in these propor-
while being apparently very different, share a set of tions are then used to infer the missing features.
common high-level relationships. The building of An analogical proportion is a relation involv-
a structural mapping between two situations utilizes ing four objects A, B, C and D, denoted by
several subparts of their descriptions and the rela- A : B :: C : D and which reads A is to B as C is
tionships between them. to D. The definition and computation of these pro-
Analogy-making seems to play a central role in portions are studied in Section 3. For the moment,
121
we contend that it is possible to construct analogical 3 An algebraic framework for analogical
proportions between (possibly partially informed) proportions
objects in S. Let I(X) be a partially described ob-
ject not seen during training. The analogical infer- Our inductive model requires the availability of a de-
ence process is formalized as: vice for computing analogical proportions on feature
vectors. We consider that an analogical proportion
1. Construct the set T (X) ⊂ S 3 defined as: holds between four feature vectors when the propor-
tion holds for all components. In this section, we
T (X) = {(A, B, C) ∈ S 3 | propose a unified algebraic framework for defining
I(A) : I(B) :: I(C) : I(X)} analogical proportions between individual features.
After giving the general definition, we present its in-
2. For each (A, B, C) ∈ T (X), compute hy- stantiation for two types of features: words over a
\ by solving the equation:
potheses O(X) finite alphabet and sets of labelled trees.
\ = O(A) : O(B) :: O(C) :?
O(X) 3.1 Analogical proportions
This inference procedure shows lots of similari- Our starting point will be analogical proportions in
ties with the k-nearest neighbors classifier (k-NN) a set U , which we define as follows: ∀x, y, z, t ∈
which, given a new instance, (i) searches the training U, x : y :: z : t if and only if either x = y and z = t
set for close neighbors, (ii) compute the unknown or x = z and y = t. In the sequel, we assume that
class label according to the neighbors’ labels. Our U is additionally provided with an associative inter-
model, however, does not use any metric between nal composition law ⊕, which makes (U, ⊕) a semi-
objects: we only rely on the definition of analogical group. The generalization of proportions to semi-
proportions, which reveal systemic, rather than su- groups involves two key ideas: the decomposition of
perficial, similarities. Moreover, inputs and outputs objects into smaller parts, subject to alternation con-
are regarded in a symmetrical way: outputs are not straints. To formalize the idea of decomposition, we
restricted to a set of labels, and can also be structured define the factorization of an element u in U as:
objects such as sequences. The implementation of Definition 1 (Factorization)
the model still has to address two specific issues. A factorization of u ∈ U is a sequence u1 . . . un ,
with ∀i, ui ∈ U , such that: u1 ⊕ . . . ⊕ un = u.
• When exploring S 3 , an exhaustive search eval-
Each term ui is a factor of u.
uates |S|3 triples, which can prove to be in-
tractable. Moreover, objects in S may be The alternation constraint expresses the fact that
unequally relevant, and we might expect the analogically related objects should be made of alter-
search procedure to treat them accordingly. nating factors: for x : y :: z : t to hold, each factor
in x should be found alternatively in y and in z. This
• Whenever several competing hypotheses are yields a first definition of analogical proportions:
\ a ranking must be per-
proposed for O(X), Definition 2 (Analogical proportion)
formed. In our current implementation, hy- (x, y, z, t) ∈ U form an analogical proportion, de-
potheses are ranked based on frequency counts. noted by x : y :: z : t if and only if there exists some
These issues are well-known problems for k-NN factorizations x1 ⊕ . . . ⊕xd = x, y1 ⊕ . . . ⊕yd = y,
classifiers. The second one does not appear to be z1 ⊕ . . . ⊕ zd = z, t1 ⊕ . . . ⊕ td = t such that
critical and is usually solved based on a majority ∀i, (yi , zi ) ∈ {(xi , ti ), (ti , xi )}. The smallest d for
rule. In contrast, a considerable amount of effort has which such factorizations exist is termed the degree
been devoted to reduce and optimize the search pro- of the analogical proportion.
cess, via editing and condensing methods, as stud- This definition is valid for any semigroup, and a
ied e.g. in (Dasarathy, 1990; Wilson and Martinez, fortiori for any richer algebraic structure. Thus, it
2000). Proposals for solving this problem are dis- readily applies to the case of groups, vector spaces,
cussed in Section 3.4. free monoids, sets and attribute-value structures.
122
3.2 Words over Finite Alphabets w(1) : ε w(k) : ε
0 1 k
3.2.1 Analogical Proportions between Words ε : w(1) ε : w(k)
Let Σ be a finite alphabet. Σ? denotes the set of
finite sequences of elements of Σ, called words over Figure 1: The transducer Tw computing comple-
Σ. Σ? , provided with the concatenation operation . mentary sets wrt w.
is a free monoid whose identity element is the empty
word ε. For w ∈ Σ? , w(i) denotes the ith symbol in
w. In this context, definition (2) can be re-stated as:
Definition 3 (Analogical proportion in (Σ? ,.)) Shuffle The shuffle u • v of two words u and v is
(x, y, z, t) ∈ Σ? form an analogical proportion, de- introduced e.g. in (Sakarovitch, 2003) as follows:
noted by x : y :: z : t if and only if there exists some
integer d and some factorizations x1 . . . xd = x, u • v = {u1 v1 u2 v2 . . . un vn , st. ui , vi ∈ Σ? ,
y1 . . . yd = y, z1 . . . zd = z, t1 . . . td = t such that u1 . . . un = u, v1 . . . vn = v}
∀i, (yi , zi ) ∈ {(xi , ti ), (ti , xi )}.
An example of analogy between words is: The shuffle of two words u and v contains all the
words w which can be composed using all the sym-
viewing : reviewer :: searching : researcher bols in u and v, subject to the condition that if a
with x1 = , x2 = view, x3 = ing and t1 = re, precedes b in u (or in v), then it precedes b in w.
t2 = search, t3 = er. This definition generalizes Taking, for instance, u = abc and v = def , the
the proposal of (Lepage, 1998). It does not ensure words abcdef , abdef c, adbecf are in u • v; this
the existence of a solution to an analogical equation, is not the case with abef cd. This operation gen-
nor its uniqueness when it exists. (Lepage, 1998) eralizes straightforwardly to languages. The shuf-
gives a set of necessary conditions for a solution to fle of two regular languages is regular (Sakarovitch,
exist. These conditions also apply here. In particu- 2003); the automaton A, computing K •L, is derived
from the automata AK = (Σ, QK , qK 0 , F , δ ) and
lar, if t is a solution of x : y :: z :?, then t contains, K K
0
AL = (Σ, QL , qL , FL , δL ) recognizing respectively
in the same relative order, all the symbols in y and z
that are not in x. As a consequence, all solutions of K and L as the product automata A = (Σ, QK ×
QL , (qK0 , q 0 ), F × F , δ), where δ is defined as:
an equation have the same length. L K L
δ((qK , qL ), a) = (rK , rL ) if and only if either
3.2.2 A Finite-state Solver δK (qK , a) = rK and qL = rL or δL (qL , a) = rL
Definition (3) yields an efficient procedure for and qK = rK .
solving analogical equations, based on finite-state The notions of complementary set and shuffle are
transducers. The main steps of the procedure are related through the following property, which is a
sketched here. A full description can be found in direct consequence of the definitions.
(Yvon, 2003). To start with, let us introduce the no-
tions of complementary set and shuffle product. w ∈ u • v ⇔ u ∈ w\v
Complementary set If v is a subword of w, the Solving analogical equations The notions of
complementary set of v with respect to w, denoted shuffle and complementary sets yield another
by w\v is the set of subwords of w obtained by re- characterization of analogical proportion between
moving from w, in a left-to-right fashion, the sym- words, based on the following proposition:
bols in v. For example, eea is a complementary sub-
Proposition 1.
word of xmplr with respect to exemplar. When v is
not a subword of w, w\v is empty. This notion can ∀x, y, z, t ∈ Σ? , x : y :: z : t ⇔ x • t ∩ y • z 6= ∅
be generalized to any regular language.
The complementary set of v with respect to w is An analogical proportion is thus established if the
a regular set: it is the output language of the finite- symbols in x and t are also found in y and z, and ap-
state transducer Tw (see Figure 1) for the input v. pear in the same relative order. A corollary follows:
123
Proposition 2. S
S
t is a solution of x : y :: z :? ⇔ t ∈ (y • z)\x NP VP PP
NP VP NP :
his car AUX VP by NP
The set of solutions of an analogical equation the police have VP his car
have been VP the police
x : y :: z :? is a regular set, which can be computed impounded
impounded
with a finite-state transducer. It can also be shown ::
that this analogical solver generalizes the approach S
S
based on edit distance proposed in (Lepage, 1998).
NP VP PP
NP VP NP
3.3 Trees :
AUX VP by NP
the cat
the mouse has VP the cat
Labelled trees are very common structures in NLP has been VP the mouse
eaten
tasks: they can represent syntactic structures, or
eaten
terms in a logical representation of a sentence. To
express the definition of analogical proportion be- Figure 2: Analogical proportion between trees.
tween trees, we introduce the notion of substitution.
Definition 4 (Substitution)
If we use, for trees, the solver based on tree lin-
A (single) substitution is a pair (variable ← tree).
earizations, the resolution of an equation amounts,
The application of the substitution (v ← t0 ) to a tree
in both cases, to solving analogies on words.
t consists in replacing each leaf of t labelled by v by
the tree t0 . The result of this operation is denoted: The learning algorithm introduced in Section 2.2
t(v ← t0 ). For each variable v, we define the binary is a two-step procedure: a search step and a trans-
operator /v as t /v t0 = t (v ← t0 ). fer step. The latter step only involves the resolu-
tion of (a restricted number of) analogical equations.
Definition 2 can then be extended as:
When x, y and z are known, solving x : y :: z :?
Definition 5 (Analogical proportion (trees)) amounts to computing the output language of the
(x, y, z, t) ∈ U form an analogical propor- transducer representing (y • z)\x: the automaton
tion, denoted by x : y :: z : t iff there exists some for this language has a number of states bounded by
variables (v1 , . . . , vn−1 ) and some factorizations | x | × | y | × | z |. Given the typical length of words in
x1 /v1 . . . /vn−1 xn = x, y1 /v1 . . . /vn−1 yn = y, our experiments, and given that the worst-case ex-
z1 /v1 . . . /vn−1 zn = z, t1 /v1 . . . /vn−1 tn = t such ponential bound for determinizing this automaton is
that ∀i, (yi , zi ) ∈ {(xi , ti ), (ti , xi )}. hardly met, the solving procedure is quite efficient.
An example of such a proportion is illustrated on The problem faced during the search procedure
Figure 2 with syntactic parse trees. is more challenging: given x, we need to retrieve
This definition yields an effective algorithm all possible triples (y, z, t) in a finite set L such
computing analogical proportions between trees that x : y :: z : t. An exhaustive search requires
(Stroppa and Yvon, 2005). We consider here a sim- the computation of the intersection of the finite-
pler heuristic approach, consisting in (i) linearizing state automaton representing the output language of
labelled trees into parenthesized sequences of sym- (L • L)\x with the automaton for L. Given the size
bols and (ii) using the analogical solver for words of L in our experiments (several hundreds of thou-
introduced above. This approach yields a faster, al- sands of words), a complete search is intractable and
beit approximative algorithm, which makes analogi- we resort to the following heuristic approach.
cal inference tractable even for large tree databases. L is first split into K bins {L1 , ..., LK }, with | Li |
small with respect to | L |. We then randomly select
3.4 Algorithmic issues
k bins and compute, for each bin Li , the output lan-
We have seen how to compute analogical relation- guage of (Li • Li )\x, which is then intersected with
ships for features whose values are words and trees. L: we thus only consider triples containing at least
124
two words from the same bin. It has to be noted that of analysis makes it possible to reconstruct the series
the bins are not randomly constructed: training ex- of morphological operations deriving a lemma, to
amples are grouped into inflectional or derivational compute its root, its part-of-speech, and to identify
families. To further speed up the search, we also im- morpheme boundaries. This information is required,
pose an upper bound on the degree of proportions. for instance, to compute the pronunciation of an un-
All triples retrieved during these k partial searches known word; or to infer the compositional meaning
are then merged and considered for the transfer step. of a complex (derived or compound) lemma. Bins
The computation of analogical relationships has gather entries sharing a common root.
been implemented in a generic analogical solver;
input=acrobatically; output =
this solver is based on Vaucanson, an automata ma-
B
nipulation library using high performance generic HH
programming (Lombardy et al., 2003). H
A B|A.
4 Experiments HH
N A|N. ally
4.1 Methodology
acrobat ic
The main purpose of these experiments is to demon-
Figure 3: Input/output pair for task 2. Bound mor-
strate the flexibility of the analogical learner. We
phemes have a compositional type: B|A. denotes a
considered two different supervised learning tasks,
suffix that turns adjectives into adverbs.
both aimed at performing the lexical analysis of iso-
lated word forms. Each of these tasks represents a These experiments use the English, German, and
possible instantiation of the learning procedure in- Dutch morphological tables of the CELEX database
troduced in Section 2.2. (Burnage, 1990). For task 1, these tables contain
The first experiment consists in computing one respectively 89 000, 342 000 and 324 000 different
or several vector(s) of morphosyntactic features to word forms, and the number of features to predict is
be associated with a form. Each vector comprises respectively 6, 12, and 10. For task 2, which was
the lemma, the part-of-speech, and, based on the only conducted with English lemma, the total num-
part-of-speech, additional features such as number, ber of different entries is 48 407.
gender, case, tense, mood, etc. An (English) in- For each experiment, we perform 10 runs, using
put/output pair for this tasks thus looks like: in- 1 000 randomly selected entries for testing1 . Gen-
put=replying; output={reply; V-pp--}, where the eralization performance is measured as follows: the
placeholder ’-’ denotes irrelevant features. Lexi- system’s output is compared with the reference val-
cal analysis is useful for many applications: a POS ues (due to lexical ambiguity, a form may be asso-
tagger, for instance, needs to “guess” the possi- ciated in the database with several feature vectors
ble part(s)-of-speech of unknown words (Mikheev, or parse trees). Per instance precision is computed
1997). For this task, we use the definition of analog- as the relative number of correct hypotheses, i.e.
ical proportions for “flat” feature vectors (see sec- hypotheses which exactly match the reference: for
tion 3.1) and for word strings (section 3.2). The task 1, all features have to be correct; for task 2, the
training data is a list of fully informed lexical en- parse tree has to be identical to the reference tree.
tries; the test data is a list of isolated word forms Per instance recall is the relative number of refer-
not represented in the lexicon. Bins are constructed ence values that were actually hypothesized. Preci-
based on inflectional families. sion and recall are averaged over the test set; num-
The second experiment consists in computing a bers reported below are averaged over the 10 runs.
morphological parse of unknown lemmas: for each Various parameters affect the performance: k, the
input lemma, the output of the system is one or sev- number of randomly selected bins considered during
eral parse trees representing a possible hierarchical the search step (see Section 3.4) and d, the upper
decomposition of the input into (morphologically 1
Due to lexical ambiguity, the number of tested instances is
categorized) morphemes (see Figure 3). This kind usually greater than 1 000.
125
bound of the degree of extracted proportions. given bin all the members of the same derivational
(rather than inflectional) family. For Dutch, these
4.2 Experimental results results are comparable with the results reported in
Experimental results for task 1 are given in Tables 1, (van den Bosch and Daelemans, 1999), who report
2 and 3. For each main category, two recall and pre- an accuracy of about 92% on the task of predicting
cision scores are computed: one for the sole lemma the main syntactic category.
and POS attributes (left column); and one for the
Rec. Prec.
lemma and all the morpho-syntactic features (on the
Morphologically Complex 46.71 70.92
right). In these experiments, parameters are set as
Others 17.00 46.86
follows: k = 150 and d = 3. As k grows, both recall
and precision increase (up to a limit); k = 150 ap- Table 4: Results on task 2 for English
pears to be a reasonable trade-off between efficiency
and accuracy. A further increase of d does not sig- The second task is more challenging since the ex-
nificantly improve accuracy: taking d = 3 or d = 4 act parse tree of a lemma must be computed. For
yields very comparable results. morphologically complex lemmas (involving affixa-
tion or compounding), it is nevertheless possible to
Lemma + POS Lemma + Features obtain acceptable results (see Table 4, showing that
Rec. Prec. Rec. Prec. some derivational phenomena have been captured.
Nouns 76.66 94.64 75.26 95.37 Further analysis is required to assess more precisely
Verbs 94.83 97.14 94.79 97.37 the potential of this method.
Adjectives 26.68 72.24 27.89 87.67 From a theoretical perspective, it is important to
Table 1: Results on task 1 for English realize that our model does not commit us to a
morpheme-based approach of morphological pro-
Lemma + POS Lemma + Features cesses. This is obvious in task 1; and even if
Rec. Prec. Rec. Prec. task 2 aims at predicting a morphematic parse of in-
Nouns 71.39 92.17 54.59 74.75 put lemmas, this goal is achieved without segment-
Verbs 96.75 97.85 93.26 94.36 ing the input lemma into smaller units. For in-
Adjectives 91.59 96.09 90.02 95.33 stance, our learner parses the lemma enigmatically
as: [[[.N enigma][.A|N ical]]B|A. ly], that is with-
Table 2: Results on task 1 for Dutch out trying to decide to which morph the orthographic
t should belong. In this model, input and output
Lemma + POS Lemma + Features spaces are treated symmetrically and correspond to
Rec. Prec. Rec. Prec. distinct levels of representation.
Nouns 93.51 98.28 77.32 81.70
Verbs 99.55 99.69 90.50 90.63 5 Discussion and future work
Adjectives 99.14 99.28 99.01 99.15 In this paper, we have presented a generic analog-
Table 3: Results on task 1 for German ical inference procedure, which applies to a wide
range of actual learning tasks, and we have detailed
As a general comment, one can note that high its instantiation for common feature types. Prelimi-
generalization performance is achieved for lan- nary experiments have been conducted on two mor-
guages and categories involving rich inflectional phological analysis tasks and have shown promising
paradigms: this is exemplified by the performance generalization performance.
on all German categories. English adjectives, at These results suggest that our main hypotheses
the other end of this spectrum, are very difficult to are valid: (i) searching for triples is tractable even
analyze. A simple and effective workaround for with databases containing several hundred of thou-
this problem consists in increasing the size the sub- sands instances; (ii) formal analogical proportions
lexicons (Li in Section 3.4) so as to incorporate in a are a reliable sign of deeper analogies between lin-
126
guistic entities; they can thus be used to devise flex- Konikov, editors, The analogical mind, pages 161–
ible and effective learners for NLP tasks. 195. The MIT Press, Cambridge, MA.
This work is currently being developed in various Yves Lepage. 1998. Solving analogies on words: An
directions: first, we are gathering additional experi- algorithm. In Proceedings of COLING-ACL ’98, vol-
mental results on several NLP tasks, to get a deeper ume 2, pages 728–735, Montréal, Canada.
understanding of the generalization capabilities of Yves Lepage. 1999a. Analogy+tables=conjugation.
our analogical learner. One interesting issue, not In G. Friedl and H.G. Mayr, editors, Proceedings of
addressed in this paper, is the integration of vari- NLDB ’99, pages 197–201, Klagenfurt, Germany.
ous forms of linguistic knowledge in the definition
Yves Lepage. 1999b. Open set experiments with direct
of analogical proportions, or in the specification of analysis by analogy. In Proceedings of NLPRS ’99,
the search procedure. We are also considering al- volume 2, pages 363–368, Beijing, China.
ternative heuristic search procedures, which could
Sylvain Lombardy, Raphaël Poss, Yann Régis-Gianas,
improve or complement the approaches presented in and Jacques Sakarovitch. 2003. Introducing Vaucan-
this paper. A possible extension would be to define son. In Proceedings of CIAA 2003, pages 96–107.
and take advantage of non-uniform distributions of
Andrei Mikheev. 1997. Automatic rule induction for
training instances, which could be used both during unknown word guessing. Computational Linguistics,
the searching and ranking steps. We finally believe 23(3):405–423.
that this approach might also prove useful in other
application domains involving structured data and Vito Pirrelli and François Yvon. 1999. Analogy in the
lexicon: a probe into analogy-based machine learning
are willing to experiment with other kinds of data. of language. In Proceedings of the 6th International
Symposium on Human Communication, Santiago de
Cuba, Cuba.
References
Tony A. Plate. 2000. Analogy retrieval and processing
David W. Aha. 1997. Editorial. Artificial Intelligence with distributed vector representations. Expert sys-
Review, 11(1-5):7–10. Special Issue on Lazy Learn- tems, 17(1):29–40.
ing.
Jacques Sakarovitch. 2003. Eléments de théorie des au-
Gavin Burnage. 1990. CELEX: a guide for users. Tech- tomates. Vuibert, Paris.
nical report, University of Nijmegen, Center for Lexi-
cal Information, Nijmegen. Royal Skousen. 1989. Analogical Modeling of Lan-
guage. Kluwer, Dordrecht.
Walter Daelemans, Antal Van Den Bosch, and Jakub Za-
vrel. 1999. Forgetting exceptions is harmful in lan- Nicolas Stroppa and François Yvon. 2005. Formal
guage learning. Machine Learning, 34(1–3):11–41. models of analogical relationships. Technical report,
ENST, Paris, France.
B.V. Dasarathy, editor. 1990. Nearest neighbor (NN)
Norms: NN Pattern Classification Techniques. IEEE Paul Thagard, Keith J. Holoyak, Greg Nelson, and David
Computer Society Press, Los Alamitos, CA. Gochfeld. 1990. Analog retrieval by constraint satis-
faction. Artificial Intelligence, 46(3):259–310.
Brian Falkenheimer and Dedre Gentner. 1986. The
structure-mapping engine. In Proceedings of the meet- Antal van den Bosch and Walter Daelemans. 1999.
ing of the American Association for Artificial Intelli- Memory-based morphological processing. In Pro-
gence (AAAI), pages 272–277. ceedings of ACL, pages 285–292, Maryland.
Dedre Gentner, Keith J. Holyoak, and Boicho N. D. Randall Wilson and Tony R. Martinez. 2000. Reduc-
Konikov, editors. 2001. The Analogical Mind. The tion techniques for instance-based learning algorithms.
MIT Press, Cambridge, MA. Machine Learning, 38(3):257–286.
Douglas Hofstadter and the Fluid Analogies Research François Yvon. 1999. Pronouncing unknown words us-
group, editors. 1995. Fluid Concepts and Creative ing multi-dimensional analogies. In Proc. Eurospeech,
Analogies. Basic Books. volume 1, pages 199–202, Budapest, Hungary.
Keith J. Holyoak and John E. Hummel. 2001. Under- François Yvon. 2003. Finite-state machines solving
standing analogy within a biological symbol system. analogies on words. Technical report, ENST.
In Dedre Gentner, Keith J. Holyoak, and Boicho N.
127

An Analogical Learner for Morphological Analysis

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

An Analogical Learner for Morphological Analysis

Uploaded by

Copyright:

Available Formats

An Analogical Learner for Morphological Analysis

Nicolas Stroppa & François Yvon

Abstract induced, based on one or several analogs. The im-

You might also like