Project Note Template

Text Simplification for Reading Assistance: A Project Note
Kentaro Inui Atsushi Fujita Tetsuro Takahashi Ryu Iida

Nara Advanced Institute of Science and Technology
Takayama, Ikoma, Nara, 630-0192, Japan
finui,atsush-f,tetsu-ta,ryu-ig@is.aist-nara.ac.jp
Tomoya Iwakura
Fujitsu Laboratories Ltd.
Kamikodanaka, Nakahara, Kawasaki, Kanagawa, 211-8588, Japan
iwakura.tomoya@jp.fujitsu.com
Abstract then reported on the present results on three of the

four, readability assessment, paraphrase representa-
This paper describes our ongoing research
tion and post-transfer error detection, in the subse-
project on text simplification for congenitally
quent sections.
deaf people. Text simplification we are aiming
at is the task of offering a deaf reader a syn- 2 Research issues and our approach
tactic and lexical paraphrase of a given text for
assisting her/him to understand what it means. 2.1 Readability assessment
In this paper, we discuss the issues we should
The process of text simplification for reading as-
address to realize text simplification and re-
sistance can be decomposed into the following three
port on the present results in three different
subprocesses:
aspects of this task: readability assessment,
paraphrase representation and post-transfer er- a. Problem identification: identify which portions of
ror detection. a given text will be difficult for a given user to
read,
1 Introduction b. Paraphrase generation: generate possible candi-
This paper reports on our ongoing research into date paraphrases from the identified portions, and
text simplification for reading assistance. Potential c. Evaluation: re-assess the resultant texts to choose
users targeted in this research are congenitally deaf the one in which the problems have been resolved.
people (more specifically, students at (junior-)high
Given this decomposition, it is clear that one of the
schools for the deaf), who tend to have difficulties
key issues in reading assistance is the problem of as-
in reading and writing text. We are aiming at the
sessing the readability or comprehensibility1 of text
development of the technology of text simplification
because it is involved in subprocesses (a) and (c).
with which a reading assistance system lexically and
structurally paraphrases a given text into a simpler Readability assessment is doubtlessly a tough is-
and plainer one that is thus more comprehensible. sue (Williams et al., 2003). In this project, however,
The idea of using paraphrases for reading as- we argue that, if one targets only a particular popu-
sistance is not necessarily novel. For example, lation segment and if an adequate collection of data
Carroll et al. (1998) and Canning and Taito (1999) is available, then corpus-based empirical approaches
report on their project in which they address syn- may well be feasible. We have already proven that
tactic transforms aiming at making newspaper text one can collect such readability assessment data by
accessible to aphasics. Following this trend of re- conducting survey questionnaires targeting teachers
search, in this project, we address four unexplored at schools for the deaf.
issues as below besides the user- and task-oriented 1
evaluation of the overall system. In this paper, we use the terms readability and comprehen-
sibility interchangeably, while strictly distinguishing them from
Before going to the detail, we first clarify the four legibility of each fragment (typically, a sentence or paragraph)
issues we have addressed in the next section. We of a given text.
9
2.2 Paraphrase acquisition syntactic information is highly abstracted away as
One of the good findings that we obtained through in (Dorna et al., 1998; Richardson et al., 2001),
the aforementioned surveys is that there are a broad does not suit this task. Provided that the morpho-
range of paraphrases that can improve the readabil- syntactic stratum be an optimal level of abstraction
ity of text. A reading assistance system is, therefore, for representing paraphrasing/transfer patterns, one
hoped to be able to generate sufficient varieties of must recall that semantic-transfer approaches such as
paraphrases of a given input. To create such a sys- those cited above were motivated mainly by the need
tem, one needs to feed it with a large collection of for reducing the complexity of transfer knowledge,
paraphrase patterns. Very timely, the acquisition of which could be unmanageable in morpho-syntactic
paraphrase patterns has been actively studied in re- transfer.
cent years: Our approach to this problem is to (a) leave the de-
Manual collection of paraphrases in the context of scription of each transfer pattern underspecified and
(b) implement the knowledge about linguistic con-
language generation, e.g. (Robin and McKeown,
straints that are independent of a particular trans-
1996),
Derivation of paraphrases through existing lexical fer pattern separately from the transfer knowledge.
There are a wide range of such transfer-independent
resources, e.g. (Kurohashi et al., 1999), linguistic constraints. Constraints on morpheme
Corpus-based statistical methods inspired by the connectivity, verb conjugation, word collocation,
work on information extraction, e.g. (Jacquemin, and tense and aspect forms in relative clauses are typ-
1999; Lin and Pantel, 2001), and ical examples of such constraints.
Alignment-based acquisition of paraphrases from
comparable corpora, e.g. (Barzilay and McKe- These four issues can be considered as different
own, 2001; Shinyama et al., 2002; Barzilay and aspects of the overall question how one can make
Lee, 2003). the development and maintenance of a gigantic re-
source for paraphrasing tractable. (1) The introduc-
One remaining issue is how effectively these meth-
tion of readability assessment would free us from
ods contribute to the generation of paraphrases in our
cares about the purposiveness of each paraphrasing
application-oriented context.
rule in paraphrase acquisition. (2) Paraphrase ac-
2.3 Paraphrase representation quisition is obviously indispensable for scaling up
One of the findings obtained in the previous stud- the resource. (3) A good formalism for representing
ies for paraphrase acquisition is that the automatic paraphrasing rules would facilitate the manual re-
acquisition of candidates of paraphrases is quite re- finement and maintenance of them. (4) Post-transfer
alizable for various types of source data but acquired error detection and revision would make the system
collections tend to be rather noisy and need manual tolerant to flows in paraphrasing rules.
cleaning as reported in, for example, (Lin and Pan- While many researchers have addressed the issue
tel, 2001). Given that, it turns out to be important to of paraphrase acquisition reporting promising results
devise an effective way of facilitating manual correc- as cited above, the other three issues have been left
tion and a standardized scheme for representing and relatively unexplored in spite of their significance in
storing paraphrase patterns as shared resources. the above sense. Motivated by this context, in the
Our approach is (a) to define first a fully express- rest of this paper, we address these remaining three.
ible formalism for representing paraphrases at the 3 Readability assessment
level of tree-to-tree transformation and (b) devise an To the best of our knowledge, there have never
additional layer of representation on its top that is de- been no reports on research to build a computational
signed to facilitate handcoding transformation rules. model of the language proficiency of deaf people, ex-
2.4 Post-transfer text revision cept for the remarkable reports by Michaud and Mc-
In paraphrasing, the morpho-syntactic informa- Coy (2001). As a subpart of their research aimed at
tion of a source sentence should be accessible developing the ICICLE system (McCoy and Master-
throughout the transfer process since a morpho- man, 1997), a language-tutoring application for deaf
syntactic transformation in itself can often be a mo- learners of written English, Michaud and McCoy de-
tivation or goal of paraphrasing. Therefore, such veloped an architecture for modeling the writing pro-
an approach as semantic transfer, where morpho- ficiency of a user called SLALOM. SLALOM is de-
10
signed to capture the stereotypic linear order of ac- proficiency testing.
quisition within certain categories of morphological 3.1.2 Questionnaire
and/or syntactic features of language. Unfortunately,
In the questionnaire, each question consists of sev-
the modeling method used in SLALOM cannot be
eral paraphrases, as shown in Figure 1 (a), where
directly applied to our domain for three reasons.
(A) is a source sentence, and (B) and (C) are para-
Unlike writing tutoring, in reading assistance, tar- phrases of (A). Each respondent was asked to as-
get sentences are in principle unlimited. We sess the relative readability of the paraphrases given
therefore need to take a wider range of morpho- for each source sentence, as shown in Figure 1 (b).
syntactic features into account. The respondent judged sentence (A) to be the most
SLALOM is not designed to capture the difficulty difficult and judged (B) and (C) to be comparable.
of any combination of morpho-syntactic features, A judgment that sentence si is easier than sentence
which it is essential to take into account in reading sj means that si is judged likely to be understood
assistance. by a larger subset of students than sj . We asked
Given the need to consider feature combinations, the respondents to annotate the paraphrases with
a simple linear order model that is assumed in format-free comments, giving the reasons for their
SLALOM is unsuitable. judgments, alternative paraphrases, etc., as shown in
3.1 Our approach: We ask teachers Figure 1 (b).
To overcome these deficiencies, we took yet an- To make our questionnaire efficient for model ac-
other approach where we designed a survey ques- quisition, we had to carefully control the variation in
tionnaire targeting teachers at schools for the deaf, paraphrases. To do that, we first selected around 50
and have been collecting readability assessment data. morpho-syntactic features that are considered influ-
In this questionnaire, we ask the teachers to compare ential in sentence readability for deaf people. For
the readability of a given sentence with paraphrases each of those features, we collected several sim-
of it. The use of paraphrases is of critical importance ple example sentences from various sources (literacy
in our questionnaire since it makes manual readabil- textbooks, grammar references, etc.). We then man-
ity assessment significantly easier and more reliable. ually produced several paraphrases from each of the
3.1.1 Targets collected sentences so as to remove the feature that
characterized the source sentence from each para-
We targeted teachers of Japanese or English liter-
phrase. For example, in Figure 1, the feature char-
acy at schools for the deaf for the following reasons.
acterizing sentence (A) is a non-restrictive relative
Ideally, this sort of survey would be carried out
clause (i.e., sentence (A) was selected as an example
by targeting the population segment in question, i.e.,
of this feature). Neither (B) nor (C) has this feature.
deaf students in our study. In fact, pedagogists and
We also controlled the lexical variety to minimize
psycholinguists have made tremendous efforts to ex-
the effect of lexical factors on readability; we also
amine the language proficiency of deaf students by
restricted the vocabulary to a top-2000 basic word
giving them proficiency tests. Such efforts are very
set (NIJL, 1991).
important, but they have had difficulty in capturing
enough of the picture to develop a comprehensive 3.1.3 Administration
and implementable reading proficiency model of the We administrated a preliminary survey targeting
population due to the expense of extensive language three teachers. Through the survey, we observed that
proficiency testing. (a) the teachers largely agreed in their assessments of
In contrast, our approach is an attempt to model relative readability, (b) their format-free comments
the knowledge of experts in this field (i.e., teaching indicated that the observed differences in readabil-
deaf students). The targeted teachers have not only ity were largely explainable in terms of the morpho-
rich experiential knowledge about the language pro- syntactic features we had prepared, and (c) a larger-
ficiency of their students but are also highly skilled in scaled survey was needed to obtain a statistically re-
paraphrasing to help their students’ comprehension. liable model. Based on these observations, we con-
Since such knowledge gleaned from individual ex- ducted a more comprehensive survey, in which we
periences already has some generality, extracting it prepared 770 questions and sent questionnaires with
through a survey should be less costly and thus more a random set of 240 of them to teachers of Japanese
comprehensive than investigation based on language or English literacy at 50 schools for the deaf. We
11
Figure 1: Sample question and response
asked them to evaluate as many as possible anony- signed to si (sj ), map it to real value dor (t; s) 2
mously. We obtained 4080 responses in total (8.0 [0; 1] so that the lowest degree maps to 0 and the
responses per question). highest degree maps to 1. For example, the de-
3.2 Readability ranking model gree of readability assigned to (A) in Figure 1 (b)
The task of ranking a set of paraphrases can be de- maps to around 0.1, whereas that assigned to (B)
composed into comparisons between two elements maps to around 0.9.
P
3. D(si ; sj ) = jTs1s j t2Ts s dor(t; si ) dor(t; sj ):
combinatorially selected from the set. We consider i j i j
the problem of judging which of a given pair of para- Output score ScM (si ; sj ) 2 [ 1; 1] for input
phrase sentences is more readable/comprehensible (si ; sj ) was given by the normalized distance be-
for deaf students. More specifically, given para- tween (si ; sj ) and the hyperplane.
phrase pair (si ; sj ), our problem is to classify it into
3.3 Evaluation and discussion
either left (si is easier), right (sj is easier), or com-
parable (si and sj are comparable). To evaluate the two modeling methods, we con-
Once the problem is formulated this way, we can ducted a ten-fold cross validation on the set of 4055
use various existing techniques for classifier learn- paraphrase pairs derived from the 770 questions used
ing. So far, we have examined a method of using the in the survey. To create a feature vector space, we
support vector machine (SVM) classification tech- used 355 morpho-syntactic features. Feature annota-
nique. tion was done semi-automatically with the help of a
A training/testing example is paraphrase pair morphological analyzer and dependency parser.
(si ; sj ) coupled with its quantified class label
The task was to classify a given paraphrase pair
D (si ; sj ) 2 [ 1; 1]. Each sentence si is character- into either left, right, or comparable. Model M ’s
ized by a binary feature vector Fsi , and each pair output class for ((si ; sj ) was given by
left (ScM (si ; sj ) m )
(si ; sj ) is characterized by a triple of feature vectors
Cls M (si ; sj ) = right (ScM (si ; sj ) m ) ;
hFsCi sj ; FsLi sj ; FsRisj i, where comparable (otherwise)
where m 2 [ 1; 1] is a variable threshold used to
F i j = F i ^ F j (features shared by s and s ),
C
s s s s i j
balance precision with recall.
F i j = F i ^ F j (features belonging only to s ),
L
s s s s i We used the 473 paraphrase pairs that satisfied the
F i j = F i ^ F j (features belonging only to s ).
R
s s s s j
following conditions:
D (si ; sj ) represents the difference in readability be- jD(si; sj )j was not less than threshold a (a =
tween si and sj ; it is computed in the following way. n
0:5). The answer of (si ; sj ) is given by
(D(si ; sj ) a )
1. Let Tsi sj be the set of respondents who assessed Cls Ans(si ; sj ) = left
right (D(si ; sj ) a )
:
(si ; sj ). (si; sj ) must have been assessed by more then one
2. Given the degree of readability respondent t as- respondent, i.e., jTsi sj j > 1:
12
Agreement ratio Agr (si ; sj ) must be suffi- Atomic vs. compositional paraphrases The pro-
ciently high, i.e., Agr (si ; sj ) 0:9, where cess of paraphrasing (2a) into (2c) is compositional
Agr (si ; sj ) = (for (si ; sj ) agst(si ; sj ))= because it can be decomposed into two subpro-
jTsi sj j, and for (si; sj ) and agst(si; sj ) are the cesses, (2a) to (2b) and (2b) to (2c). In develop-
number of respondents who agreed and disagreed ing a resource for paraphrasing, we have only to
with Cls Ans (si ; sj ), respectively. cover non-compositional (i.e., atomic) paraphrases.
We judged output class ClsM (si ; sj ) correct if and Compositional paraphrases can be handled if an ad-
only if Cls M (si ; sj ) = Cls Ans (si ; sj ). The overall ditional computational mechanism for combining
performance was evaluated based on recall Rc and atomic paraphrases is devised.
precision P r : Meaning-preserving vs. reference-preserving
jf(si ;sj )j Cls M (si; sj ) is correctgj paraphrases It is also useful to distinguish
Rc =
jf(si ;sj )j Cls Ans(si ;sj )2fleft ;right ggj reference-preserving paraphrases from meaning-
jf(si ;sj )j Cls M (si; sj ) is correctgj . preserving ones. The above example in (3) is of the
Pr =
jf(si ;sj )j Cls M (si;sj )2fleft ;right gj reference-preserving type. This types of paraphras-
The model achieved 95% precision with 89% re- ing requires the computation of reference to objects
call. This result confirmed that the data we collected outside discourse and thus should be excluded from
through the questionnaires were reasonably noiseless our scope for the present purpose.
and thus generalizable. Furthermore, both models 4.2 Dependency trees (MDSs)
exhibited a clear trade-off between recall and preci- Previous work on transfer-based machine transla-
sion, indicating that their output scores can be used tion (MT) suggests that the dependency-based repre-
as a confidence measure. sentation has the advantage of facilitating syntactic
transforming operations (Meyers et al., 1996; Lavoie
4 Paraphrase representation
et al., 2000). Following this, we adopt dependency
We represent paraphrases as transfer patterns be- trees as the internal representations of target texts.
tween dependency trees. In this section, we propose We suppose that a dependency tree consists of a set
a three-layered formalism for representing transfer of nodes each of which corresponds to a lexeme or
patterns. compound and a set of edges each of which repre-
4.1 Types of paraphrases of concern sents the dependency relation between its ends. We
There are various levels of paraphrases as the fol- call such a dependency tree a morpheme-based de-
lowing examples demonstrate: pendency structure (MDS). Each node in an MDS is
supposed to be annotated with an open set of typed
(1) a. She burst into tears, and he tried to comfort features that indicate morpho-syntactic and semantic
her. information. We also assume a type hierarchy in de-
b. She cried, and he tried to console her. pendency relations that consists of an open set of de-
pendency classes including dependency, compound,
(2) a. It was a Honda that John sold to Tom.
parallel, appositive and insertion.
b. John sold a Honda to Tom.
4.3 Three-layered representation
c. Tom bought a Honda from John.
Previous work on transfer-based MT sys-
(3) a. They got married three years ago. tems (Lavoie et al., 2000; Dorna et al., 1998)
b. They got married in 2000. and alignment-based transfer knowledge acqui-
sition (Meyers et al., 1996; Richardson et al.,
Lexical vs. structural paraphrases Example (1) 2001) have proven that transfer knowledge can be
includes paraphrases of the single word “comfort” best represented by declarative structure mapping
and the canned phrase “burst into tears”. The sen- (transforming) rules each of which typically consists
tences in (2), on the other hand, exhibit structural of a pair of source and target partial structures as in
and thus more general patterns of paraphrasing. Both the middle of Figure 2.
types of paraphrases, lexical and structural para- Adopting such a tree-to-tree style of representa-
phrases, are considered useful for many applications tion, however, one has to address the issue of the
including reading assistance and thus should be in trade-off between expressibility and comprehensi-
the scope our discussion. bility. One may want a formalism of structural
13
rule can usually be abbreviated if a means to auto-
matically complement it is provided. We use a parser
rule editing
simplified MDS transfer rule
and macros to do so; namely, the rule translator com-
N shika V- nai -> V no wa N dake da.
plements an SSR rule by macro expansion and pars-
(someone does not V to nothing but N) (it is only to N that someone does V) ing to produce the corresponding SR rule specifica-
translation
tions. The advantages of introducing the SSR rule
MDS transfer rule layer are the following:
X1
pos: aux_verb
lex: nai (not) X6
pos: aux_verb
lex: da (copula) The SSR rule formalism allows a rule writer to
pos: postp pos: postp edit rules with an ordinary text editor, which
X2 aux_verb* X7 X11
lex: wa (TOP) lex: dake (only)
makes the task of rule editing much more efficient
X3 pos: verb X8 pos: noun
X12 pos: noun
than providing her/him with a GUI-based com-
lex: no (thing)
(=X5) plex tool for editing SR rules directly.
The use of the extended natural language also
pos: postp
X4 X9 vws (=X2)
lex: shika (except)
X5 pos: noun X10 pos: verb (=X3) has the advantage in improving the readability of
compilation
rules for rule writers, which is particularly impor-
MDS processing operators tant in group work.
sp_rule(108, negation, RefNode) :-
match(RefNode, X4=[pos:postp,lex: shika]),
To parse SSR rules, one can use the same parser
depend(X3=[pos:verb], empty, X4), as that used to parse input texts. This also im-
depend(X1=[pos:aux_verb,lex: nai], proves the efficiency of rule development because
X2=[pos:aux_verb*], X3),
depend(X4, empty, X5=[pos:noun]),
it significantly reduces the burden of maintaining
replace(X1, X6=[pos:aux_verb,lex: da]), the consistency between the POS-tag set used for
substitute(X5, X12=[pos:noun]), parsing input and that used for rule specifications.
move_dtrs(X5, X12),
substitute(X3, X10=[pos:verb]), The SSR rule layer shares underlying motiva-
:
tions with the formalism reported by Hermjakob et
Figure 2: Three-layered rule representation al. (2002). Our formalism is, however, considerably
transformation patterns that is powerful enough to extended so as to be licensed by the expressibility of
represent a sufficiently broad range of paraphrase the SR rule representation and to be annotated with
patterns. However, highly expressible formalisms various types of rule applicability conditions includ-
would make it difficult to create and maintain rules ing constraints on arbitrary features of nodes, struc-
manually. tural constraints, logical specifications such as dis-
junction and negation, closures of dependency rela-
To mediate this trade-off, we devised a new layer
tions, optional constituents, etc.
of representation to add on the top of the layer of
tree-to-tree pattern representation as illustrated in The two layers for paraphrase representation
Figure 2. At this new layer, we use an extended natu- are fully implemented on our paraphrasing engine
ral language to specify transformation patterns. The K URA (Takahashi et al., 2001) coupled with another
language is designed to facilitate the task of hand- layer for processing MDSs (the bottom layer illus-
coding transformation rules. For example, to define trated in Figure 2). The whole system of K URA
the tree-to-tree transformation pattern given in the and part of the transer rules implemented on it
middle of Figure 2, a rule editor needs only to spec- (see Section 5 below) are available at http://cl.aist-
ify its simplified form: nara.ac.jp/lab/kura/doc/.
(4) N shika V- nai ! V no ha N dake da. 5 Post-transfer error detection
(Someone does V to nothing but N ! It is only to What kinds of transfer errors tend to occur in lex-
N that someone does V) ical and structural paraphrasing? To find it out, we
A rule of this form is then automatically translated conducted a preliminary investigation. This section
into a fully-specified tree-to-tree transformation rule. reports a summary of the results. See (Fujita and
We call a rule of the latter form an MDS rewriting Inui, 2002) for further details.
rule (SR rule), and a rule of the former form a sim- We implemented over 28,000 transfer rules for
plified SR rule (SSR rule). Japanese paraphrases on the K URA paraphrasing en-
The idea is that most of the specifications of an SR gine based on the rules previously reported in (Sato,
14
1999; Kondo et al., 1999; Kondo et al., 2001; Iida et novel approach in which we conducted questionnaire
al., 2001) and existing lexical resources such as the- surveys to collect readability assessment data and
sauri and case frame dictionaries. The implemented took a corpus-based empirical method to obtain a
rules ranged from such lexical paraphrases as those readability ranking model. The results of the sur-
that replace a word with its synonym to such syn- veys show the potential impact of text simplification
tactic/structural paraphrases as those that remove a on reading assistance. We conducted experiments on
cleft construction from a sentence, devide a sentence, the task of comparing the readability of a given para-
etc. We then fed K URA with a set of 1,220 sentences phrase pair and obtained promising results by SVM-
randomly sampled from newspaper articles and ob- based classifier induction (95% precision with 89%
tained 630 transferred output sentences. recall). Our approach should be equally applicable
The following are the tendencies we observed: to other population segments such as aphasic read-
The transfer errors observed in the experiment ex- ers and second-language learners. Our next steps
includes the investigation of the drawbacks of the
hibited a wide range of variety from morphologi-
cal errors to semantic and discourse-related ones. present bag-of-features modeling approach. We also
Most types of errors tended to occur regardless need to consider a method to introduce the notion
of user classes (e.g. beginner, intermediate and ad-
of the types of transfer. This suggests that if one
vanced). Textual aspects of readability will also need
creates an error detection module specialized for
to be considered, as discussed in (Inui and Nogami,
a particular error type, it works across different
2001; Siddahrthan, 2003).
types of transfer.
The most frequent error type involved inappropri- Regarding paraphrase representation, we pre-
sented our revision-based lexico-structural para-
ate conjugation forms of verbs. It is, however, phrasing engine. It provides a fully expressible
a matter of morphological generation and can be scheme for representating paraphrases, while pre-
easily resolved. serving the easiness of handcraft paraphrasing rules
Errors in regard to verb valency and selectional by providing an extended natural language as a
restriction also tended to be frequent and fatal, means of pattern editting. We have handcrafted over
and thus should have preference as a research a thousand transfer rules that implement a broad
topic. range of lexical and structural paraphrasing.
The next frequent error type was related to the The problem of error detection is also critical.
difference of meaning between near synonyms. When we find a effective solution to it, we will be
However, this type of errors could often be de- ready to integrate the technologies into an applica-
tected by a model that could detect errors of verb tion system of text simplification and conduct user-
valency and selectional restriction. and task-oriented evaluations.
Based on these observations, we concluded that Acknowledgments
the detection of incorrect verb valences and verb-
complement cooccurrence was one of the most se- The research presented in this paper was partly
rious problems that should have preference as a re- funded by PREST, Japan Science and Technology
search topic. We are now conducting experiments Corporation. We thank all the teachers at the schools
on empirical methods for detecting this type of er- for the deaf who cooperated in our questionnaire sur-
rors (Fujita et al., 2003). vey and Toshihiro Agatsuma (Joetsu University of
Education) for his generous and valuable coopera-
6 Conclusion tion in the survey. We also thank Yuji Matsumoto
This paper reported on the present results of our and his colleagues (Nara Advanced Institute of Sci-
ongoing research on text simplification for reading ence and Technology) for allowing us to use their
assistance targeting congenitally deaf people. We NLP tools ChaSen and CaboCha, Taku Kudo (Nara
raised four interrelated issues that we needed address Advanced Institute of Science and Technology) for
to realize this application and presented our previ- allowing us to use his SVM tool, and Takaki Makino
ous activities focuing on three of them: readabil- and his colleagues (Tokyo University) for allow-
ity assessment, paraphrase representation and post- ing us to use LiLFeS, with which we implemented
transfer error detection. K URA. We also thank the anonymous reviewers for
Regarding readability assessment, we proposed a their suggestive and encouraging comments.
15
References (ACL), pages 481–488.
Lavoie, B. Kittredge, R. Korelsky, T. Rambow, O. 2000.
Barzilay, R. and McKeown, K. 2001. Extracting para- A framework for MT and multilingual NLG ystems
phrases from a parallel corpus. In Proc. of the 39th An- based on uniform lexico-structural processing. In Proc.
nual Meeting and the 10th Conference of the European of ANLP-NAACL.
Chapter of Association for Computational Linguistics
(EACL), pages 50–57. Lin, D. and Pantel, P. 2001. Discovery of inference rules
for question-answering. Natural Language Engineer-
Barzilay, R. and Lee, L. 2003. Learning to paraphrases: an ing, 7(4):343–360.
unsupervised approach using multiple-sequence align-
ment. In Proc. of HLT-NAACL. McCoy ,K. F. and Masterman (Michaud), L. N. 1997. A
Tutor for Teaching English as a Second Language for
Canning, Y. and Taito, J. 1999. Syntactic simplification of Deaf Users of American Sign Language, In Proc. of
newspaper text for aphasic readers. In Proc. of the 22nd ACL/EACL ’97 Workshop on Natural Language Pro-
Annual International ACM SIGIR Conference (SIGIR). cessing for Communication Aids.
Carroll, J., Minnen, G., Canning, Y., Devlin, S. and Tait, J. Meyers, A., Yangarber, R. and Grishman, R. 1996. Align-
1998. Practical simplification of English newspaper ment of shared forests for bilingual corpora. In Proc.
text to assist aphasic readers. In Proc. of AAAI-98 of the 16th International Conference on Computational
Workshop on Integrating Artificial Intelligence and As- Linguistics (COLING), pages 460–465.
sistive Technology.
Michaud, L. N. and McCoy, K. F. 2001. Error profiling:
Dorna, M., Frank, A., Genabith, J. and Emele, M. 1998. toward a model of English acquisition for deaf learn-
Syntactic and semantic transfer with F-structures. In ers. In Proc. of the 39th Annual Meeting and the 10th
Proc. of COLING-ACL, pages 341–347. Conference of the European Chapter of Association for
Fujita, A. and Inui, K. 2002. Decomposing linguistic Computational Linguistics (EACL), pages 386–393.
knowledge for lexical paraphrasing. In Information NIJL, the National Institute for Japanese Language. 1991.
Processing Society of Japan SIG Technical Reports, Nihongo Kyôiku-no tame-no Kihon-Goi Ch ôsa (The
NL-149, pages 31–38. (in Japanese) basic lexicon for the education of Japanese). Shuei
Fujita, A., Inui, K. and Matsumoto, Y. 2003. Automatic Shuppan, Japan. (In Japanese)
detection of verb valency errors in paraphrasing. In In- Richardson, S., Dolan, W., Menezes, A. and Corston-
formation Processing Society of Japan SIG Technical Oliver, M. 2001. Overcoming the customization bottle-
Reports, NL-156. (in Japanese) neck using example-based MT. In Proc. of the 39th An-
Hermjakob, U., Echihabi, A. and Marcu, D. 2002. Nat- nual Meeting and the 10th Conference of the European
ural language based reformulation resource and Web Chapter of Association for Computational Linguistics
exploitation for question answering. In Proc. of the (EACL), pages 9–16.
TREC-2002 Conference. Robin, J. and McKeown, K. 1996. Empirically designing
Iida, R., Tokunaga, Y., Inui, K. and Eto, J. 2001. Explo- and evaluating a new revision-based model for sum-
ration of clause-structural and function-expressional mary generation. Artificial Intelligence, 85(1–2):135–
paraphrasing using KURA. In Proc. of the 63th Annual 179.
Meeting of Information Processing Society of Japan, Sato, S. 1999. Automatic paraphrase of technical pa-
pages 5–6. (in Japanese). pers’ titles. Journal of Information Processing Society
Inui, K. and Nogami, M. 2001. A paraphrase-based explo- of Japan, 40(7):2937–2945. (in Japanese).
ration of cohesiveness criteria. In Proc. of the Eighth Shinyama, Y., Sekine, S. Kiyoshi, Sudo. and Grishman,
European Workshop on Natulan Language Generation, R. 2002. Automatic paraphrase acquisition from news
pages 101–110. articles. In Proc. of HLT, pages 40–46.
Jacquemin, C. 1999. Syntagmatic and paradigmatic rep- Siddahrthan, A. 2003. Preserving discourse structure
resentations of term variations. In Proc. of the 37th when simplifying text. In Proc. of European Workshop
Annual Meeting of the Association for Computational on Natural Language Generation, pages 103–110.
Linguistics (ACL), pages 341–349.
Takahashi, T., Iwakura, T., Iida, R., Fujita, A. and Inui, K.
Kondo, K., Sato, S. and Okumura, M. 1999. Paraphras- 2001. K URA: a transfer-based lexico-structural para-
ing of “sahen-noun + suru”. Journal of Information phrasing engine. In Proc. of the 6th Natural Language
Processing Society of Japan, 40(11):4064–4074. (in Processing Pacific Rim Symposium (NLPRS) Workshop
Japanese). on Automatic Paraphrasing: Theories and Applica-
Kondo, K., Sato, S. and Okumura, M. 2001. Para- tions, pages 37–46.
phrasing by case alternation. Journal of Informa- Williams, S., Reiter, E. and Osman, L. 2003. Experiments
tion Processing Society of Japan, 42(3):465–477. (in with discourse-level choices and readability. In Proc. of
Japanese). European Workshop on Natural Language Generation,
Kurohashi, S. and Sakai, Y. 1999. Semantic analysis of pages 127–134.
Japanese noun phrases: a new approach to dictionary-
based understanding. In Proc. of the 37th Annual Meet-
ing of the Association for Computational Linguistics
16

Project Note Template

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Project Note Template

Uploaded by

Copyright:

Available Formats

Text Simplification for Reading Assistance: A Project Note

Kentaro Inui Atsushi Fujita Tetsuro Takahashi Ryu Iida

Abstract then reported on the present results on three of the

You might also like