Professional Documents
Culture Documents
Language Learning
Jian-Cheng Wu , Kevin C. Yeh Thomas C. Chuang Wen-Chi Shei , Jason S. Chang
Department of Computer Science Department of Computer Science Department of Computer Science
National Tsing Hua University Van Nung Institute of Technology National Tsing Hua University
101, Kuangfu Road, Hsinchu, No. 1 Van-Nung Road 101, Kuangfu Road, Hsinchu, 300,
300, Taiwan, ROC Chung-Li Tao-Yuan, Taiwan, ROC Taiwan, ROC
g904307@cs.nthu.edu.tw tomchuang@cc.vit.edu.tw jschang@cs.nthu.edu.tw
analyses to extract bilingual phrases/collocations pressions. For this purpose, the system opens up
from a parallel corpus. The preferred syntactic pat- two text boxes for the user to enter queries in any
terns are obtained from idioms and collocations in one of the languages involved or both. We offer
the machine readable English-Chinese version of some special expressions for users to specify the
Longman Dictionary of Contemporary of English. following queries:
Phrases matching the patterns are extract from • Exact single word query - W. For instance,
aligned sentences in a parallel corpus. Those enter “work” to find citations that contain
phrases are subsequently matched up via cross lin- “work,” but not “worked”, “working”,
guistic statistical association. Statistical association “works.”
between the whole phrase as well as words in
phrases are used jointly to link a collocation and its • Exact single lemma query – W+. For in-
counterpart collocation in the other language. See stance, enter “work+” to find citations that
Table 1 for an example of extracting bilingual col- contain “work”, “worked”, “working”,
locations. The word and phrase level information is “works.”
kept in relational database for use in processing • Exact string query. For instance, enter “in
queries, hightlighting translation counterparts, and the work” to find citations that contain the
ranking citations. Sections 3 and 4 will give more three words, “in,” “the,” “work” in a row,
details about that. but not citations that contain the three words
in any other way.
3 The Queries
• Conjunctive and disjunctive query. For in-
The goal of the TotalRecall System is to allow a stance, enter “give+ advice+” to find cita-
user to look for instances of specific words or ex- tions that contain “give” and “advice.” It is
also possible to specify the distance between language learning tool. Currently, TotalRecall
“give” and “advice,” so they are from a VO uses Sinorama Magazine corpus as the translation
construction. Similarly, enter “hard | diffi- memory and will be continuously updated as new
cult | tough” to find citations that involve issues of the magazine becomes available. We
difficulty to do, understand or bear some- have already put a beta version on line and ex-
thing, using any of the three words. perimented with a focus group of second language
learners. Novel features of TotalRecall include
Once a query is submitted, TotalRecall dis-
highlighting of query and corresponding transla-
plays the results on Web pages. Each result ap-
tions, clustering and ranking of search results ac-
pears as a pair of segments, usually one sentence
cording translation and frequency.
each in English and Chinese, in side-by-side for-
mat. The words matching the query are high-
TotalRecall enable the non-native speaker who
lighted, and a “context” hypertext link is included
is looking for a way to express an idea in English
in each row. If this link is selected, a new page ap-
or Chinese. We are also adding on the basic func-
pears displaying the original document of the pair.
tions to include a log of user activities, which will
If the user so wishes, she can scroll through the
record the users’ query behavior and their back-
following or preceding pages of context in the
ground. We could then analyze the data and find
original document.
useful information for future research.
4 Ranking
Acknowledgement
It is well known that the typical user usual has no We acknowledge the support for this study through
patient to go beyond the first or second pages re- grants from National Science Council and Ministry
turned by a search engine. Therefore, ranking and of Education, Taiwan (NSC 90-2411-H-007-033-
putting the most useful information in the first one MC and MOE EX-91-E-FA06-4-4) and a special
or two is of paramount importance for search en- grant for preparing the Sinorama Corpus for distri-
gines. This is also true for a concordance. bution by the Association for Computational Lin-
guistics and Chinese Language Processing.
Experiments with a focus group indicate that
the following ranking strategies are important: References
Chuang, T.C. and J.S. Chang (2002), Adaptive Sentence
• Citations with a translation counterpart Alignment Based on Length and Lexical Information, ACL
2002, Companion Vol. P. 91-2.
should be ranked first. Gale, W. & K. W. Church, "A Program for Aligning Sen-
• Citations with a frequent translation coun- tences in Bilingual Corpora" Proceedings of the 29th An-
terpart appear before ones with less frequent nual Meeting of the Association for Computational
translation Linguistics, Berkeley, CA, 1991.
Macklovitch, E., Simard, M., Langlais, P.: TransSearch: A
• Citations with same translation counterpart Free Translation Memory on the World Wide Web. Proc.
should be shown in clusters by default. The LREC 2000 III, 1201--1208 (2000).
cluster can be called out entirely on demand. Nie, J.-Y., Simard, M., Isabelle, P. and Durand, R.(1999)
• Ranking by nonlinguistic features should Cross-Language Information Retrieval based on Parallel
Texts and Automatic Mining of Parallel Texts in the Web.
also be provided, including date, sentence Proceedings of SIGIR ’99, Berkeley, CA.
length, query position in citations, etc. Simard, M., G. Foster & P. Isabelle (1992), Using cognates to
align sentences in bilingual corpora. In Proceedings of
With various ranking options available, the users TMI92, Montreal, Canada, pp. 67-81.
Wu, Dekai (1994), Aligning a parallel English-Chinese corpus
can choose one that is most convenient and statistically with lexical criteria. In The Proceedings of the
productive for the work at hand. 32nd Annual Meeting of the Association for Computational
Linguistics, New Mexico, USA, pp. 80-87.
Wu, J.C. and J.S. Chang (2003), Bilingual Collocation Extrac-
5 Conclusion tion Based on Syntactic and Statistical Analyses, ms.
Yeh, K.C., T.C. Chuang, J.S. Chang (2003), Using Punctua-
In this paper, we describe a bilingual concordance tions for Bilingual Sentence Alignment- Preparing Parallel
designed as a computer assisted translation and Corpus for Distribution by the ACLCLP, ms.