A novel English distance learning system was developed through multimedia database and Internet technologies. The main function of this system is to query the English sentence pattern through keywords from the English multimedia corpus. The system not only teaches English grammar, but also, due to its database, allows teachers to understand the most frequent mistakes.
A novel English distance learning system was developed through multimedia database and Internet technologies. The main function of this system is to query the English sentence pattern through keywords from the English multimedia corpus. The system not only teaches English grammar, but also, due to its database, allows teachers to understand the most frequent mistakes.
A novel English distance learning system was developed through multimedia database and Internet technologies. The main function of this system is to query the English sentence pattern through keywords from the English multimedia corpus. The system not only teaches English grammar, but also, due to its database, allows teachers to understand the most frequent mistakes.
A Multimedia Database Supports Intelligent English Distance Learning
Ying-Hong Wang, Hui-Hua Huang, Chih-Hao Lin
(Department of Computer Science and Information Engineering, TamKang University, Taipei Hsien, Taiwan, Email: inhon@mail.tku.edu.tw, HHHuang@cs.tku.edu.tw, CHLin@cs.tku.edu.tw Abstract This work presents a novel English distance learning system that was developed through multimedia database and Internet technologies called English multimedia corpus. The system includes English articles, dialogs, and videos. A student can study English writing and reading as well as view Web browser listings to connect the Corpus server. In the system, semantic query and Link grammar" are applied to construct the English multimedia corpus system. Furthermore, it promotes the query level from keyword- base and content-based query to a semantic level. The main function of this system is to query the English sentence pattern through keywords from the English multimedia corpus. The other function is to detect grammatical errors in written English. Thus, the system not only teaches English grammar, but also, due to its database, allows teachers to understand the most frequent mistakes. Keywords: Multimedia database, Link grammar, Distance learning, Linguistic corpus 1. Introduction Traditional databases always store the character and numerical data. A primary function manages this basic data, for example, Students, School data, or Companies' employee and financial data. Following computer-based technological advancements, multiple types of data, such as image, audio, video, and hypermedia documents, are applied to represent computer information. Thus, database technologies must support multimedia data, which is known as a multimedia database. It includes many facets, such as content-based retrieval [6,15], shape detection and object recognition [8,10]. Meanwhile, these technical aspects promote its numerous application domains. Many examples of distance learning, digital television, and distance medicine are implemented based on those techniques. This study presents an Internet-based English learning environment that applies multimedia database technologies. Learning English is always a substantial difficulty for Chinese students. Some experts recommend that the best way to learn English is to establish a good study environment and practice it through several various approaches. This should make it a satisfying learning experience. Based on this viewpoint, teachers should provide students with a better environment to learn English. Distance learning based on Internet is one of the best solutions to create an English learning environment. Currently, the highly distributed computing environment has been supported and applied popularly. Networking services should perform as tutoring tools. Numerous innovative approaches in this field have been investigated. Through implementing computer-mediated education, many advocates emphasize its positive aspects and English learning tutoring systems, which are computer-based, have been developed by numerous academic research groups [19-31]. Incidentally, students learning a foreign language through computing devices might encounter many difficulties. Some articles have discussed whether students frustrations inhibit their educational opportunity [20, 21, 23]. These frustrations were based on three interrelated sources: lack of prompt feedback, ambiguous Web instructions, and technical problems. Hence, an easy-to-use English learning system was developed. The proposed system contains several preparations, which are relevant to English learning. While treating natural language processing, the system is constructed based on Link Grammar. However, in order to separate it from conventional approaches, modifications were included to improve its ability. The proposed English-learning corpus stores mistakes that students make. That is, all errors that students will probably make on a particular question are determined in advance. Each type of mistake has correcting suggestions and error descriptions. Corpus can not only provide suggestions but it also records the mistakes that Chinese students might make in English. Another function of this Internet-based multimedia corpus is that a user can locate sample sentences patterns from movie scripts, live dialogs and textbook reading. Each sentence from live dialog and movie captions is parsed and analyzed by link grammar, which includes special tags. The end product is then stored within this proposed database system. Consequently, based on semantic retrieval, the system subscribes to query methodology. In this paper, English sentence construction will be analyzed first. Notably, the sentence pattern is divided into nine classes. Secondly, each sentence pattern is parsed by link grammar, in which every pattern has a particular tag set. Consequently, the system constructs the query engine with these essential tag sets. In general, similar learning systems store single sentences or vocabularies and facilitate query through keywords [19- 22,24,25,27-31]. Semantic-based retrieval technologies are rarely used. This work discusses how to use the appended information, the special tag sets, which Link grammar generated to advance the query function from keyword-based to semantic- based. Several web sites regarding learning corpus [19,20,21,22], and study several papers in addition to those that pertain to Link grammar [4,9,13,16] were surveyed. Furthermore, Schulenburg proposed realistic feedback processing [9], Brill developed machine learning and automatic linguistic analysis [12], and Chang presented automatic linguistic resolution: framework and applications [3]. Other similar systems are referenced [4,5,14,16, 17]. In section 2, related works regarding Link grammar and XML are discussed. The system analysis for the proposed learning functions is introduced in section 3. Section 4 presents the architecture of the proposed system. Finally, section 5 contains the conclusion and discussions for future research. 2. Theoretical Backgrounds Link grammar is an English grammar parser system that was proposed by The School of Computer Science of Carnegie Mellon University (CMU). Link grammar is a context-free formula to describe natural language [32]. Link grammar consists of a set of words, which are the terminal symbols of the grammar, and each has a linking requirement. The linking requirements of each word are gathered in a dictionary. Figure 1 illustrates the linking requirements through a simple dictionary for the words, a/the, cat/mouse, John, ran, chased. a the cat nouse John ran chased 5 5 O D D O 5 5 O Fig. 1: words and connectors in a dictionary Each of the intricately shaped boxes is labeled connector. A pair of compatible connectors will join, and only one of those attached to a given black dot must be satisfied. Figure 2 depicts how the linking requirements are satisfied in the sentence, "The cat chased a mouse." the cat nouse a chased 5 5 D D O O D D Fig. 2: All linking requirements are satisfied to form a linkage The linkage can be perceived as a graph and the words can be treated as vertices, which are connected by labeled arcs. Thus, the graph is connected and planar. The labeled arcs that connect words to others on either their left or right are links. A valid parse is called a linkage. Thus, the following is a simplified form of the diagram indicating that the cat chased a mouse is part of this language. the cat chased a nouse D D O 5 Fig. 3: the simplified form of Fig. 2 Table 1 presents an abridged dictionary, which encodes linking requirements of the above example. Table 1: the words and linking requirements in a dictionary words formula a / the D+ cat / mouse D- & (O- or S+) John O- or S+ ran S- chased S- & O+ The linking requirement for each word is expressed as a formula involving that includes the operators &, and o r, parentheses, and connector names. The + or suffix on a connector indicates the direction in which the matching connector must lie be placed. Thus, the farther left a connector is in the within an expression phphrase, the nearer the word to which it connects must be. A sequence of words is a sentence defined by grammar, if links can be established among the words so as to satisfy the formula of each word. It includes the following meta-rules: Planarity: Links are drawn above the sentence and do not cross. Connectivity: Links connect all the words in the sequence. Ordering: When the connectors of a formula are traversed from left to right, the words to which they connect proceed from near to far. Namely, consider a word, and consider two links connecting that word to the word on its left. The link connecting the closest word (the shorter link) must satisfy a connector that appears to the left (in the formula) of that connector in the other word. The same process occurs for a link to the right. Exclusion: No two links may connect the same pair of words. The use of a formula to specify a link grammar dictionary is convenient for creating natural language grammars, however it is cumbersome for mathematical analysis there of, and as well as in describing algorithms to parse link grammar. An alternate method of expressing link grammar is known as disjunctive form, in which each word has an associated set of disjuncts. A disjunct is denoted as: ((L 1 , L 2 , ,L m )(R n , R n-1 , , R 1 )) Where L 1 , L 2 , , L m and R n , R n- 1 , , R 1 are the connectors that must connect to the left and right, respectively. Thus, simply rewriting each disjunct and combining them all with the or operator to create an appropriate formula translates a link grammar in from a disjunctive to a standard form can be completed by as follows: (L 1 &L 2 &&L m &R 1 & R 2 &&R n ) Enumerating all ways that the formula cab be satisfied translates a formula into a set of disjuncts. For example, the formula (A- or ( )) & D- & (B+ or ( )) & (O- or S+) corresponds to the following eight disjuncts, may be used in some linkages: ( (A,D) (S,B) ) ( (A,D,O) (B) ) ( (A,D) (S) ) ( (A,D,O) ( ) ) ( (D) (S,B) ) ( (D,O) (B) ) ( (D) (S) ) ( (D,O) ( ) ) 3. System analysis The proposed system contains two varieties of learning functions, or two sub-systems. These sub-systems are English pattern query system and English sentence error-detect system. The detail will be discussed in the following. 3.1 English pattern query system To develop this system, English sentence patterns were analyzed. Based on general analysis, there are five English sentence patterns, as follows: (1) Simple sentence pattern (2) Negative sentence pattern (3) Interrogative sentence pattern (4) WH question sentence pattern (What, When, Where,) (5) Imperative sentence pattern They can then be divided by tense, as follows: Simple pattern The simple present tense: He writes a letter everyday. The simple past tense: He wrote a letter yesterday. The simple future tense: He will write a letter tomorrow. Perfective pattern The present perfective tense: He has written a letter. The past perfective tense: He had written a letter when I arrived. The future perfective tense: He will have written a letter before I arrive. Continuous pattern The present continuous tense: He is writing a letter now. The past continuous tense: He was writing a letter when I arrived. The future continuous tense: He will be writing a letter when I arrive. Perfective proceed pattern The present perfective continuous tense: He is writing a letter now. The past perfective continuous tense: He was writing a letter when I arrived. Link grammar parses these sentences to determine the label correlation of their sentence patterns (Table 2~5). Table2: Simple sentence pattern and link grammar tags Active voice Passive voice Simple pattern present past future Ss or Sp Ss or Sp Ss or Sp + I Ss or Sp + Pv Ss or Sp + Pv Ss or Sp + Ix + Pv Continuou s pattern present past future Ss or Sp + PP Ss or Sp + PP Ss + If + PP Ss or Sp + ppf + Pv Ss or Sp + ppf + Pv Ss or Sp + If + ppf + Pv Proceed pattern present past future Ss or Sp + Pg Ss or Sp + Pg Ss or Sp + Ix + Pg Ss or Sp + Pg + Pv Ss or Sp + Pg + Pv none Perfective continuou s pattern present past Ss or Sp + ppf + Pg Ss or Sp + ppf + Pg Illustrate: Ss and Sp: connects subject nouns to finite verbs I: connects infinitive verb forms to certain words such as modal verbs and "to" PP: connects forms of "have" with past participles. PV: connects forms of the verb "be" to past participles. Pg: connects forms of the verb "be" to present participles. ppf: connects forms of the verb "be" to past participles been. Subsequently, each pattern receives a label constituent type. This characteristic parses each sentence in the system, which stores the resulting label set. For example, (Ss, ppf, pg) is a label set that was achieved and stored following link grammar parsing. Then, a user can employ query language to arrive at the English sentence pattern they desire. Initially, to ensure that a pattern is accurate, analysis of only one type of pattern is performed. The system can then be upgraded to handle multi- patterns. The system can also process multiple queries in several types of patterns. However, this has some difficulties, which are discussed in section 3.2. 3.2 English sentence error detect system In this section, the Enhanced Link Grammar, which is based on a dictionary with modified disjuncts, is illustrated. Herein, word linkage and the applications of Enhanced Link Grammar are discusses, respectively. Modified Dictionary In general, the formula of each word is complicated. Initially, the word that requires connector changes is analyzed. Various formulas impose distinct constraints on the proposed system. Thus, although changes may improve the system, many may not. To illustrate, simple words are tested prior to further explanation. For example, some verbs must be followed by an infinitive to, a gerund or a participle. However, several verbs can accept several varying types. For example, We enjoy walking in the rain. is accurate, but We enjoy to walk in the rain. Is not. However, the link of the word enjoy can be altered to accept the infinitive to. a portion of the formula is changed from (Pg) to (Pg o r TO***e9+). Hence, the word enjoy can be linked to not only a gerund but also an infinitive to with the additional connector TO***e9+. Notably, this connector contains a subscript ***e9, which indicates there is a type 9 error. The type 9 error indicates the mistake is happened when a transitive verb is used as intransitive verb. Subsequently, to the Link grammar with original dictionary (Fig. 4) and modified dictionary (Fig. 5), respectively, there are another two sentences We enjoy swimming. and We enjoy to swim. Figure 4presents that the link grammar that has the original dictionary can not accept an inaccurate sentence. That is, an error message No complete linkage found. will appear. Fig.4: the original link grammar parser However, Fig. 5 presents the experimental result when the same two sentences are parsed by the link grammar that contains the modified dictionary. Fig. 5: the modified link grammar parser Thus, enjoy can link to either a gerund or an infinitive to. Incidentally, there are ten types of catalogued error tags for error detection. In the next section, the error tag, including the links that can and cannot be modified, is discussed. 4. The System Architecture and Examples 4.1 System architecture Corpus is a multimedia database that provides a dictionary as well as record errors, comments, and system feedback. Similar to English essays, dialog, sentences, words, terms, and attributes, the proposed English-learning corpus also has positive study data. It promotes language teaching and research. In addition to the original data, it also stores multimedia data, including instructional films, and audio clips, movies, music videos. Therefore, it provides a vivid, attractive learning environment. This system, presented in Fig. 9 and 10, is a multimedia corpus that provides four main functions of basic language learning; they are listen, speech, read, and write. That is a user can read an essay, listen to Standard English pronunciation, write compositions, practice dialog and recite text. The database is a vital tool for linguistic researchers. Philologists can analysis mistakes that English as second language students typically make. This system can be used over the Internet to access the data from the multimedia learning corpus. Figure 6 illustrates how a user can access this system from any location. Fig. 6 System architecture overview As stated, the proposed system contains two sub-systems, which are English pattern query system and English sentence error detect system. Notably, they share the same database. Figure 7 depicts the architecture of the English pattern query system. Link grammar with the original dictionary parses the data, which produces link labels. The system then analyses and filters the link labels, which are stored into the database. When a user queries a sentence pattern, they use the interface of the system to obtain the desired pattern. Subsequently, the system will transform the query into readable data-type and, if possible, it will locate the relevant link label within the database. The system will retrieve the data and send it to the multimedia player on the Web. The advantage of this is that a user can replay messages several times until they clearly understand the correct pronunciation of the sentence, as well as the context in which the sentence should be used. Figure 8 depicts the architecture of the other second sub- system, English sentence error detect system. Lnglish sentence input Link grannar parser (enhanced) Label analysis & Iiliter ls there error tag appear access the data in the learner corpus DataBase out put Yes No put the error sentence into th database out put Learner Corpus The user can input a sentence into the system. Following analysis, the link grammar with the modified dictionary will parse the sentence, and filter the link labels if error tags are detected. The system will locate the description and error examples within the learner corpus and then display that information on the web page. However, if the system fails to understand an error, it may not have been defined or is too complex. Those sentences will be stored in the database for philologist analysis and further subsequent system enhancement. 4.2 System examples In this section, operation examples of the system, which is a web-based user interface, are presented. Figure 9 depicts the interface of the English pattern query system. This provides users with an easy-to-use interface; that is they simply need to drop down the menu and select the pattern that they want to read. The system will then return the query result with the multimedia data on the web browser (Fig. 10). Lnglish pattern Query Function data transIorn nultinedia player on the web Output DataBase Label analysis & Iilter Link grannar Parser (Original) data input Data/5entence Fig. 7 English pattern query system architecture Fig. 8 English sentence error detect system architecture Fig. 9 English pattern query system interface User(teacher,student..) multimedia web page Essay Database Movies Database Dialog Database SQL Server,IIS Internet This system also supports keyword-based queries. Figure 11 presents the interface of the English sentence error detect system. you can also use this kind of word detect system. After a sentence is input, link grammar with modified dictionary will determine errors. Figure 12 illustrates an example of a correctly input English sentence. However, Fig. 13 presents the systems response when an error is made When the linkage of the users input reveals one error tag, the system searches the Learner Corpus and provides both correct and incorrect examples as well as a description thereof (Fig. 14). 5. Conclusions and Future Work 5.1 The Contribution and Remarks After following analysis and experimentation, the system can easily use the label set, which is stored in the corpus and parsed with link grammar. Primarily, it assists junior or senior high school students to locate proper English sentence patterns. Numerous linguistic experts recommend that the best way to improve English speaking ability is to establish a realistic learning environment and practice English through several distinct approaches. The proposed interactive multimedia system promotes this type of learning. Sleator and Temperley developed Link Grammar, which is a word based parsing mechanism. Although the system was though to be an accurate grammar checker, unlike the proposed system, it fails to focus on fault tolerance ability, which is particularly useful for non-native English speaking students. The proposed system can parse sentences that the original Link Grammar cannot. Notably, our modified dictionary has prog ressed steadily. It appears that the idea proposed herein can be applied to other natural language processing and its corresponding applications. The system not only queries English sentence patterns, but also parses input sentences on-line. It is important to philologists to analysis sentences generated by students, so they can easily locate common or special mistakes. Subsequently, they can alter the focus of their course for a complete learning environment. This system provides a better and more interactive environment for both teachers and students. Furthermore, it combines the humanistic education with technology and Fig. 10 query result Fig. 12 correct sentence result Fig. 13 error sentence result Fig. 14 error suggestion Fig. 11 English sentence error detect system interface advances these two sciences for superior achievement. 5.2 Future Works Generally, words are a basic unit of communication that are common to all natural language texts and speech-processing activities. When teaching English, teachers always want to know the types of mistakes that students may make. Thus, according to various functional necessities, the proposed system can be extended to cover encompass more scalable ranges. The notions presented herein can aid in the development of other similar applications. XML has self-descriptive characteristics, which has resulted in increasing data interchange applications. These applications and research areas may become a popular subject in the future. Reference: [1] Wasfi Al-Khatib, Y. Francis Day, Arif Ghafoor, P. Bruce Berra, Semantic Modeling and Knowledge Representation in Multimedia Databases, IEEE Transactions on Knowledge and Data Engineering, Vol. 11 NO. 1 January/February 1999. [2] D. Beeferman, A. Berger and J. Lafferty, Text segmentation using exponential models, Proceedings of the Second Conference On Empirical Methods in NLP, Providence, RI, 1997. [3] Brill, E., Machine learning and automatic linguistic analysis: the next step, Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on Volume: 2 , 1998 , Page(s): 1033 -1036 vol.2 [4] C.M. Chung, Yuan-Kai Wang, Ying-Hong Wang and Chih- Hao Lin, The multimeaia corpus based on link grammmar, National Computer Symposium 99, Vol.1 A- 300~304. [5] C.M. Chung, Yuan-Kai Wang, Ying-Hong Wang and Chih- Hao Lin, The Design of Multimedia Database for English Learning Application proceeding of 2000 International Conference on Chinese Language Computing (ICCLC2000), pp. 77~81. [6] Shih-Fu Chang, Chen, W., Meng, H.J., Sundaram, H., Di Zhong Circuits and Systems for Video Technology, A Fully Automated Content-Based Video Search Engine Supporting Spatiotemporal Queries, IEEE Transactions on Volume: 8 5, Sept. 1998, Page(s): 602 615. [7] Chan, S.W.K., Automatic linguistic resolution: framework and applications, Systems, Man and Cybernetics, 1996., IEEE Internation Conference on Volume: 1 , 1996 , Page(s): 625 -630 vol.1. [8] Chang, S.-K.; Hsu, A., Image Information systems: where do we go from here? Knowledge and Data Engineering, IEEE Transactions on Volume: 4 5 , Oct. 1992 , Page(s): 431 442. [9] D. Grinberg, J. Lafferty and D. Sleator, A robust parsing algorithm for link grammars, Proceedings of the Fourth International Workshop on Parsing Technologies, Prague and Karlovy Vary, September 1995, pp.111-125. [10] F. Myron, S. Harpreet, N. Wayne, A. Jonathan, H. Qian, D. Byron, G. Monika, H. Jim, L. Denis and P. Dragutin, Query by Image and Video Content: The QBIC System, Computer v28 n9 Sept 1995 IEEE Los Alamitos CA USA p 23-32. [11] Yuichi Nakamura, Takeo Kanade, Semantic Analysis for Video Contents Extraction Spotting by Association in New Video, ACM Multimedia Conference, 1997. [12] Schulenburg, D., Sentence processing with realistic feedback, Neural Networks, 1992. IJCNN., International Joint Conference on Volume: 4 , 1992 , Page(s): 661 -666 vol.4. [13] D. Sleator, and D. Temperley, Parsing English with a link grammar, technical report CMU-CS-91-196, Department of Computer Science, Carnegie Mellon University, 1991. [14] chiro Yamada, Hideki Sumiyoshi, Yeun-Bea Kim and Masahiro Shibata, TV program skimming method using similarity of scene descriptions, SPIE Conference on multimedia Storage and Archiving Systems III, November 1998. [15] Yoshitaka, A.; Ichikawa, T., A survey on content-based retrieval for multimedia databases, Knowledge and Data Engineering, IEEE Transactions on Volume: 11 1, Jan.-Feb. 1999, Page(s): 81 93. [16] Yuan-Kai Wang, Ying-Hong Wang, Hsiu-Ping Lin and Chih-Feng Chao, An English-Learning Application based on extended link grammar formalism, National Computer Symposium 99, Vol. 2, B-306~310. [17] IHarry W. Agius, Marios C. Angelides, Developing Knowledge-Based Intelligent Multimedia Tutoring Systems Using Semantic Content- Based Modeling, Artificial Intelligence Review 13: 55-83, 1999. [18] Jian-Kang Wu, Content-Based Indexing of Multimedia Databases, IEEE Transactions on Knowledge and Data Engineering, Vol. 9, NO. 6, November/December 1997. [19] EFLWEB Home Page, http://www.eflweb.com/ [20] English grammar clinic, http://forums.educhat.com:8080/~grammar [21] Foreign Languages for Travelers, http://www.travlang.com/languages/ [22] G.R. Sampson: SUSANNE Scheme, http://www.cogs.susx.ac.uk/users/geoffs/RSue.html [23] Interactive Javascript Quizzes for ESL Students: http://www.aitech.ac.jp/~iteslj/quizzes/js/ [24] Language Dictionaries and Translators: http://rivendel.com/~ric/resources/dictionary.html [25] Linguistic Data Consortium LDC, http://www.ldc.upenn.edu/ [26] Link Grammar, http://bobo.link.cs.cmu.edu/link/ [27] Longman Dictionaries Home, http://www.awl- elt.com/dictionaries/ [28] Merriam-Webster Online, http://www.m-w.com/home.htm [29] On line English Grammar, http://www.edunet.com/english/grammar/ [30] Project Gutenberg Official Home Site, http://promo.net/pg/ [31] YourDictionary.COM, http://www.yourdictionary.com/ [32] John Lafferty, Daniel Sleator, and Davy Temperley Grammatical Trigrams: A Probabilistic Model of Link Grammar Presented to the 1992 AAAI Fall Symposium on Probabilistic Approaches to Natural Language.