Professional Documents
Culture Documents
Muhammad Aatif
Department of Computer Science
University of Peshawar
MS Computer Science (2017-18)
Prof. Dr. Mohammad Abid Khan
Free SlideSalad PowerPoint Template Copyright (C) SlideSalad.com All rights reserved.
Introduction 01
Literature Review 02
Research Objectives 05
Methodology 06
Publication 10
Free SlideSalad PowerPoint Template Copyright (C) SlideSalad.com All rights reserved.
Introduction
Discourse / Discourse Unit
A piece of text consisting of at lease one sentence where the sentences (if many) are linked to each other.
[ Ali is son of Jamila. He is ten years old. Kashif is his elder brother.] [ Kashif is a student of MS Computer Science. He studies in
University of Peshawar.] (Inconsistent Annotation)
Therefore,
An algorithm for the identification of Discourse Boundaries(DBs) is developed. Needs preprocessed text.
However, Majority are Word-level & Sentence-level which is a poor way of processing natural text because natural text is very
coherent.
The relevant NLP systems need such a unit of processing which is complete semantically & referentially and describe the sub/idea
entirely – the motivation.
Hence, the machine is happy to process the input text unit-by-unit having such a unit of processing.
Impact: Great impact on many NLP systems such as Text understanding, simplification, translation, summarization, Question-
Answering systems.
---------------- Example of
------------- Text Selected Selected Selected a Question
------------- Database Doc Page(s) DU Answering
---------------- System
Therefore, the relevant NLP systems would boost up in terms of accuracy, efficient processing and getting more effective & useful
results when the problem under focus is solved.
Objective #1
To create an anaphorically annotated corpus of English text.
Objective #2
To design an innovative algorithm based on the knowledge of
the anaphorically annotated text.
Objective #3
To Implement the algorithm for the Identification DBs.
Main Goal: Identification of DBs. Used a Corpus-based approach. why? Python 3.6 & Spyder IDE
2 possible Approaches: 1) Standard way in NLP 2) Not used before Performed experiments on 24 docs
1) Use or 2) Don’t use a corpus from the corpus.
Applicable to Dialogue
Creating of an anaphorically annotated corpus was a difficult stage. Therefore, multiple different options was considered however,
we were lucky to find Phrase Detectives Corpus 2.1.4 (PD2).
PD2 is an anaphorically annotated corpus. 1 st version 2016, 2nd version 2019. Two subset are Silver & Gold.
542 docs, 408153 tokens & 49990 markables (any linguistic expression of interest).
On average, 12.6 annotations per markable. The Gold subset has less errors compared to silver because ...
2. Incorrect Annotation
Markables are annotated but incorrectly.
4. Algorithm Errors
Errors produced by the developed algorithm
Input text need to be: Making the system wholly automatic &
Anaphorically annotated & incorporation of the algorithm in other NLP
In XML form systems to check its usefulness
Submitted for publication on 26.01.2020 in Journal of Information Communication Technologies & Robotic
Applications (JICTRA). An ‘X’ category journal recognized by HEC.
[2] A. R. Tayar, S. R. Tandan, and M. A. Tayal, “A Research on Discourse Access,” IJRTE, vol. 8, no. 2S11, pp. 827–830, Nov. 2019, doi:
10.35940/ijrte.B1135.0982S1119.
[3] R. Ali, M. A. Khan, M. Bilal, and I. Rabbi, “Reciprocal anaphora resolution in Pashto discourse,” in 2008 4th International Conference on Emerging
Technologies, Oct. 2008, pp. 1–5, doi: 10.1109/ICET.2008.4777464.
[4] P. A. Heeman, D. Byron, and J. F. Allen, “Identifying Discourse Markers in Spoken Dialog,” presented at the AAAI 1998 Spring Symposium on Applying
Machine Learning to Discourse Processing, Menlo Park, California, March 1998., pp. 44–51.
[5] K. Tomiyama, F. Nihei, Y. I. Nakano, and Y. Takase, “Identifying Discourse Boundaries in Group Discussions using a Multimodal Embedding Space,” in IUI
Workshops, 2018, vol. 2068.
[6] P. Furkó, “The Boundaries of Discourse Markers – Drawing Lines through Manual and Automatic Annotation,” The Journal of Sapientia Hungarian
University of Transylvania, vol. 10, no. 2, pp. 155–170, Nov. 2018, doi: 10.2478/ausp-2018-0020.
[7] M. Palomar et al., “An Algorithm for Anaphora Resolution in Spanish Texts,” Comput. Linguist., vol. 27, no. 4, pp. 545–567, Dec. 2001.
[8] S. Singh, P. Lakhmani, P. Mathur, and S. Morwal, “Analysis of Anaphora Resolution System for English Language,” IJIT, vol. 3, no. 2, pp. 51–57, Apr. 2014,
doi: 10.5121/ijit.2014.3205.
[9] R. Bunescu, “Associative Anaphora Resolution: A Web-Based Approach,” in Proceedings of the 2003 EACL Workshop on The Computational Treatment of
Anaphora, 2003.
[10] R. J. Evans and C. Orasan, “NP Animacy Identification for Anaphora Resolution,” jair, vol. 29, pp. 79–103, Jun. 2007, doi: 10.1613/jair.2179.
[11] P. Lakhmani, S. Singh, and S. Morwal, “Performance Analysis of two Anaphora Resolution System for Hindi Language,” vol. 3, no. 3, pp. 576–580, 2014.
[12] M. A. Khan and F. T. Zuhra, “Role of Corpus in Anaphora Resolution,” presented at the Corpus Linguistics, ICC Birmingham, Jul. 2011.
[13] R. Ali, M. A. Khan, R. Ahmad, and I. Rabbi, “Rule based personal references resolution in pashto discourse for better machine translation,” in 2008 Second
International Conference on Electrical Engineering, Mar. 2008, pp. 1–6, doi: 10.1109/ICEE.2008.4553941.
[14] R. Iida, K. Inui, and Y. Matsumoto, “Anaphora resolution by antecedent identification followed by anaphoricity determination,” ACM Trans. Asian Lang.
Inf. Process., vol. 4, no. 4, pp. 417–434, 2005, doi: 10.1145/1113308.1113312.
[15] M. Sen, N. Shah, and L. Kurup, “An algorithm for resolution of Anaphora in English text,” in 2017 International Conference on Innovations in Information,
Embedded and Communication Systems (ICIIECS), Mar. 2017, pp. 1–5, doi: 10.1109/ICIIECS.2017.8276078.
[16] J. van Kuppevelt, “Discourse structure, topicality and questioning,” JL, vol. 31, no. 1, pp. 109–147, Mar. 1995, doi: 10.1017/S002222670000058X.
[18] S. Ullah, M. A. Hussain, and K. S. Kwak, “Resolution of Unidentified Words in Machine Translation,” CoRR, Nov. 2009.
[19] T. A. van Dijk, “Principles of Critical Discourse Analysis,” Discourse & Society, vol. 4, no. 2, pp. 249–283, Apr. 1993, doi: 10.1177/0957926593004002006.
[20] M. Jørgensen and L. Phillips, Discourse analysis as theory and method. London ; Thousand Oaks, Calif: Sage Publications, 2002.
[21] M. Patel, A. Chokshi, S. Vyas, and K. Maurya, “Machine Learning Approach for Automatic Text Summarization Using Neural Networks‖,” International
Journal of Advanced Research in Computer and Communication Engineering, vol. 7, no. 1, 2018.
[22] M.-Y. Day and C.-Y. Chen, “Artificial Intelligence for Automatic Text Summarization,” in 2018 IEEE International Conference on Information Reuse and
Integration (IRI), Jul. 2018, pp. 478–484, doi: 10.1109/IRI.2018.00076.
[23] M. A. Khan, Text Based Machine Translation, 1st ed. Department of Computer Science, University of Peshawar, Peshawar, 1995.
[25] M. Poesio and R. Artstein, “Anaphoric Annotation in the ARRAU Corpus.,” presented at the Proceedings of the International Conference on Language
Resources and Evaluation, LREC 2008, Marrakech, Morocco, Jan. 2008.
[27] J. Chamberlain, M. Poesio, and U. Kruschwitz, “Phrase Detectives Corpus 1.0 Crowdsourced Anaphoric Coreference.,” in Proceedings of the Tenth
International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia, May 2016, pp. 2039–2046.
[28] L. von Ahn, “Games with a purpose,” Computer, vol. 39, no. 6, pp. 92–94, Jun. 2006, doi: 10.1109/MC.2006.196.
[30] K. M. Seddik and A. Farghaly, “Anaphora Resolution,” in Natural Language Processing of Semitic Languages, I. Zitouni, Ed. Berlin, Heidelberg: Springer
Berlin Heidelberg, 2014, pp. 247–277.
[31] R. Sonbol, G. Rebdawi, and N. Ghneim, “Anaphora Resolution in Business Process Requirement Engineering,” IJECE, vol. 8, no. 3, p. 1766, Jun. 2018, doi:
10.11591/ijece.v8i3.pp1766-1773.
[32] A. Kozlova, A. Svischev, O. Gureenkova, and T. Batura, “A hybrid approach for anaphora resolution in the Russian language,” in 2017 Siberian Symposium
on Data Science and Engineering (SSDSE), Apr. 2017, pp. 36–40, doi: 10.1109/SSDSE.2017.8071960.
[33] Y. Zhu, W. Song, X. Liu, L. Liu, and X. Zhao, “Improving Anaphora Resolution by Animacy Identification,” in 2019 IEEE International Conference on
Artificial Intelligence and Computer Applications (ICAICA), Mar. 2019, pp. 48–51, doi: 10.1109/ICAICA.2019.8873499.
[35] J. T. Dutka, “Anaphoric relations, comprehension and readability,” in Processing of Visible Language, P. A. Kolers, M. E. Wrolstad, and H. Bouma, Eds.
Boston, MA: Springer US, 1980, pp. 537–549.
[36] M. Kameyama, “Recognizing Referential Links: An Information Extraction Perspective,” arXiv:cmp-lg/9707009, Jul. 1997.
[37] R. Ali and M. A. Khan, “Computational Treatment of Zero Anaphora in Pashto Language,” ResearchGate. .
[38] R. Mitkov, “Robust Pronoun Resolution with Limited Knowledge,” in Proceedings of the 36th Annual Meeting of the Association for Computational
Linguistics and 17th International Conference on Computational Linguistics - Volume 2, Stroudsburg, PA, USA, 1998, pp. 869–875, doi: 10.3115/980691.980712.
[39] S. Ullah, M. A. Khan, and K. S. Kwak, “A discourse based approach in text-based machine translation,” in ITC-CSCC :International Technical Conference
on Circuits Systems, Computers and Communications, The Institute of Electronics Engineers of Korea, Jul. 2007, pp. 1128–1129.
[40] M. Poesio, J. Chamberlain, S. Paun, J. Yu, A. Uma, and U. Kruschwitz, “A Crowdsourced Corpus of Multiple Judgments and Disagreement on Anaphoric
Interpretation,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language
Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, Jun. 2019, pp. 1778–1789.
Free SlideSalad PowerPoint Template Copyright (C) SlideSalad.com All rights reserved.