Presentation MSComputerScienceThesisDefence

IDENTIFICATION OF DISCOURSE
BOUNDARIES USING ANAPHORICALLY

ANNOTATED TEXT
Muhammad Aatif
Department of Computer Science
University of Peshawar
MS Computer Science (2017-18)
Prof. Dr. Mohammad Abid Khan
Free SlideSalad PowerPoint Template Copyright (C) SlideSalad.com All rights reserved.
Introduction 01
Literature Review 02
The Problem We Solve 03-04
Research Objectives 05
Methodology 06
Phrase Detectives Corpus 2.1.4 07
Results & Discussion 08

Presentation Outline
Summary 09
Publication 10
Introduction
Discourse / Discourse Unit
A piece of text consisting of at lease one sentence where the sentences (if many) are linked to each other.
[ Ali is son of Jamila. He is ten years old. Kashif is his elder brother.] [ Kashif is a student of MS Computer Science. He studies in
University of Peshawar.] (Inconsistent Annotation)
An antecedent An anaphoric device
Identification of Discourse Boundaries is Important

It helps in: Text understanding, simplification, translation, summarization, Paraphrasing, Question-Answering systems etc.
Therefore,
An algorithm for the identification of Discourse Boundaries(DBs) is developed. Needs preprocessed text.
Free SlideSalad PowerPoint Template 1

Copyright (C) SlideSalad.com All rights reserved.
Literature review
literature focused mainly on anaphora resolution, text understanding, text simplification, discourse markers, enhancing
existing models performance, text summarization and comparison of performance in NLP systems among others.
However, Majority are Word-level & Sentence-level which is a poor way of processing natural text because natural text is very
coherent.
The relevant NLP systems need such a unit of processing which is complete semantically & referentially and describe the sub/idea
entirely – the motivation.
Hence, the machine is happy to process the input text unit-by-unit having such a unit of processing.

The Problem We Solve
The Problem We Solve: Is the identification of DBs in English text.
Once it is solved then discourse unit (DU) could be made as a unit of processing in the relevant NLP systems.
Impact: Great impact on many NLP systems such as Text understanding, simplification, translation, summarization, Question-
Answering systems.
---------------- Example of
------------- Text Selected Selected Selected a Question
------------- Database Doc Page(s) DU Answering
---------------- System

The Problem We Solve
Impact: Great impact on many NLP systems such as Text understanding, simplification, translation, summarization, Question-
Answering systems.
Division at Discourse Summarization

Level Level
More abstract
DU1 summaries
---------------- Example of a could be created
------------- DU2 Fetch ideas Text by combining
------------- to Summarization the DUs at
---------------- DU3 summarize System levels
Input text doc
DUn
Therefore, the relevant NLP systems would boost up in terms of accuracy, efficient processing and getting more effective & useful
results when the problem under focus is solved.

Research Objectives
Objective #1
To create an anaphorically annotated corpus of English text.
Objective #2
To design an innovative algorithm based on the knowledge of
the anaphorically annotated text.
Objective #3
To Implement the algorithm for the Identification DBs.

Methodology
Main Goal: Identification of DBs. Used a Corpus-based approach. why? Python 3.6 & Spyder IDE
2 possible Approaches: 1) Standard way in NLP 2) Not used before Performed experiments on 24 docs
1) Use or 2) Don’t use a corpus from the corpus.
Applicable to Dialogue

Phrase Detectives Corpus 2.1.4
Creating of an anaphorically annotated corpus was a difficult stage. Therefore, multiple different options was considered however,
we were lucky to find Phrase Detectives Corpus 2.1.4 (PD2).
PD2 is an anaphorically annotated corpus. 1 st version 2016, 2nd version 2019. Two subset are Silver & Gold.
542 docs, 408153 tokens & 49990 markables (any linguistic expression of interest).
On average, 12.6 annotations per markable. The Gold subset has less errors compared to silver because ...
PD2 age is more than 11 years and still going on…

Results & Discussion
Before Enhancement 1. Missing Annotation After Enhancement

Markables unannotated at all. Hence, the algorithm also is
unaware of there existence completely.
Accuracy Rate Accuracy Rate
2. Incorrect Annotation
Markables are annotated but incorrectly.
88.72% 3. Inconsistent Annotation 97.66%

Annotations which violates the annotation rules of the
corpus.
4. Algorithm Errors
Errors produced by the developed algorithm

Summary
Contributed an algorithm for the Achieved good accuracy rate

identification of DBs using anaphorically The work is unique, useful & has great
annotated text impact to a group of other NLP systems
Input text need to be: Making the system wholly automatic &
Anaphorically annotated & incorporation of the algorithm in other NLP
In XML form systems to check its usefulness

Publication
Identification of Discourse Boundaries Using Anaphorically Annotated Text
Submitted for publication on 26.01.2020 in Journal of Information Communication Technologies & Robotic
Applications (JICTRA). An ‘X’ category journal recognized by HEC.
Critical Review Phase started

References
[1] Z. S. Harris, “Discourse Analysis,” Language, vol. 28, no. 1, pp. 1–30, 1952, doi: 10.2307/409987.
[2] A. R. Tayar, S. R. Tandan, and M. A. Tayal, “A Research on Discourse Access,” IJRTE, vol. 8, no. 2S11, pp. 827–830, Nov. 2019, doi:
10.35940/ijrte.B1135.0982S1119.
[3] R. Ali, M. A. Khan, M. Bilal, and I. Rabbi, “Reciprocal anaphora resolution in Pashto discourse,” in 2008 4th International Conference on Emerging
Technologies, Oct. 2008, pp. 1–5, doi: 10.1109/ICET.2008.4777464.
[4] P. A. Heeman, D. Byron, and J. F. Allen, “Identifying Discourse Markers in Spoken Dialog,” presented at the AAAI 1998 Spring Symposium on Applying
Machine Learning to Discourse Processing, Menlo Park, California, March 1998., pp. 44–51.
[5] K. Tomiyama, F. Nihei, Y. I. Nakano, and Y. Takase, “Identifying Discourse Boundaries in Group Discussions using a Multimodal Embedding Space,” in IUI
Workshops, 2018, vol. 2068.
[6] P. Furkó, “The Boundaries of Discourse Markers – Drawing Lines through Manual and Automatic Annotation,” The Journal of Sapientia Hungarian
University of Transylvania, vol. 10, no. 2, pp. 155–170, Nov. 2018, doi: 10.2478/ausp-2018-0020.
[7] M. Palomar et al., “An Algorithm for Anaphora Resolution in Spanish Texts,” Comput. Linguist., vol. 27, no. 4, pp. 545–567, Dec. 2001.
[8] S. Singh, P. Lakhmani, P. Mathur, and S. Morwal, “Analysis of Anaphora Resolution System for English Language,” IJIT, vol. 3, no. 2, pp. 51–57, Apr. 2014,
doi: 10.5121/ijit.2014.3205.

References
[9] R. Bunescu, “Associative Anaphora Resolution: A Web-Based Approach,” in Proceedings of the 2003 EACL Workshop on The Computational Treatment of
Anaphora, 2003.
[10] R. J. Evans and C. Orasan, “NP Animacy Identification for Anaphora Resolution,” jair, vol. 29, pp. 79–103, Jun. 2007, doi: 10.1613/jair.2179.
[11] P. Lakhmani, S. Singh, and S. Morwal, “Performance Analysis of two Anaphora Resolution System for Hindi Language,” vol. 3, no. 3, pp. 576–580, 2014.
[12] M. A. Khan and F. T. Zuhra, “Role of Corpus in Anaphora Resolution,” presented at the Corpus Linguistics, ICC Birmingham, Jul. 2011.
[13] R. Ali, M. A. Khan, R. Ahmad, and I. Rabbi, “Rule based personal references resolution in pashto discourse for better machine translation,” in 2008 Second
International Conference on Electrical Engineering, Mar. 2008, pp. 1–6, doi: 10.1109/ICEE.2008.4553941.
[14] R. Iida, K. Inui, and Y. Matsumoto, “Anaphora resolution by antecedent identification followed by anaphoricity determination,” ACM Trans. Asian Lang.
Inf. Process., vol. 4, no. 4, pp. 417–434, 2005, doi: 10.1145/1113308.1113312.
[15] M. Sen, N. Shah, and L. Kurup, “An algorithm for resolution of Anaphora in English text,” in 2017 International Conference on Innovations in Information,
Embedded and Communication Systems (ICIIECS), Mar. 2017, pp. 1–5, doi: 10.1109/ICIIECS.2017.8276078.
[16] J. van Kuppevelt, “Discourse structure, topicality and questioning,” JL, vol. 31, no. 1, pp. 109–147, Mar. 1995, doi: 10.1017/S002222670000058X.

References
[17] D. Liu, “Discourse Topic in Anaphora Resolution and Discourse Construction,” in 2013 International Conference on Asian Language Processing, Aug. 2013,
pp. 15–17, doi: 10.1109/IALP.2013.69.
[18] S. Ullah, M. A. Hussain, and K. S. Kwak, “Resolution of Unidentified Words in Machine Translation,” CoRR, Nov. 2009.
[19] T. A. van Dijk, “Principles of Critical Discourse Analysis,” Discourse & Society, vol. 4, no. 2, pp. 249–283, Apr. 1993, doi: 10.1177/0957926593004002006.
[20] M. Jørgensen and L. Phillips, Discourse analysis as theory and method. London ; Thousand Oaks, Calif: Sage Publications, 2002.
[21] M. Patel, A. Chokshi, S. Vyas, and K. Maurya, “Machine Learning Approach for Automatic Text Summarization Using Neural Networks‖,” International
Journal of Advanced Research in Computer and Communication Engineering, vol. 7, no. 1, 2018.
[22] M.-Y. Day and C.-Y. Chen, “Artificial Intelligence for Automatic Text Summarization,” in 2018 IEEE International Conference on Information Reuse and
Integration (IRI), Jul. 2018, pp. 478–484, doi: 10.1109/IRI.2018.00076.
[23] M. A. Khan, Text Based Machine Translation, 1st ed. Department of Computer Science, University of Peshawar, Peshawar, 1995.
[24] T. Wasow, “Anaphoric Relations in English,” Massachusetts Institute of Technology, 1972.
[25] M. Poesio and R. Artstein, “Anaphoric Annotation in the ARRAU Corpus.,” presented at the Proceedings of the International Conference on Language
Resources and Evaluation, LREC 2008, Marrakech, Morocco, Jan. 2008.

References
[26] K. J. Rodriguez, F. Delogu, Y. Versley, E. W. Stemle, and M. Poesio, “Anaphoric Annotation of Wikipedia and Blogs in the Live Memories Corpus.,”
presented at the Proceedings of the International Conference on Language Resources and Evaluation, LREC 2010, 17-23 May 2010, Valletta, Malta, Valletta, Malta,
2010, vol. LREC’10.
[27] J. Chamberlain, M. Poesio, and U. Kruschwitz, “Phrase Detectives Corpus 1.0 Crowdsourced Anaphoric Coreference.,” in Proceedings of the Tenth
International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia, May 2016, pp. 2039–2046.
[28] L. von Ahn, “Games with a purpose,” Computer, vol. 39, no. 6, pp. 92–94, Jun. 2006, doi: 10.1109/MC.2006.196.
[29] M. Poesio, J. Chamberlain, and U. Kruschwitz, “Phrase Detectives,” 2015.
[30] K. M. Seddik and A. Farghaly, “Anaphora Resolution,” in Natural Language Processing of Semitic Languages, I. Zitouni, Ed. Berlin, Heidelberg: Springer
Berlin Heidelberg, 2014, pp. 247–277.
[31] R. Sonbol, G. Rebdawi, and N. Ghneim, “Anaphora Resolution in Business Process Requirement Engineering,” IJECE, vol. 8, no. 3, p. 1766, Jun. 2018, doi:
10.11591/ijece.v8i3.pp1766-1773.
[32] A. Kozlova, A. Svischev, O. Gureenkova, and T. Batura, “A hybrid approach for anaphora resolution in the Russian language,” in 2017 Siberian Symposium
on Data Science and Engineering (SSDSE), Apr. 2017, pp. 36–40, doi: 10.1109/SSDSE.2017.8071960.
[33] Y. Zhu, W. Song, X. Liu, L. Liu, and X. Zhao, “Improving Anaphora Resolution by Animacy Identification,” in 2019 IEEE International Conference on
Artificial Intelligence and Computer Applications (ICAICA), Mar. 2019, pp. 48–51, doi: 10.1109/ICAICA.2019.8873499.

References
[34] S. Lappin and H. J. Leass, “An Algorithm for Pronominal Anaphora Resolution,” Comput. Linguist., vol. 20, no. 4, pp. 535–561, Dec. 1994.
[35] J. T. Dutka, “Anaphoric relations, comprehension and readability,” in Processing of Visible Language, P. A. Kolers, M. E. Wrolstad, and H. Bouma, Eds.
Boston, MA: Springer US, 1980, pp. 537–549.
[36] M. Kameyama, “Recognizing Referential Links: An Information Extraction Perspective,” arXiv:cmp-lg/9707009, Jul. 1997.
[37] R. Ali and M. A. Khan, “Computational Treatment of Zero Anaphora in Pashto Language,” ResearchGate. .
[38] R. Mitkov, “Robust Pronoun Resolution with Limited Knowledge,” in Proceedings of the 36th Annual Meeting of the Association for Computational
Linguistics and 17th International Conference on Computational Linguistics - Volume 2, Stroudsburg, PA, USA, 1998, pp. 869–875, doi: 10.3115/980691.980712.
[39] S. Ullah, M. A. Khan, and K. S. Kwak, “A discourse based approach in text-based machine translation,” in ITC-CSCC :International Technical Conference
on Circuits Systems, Computers and Communications, The Institute of Electronics Engineers of Korea, Jul. 2007, pp. 1128–1129.
[40] M. Poesio, J. Chamberlain, S. Paun, J. Yu, A. Uma, and U. Kruschwitz, “A Crowdsourced Corpus of Multiple Judgments and Disagreement on Anaphoric
Interpretation,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language
Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, Jun. 2019, pp. 1778–1789.
[41] P. Raybaut, “Spyder-Documentation,” Available online at: pythonhosted. org, 2009.

THANK YOU !
IDENTIFICATION OF DISCOURSE
BOUNDARIES USING ANAPHORICALLY
ANNOTATED TEXT
Muhammad Aatif
Department of Computer Science
University of Peshawar
MS Computer Science (2017-18)
Free SlideSalad PowerPoint Template
Prof. Dr. Mohammad Abid Khan Copyright (C) SlideSalad.com All rights reserved.
QUESTIONS

Presentation MSComputerScienceThesisDefence

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Presentation MSComputerScienceThesisDefence

Uploaded by

Copyright:

Available Formats

IDENTIFICATION OF DISCOURSE

BOUNDARIES USING ANAPHORICALLY

The Problem We Solve 03-04

Phrase Detectives Corpus 2.1.4 07

Results & Discussion 08

An antecedent An anaphoric device

Identification of Discourse Boundaries is Important

Free SlideSalad PowerPoint Template 1

Free SlideSalad PowerPoint Template 2

Free SlideSalad PowerPoint Template 3

Division at Discourse Summarization

Free SlideSalad PowerPoint Template 4

Free SlideSalad PowerPoint Template 5

Free SlideSalad PowerPoint Template 6

PD2 age is more than 11 years and still going on…

Free SlideSalad PowerPoint Template 7

Before Enhancement 1. Missing Annotation After Enhancement

88.72% 3. Inconsistent Annotation 97.66%

Free SlideSalad PowerPoint Template 8

Contributed an algorithm for the Achieved good accuracy rate

Free SlideSalad PowerPoint Template 9

Identification of Discourse Boundaries Using Anaphorically Annotated Text

Critical Review Phase started

Free SlideSalad PowerPoint Template 10

Free SlideSalad PowerPoint Template 11

Free SlideSalad PowerPoint Template 12

[24] T. Wasow, “Anaphoric Relations in English,” Massachusetts Institute of Technology, 1972.

Free SlideSalad PowerPoint Template 13

[29] M. Poesio, J. Chamberlain, and U. Kruschwitz, “Phrase Detectives,” 2015.

Free SlideSalad PowerPoint Template 14

[41] P. Raybaut, “Spyder-Documentation,” Available online at: pythonhosted. org, 2009.

Free SlideSalad PowerPoint Template 15

You might also like