You are on page 1of 6

Admas University

School of Postgraduate Studies


Selected Topics in Computer Science
Natural language processing
Ar�cle review of: unsupervised corpus-based approach
for word sense disambiguation to afaan Oromo
words
Group 4, sec. 2
Group Name ID NO
1 Tsion Legesse................................. PGMGS/4032/22
2. Elshaday Zegeye............................PGMGS/4033/22
3. Abrham Mekbib .......................... PGMGS/4047/22
4. Samson Yemanebirhan................. PGMGS/4028/22
5. Erste Eshete.........................……. PGMGS/4028/22
6. Alemu Hayle……………………...PGMGS/3060/21

Submitted to: dr. Haylie


Submission date: 3/11/23
Title

The title of this thesis is concise, brief to the point and its descriptive. It’s also
including relevant keywords that are likely be used in searches.

Abstract

The abstract of this thesis is written in plain and with a concise language and it is
easily understandable. it represents the content and findings of full paper and also
contains relevant keywords and phrases. In the thesis abstract a comprehensive
summery of the entire research paper, including methodology, major findings, and
conclusions was provided. However, the research question was not mentioned in this
part.

Introduction

In the 2015 thesis of graduate Studies of Addis Ababa University in Partial


Fulfillment of the Requirements for the Degree of Master of Science in Information
Science, titled as “unsupervised corpus-based approach word sense disambiguation
to afaan oromo words” researcher Fayisa Gemechu Shoga performs research on
word sense disambiguation for afaan oromo language which is spoken abundantly
in Ethiopia, by using corpus-based approach and unsupervised techniques.
In the introduction part back ground of the study was clearly stated and the central
and key concepts as well as definitions were mentioned. The aim of this thesis is to
develop dataset and this dataset by using different clustering algorithms to map out
prototype of unsupervised word sense disambiguation system for afaan oromo
language. the problem statement of this study was clear and concise, easy to
understand and based on sound research. And also, the problem stated in this study
is specific enough, relevant, real and significant issue, worthy to solve, feasible to
address and has practical implications.

1
Research Questions

In this study four research questions were raised to ensure the performance level of
the proposed approach. The Research questions were fit within the scope of the
research project and ensure the cover various angles of the research problem. If
possible, research questions should consider hypothetical and casual relationships
between variables. This research questions ere well designed, relevant and feasible
for research study.

Literature review

The literature of this thesis presents not only the review of preliminary work done
on the area of natural language processing but also it presents different approaches
and algorithms to basically how word sense disambiguation is implemented for
different language in different way. Word sense disambiguation for four different
language was reviewed namely WSD for Amharic, WSD for Arabic, WSD for hindi,
and WSD for afaan oromo and the research gap was identified as that the problem
of word sense disambiguation; there are many ambiguous words are there for
different language due to knowledge acquisition bottleneck. The case of afaan
Oromo is also the same as other language. the research gap identification was
somewhat specific and it was general.

The literature review of this thesis is relevant to the research topic and research
questions. It is comprehensive enough to provide significant gaps in the coverage.
The review flow smoothly and the review discusses the research methods in the sited
studies like what approach and techniques they used and relationships between
studies are well explained. Generally, the literature review of this thesis is well
structured, comprehensive, and supports the research effectively.

2
Research design / methodology

The methodological section of a study is critical to the research’s validity and


reliability. The methodology used in this study were; first seven ambiguous afaan
oromo words namely sanyii, karaa, horii, sirna and qoqhii, ulfina, ifa were selected
out of these ambiguous words five of them were taken from previously existed data
and for those ambiguous words the sentences (texts) were collected (corpus) and
dataset were created and developed using corpus-based approach. Neighboring
information and collocation words were used to consider the context of ambiguous
word. Unstructured source of information and unsupervised method were used in
this research and some preprocessing activities like tokenization, stop word removal
and stemming were applied on the documents. At the end different clustering
algorithms like simple k means, hierarchical agglomerative: Single, Average and
complete link and Expectation Maximization algorithms were applied to analyses
and evaluate the accuracy and performance of algorithms. Performance evaluation
techniques like precision (P), recall (R), F-measure and accuracy were used to
evaluate clustering results. The Weka 3.6.11 tool was used for clustering and
classification purpose in this research.

In general, in this thesis two main experiments were conducted both for
lemmatization and stemming to evaluate the effect of stemmed and unstemmed
dataset.

The chosen methodology is appropriate for the research questions and objectives.
The research nature of this study was quantitative and the research method is aligned
with its nature. The sampling method is convenance and it is justified as 1000 of
corpus were taken from already prepared and 501 corpora were created by researcher
manually for the study. The data collection methods for this research are valid and

3
reliable. The results of this study are presented using tables and they are clear and
meaningful.

Conclusion and contribution

Finally, some conclusion was reached for the problem of WSD for afaan Oromo
language that stemming performs better when preprocessing, simple k means
algorithms performs better in terms of accuracy, supervised approach achieves better
for WSD, and four to four and three to three window size is effective for
unsupervised and supervised methods respectively for afaan oromo WSD.

In the conclusion section of the study the key finding and results from the study were
synthesized. How ever limitation of the study was not acknowledged. The finding
was clear, significant and it contribute great potential to the field. Finally,
recommendations for future research were offered and it is encouraging for natural
processing area.

Strong side of work:

as this research data is corpus the appropriate methods and techniques were
used to analyses and extraction of information.
Algorithm parameters used were impact full on the WSD accuracy.
Evaluation matrix such as precision, recall, f1 -score and accuracy to measure
performance of the WSD method were also impact full.
Qualitative analysis was used for examining specific examples for the succeed
of method.

4
Weak side of work:

In this thesis the system dealt with semantic level analysis without grammar
and spelling correction, but for afaan oromo language specially spelling the
critical thing that can alter the semantic analysis.
Based on this we can critique the depth of analysis as the research did not
delve into the research questions and did not provide substantial discission of
the topic or we could say that it could be superficial.
The document is not well structured for example the literature review and
methodology are mention as a topic twice in the paper.

Citation and references

In this thesis diverse range of resources was cited, and it represents different
viewpoints and its perspective to the topic. The appropriate APA citation style was
used and each reference is properly cited within the text and paraphrased content are
appropriately attributed their sources.

You might also like