You are on page 1of 27

ARBA MINCH INSTITUTE OF TECHNOLOGY (AMIT)

SCHOOL OF COMPUTING AND SOFTWARE ENGINEERING


DEPARTMENT OF COMPUTER SCIENCE
DESIGNING AND DEVELOPING WORD LEVEL STEMMER
MODEL FOR HADIYYISA LANGUAGE USING DEEP-LEARNING
TECHNIQUES
A thesis Proposal submitted to the School of Graduate Studies of Arba Minch
Institute of Technology (AMIT) in Partial Fulfillment of the Requirement for the
Degree of Master of Science in Information Technology

By: Feleke Asefa Anjulo

Under the supervision of

Amin Tuni (Asst. Professor)

APRIL 1, 2023
Arba Minch, Ethiopia
SCHOOL OF GRADUATE STUDIES
ARBA MINCH UNIVERSITY

ADVISORS’ PROPOSAL APPROVAL SHEET

This is to certify that the thesis proposal entitled “Designing and developing word

level stemmer model for Hadiyyisa language using deep-learning


techniques” has been carried out by Feleke Assefa Anjullo ID. Number
PRAMIT/103/14, under my/our supervision. Therefore I/we recommend that the
student’s proposal can be presented for review and open oral presentation.

Name of Principal Advisor Signature Date

Amin Tuni (Asst. Prof) ______________ ___________

Name of co-advisor Signature Date


Mr. ___________________ ______________ ______ _____
SCHOOL OF GRADUATE STUDIES

ARBA MINCH UNIVERSITY

APPROVAL SHEET OF REVIEWED PROPOSAL

Name of the candidate: Feleke Assefa Anjullo

Faculty of Computing and Software Engineering

Thesis Title ‘‘Designing and developing word level stemmer model for

Hadiyyisa language using deep-learning techniques’’

Date of open oral presentation: ---------------------------------------------------------------------------

This above, and present version thesis proposal entitled above, and the present version of the
proposal have incorporated all comments and suggestions given during the proposal review.
Therefore we recommend that this final version can be considered as reviewed thesis proposal in
partial fulfillment for the degree of Master of Science to the respective graduate program.

Name of the Reviewer 1 Signature Date


____________________ ______________ ______

Name of the Reviewer 2 Signature Date

_________________ ______________ ______

Checked By:
Faculty Dean Name Signature and Stamp Date
____________________ ______________ ______

AMIT SGS Dean Name Signature and Stamp Date


____________________ ______________ ______
Lists of abbreviations
NLP- Natural language processing DBM- Dictionary-Based Method
WLS- Word level Stemmer RNN- Recurrent Neural Network
TVC - Technical and Vocational College CNN- Convolutional Neural Networks
RQ - Research Question LSTM- Long Short-Term Memory
ETB – Ethiopian Birr NLTK – Natural Language Toolkit
Lists of tables
Table 1 Literature Review........................................................................................................................6
Table 2 Cost Breakdown.........................................................................................................................13
Table 3 Work plan schedule.....................................................................................................................14
Lists of Figures
Figure 1 Research Process.........................................................................................................................10
Figure 2 data collection and Analysis.......................................................................................................11
Figure 3 Tool selection Method................................................................................................................13
Contents

Abstract............................................................................................................................................................................i
1. Introduction...........................................................................................................................................................1
1.1. Background....................................................................................................................................................1

1.2. Motivation for the Study...............................................................................................................................2

1.3. Statement of the problem.............................................................................................................................3

1.4. Research Questions.......................................................................................................................................4

1.5. Objective.......................................................................................................................................................4

1.5.1. General Objectives...............................................................................................................4


1.5.2. Specific Objectives...............................................................................................................4
1.6 Scope of the Study.........................................................................................................................................4

1.7 Literature review...........................................................................................................................................5

1.7.1 Conceptual review...............................................................................................................5


1.7.1 Review of Related Work.......................................................................................................6
1.8 Significance of the Study...............................................................................................................................8

1.9 Methodology.................................................................................................................................................9

1.9.1 Research Design...................................................................................................................9


1.9.2 Research approach..............................................................................................................9
1.9.3 Research methodology........................................................................................................9
1.9.4 Data Collection...................................................................................................................10
1.9.5 Tool selection.....................................................................................................................12
1.10 Budget and Time Breakdown......................................................................................................................13

1.10.1 Budget................................................................................................................................13
1.10.2 Time Breakdown................................................................................................................14
1.11 References...................................................................................................................................................15
Abstract
The proposed research study aims to design and develop a word-level stemmer for the Hadiyyisa
language using deep learning techniques. The stemmer will be designed to identify the root word
of any given word by analyzing its linguistic features. The existing stemmer model are based on
rule based, statistical, and hybrid method. The stemmer will be trained on a large corpus of text
data using deep learning. The proposed stemmer will be evaluated on various datasets and
compared with existing stemmers to demonstrate its effectiveness. The proposed stemmer has
significant implications for natural language processing and can be applied in various domains
such as search engines, text classification, and information retrieval systems. In this proposed
study the researcher uses Exploratory and experimental research design to achieve the research
objective and to answer research questions. In the proposed study the mixed approach will be
applied because the study requires both quantitative and qualitative data for analysis and designs
a model to address the problem. The study will involve several stages, including data collection,
preprocessing, feature extraction, model training, and evaluation. The data collection stage will
involve gathering dictionary books of the language and a large corpus of text data from various
sources. The preprocessing stage will involve cleaning the data and preparing it for feature
extraction. The model training stage will involve using deep learning algorithms to train the
stemmer on the extracted features. Finally, the evaluation stage will involve testing the stemmer
on various datasets and comparing its performance with existing stemmers. Overall, the proposed
research study has significant potential to advance the field of natural language processing and
improve the accuracy of various text-based systems.

Keywords: Hadiyyisa,, word-level stemmer, deep learning, corpus ,dictionary based Method

i|Page
1. Introduction
2. Background

Hadiyyisa is a language spoken by the Hadiya people in southern Nations Nationalities of Ethiopia. It is a Cushitic language that has
around two million speakers. Like many other languages, Hadiyyisa has a complex morphology with a rich system of affixation.
‘’Hadiyyisa’’ is the language of the Hadiyya people and the academic language and Hadiyyisa language and literature delivered as a
field of study at Wachamo University and Teachers Training Colleges in the Hadiyya zone and also the academic language for the
primary school in the zone[1]. The morphology of the language makes it challenging to perform natural language processing tasks.

Nowadays, there is a different emerging technology for one’s language development that support easy communication. Artificial
intelligence is one of the hottest research areas and Natural Language Processing (NLP) is one of them. From the NLP, stemmer is the
basic step for the other NLP tasks like information extraction, information retrieval, summarization, machine translation, sentiment
analysis, and text classification and so on. Stemming is the process of reducing a word to its base or root form. Stemming helps to
normalize text and improve the performance of NLP tasks. Several Stemmer algorithms, such as the Porter Stemming Algorithm,
Snowball Stemmer, and Lancaster Stemmer, exist, but these algorithms may not be suitable for the Hadiyyisa language due to its
unique morphology and syntax[2].

Word-level stemmer model operates at the level of individual words. For example, waaroolla” means coming” and the root or stem is
“waare means to come”. It uses a dictionary to identify and remove affixes from a word, such as -ing, -ed, -s, and –es (oolla, icho/
uwwa, eewwa, ooma, -ane, -at, - ancha, -cha, -imma, etc.). The reduced stem is then used as a basis for further analysis or processing.

Deep learning techniques have shown significant progress in NLP tasks, including Stemming. Deep learning models can automatically
learn features from data, making them more accurate and effective than traditional rule-based methods t[3]. Therefore, designing and
developing a Stemmer model for the Hadiyyisa language using deep learning techniques can significantly improve the accuracy and

1|Page
efficiency of NLP tasks in Hadiyyisa language text. The study requires a thorough understanding of the language's morphology and
syntax and the appropriate selection of algorithms and neural network architectures

The proposed research aims to develop an accurate and efficient Stemmer model for the Hadiyyisa language that can improve the
performance of NLP tasks. The proposed research aims to design and develop a Stemmer model for the Hadiyyisa language using
deep learning techniques.

Overall, designing a word stemmer model for the Hadiyyisa text has a huge benefit for the speakers and in the development of various
natural language processing applications for the language. Thus, the proposed model potentially helps to develop tools like grammar
checkers, document summarizers, thesauri, spell checkers, indexers, and word frequency counters and also overcomes the problems
related to stemming from a large number of Hadiyyisa texts/ words that serve as an input to the Machine Learning/Deep Learning
model.

1.2. Motivation for the Study

The motivation behind the research for developing a stemmer model for the Hadiyyisa language is to enable natural language
processing applications. Stemming is an essential step in many NLP applications, such as text classification, information retrieval, and
machine translation. However, Hadiyyisa, like many other low-resource languages, lacks adequate linguistic resources, including
stemmers. Therefore, developing a stemmer model for Hadiyyisa will facilitate the development of NLP applications for this
language, which will enable its use in various fields, including education, healthcare, and business.

Additionally, this research will contribute to the preservation and development of the Hadiyyisa language by making it more
accessible and usable in modern technologies. The other reason that initiates the researcher to conduct this study is to contribute his
effort to the development of the language and its linguistic tasks with current Artificial intelligence support.

2|Page
1.3. Statement of the problem
Every natural language has its features and characteristics, so it is difficult to apply the same stemming rules and steps for all
languages. There is various literature on the stemming model in English language textual documents. There are also some Ethiopian
languages like Afaan Oromo, Amharic, Silte, Wolyta, Tigrigna, and others, that have developed prototypes and algorithms for
stemming their language. However, the Hadiyyisa language is widely spoken in the South Nations Nationalities of the region of
Ethiopia Hadiyya zone with a total number of speakers of more than 1.5 million, but differences in language make a challenge in the
application of the existing attempts for the proposed language, and it also there is lacks of the basic natural language processing tools
like the Stemming model. The Hadiyyisa language is morphologically very complex and highly inflected, and the language has several
variants of a single-word term. Variances in a text corpus result in redundancy while developing NLP or machine learning models,
such models may be ineffective.

There are challenges in developing tools to handle the complexity of word processing due to the existence of various inflections and
derivations in the language [3]). For morphologically rich language like Hadiyyisa stemming is an important early step in Information
retrieval and NLP tasks. The word "inflectional" relates to expanding or changing the function of a word. Hence, the affix in this case
is called inflectional because its task is to expand its grammatical function within the word. It is essential to normalize text by
removing repetition and transforming words to their base form stemming from building a robust model.

Stemming plays an important role in the identification of a word stem from a full word by removing inflectional and derivational
affixes, and there has been much interest in developing applications for this purpose [4]. Therefore, the main aim of the proposed
research is to design and develop a word stemmer model using deep learning techniques, so that the model can identify the base or
root form of words in the Hadiyyisa language, thereby reducing the number of unique words that need to be processed and grouping
words with similar meanings.

3|Page
1.4. Research Questions

RQ 1. What are the current status and challenges of word stemmer in the Hadiyyisa language?

RQ 2. How we can apply Deep learning techniques in word-level stemmer for the Hadiyyisa language?

RQ3. Which Deep learning technique is the most suitable for Hadiyyisa language stemmer?

1.5. Objective

1.5.1. General Objectives

The general objective of this research study is to design and develop a word-lever stemmer model for the Hadiyyisa language using
deep learning techniques.

1.5.2. Specific Objectives

The specific objectives of this research study are:

To assess and identify the current status and challenges in developing the Hadiyyisa language word level stemming.
To identify and apply the deep learning techniques for the Hadiyyisa language stemmer.
To select the most appropriate deep learning techniques for Hadiyyisa language word level stemmer.

2.6 Scope of the Study


There are different ways of developing stemmer. Namely, the dictionary method, rule-based, and statistical approach [2]. The stemmer
for the Hadiyyisa language in the proposed research work uses a Dictionary-Based Method (DBM) and only conflates the inflectional
and derivational variants in a word whose suffixes occur in a regular pattern. A dictionary technique depends mainly on creating a
very large dictionary, which stores words found in natural texts with their corresponding morphological parts. Such parts include
stems, roots, and affixation. The model didn’t check whether the entered word by the user is a Hadiyyisa word or not.

4|Page
To design and develop a word-level stemmer model for Hadiyyisa, we use deep learning techniques. This stemmer aimed to use the
dictionary-based method and it does not conflate compound words occurring in the language, because of the irregular nature of the
formation of compounds and the time shortage to further investigate the formation of compounds. Hence, a detailed analysis of the
morphology (including a detailed investigation of the formation of compounds) of the language is necessary to improve the
performance of the stemmer.

2.7 Literature review

2.7.1 Conceptual review


Models for stemming have been studied in computer science since the 1960s. A computer program or subroutine that stems word may
be called a stemming program, stemming model, or stemmer. [1] Several previously completed relevant research literature and salient
concepts (books, articles, and journals) are investigated to support this proposed research study and achieve the research purpose.
Stemming models are widely used in natural language processing to reduce words to their root form, or stem, to improve the accuracy
of text analysis and information retrieval. M. Porter stemmer [5] for English operating on the stem cat should identify such strings as
cats, catlike, and catty. There are a lot of stemming algorithms developed for different languages. The approaches and techniques used
in the Porter stemmer algorithm.

A stemming algorithm might also reduce the words fishing, fished, and fisher to the stem fish. The stem need not be a word, for
example, the Porter algorithm reduces, argue, argued, argues, arguing, and argues to the stem argu. In the context of Ethiopian
languages, there is limited literature on stemming algorithms. However, some studies have developed rule-based and hybrid
algorithms for Amharic and Oromo, and other languages in different Approaches depending on their linguistic knowledge and
computational techniques. For example, the rule-based algorithm developed by Tsegaye and Mekonnen [6] for Amharic achieved an
accuracy rate of 95.4% in stem identification. Similarly, the hybrid identification [7].

5|Page
The development of an effective stemming model for Ethiopian languages, including Hadiyyisa, has the potential to improve text
analysis and information retrieval in these languages. The goal of stemming is to identify the base form of a word, regardless of its
inflectional suffixes or prefixes. As more digital content becomes available in these languages, stemming algorithms can help facilitate
the processing and analysis of this content for various applications such as sentiment analysis, topic modeling, and information
retrieval.

In general, stemming algorithms are an important tool in natural language processing for improving the accuracy of text analysis and
information retrieval. While there is limited literature on stemming algorithms for Ethiopian languages, the existing studies
demonstrate promising results and highlight the need for further research in this area.

1.7.1 Review of Related Work


There are a lot of stemming algorithms developed for different languages. The approaches and techniques used in the Porter algorithm
[5], are especially studied from different resources. In addition to that, stemming research has been conducted in several languages
both internationally and in the local context. Locally, stemming algorithm design and development has been attempted for Amharic
(Atelach Alemu and Lars Asker [8]), Afaan Oromo, Kambaata, Tigrigna, Wolaita, Silt’e, and a few others.

Some other related works in local languages and English Languages are listed below in the table

Table 1 Literature review


Ref.No, Authors & Year, Major Findings Critical Remarks to find out the
Title & Journal /Contributions research ability gap for the
Name/Conference Name &conclusion proposed research

[9]Designing a Stemmer forThis system takes as input a word and removes its affixes This work is somehow related to
Afaan Oromo Text: A according to a rule-based algorithm. The algorithm follows the the proposed study. As compared to
Hybrid Approach, AAU known Porter algorithm for the English language and it is the rule-based, the evaluation of the
Institutional Repository. developed according to the grammatical rules of the Afaan hybrid stemmer shows an accuracy

6|Page
Oromo.   The result of the study is a prototype context-sensitive increase of 0.of9% but is less
iterative stemmer for Afaan Oromo. An evaluation of the efficient.
system shows that the accuracy of the algorithm works with
better performance than other past stemming algorithms for
Afaan Oromo giving 95.73 percent correct results.

[10], Development of The researcher used four suffixes that are iteratively stripped This work is related to the proposed
Longest-Match Based from the word and after the application of the necessary study but the sample size and
Stemmer for Texts of condition; the final word is considered as a stem. For data attributes are small. For further
Wolaita Language. preprocessing and implementation, C# programming language improvement of the stemmer, deep
is used. The obtained result shows that the rule-based longest analyses on compounding and
International Journal on match approach is promising for stemming Wolaita language irregular words should be made.
Data Science and texts. The output on that test dataset has shown 91.84% The stemmer has to be tested with a
Technology. accuracy over actual manually stemmed words. large amount of text to prove its
real performance.

[11] Designing A Rule- The output of this study is a context-sensitive, longest-match This work is related to the proposed
Based Stemming stemming algorithm for Kambaata words. In this research, the study but the sample size, language,
Algorithm for Kambaata word conflation technique for Kambaata words has been and attributes are small. For further
Language Text/ explored. The algorithm has also been tested and reported that improvement of the stemmer, deep
it is effective and very fast by stemming 330 words per second. analyses on compounding and
International Journal of The stemming algorithm could also provide advantages reduce irregular words should be made.
Computational Linguistics the size of documents by decreasing word variations. The researchers suggest that in the
(IJCL), Volume (9) future, they could study how to
stem more complex words in
Kambaata, like words that are made
up of multiple parts.

7|Page
[12] Designing A Stemming In this experiment, the stripping procedure was applied in . This stemming algorithm was
Algorithm For Silt’e order: prefix, suffix, and finally letter reduplication. The developed based on Unicode data,
Language stemmer was tested on a sample data of 1486 words, which it is recommended to use
were selected randomly from the sample texts. The result of the transliterated data to make it more
experiment shows that the stemmer performs at an accuracy of efficient. One can add more
85.71%, and brings a dictionary reduction of 34.99% for stem context-sensitive and recoding rules
words to increase the accuracy of this
stemmer.

[5] Porter Stemmer This paper presents the Porter Stemming Algorithm, which is Porter's algorithm uses a dictionary
one of the most widely used Stemming algorithms. The paper of about 60 suffixes and has only a
The University of discusses the algorithm's rules and its effectiveness in reducing few context-sensitive and recording
Cambridge in words to their base form. rules, and therefore is economical
in storage and computing time. Not
all suffixes are available Sometimes
produce invalid stems and Poor
recall & precision.

1.8 Significance of the Study


The development of language is very linked with the development of technology (Toffler, 1980) [13]. The designing and developing
of a stemmer model for the Hadiyyisa language is significant for several things. Firstly, it will enable the development of natural
language processing applications for the Hadiyyisa language, which will facilitate communication and information exchange among
Hadiyyisa speakers. This is particularly important in fields such as education, healthcare, and business, where effective
communication is crucial. Secondly, the development of a stemmer model for Hadiyyisa will contribute to the preservation and
development of the language. With the increasing use of modern technologies, many low-resource languages are at risk of being

8|Page
marginalized and eventually disappearing. By making Hadiyyisa more accessible and usable in NLP applications, this research will
help to ensure that the language remains relevant and alive.

Finally, the development of a stemmer model for Hadiyyisa will have broader implications for the field of NLP. As more low-resource
languages are studied and analyzed, researchers can gain a better understanding of the linguistic structures and patterns that underlie
these languages. This, in turn, can lead to the development of more effective and accurate NLP models, which can benefit all
languages, including those with more resources.

1.9 Methodology.
1.9.1 Research Design
In this proposed study the researcher uses Exploratory and experimental research design to achieve the research objective and to
answer research questions. Because the preliminary steps to design the model the researcher needs to understand and explore the
morphology and linguistic features of the language, then after the researcher design and develops the model apply computational
linguistic tools to model stemmer for the language.

1.9.2 Research approach.

In the proposed study the mixed approach will be applied because the study requires both quantitative and qualitative data for analysis
and designs a model to address the problem. The approach will include collecting, analyzing, and interpreting data. Thus, the data
needed for this research is a corpus of words, analyzed and expressed using the numerical and non-numerical form

1.9.3 Research methodology


The goal is to develop a word-level stemmer using deep learning techniques, the following steps can be taken:

Literature Review: Extensive literature reviews from different books, journals articles theseses, and the Internet are conducted, to
have a solid and concrete understanding of the principles, techniques, and tools of Stemmers with a special emphasis on Dictionary-

9|Page
based Methods. Furthermore, research that is conducted on Stemming Algorithms in Ethiopian languages and other related work is
reviewed

Problem Identification: The problem is to develop a word-level stemmer that can accurately and efficiently identify the root form of
a given word. This involves defining the scope of the problem, identifying relevant data sources, and specifying performance metrics
for evaluating the effectiveness of the algorithm.

Data Collection and Preparation: The next step is to collect and prepare the data that will be used to develop and test the model.
This involves selecting representative datasets, cleaning and preprocessing the data, and creating a standard dataset (dictionary/corpus)
for evaluating the accuracy of the algorithm.

Model Development: The third step is to develop a deep learning model for word-level stemming. This involves selecting anpropriate
Deep Learning techniques (e.g., RNN, BLSTM, and LSTM), designing the input and output layers, and training the model on the
prepared data.

Evaluation and Validation: The fourth step is to evaluate and validate the model's performance using the dataset. This involves
measuring its accuracy, precision, recall, and other relevant metrics, and comparing its performance to other existing algorithms.

Overall, developing a word-level stemmer using deep learning techniques requires a similar research design and approach as
traditional stemming algorithms, but with a focus on designing and training a deep learning model for the task.

10 | P a g e
Literature problem Data
review Identification Collection

Discusion and
Data Analysis
conclusion

Figure 1Research process

1.9.4 Data Collection


The study will be conducted by collecting data from both primary and secondary sources. The primary data sources will be collected
through interview experts, public elders, and scholars to collect social frequently used text), questionnaires, and technical observation
of the language structure. Secondary data sources will be collected from the Hadiyyisa dictionary, bible, and academic textbook, and
analyses of different academic kinds of literature, documents, and scriptures.

Detail procedure for the collection of data and the analyzed data for this study is summarized in the Figure below.

11 | P a g e
Figure 2data collection and Analysis
The first step is to collect a corpus of Hadiyyisa texts from various sources, including academic papers, books, and online resources.
The corpus will be pre-processed to remove any noise and irrelevant information, such as punctuation marks and stop words. Next,
the pre-processed corpus will be used to create a comprehensive dictionary of Hadiyyisa words and their stems. This will be

12 | P a g e
achieved by identifying the most common morphological variants of each word in the corpus and manually determining their
corresponding stems. The dictionary will also include information on the part of speech of each word and its equivalent definition. The
feature extraction stage will involve identifying the linguistic features that are relevant to word stemming.

The dictionary will be used to develop a model that can automatically stem Hadiyyisa words. The model will use the dictionary to
look up each word and retrieve its corresponding stem. If the word is not found in the dictionary, the model will compare the input
with the stem based on the word's morphology. The program will be tested using a test corpus of Hadiyyisa texts to evaluate its
accuracy and effectiveness. The test corpus will be manually stemmed by experts in the Hadiyyisa language, and the results will be
compared to those produced by the developed model. The proposed research approach will provide a reliable and accurate stemming
model for the Hadiyyisa language using Deep learning techniques.

1.9.5 Tool selection


First of all, the researchers choose free and open-source instruments. To conduct the proposed study, different software and designing
tools are proposed to be selected using the most-fit strategy using parametric analysis. The researcher tried to focus on Open Source
(offline Version) and cloud based (online) tools which will be used for data collection, analysis and designing the model, and
validating it. See the detail in figure diagram below

13 | P a g e
Figure 3 tool selection Method

14 | P a g e
1.10 Budget and Time Breakdown

1.10.1 Budget
The budget is the money required to complete the proposed research for different activities held in the proposed study. Therefore, for
the time being, the proposed research is requiring twenty-five thousand -birr (25000 ETB) birr only to accomplish this proposed study,
and the specific budget needed for each particular activity is mentioned in the table below

No Items to be Budgeted No of Items Cost per unit Total Cost (in Reason
Birr)

1. Data Collector and analyzer 20 days 300 birr per day 6000 For Data Gathering

2. External hard Disk 1 unit (8 Gb) 1per 200 5000 For data backup

3 transport 4 days 500 2000 For data collection

4. Paper 1 packet 1000 6000 For printing

5. Mobile card 15 100 2500 For communication

6. Binding 3 2000 1000 For binding thesis final


work

7. Contingency 10% of total 2500 Contingency

The total budget to be estimate=25000 ETB Only

15 | P a g e
Table 2 Cost Breakdown

weeks የትምህርት
Chapters Weekly objectives ክፍለ_ጊዜ
አይነቶች
(የሳሚንቱ ግብ)

Week1 በዚህ ሳሚንት ብያንስ 2 chapter


አንብቤ መጨረስ ይኖሪቢኛል

Week2 ከመጀመረያው ሳሚንት የሚከቀጥለውን


chapter ችን ማጥናቀቅ፣
Mathematics
Week3 የመጭረስሃዉን ክፍል መጨርስና
ማካክስ

Week4

Week5

Week6
Physics
Week7

Week8

Week9

Week10
Chemistry
Week11

Week12

16 | P a g e
5 Biology
1.11 References

[1] Wikipedia, stemming Algoritm, Wikipedia.

[2] t. S. Garkebo, Documentation, and Description of Hadiyya, Addis Ababa: AAU institutional repository, 2015.

[3] D. W. TUMEBO, "DESIGN AND DEVELOPMENT OF HADIYYISA TEXT RETRIEVAL," Haramaya, Haramaya University, Oct 2016, p. 57.

[4] D. K. GURMESSA, Afaan Oromo Automatic word stemmer, Addis Ababa: college University, 2017.

[5] M. Porter, "An algorithm for suffix stripping, Program," in The Porter Stemming Algorithm, London: British, Morgan Kaufmann,, 1980, p.
137.

[6] S. &. M. Tsegaye, "Rule-Based Stemming Algorithm for Amharic Languag," nternational Journal of Computer Applications, (2016).

[7] A. &. J. Lelissa, Hybrid Stemming Algorithm for Oromo Language., vol. 7, Addis Ababa: nternational Journal of Computer Science and Mobile
Computing/7(2), 2018.

[8] L. A. Atelach Alemu Argaw, An Amharic Stemmer: Reducing Words to their Citation Forms, Stockholm: Proceedings of the 5th Workshop on
Important Unresolved Matters, June 2007.

[9] d. Tesfaye, "Designing Stemmer for Afan Oromo text: Hybrid Approach," AAU institutional repository, 2017.

[10] G. Y. B. a. H. Seid, Development of Longest-Match Based Stemmer for Texts of Wolaita Language, Addis Ababa: Science publishing Groups,
July 2018.

[11] J. s. a. S. tefera, Designing A Rule Based Stemming Algorithm for Kambaata Language Text, addis abeba: International Journal of

17 | P a g e
Computational Linguistics (IJCL), 2018.

[12] M. K. ABEDO, "DESIGNING A STEMMING ALGORITHM FOR SILT’E," in MUZEYN KEDIR.pdf, Addis Ababa, AAu institutional repository, 2020.

[13] A. toffer, Knowledge, Technology, and Change in Future Society, Jun 2012, 1980.

[14] N. Milosevic, "Stemmer for the Serbian language.," arXiv preprint arXiv:1209.4471, 2012.

[15] M. K. ABEDO, DESIGNING A STEMMING ALGORITHM FOR SILT’E, ADDIS ABABA UNIVERSITY, June 2012.

[16] p. Haile, "TEXT INDEPENDENT SPEAKER IDENTIFICATION FOR HADIYYISA LANGUAGE," Addis Ababa, https://www.academia.edu/, 2019.

[17] N. Vyatkina, "Corpus-Informed Pedagogy in a Language Course: Design, Implementation, and Evaluation/," in New Technological
Applications for Foreign and Second Language Learning and Teaching, Kansas, USA, IG global publisher of timely knowledge, 2020, p. 30.

[18] T. F. W. Ben Lutkevich, "Natural language processing (NLP)," 2022.

[19] Census Report, "Hadiyya zone’s population now," CSA of Ethiopia, Addis Ababa, 2022.

18 | P a g e

You might also like