NER For

A Method of Named Entity Recognition for Tigrinya
Hailemariam Mehari Yohannes Toshiyuki Amagasa

Systems and Information Engineering Center for Computational Sciences
University of Tsukuba University of Tsukuba
Tsukuba Tsukuba
Japan Japan
s.2030154@s.tsukuba.ac.jp amagasa@cs.tsukuba.ac.jp
ABSTRACT Despite its popularity in population, the current status of

This paper proposes a method for Named-Entity Recogni- Tigrinya in terms of the available linguistic resource is not
tion (NER) for a low-resource language, Tigrinya, using a sufficient, and it is regarded as a low-resource language due
pre-trained language model. Tigrinya is a morphologically to the fact that it has received less attention from researchers.
rich, although one of the underrepresented in the field of As a result, most of the NLP tasks for such low-resource lan-
NLP. This is mainly due to the limited amount of anno- guages, e.g., Named Entity Recognition (NER), POS tag-
tated data available. To address this problem, we present ging, sentiment analysis, question answering, etc., are still
the first publicly available datasets of NER for Tigrinya con- in the early stages and do not have sufficient language re-
taining two versions, namely, (V1 and V2) annotated man- sources (e.g., corpus), while there has been significant de-
ually. The V1 and V2 datasets contain 69,309 and 40,627 velopment in NLP tasks for resource-rich languages (e.g.,
tokens, respectively, where the annotations are based on the English, French, etc.) due to the emergence of deep learn-
CoNLL 2003 Beginning, Inside, and Outside (BIO) tagging ing technologies.
schema. Specifically, we develop a new pre-trained language One of the obstacles is that making language resources is
model for Tigrinya based on RoBERTa, which we refer to cost demanding. For example, high-quality human anno-
as TigRoBERTa. Our model is then fine-tuned on down- tated dataset are essential for training NLP model for tasks,
stream tasks on a more specific target NER and POS tasks such as NER, POS tagging, sentiment analysis, and text
with limited data. Finally, we further enhance the model classification, but creating an annotated dataset is expen-
performance by applying semi-supervised self-training us- sive and time-consuming. According to [2], recent studies on
ing unlabeled data. The experimental results show that the named entity recognition relied on the use of hand-crafted
method achieved 84% F1-score for NER and 92% accuracy features and very large knowledge resources, which is time-
for POS tagging, which is better than or comparable to the consuming and not appropriate for low-resource languages.
baseline method based on the CNN-BiLSTM-CRF.
So far, there have been efforts to create datasets for low-
resourced languages; for example, authors in [9] created a
CCS Concepts NER dataset for ten South African languages based on gov-
•Information retrieval → Document representation; ernment data. Additionally, [29] created a dataset for some
African languages (such as Amharic), but they are not pub-
licly available. Another attempt by [1] also created a NER
Keywords dataset for 10 African languages; unfortunately, Tigrinya
Named entity recognition, POS tagging, pre-trained lan- was not included in their study. To our knowledge, the only
guage model, low-resource language, semi-supervised learn- publicly available dataset for Tigrinya is the Nagaoka corpus
ing [33], which can only be used for POS tagging task.
As for the state-of-the-art NLP tasks, many recent works use
Transformer architectures [35] that have been pre-trained for
1. INTRODUCTION language modeling tasks. Recent studies have shown that
Tigrinya (also known as Tigrigna) is a Semitic (Afro-Asiatic) transfer learning by fine-tuning pre-trained language models
language spoken by an estimated population of more than 9 [7, 18, 15] can improve the performance of downstream tasks.
million1 in Ethiopia and Eritrea. Semitic languages belong The presence of pre-trained language models play a crucial
to a language family that includes modern languages, for role in the development of NLP tasks. Multilingual Trans-
example, Tigrinya, Tigre, Amharic, Hebrew, and Arabic 2 . former models [25, 6] have benefited several low and rich
resource languages. However, Tigrinya language was not
1
https://utalk.com/en/store/tigrinya considered in the pre-training of multilingual Transformer
2
https://www.ucl.ac.uk/atlas/tigrinya/language.html models. Furthermore, several deep learning and supervised
studies [8, 16, 11, 5, 38, 41] use multiple processing layers to
Copyright is held by the authors. This work is based on an earlier work: SAC’22
Proceeding of the 2022 ACM Symposium on Applied Computing, Copyright learn a hierarchical representation of data and have achieved
2022 ACM 978-1-4503-6866-7. https://doi.org/10.1145/3477314.3507066 better results in many NLP domains.
APPLIED COMPUTING REVIEW SEPT. 2022, VOL. 22, NO. 3 56

Those having observed, we develop in this paper the first 2. LITERATURE REVIEW
publicly available dataset for Tigrinya tagged with Named
Entity Recognition (NER). Our dataset has two versions 2.1 Natural Language Processing for Tigrinya
(V1 and V2): V1 includes 69,309 tokens with 3,625 sen-
tences and five different entity classes, i.e., person (PER),
and other Low-resource Languages
place (LOC), organization (ORG), date & time (DATE), In recent years, there have been several studies NLP for
and miscellaneous (MISC). Similarly, V2 includes 40,627 to- Tigrinya, such as neural machine translation (NMT) [23,
kens annotated from 2,019 sentences with five entity classes. 32], text classification [10], and POS tagging [34, 31].
Then, we introduced a pre-trained language model for Tigri- Unlike rich-resource languages like English, there is almost
nya language, which is a RoBERTa-based [18] model. The no available Tigrinya corpus, which makes it difficult for re-
language model was trained exclusively on Tigrinya corpus searchers to develop tools. To the best of our knowledge, the
using the Masked Language Modeling (MLM) task and has only publicly available labeled corpus for Tigrinya language
the same size as the RoBERTa-base model. We named is the Nagaoka POS tagging corpus [33], which contains gold
the language model TigRoBERTa (Tig refers to Tigrinya, POS labeled with 72,080 tokens and 4,656 sentences.
and RoBERTa refers to the Transformer model used). We The work with the Nagaoka corpus proposed a method for a
then apply TigRoBERTa on two different downstream se- POS tagging using the traditional supervised machine learn-
quence labeling tasks: NER and POS tagging. For this pur- ing approaches [33]. The authors evaluated traditional ma-
pose, we apply fine-tuning to the TigRoBERTa using the re- chine learning methods: Conditional Random Field (CRF)
spective labeled datasets. Furthermore, we propose a semi- and Support Vector Machine (SVM). The original Nagaoka
supervised self-training approach for Tigrinya to augment Tigrinya POS tagging corpus contained 73 labels and was
the training data, and achieve a better performance. We reduced to 20, achieving an accuracy of 89.92% and 90.89%
further explore the CNN-BiLSTM-CRF as a baseline model for SVM and CRF, respectively.
for NER and POS tagging tasks. A pre-trained word2vec
[22] embedding purely trained on Tigrinya corpus was used Another study on Tigrinya POS tagging using the Nagaoka
to initialize the CNN-BiLSTM-CRF model. corpus was also conducted by [34]. The authors evaluated
Deep Neural Network (DNN) classifiers: Feed Forward Neu-
The experimental result shows that TigRoBERTa achieved ral Network (FFNN), Long Short-Term Memory method
an F1-score of 84% for NER and 92% accuracy for POS (LSTM), Bidirectional LSTM, and Convolutional Neural Net-
tagging. Similarly, our baseline model CNN-BiLSTM-CRF work (CNN) using word2vec neural word embeddings. They
model’s experimental performance on the word2vec Tigrinya reported that the BiLSTM approach was suitable for POS
embedding resulted in an f1 score of 68.86% for the NER tagging and achieved 91.1% accuracy.
task and a POS tagging accuracy of 94%.
Moreover, a work by [32] investigated the effects of morpho-
This paper is an extended version from a paper published in logical segmentation on the performance of statistical ma-
the Knowledge and Natural Language Processing track at chine translation from English to Tigrinya. They performed
ACM SAC 2022 conference [40]. Our contributions can be a segmentation to achieve better word alignment and to re-
summarized as follows: duce vocabulary dropouts, thereby improving the language
model in both languages. Furthermore, they explored two
• Develop the first dataset tagged with named entity segmentation schemes, i.e., one based on longest affix seg-
recognition for Tigrinya and release it publicly. mentation and another based on fine-grained morphological
• Develop and release a language model pre-trained ex- segmentation.
clusively on Tigrinya language corpus. Another study by [10] investigated text classification based
• Introduce supervised, semi-supervised, and transfer le- on CNN-BiLSTM for Tigrinya. They created a manually
arning techniques for Tigrinya language in a NER and annotated dataset of 30,000 documents from Tigrinya news
POS tagging tasks. with the six different categories of “sports”, “agriculture”,
“politics”, “religion”, “education”, and “health”, as well as
• We propose a semi-supervised self-training approach none annotated corpus of more than six million words. How-
that can yield comparable performance to the super- ever, they did not make their corpus publicly available. They
vised learning in Tigrinya language. evaluated word2vec and fasttext word embedding in classifi-
cation models by applying CNNs to Tigrinya news articles.
The rest of this paper is organized as follows: Section 2 A work in [1] investigated the NER task for ten African lan-
provides a brief literature review of previous NLP work on guages and created and published a NER dataset for each
Tigrinya and other low-resource languages. Section 3 presents language. They also investigated cross-domain transfer with
the development and annotation process of the Named En- experiments on five languages using the Wiki-Ann dataset
tity Recognition dataset. Section 4. describes the proposed and a cross-lingual transfer for low-resource named entity
language model and semi-supervised self-training method. recognition.
Section 5. discusses the experimental results in the NER
and POS tagging datasets using the TigRoBERTa and CNN-
BiLSTM-CRF models. Section 6 shows the error analysis of 2.2 Semi-supervised Learning
supervised and semi-supervised self-training study. Finally, The authors in [28] investigated a semi-supervised NER us-
Section 7 presents the conclusions and directions for future ing a graph-based label propagation algorithm for Amharic
work. named entity recognition problem. Their experiment uses

Table 1: Entity tags with their number of frequency presented
in the corpus.
Entity V1 Tokens V2 tokens Total
PER 2095 1181 3276
LOC 3333 1489 4822
ORG 2612 1999 4611
DATE 2881 1499 4380
MISC 369 411 780
O 58019 34048 92067
Total tokens 69309 40627 109,936
Total sentences 3625 2019 5644
Table 2: Inter-annotator agreement for our dataset calculated

Figure 1: Distribution of different text types in the corpus
using Cohen kappa for each entity tag. Disagreements were
resolved by discussion.
a few labeled examples and massive unlabeled data to be Entities Kappa Statistics %
trained. In addition, they employ an iterative algorithm to PER 0.98
propagate labels through the dataset. LOC 0.96
ORG 0.95
A work by [36] created a word sense disambiguation (WSD) DATE 0.93
prototype model for Amharic words using a semi-supervised MISC 0.85
learning technique to generate training sets that reduce man-
ual intervention and increase learning accuracy. They used
five confusing terms from Amharic language to create the “fidel”, meaning “script” or “letter.”
Amharic WSD prototype. They constructed their method
using supervised and semi-supervised learning techniques in Tigrinya is written from left to right in the Ethiopic script,
their study. Hence, the unlabeled data is labeled using a su- which is an abugida. In abugida scripts, each symbol rep-
pervised learner trained on the labeled data, and the most resents a pair of consonant sounds and a vowel sound. In
confident labeled samples augment the training set. Then, Ethiopia, Tigrinya native speakers in the Tigray region are
the final classification task was carried out on a labeled train- called “tigraway” for males, “tigraweyti” for females, and
ing set using the WEKA package’s Ada-boost, bagging, and “tigrawot” or “tegaru” as a group. The dialects of Tigrinya
AD tree classification algorithms. differ in sound, spelling, and grammar3 .
The work reported in [12] investigated the sentiment anal-
ysis task for a low resource settings. In order to improve 3.2 Corpus Annotation
the performance on a low resource sentiment classification According to [20], a language is considered to a be low-
tasks, they investigated semi-supervised and transfer learn- resource if there are no (or not enough) annotated corpora:
ing methods. They explore several techniques to improve name dictionaries, appropriate morphological analyzers, POS
sentiment classification system performance, such as extrac- taggers, and a tree-bank in that language. Developing a cor-
tion of dense feature representations, pre-training, and man- pus for a low-resource language like Tigrinya is very difficult
ifold regularization. Furthermore, study in [17] demonstrates for at least two reasons. First, there is no annotation tool for
the application of semi supervised sentiment analysis to a low-resource languages. Second, developing a labeled corpus
language with a limited resources. A readily available train- is expensive and time consuming. To address this problem,
ing corpus and a recently developed general sentiment lexi- we develop a newly labeled corpus for named entity recog-
con for Norwegian were both utilized in their study. Their nition in Tigrinya language.
experimental results demonstrate that using the sentiment
This study introduced two versions of a dataset for Tigrinya
lexicon considerably enhances performance, and the Support
language tagged with named entity recognition. In version
Vector Machine (SVM) achieved the best performance.
1 (V1), we annotated 69,309 tokens from 3,625 sentences;
As can be observed from the above survey, to our knowledge, similarly, in version 2 (V2), we annotated 40,627 tokens and
none of the existing research works has proposed neural- contained 2,019 sentences. Since there is no any annotator
network approaches to the NER task for Tigrinya. tool for Tigrinya language, our corpus was annotated man-
ually.
3. TIGRINYA NER CORPUS The corpus contains sentences from 2015-2021 on various
topics. Figure 1 shows the distribution of different types of
text topics in the corpus. We annotated five entity types:
3.1 Tigrinya Script person name (PER), location (LOC), organization (ORG),
Tigrinya uses the Ge’ez script. Ge’ez is a script used as an date and time (DATE), and miscellaneous (MISC) using the
abugida (alphasyllabary) for several Afro-Asiatic and Nilo-
Saharan languages in Ethiopia and Eritrea in the horn of 3
https://blog.amara.org/2021/08/04/new-to-amara-
Africa. In Amharic and Tigrinya, the script is often called tigrinya/amp/

Figure 2: An example of Named Entity Recognition in English and Tigrinya. PER, LOC, and DATE are identified as entities.
BIO standard. The annotated tags were inspired by the the inter-annotator agreement analysis. The agreement be-
English CoNNL-2003 corpus [27]. In addition, we follow the tween the annotations PER and LOC is relatively high. Fur-
MUC6 [30] annotation guidelines. thermore, the kappa agreement for MISC was low compared
to the other entities. Thus, the tag MISC was the most
In the following, we summarize the annotation guidelines for
difficult tag for our annotator. The goal of our annotation
the five classes.
technique is to produce a high-quality corpus by ensuring
high annotator agreement.
PER personal name including first name, middle name, and
last name. Personal names that refer to an organiza-
tion, location, events, law, and prizes were not tagged 4. PROPOSED METHOD
with the PER tag.
4.1 Overview
LOC includes all country names, region names, state names
In this work, we propose a new pre-trained RoBERTa-based
and city names, non-gpe locations like (mountain name,
language model for Tigrinya language.
river name and, body of water).
Language models are trained on an extensive unsupervised
ORG can be grouped as proper names that include all kinds text corpus to predict the next word in a sentence. The pre-
of organizations, sports teams, multinational organiza- trained language models can then be further trained using
tions, political parties, unions and, proper names re- a small supervised dataset by slightly altering the behavior
ferring to facilities. of the model. Training a neural network on a small amount
of data may result in over-fitting or under-fitting. However,
DATE absolute date expressions denote a particular seg- Those pre-trained language models can transfer their knowl-
ment of date, i.e., a particular day, season, final quar- edge by further training on more specific downstream tasks
ters, years, a decade, or a specific century. with a small dataset, such as: named entity recognition,
POS-tagging, question answering, and text classification.
MISC includes other types of entities, e.g., events, specific
disease names, etc. Figure 3 shows an overview of our proposed language model,
the source data used, and the architecture used to generate
O is used for non-entity tokens. our model. Finally, it shows further model training using the
fine-tuning technique for different downstream tasks, such as
NER and POS tagging.
The annotation process was carried out by three paid and
four volunteer human annotators who have a linguistic back-
ground and are native speakers of Tigrinya. Table 1 shows
4.2 Transformer-based Architectures
the frequency of each entity tag. The corpus was annotated Our model exploits the well-studied Transformer [35]. It
according to the established Beginning, Inside, and Outside uses an encoder and decoder architecture for converting one
(BIO) scheme, where “B” indicates the first word of the en- sequence to another sequence. The encoder takes as input a
tity; “I” indicates the remaining words of the same entity, sequence and converts it into an embedding which is a vector
and the “O” indicates that the tagged word is not a named representation of the input. The decoder as input takes an
entity. Our corpus will be publicly available on GitHub 4 embedding and converts it into a sequence. The encoder and
for research purposes. decoder consist of several multi-headed attentions stacked
on top of each other. However, recent approaches, such as
To validate the annotation quality, we report inter-annotator BERT, AlBERT, RoBERTa, GPT3, and XLNeT [7, 15, 18,
agreement scores in Table 2 using Cohen kappa [21] for all 39, 26], use the Transformers to create embeddings that can
entity tags. We calculated the inter-annotator agreement be used for other tasks.
between two annotation sets. Table 2 shows the results of
In this study, we use RoBERTa [18], which is a Transformer-
4
https://github.com/mehari-eng/Tigrinya-NER based model and is a replication of BERT (Bidirectional En-

Figure 3: Proposed language model, source text data used for pre-training, the architecture used, i.e., RoBERTa, generating
TigRoBERTa model and fine-tuning TigRoBERTa on NER and POS tagging tasks.
(large) encoders.
RoBERTa was developed by Facebook [18] and aimed to op-
timize training BERT. RoBERTa was introduced to improve
BERT’s training methods and shared a similar architecture
with BERT. RoBERTa modifies key hyper-parameters in
BERT, including removing BERT’s next sentence predic-
tion task objective to train longer sequences and introduc-
ing dynamic masking. RoBERTa also changes the training
of BERT with much larger mini-batches and learning rates.
This allows RoBERTa to improve the MLM objective com-
pared to BERT and leads to better performance on down-
stream tasks. Moreover, RoBERTa has two models: the
base model and the large model. The RoBERTa base model
consists of 12 layers with a hidden size of 768 and 278M
parameters, while the RoBERTa large model has 24 layers
with a hidden size of 1024 and 550M parameters.
4.3 TigRoBERTa Language Model

Recent studies [7, 18, 15] have shown that pre-training lan-
guage models improve the performance of many NLP tasks.
Several rich and low-resource languages have benefited from
multilingual Transformer models [25, 6]. Unfortunately, the
Tigrinya language is not considered in the pre-training of
the multilingual Transformer language models. To solve
this problem, we have pre-trained a new language model for
Figure 4: Structure of the language model getting input with Tigrinya. Our model is purely trained on the Tigrinya cor-
masked token and producing a prediction of the masked to- pus, and we name it the TigRoBERTa (Tig refers to Tigrinya
ken. and RoBERTa refers to the Transformer model used). The
language model has 12 layers (encoders), 768 hidden dimen-
sions, 12 attention heads, and a maximum sequence length
coder Representations from Transformers) [7]. BERT uses
of 512. Furthermore, each layer consists of Self-Attention
two training strategies: Masked Language Modeling (MLM)
and Feed Forward Neural Network sub-layers.
and Next Sentence Prediction (NSP). In the MLM phase,
15% of the words are replaced with [MASK] tokens, then
the model tries to predict the original masked word. In the 4.3.1 Data preparation
NSP phase, The model concatenates two masked sentences The dataset used for pre-training TigRoBERTa was com-
as inputs during pre-training. Sometimes they correspond piled from various Tigrinya online platforms, mainly several
to sentences that were next to each other in the original news portals, and some freely available e-Books, including
text, sometimes not. The model then predicts whether or the Bible. Unlike RoBERTa and other Transformer-based
not the two sentences followed each other 5 . In addition, the language models, the model was trained on a small dataset
architecture of BERT uses a stack of either 12 (base) or 24 with a small number of epochs.
5
https://huggingface.co/bert-base-uncased To make our Tigrinya corpus, we carefully investigated it

Table 3: Hyper-parameter setting for CNN-BiLSTM-CRF
model for NER and POS tagging experiments.
Hyper-parameter NER & POS
Char window size 3
Char number of filters 30
Dropout ratio 0.5
Batch size 10
Learning rate 0.01
Decay rate 0.05
Gradient clipping 5
and found that it contained documents from different lan-

guages, especially English, Amharic, etc. However, we re- Figure 5: The proposed architecture of the semi-supervised
moved those documents from the corpus. As a result, it was self-training method.
reduced from 4.5 million sentences (approx. 800 MB) to 4.3
million sentences (approx. 750 MB).
the weights and our new layers will be updated during this
4.3.2 Tokenizer training.
We used a Byte Pair Encoder (BPE) tokenizer with a vo-
cabulary size of 50,265 units. We replaced the default BPE 4.5 Semi-supervised Learning
tokenizer of RoBERTa with the Tigrinya tokenizer. Using We apply a semi-supervised self-training method to make
BPE allows learning a sub-word vocabulary of model size our model predictions more robust by utilizing large unla-
that can encode any input without obtaining “unknown” beled corpus. Our idea is to utilize both labeled and unla-
tokens. The model’s inputs consist of 512 continuous to- beled datasets to improve the model, addressing the prob-
kens that can span multiple documents. Special tokens are lem of low-resource languages, e.g., Tigrinya, that they lack
added to the vocabulary to represent the beginning (<s>) available language resources for NLP tasks, such as NER,
and end (</s>) of the input sequence. In addition, we used POS tagging, and sentiment analysis.
the following special tokens: unknown sub-strings occurring
during the inference (<unk>), padding for short sentences We present the proposed semi-supervised self-training ap-
(<pad>), and masked input token in an input sentence proach in Figure 5. We perform the semi-supervised learning
(<mask>). as follows:
4.3.3 Model training 1. The pre-trained TigRoBERTa language model is fine-

tuned using labeled dataset Dl on NER (or POS tag-
TigRoBERTa employs the RoBERTa’s architecture, which
ging) to obtain a fine-tuned model. For the NER task,
is an improvement of BERT. For this reason, we trained
we used the V1 dataset and V2 for evaluation. As for
TigRoBERTa with a masked language modeling task as pro-
POS, we fine-tuned the model using the Nagaoka cor-
posed by BERT and RoBERTa. This involves masking part
pus. Since the Nagaoka is the only publicly available
of the input, about 15% of the tokens, and then learning a
POS tagging dataset, we split the dataset into two sets
model that predicts the missing tokens. MLM is often used
and used 50% for fine-tuning and the rest for evaluat-
in pre-training tasks to allow the model to learn text pat-
ing.
terns from unlabeled data. Figure 4 shows predicting the
given masked token. The optimizer used is Adam with a 2. We used the fine-tuned model to predict the unlabeled
learning rate of 3e-4, a weight decay of 0.01, warming up dataset.
for 5,000 steps and maximum steps 100k with 512 sequence
length for 8 epochs. Once we pre-trained our mode, we fine- 3. For each prediction of unlabeled token, we check the
tuned on each downstream task as discussed in section 5.2.2. confidence value output form the mode and add the
label to the training dataset if the confidence value is
4.4 Supervised Fine-tuning higher than the predefined threshold (τ = 0.85, 0.90,
or 0.95).
Fine-tuning is a technique of model re-usability by squeezing
a model previously trained to do a specific task and making 4. We repeat the same process (1–3) by using the aug-
it complete a related task. After training TigRoBERTa on mented labeled dataset for five times. The number of
the unsupervised dataset, we applied the fine-tuning tech- iteration may be changed depending on the dataset.
nique on our downstream tasks. Specifically, during the fine-
tuning, we modify the final layer of the 12 layers in the model
as follows; we added a new layer for classifying the class of
named entities (or the POS tag of a word) for NER tasks 5. EXPERIMENTAL EVALUATION
(or POS tagging tasks, resp.). We only retain from scratch In this section, we present the experimental evaluation of the
the classification, which is the last layer. Then, we train proposed methods on NER and POS tagging tasks. Specif-
the model using our labeled dataset (NER and POS). Only ically, in addition to fine-tuning TigRoBERTa and semi-

Table 5: We report the f1-score for every entity tag on a test
set using CNN-BiLSTM-CRF model results (i.e., f1-score).
For the PER, LOC, ORG, DATE, and MISC tags with differ-
ent word2vec embedding settings.
Emb. Dim. PER LOC ORG DATE MISC
Rand 100 70.07 64.76 45.78 60.15 41.23
W2Vec 50 77.54 67.58 50.31 57.75 46.18
W2Vec 100 78.95 70.34 59.67 57.14 54.51
W2Vec 200 76.73 68.54 52.46 58.90 49.06
W2Vec 300 77.68 67.83 59.67 57.34 51.76
Table 6: Evaluation of CNN-BiLSTM-CRF using differ-

ent embedding settings and comparing to similar works
published. The three sections are in order, our mod-
els, experimented with (BiLSTM+relu, BiLSTM+softmax,
LSTM+relu, and LSTM), and experimented with SVM &
CRF. All experiments were performed on the Nagaoka cor-
pus for POS tagging task. Results are given in accuracy.
Algorithm Emb. Dim. Accuracy
Random 100 90.2
Word2Vec 50 93.1
CNN-BiLSTM-CRF Word2Vec 100 94
Word2Vec 200 93
Figure 6: Architecture of CNN-BiLSTM-CRF model. Word2Vec 300 93
BiLSTM+Relu Random 100 91.1
BiLSTM+Softmax Random 100 89.6
Table 4: Evaluation of CNN-BiLSTM-CRF using different LSTM+relu Random 100 89.1
word2vec embedding settings on NER. The results are pre- LSTM Random 100 89
sented using the f1-score on a test set. SVM - - 89.92
Embedding Dimension F1-score CRF - - 90.89
Random 100 63.05
Word2Vec 50 66.44
Word2Vec 100 68.86
a distance.
Word2Vec 200 66.72
Word2Vec 300 67.64 For example, distributional pre-trained word embeddings
such as fasttext, glove, and word2vec [4, 24, 22], are trained
with a neural network. They then encode similarities based
supervised self-training method we employ a CNN-BiLSTM- on the context in which they appear and are used as input
CRF model as a baseline and compare their performance. to deep neural network classifiers. In this work, we used
the word2vec method to pre-train Tigrinya embedding. The
word2vec architecture was proposed by [22] and released in
5.1 Experimental Setup 2013. As part of the word2vec method for learning word
embeddings, it includes two separate components: Continu-
5.1.1 Baseline: CNN-BiLSTM-CRF ous Bag of Words (CBOW) and skip-gram that can capture
As the baseline method, we employ CNN-BiLSTM-CRF mo- contextual word-to-word relationships in multidimensional
del. This model architecture was proposed by [19]. We space as a preliminary step for predictive models used for
first use a Convolutional Neural Network (CNN) to encode semantics retrieval. Figure 7 shows that when the context
a word into its character-level representation for each in- words are given, based on the neighborhood of the target
put vector. Then, we concatenate the characters and the word, the CBOW model predicts the target word. Skip-
word-level representation and pass them to the BiLSTM gram models are the opposite of CBOW models. The in-
layer. The CRF layer produces sentence-level tag informa- put layer gets Wn ={W(i−2) ,W(i−1) , ..., W(i+1) ,W(i+2) } as
tion for sequence prediction. To initialize our model, we an arguments, where Wn represents words. In this study
pre-trained a word2Vec word embedding for Tigrinya with a word2vec embedding with 50, 100, 200, and 300 dimen-
different dimensions. Figure 6 shows the architecture of the sions (using 4.3 million Tigrinya sentences) and a random
CNN-BiLSTM-CRF model. with 100 dimensions using a labeled corpus (NER =3,625
and POS tagging = 4,656 sentences) with a window size of
Word embedding. A word embedding is used to map a word
2 were pre-trained.
to its vector representation in such a way that words with
similar meanings are mapped to points that are close with Figure 8 shows the trained wor2vec Tigrinya embedding,
each other, while dissimilar words are mapped to points at and similar words in the corpus are placed close together.

• The output gate: determines the value of the next hid-
den state and contains information about previous in-
puts.
Conditional Random Field (CRF). The CRF [14] is a class

of statistical modeling methods commonly used in pattern
recognition and machine learning for structured predictions.
On top of the LSTM layer, the CRF decodes the labels for
the whole sequence.
5.2 Experimental Results

For NER tasks, we used the precision, recall, and F1 score;
similarly, we used accuracy to evaluate the POS perfor-
mance.
5.2.1 Baseline Model Result

Table 4 gives the f1-score obtained by CNN-BiLSTM-CRF
model on Tigrinya NER dataset. Named Entity Recogni-
tion (NER) is the task of identifying and categorizing en-
Figure 7: CBOW and Skip gram architecture.
tities in texts. Given a corpus of text, NER attempts to
find and classify named entities in a given corpus. In this
experiment, we trained the CNN-BiLSTM-CRF model on
our NER dataset, which contains 5 entity tags, 3,625 sen-
tences, and 69,309 tokens. We experimented with different
word embedding dimensions and random sampling methods
to initialize the CNN-BiLSTM-CRF model.
Table 4 illustrates the performance of five different word2vec
embedding dimensions and random samples. In this exper-
iment, we found that using the pre-trained word embed-
ding improves performance compared to the random em-
beddings. The CNN-BiLSTM-CRF model achieved the best
result with an f1-score of 68.86 on the word2vec 100 dimen-
sion. Similarly, Table 5 shows the f1-score for PER, LOC,
ORG, DATE, and MISC entity tags. Furthermore, Table 6
gives the accuracy result on POS tagging task.
For POS tagging, we used the publicly available dataset [33],
developed by Nagaoka. In Table 6, we report our experi-
mental results using different sets of word2vec Tigrinya em-
Figure 8: Training word2vec embedding for Tigrinya. beddings and compare our results with previous studies on
the same dataset. The three sections are in order: our re-
sult, experimented with (BiLSTM+relu, BiLSTM+softmax,
We used the Gensim Python library 6 to train the Tigrinya LSTM+relu, and LSTM) and experimented with SVM &
embedding with the default parameters. CRF. The results are reported in terms of accuracy. Ta-
ble 6 show that our model improved 3.11 - 4.08% points of
LSTM Layer. Recurrent Neural Networks (RNN) [3] are a accuracy, scoring 94% compared with SVM and CRF [33].
class of neural networks that are powerful in sequence mod- And 2.9 - 5% points improvement over the [34], which they
eling of data. Theoretically, RNNs can handle long-term considered a random initialization with word2vec 100 dimen-
dependencies, but in practice, they fail due to gradient van- sion. Thus, our model outperformed the highest accuracy
ishing. Long Short Term Memory (LSTM) was introduced scores reported previously.
by [13] and was designed to avoid the problem of long-term
dependencies. LSTM has three gates: the forget gate, the
input gate, and the output gate. 5.2.2 Fine-tuning TigRoBERTa experiments
A language model is first trained on a large corpus for an ini-
tial task and then fine-tuned for various downstream tasks.
• The forget gate: decides what information to throw
The technique of fine-tuning a pre-trained language model
away from the cell state.
[37] is widely used and has improved various NLP tasks.
This section shows the results of fine-tuning TigRoBERTa
• The input gate: decides what information to store in
for NER and POS tasks on the labeled dataset.
the cell state.
In Table 7 we report precision, recall, and f1 score for NER
6
https://pypi.org/project/gensim/2.3.0/ on the manually labeled dataset described in the section 3.

Figure 9: F1 score of the overall model and for all entity classes in the dataset: PER, LOC, ORG, DATE, and MISC. The
left side of the figure shows supervised learning. Similarly, the right side of the figure is semi-supervised self-training using the
threshold value of τ =0.95.
Figure 10: NER f1 score for all entity classes: PER, LOC, ORG, DATE, and MISC across all the models.
For this experiment, we evaluated TigRoBERTa on V1 and

V2 dataset. First we experimented with V1 which con- Table 7: Fine-tuning TigRoBERTa on NER (labeled dataset).
structed from 69,309 token samples. Hence, TigRoBERTa First experimented on V1. Second, experimented by combin-
achieved 80, 82, and 81 precision, recall, and f1-score, respec- ing the V1&V2 datasets. Evaluation metrics are prec, rec,
tively. Our finding shows that the language model trained and f1-score.
Dataset Precision Recall F1
on the V1 dataset improves the performance of the f1 score
V1 80 82 81
by 12% over the baseline model: CNN-BiLSTM-CRF (i.e.,
V1 & V2 83 86 84
68.86%) shown in Table 4.
Next, we combined V1 and V2 datasets to increase the num-
ber of samples. As a result, we obtained 109,936 total to- RoBERTa model on POS tagging task. For this experiment,
kens; then, we continued our experiments with the combined we used the Nagaoka manually labeled dataset with 72,072
dataset and achieved 84 % of f1-score, 83 and 86 percent of tokens and presented results with precision, recall, f1-score,
precision and recall, respectively. In this investigation, the and accuracy. Thus our primary measurement for this ex-
model improved a 3% f1 score over the experimental result of periment is accuracy. The model for the POS tagging task
V1 dataset. The combined V1 & V2 dataset is substantially obtained 92% of accuracy. However, in this experiment, we
more extensive than the V1 dataset, which can be why the did not notice an improvement compared to the baseline
experimental setting using the combined V1 & V2 produced model: the CNN-BiLSTM-CRF achieved 94%.
better results. In Figure 9 left side, we report the f1 score
of all classes: PER, LOC, ORG, DATE, and MISC of the
combined (V1&V2) dataset. Similarly, Figure 10 shows the 5.2.3 Semi-supervised self-training experiments
f1 score of all entity classes across all models. Table 9 shows the effect of semi-supervised self-training on
Furthermore, Table 8 shows the accuracy of fine-tuning Tig- NER using different settings. Since there is no publicly
accessible dataset for Tigrinya NER problem, we used our

Table 8: Fine-tuning TigRoBERTa on POS tagging with Na- Table 10: Experimental results of semi-supervised self-
gaoka dataset. Evaluation metrics are precision, recall, f1- training on POS tagging. Results are given in accuracy, f1-
score, and accuracy. score, prec, and rec.
Dataset Precision Recall F1 Accuracy Dataset Precision Recall F1 Accuracy
Nagaoka 90.20 89.80 90 92 τ = 0.85 80 81 81 86
τ = 0.90 82 80 81 86
τ = 0.95 83 82 83 87
Table 9: Semi-supervised self-training on NER using different
confident scores (τ = 0.85, 0.90, and 0.95) and then evaluated
on the V2 dataset.
Dataset Precision Recall F1
τ = 0.85 80 79 80
τ = 0.90 81 79 80
τ = 0.95 80 81 81
dataset (V2), mentioned in section 3, to evaluate our model.

For NER, first TigRoBERTa was fine-tuned on V1 dataset.
Then, the fine-tuned model was used to predict the unla-
beled dataset and evaluated on the V2 dataset. Based on the
selection mechanisms discussed in Section 4.5. Table 9 shows
the NER results obtained by semi-supervised self-training
with different threshold (τ ) values. In the first experiment,
we select the samples that meet the requirement by setting
the threshold (τ ) to 0.85. Our investigation show that the
model achieved 80%, 79% and, 80% precision, recall and
f1-score respectively.
Next, we investigated by setting the threshold (τ ) value to
0.90. Experimental result shows that the model obtained a
precision of 81%, recall 79%, and f1 score of 80%. Finally,
we set the threshold value to 0.95. In this investigation, the
model obtained 80%, 81%, and 81% precision, recall, and f1
scores, respectively.
Furthermore, the right side of Figure 9 shows the f1 scores
for all classes: PER, LOC, ORG, DATE, and MISC for
semi-supervised self-training with a threshold value setting
of τ =0.95. Figure 11: Comparison between the supervised and semi-
supervised self-training models with their prediction results.
As for POS tagging, we investigated with a similar setting
to NER (τ =0.85, 0.90 and, 0.95). Table 10 shows the ex-
perimental results in terms of precision, recall, f1 score, and matrix of both models in Table 11 and Table 12.
accuracy for POS. Accuracy is our primary evaluation met-
ric. Hence the experimental setting with a confidence score Furthermore, the ground truth and predicted tokens for both
of (τ =0.85) obtained 86 % accuracy. Similarly, (τ =0.90, and models are printed out as part of our subsequent exami-
0.95) obtained an accuracy of 86%, and 87% respectively. nation of the instances; an illustration of this is shown in
Figure 11.
We found that both models misclassified the ground truth
6. ERROR ANALYSIS tag ‘O’ as ‘ORG’. In the same way, self-training predicts
In this section, we analyze the errors in semi-supervised ground truth ‘O’ tag with ‘LOC’ and ‘DATE’ tags. Since
self-training to determine why the self-training performing some words are present in training but not in the test set,
a comparable result to fine-tuning on the supervised learn- the errors could come from both models.
ing on the manually annotated dataset and we focus on the
As a result, we observe that a general trend in utilizing semi-
NER task for further analysis.
supervised self-training gains a comparable performance to
We compare the predictions of the supervised fine-tuning supervised learning, especially when training the model with
model of the V1 dataset and the best-performing semi-super- a confidence score of τ = 0.95 improves the performance over
vised self-training (τ = 0.95). We use the development set the τ = 0.90 and 0.85. However, even though the training
of the V2 dataset for the error analysis. For the supervised sample on the semi-supervised self-training is much bigger
fine-tuning, we trained on the V1 and evaluated on V2. As than the supervised learning, the model could not outper-
for the self-training, we trained on the unlabeled prediction form. One possible explanation for this behavior is that su-
and assessed on the V2 dataset. And we provide a confusion pervised learning is trained on the human-labeled dataset,

Entity Recognition for African Languages.
Table 11: Confusion matrix trained on the labeled dataset Transactions of the Association for Computational
(V1) and evaluated on V2. Linguistics, pages 1116–1131, 2021.
O 9241 65 24 45 73 15
[2] N. Alsaaran and M. Alrabiah. Arabic Named Entity
ORG 22 422 1 16 1 0
Recognition: A BERT-BGRU approach. Computers,
PER 13 0 353 4 1 0
Materials & Continuan, 68(1):471–485, 2021.
LOC 24 20 9 407 1 0
[3] Y. Bengio, P. Simard, and P. Frasconi. Learning
DATE 21 2 0 4 459 2
long-term dependencies with gradient descent is
MISC 7 9 2 1 3 98
difficult. IEEE transactions on neural networks,
O ORG PER LOC DATE MISC
5(2):157–166, 1994.
[4] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov.
Table 12: Confusion matrix for semi-supervised self-training Enriching Word Vectors with Subword Information.
trained on the predicted unlabeled dataset with τ = 0.95 Transactions of the Association for Computational
confidence score and evaluated on the V2 dataset. Linguistics, 5:135–146, 2017.
O 9028 159 44 70 146 14 [5] J. P. Chiu and E. Nichols. Named Entity Recognition
ORG 102 316 0 42 0 2 with Bidirectional LSTM-CNNs. Transactions of the
PER 26 6 330 5 1 3 Association for Computational Linguistics, 4:357–370,
LOC 54 25 4 370 1 7 2016.
DATE 44 3 2 11 426 2 [6] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary,
MISC 9 13 1 1 5 91 G. Wenzek, F. Guzmán, É. Grave, M. Ott,
O ORG PER LOC DATE MISC L. Zettlemoyer, and V. Stoyanov. Unsupervised
Cross-lingual Representation Learning at Scale. In
Proceedings of the 58th Annual Meeting of the
in which the model helps to improve the performance. Association for Computational Linguistics, pages
8440–8451, 2020.
[7] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova.
7. CONCLUSIONS BERT: Pre-training of Deep Bidirectional
Tigrinya is poorly researched due to the lack of compre- Transformers for Language Understanding. arXiv
hensive and freely available data. To address this prob- preprint arXiv:1810.04805, 2018.
lem, we have presented the first publicly available dataset [8] X. Dong, S. Chowdhury, L. Qian, X. Li, Y. Guan,
for Tigrinya tagged with named entity recognition using J. Yang, and Q. Yu. Deep learning for named entity
the BIO format. The dataset consists of two versions (V), recognition on Chinese electronic medical records:
where the V1 has 69,309 tokens and is constructed from Combining deep transfer learning with multitask
3,625 sentences. Similarly, the V2 dataset contains 40,627 bi-directional LSTM RNN. PloS one, 14(5):e0216046,
tokens and is built from 2,019 sentences. The dataset has 2019.
five entity classes: PER, LOC, ORG, DATE, and MISC. [9] R. Eiselen. Government Domain Named Entity
Furthermore, we presented a pre-trained RoBERTa-based Recognition for South African Languages. In
language model for Tigrinya, which we call TigRoBERTa. Proceedings of the Tenth International Conference on
TigRoBERTa was trained on unsupervised Tigrinya corpus Language Resources and Evaluation (LREC’16), pages
using the Masked Language Model (MLM) objective. Once 3344–3348, 2016.
the language model has been trained, we further trained it [10] A. Fesseha, S. Xiong, E. D. Emiru, M. Diallo, and
on a more specific with a limited labeled dataset for the A. Dahou. Text Classification Based on Convolutional
NER and POS tagging task using the fine-tuning strategy. Neural Networks and Word Embedding for
To increase the number of training samples and build a ro- Low-Resource Languages: Tigrinya. Information,
bust model, we introduced a semi-supervised self-training 12(2):52, 2021.
method for Tigrinya language in this study. [11] S. Gao and Y.-K. Ng. Generating Extractive
In addition, we have investigated the CNN-BiLSTM-CRF Sentiment Summaries for Natural Language User
model for NER and POS tagging tasks. The experimen- Queries on Products. ACM SIGAPP Applied
tal results show that TigRoBERTa outperforms the CNN- Computing Review, 22(2):5–20, 2022.
BiLSTM-CRF on the NER dataset. Similarly, the CNN- [12] R. Gupta, S. Sahu, C. Espy-Wilson, and
BiLSTM- CRF model also performs well on POS tagging. S. Narayanan. Semi-Supervised and Transfer Learning
Approaches for Low Resource Sentiment
We intend to extend the research in the future to further
Classification. In 2018 IEEE International Conference
NLP areas that have not previously been investigated for
on Acoustics, Speech and Signal Processing (ICASSP),
the Tigrinya and related languages.
pages 5109–5113. IEEE, 2018.
[13] S. Hochreiter and J. Schmidhuber. Long Short-Term
8. REFERENCES Memory. Neural Computation, 9(8):1735–1780, 1997.
[14] J. D. Lafferty, A. McCallum, and F. C. Pereira.
[1] D. I. Adelani, J. Abbott, G. Neubig, D. D’souza, Conditional Random Fields: Probabilistic Models for
J. Kreutzer, C. Lignos, C. Palen-Michel, H. Buzaaba, Segmenting and Labeling Sequence Data. In
S. Rijhwani, and S. Ruder. Masakhaner: Named

Proceedings of the Eighteenth International Conference Language Resources and Evaluation (LREC’16), pages
on Machine Learning, pages 282–289, 2001. 3273–3280, 2016.
[15] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, [30] B. M. Sundheim. Overview of Results of the MUC-6
and R. Soricut. ALBERT: A Lite BERT for Evaluation. In Proc. Sixth Message Understanding
Self-supervised Learning of Language Representations. Conference (MUC-6), pages 13–31, 1995.
arXiv preprint arXiv:1909.11942, 2019. [31] Y. Tedla and K. Yamamoto. Analyzing word
[16] T. Lê and M. Burtsev. A Deep Neural Network Model embeddings and improving POS tagger of tigrinya. In
for the Task of Named Entity Recognition. 2017 International Conference on Asian Language
International Journal of Machine Learning and Processing (IALP), pages 115–118. IEEE, 2017.
Computing, 9(1):8–13, 2019. [32] Y. Tedla and K. Yamamoto. Morphological
[17] P. Liu, C. Marco, and J. A. Gulla. Semi-supervised Segmentation for English-to-Tigrinya Statistical
Sentiment Analysis for Under-resourced Languages MachineTranslation. International Journal of Asian
with a Sentiment Lexicon. In INRA@ RecSys, pages Language Processing, 27(2):95–110, 2017.
12–17, 2019. [33] Y. K. Tedla, K. Yamamoto, and A. Marasinghe.
[18] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, Tigrinya Part-of-Speech Tagging with Morphological
O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov. Patterns and the New Nagaoka Tigrinya Corpus.
RoBERTa: A Robustly Optimized BERT Pretraining International Journal of Computer Applications,
Approach. arXiv preprint arXiv:1907.11692, 2019. 146(14):33–41, 2016.
[19] X. Ma and E. Hovy. End-to-end Sequence Labeling [34] S. G. Tesfagergish and J. Kapociute-Dzikiene. Deep
via Bi-directional LSTM-CNNs-CRF. arXiv preprint Learning-Based Part-of-Speech Tagging of the
arXiv:1603.01354, 2016. Tigrinya Language. In International Conference on
[20] M. F. Mbouopda and P. Melatagia Yonta. Named Information and Software Technologies, pages
Entity Recognition in Low-resource Languages using 357–367. Springer, 2020.
Cross-lingual distributional word representation. [35] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit,
Revue Africaine de la Recherche en Informatique et L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin.
Mathématiques Appliquées, 33:1–11, 2020. Attention Is All You Need. In 31st Conference on
[21] M. L. McHugh. Interrater reliability: the kappa Neural Information Processing Systems (NIPS), pages
statistic. Biochemia Medica, 22(3):276–282, 2012. 5998–6008, 2017.
[22] T. Mikolov, K. Chen, G. Corrado, and J. Dean. [36] G. Wassie, B. Ramesh, S. Teferra, and M. Meshesha.
Efficient Estimation of Word Representations in A Word Sense Disambiguation Model for Amharic
Vector Space. arXiv preprint arXiv:1301.3781, 2013. Words using Semi-Supervised Learning Paradigm.
[23] A. Öktem, M. Plitt, and G. Tang. Tigrinya Neural Science, Technology and Arts Research Journal,
Machine Translation with Transfer Learning for 3(3):147–155, 2014.
Humanitarian Response. arXiv preprint [37] K. Won, Y. Jang, H.-d. Choi, and S. Shin. Design and
arXiv:2003.11523, 2020. Implementation of Information Extraction System for
[24] J. Pennington, R. Socher, and C. D. Manning. GloVe: Scientific Literature Using Fine-tuned Deep Learning
Global Vectors for Word Representation. In Models . ACM SIGAPP Applied Computing Review,
Proceedings of the 2014 conference on empirical 22(1):31–38, 2022.
methods in natural language processing (EMNLP), [38] L. Xie, T. Duan, and K. Shimada. SAGA-Net:
pages 1532–1543, 2014. Shape-Assisted Graph Attention Neural Network for
[25] T. Pires, E. Schlinger, and D. Garrette. How Real-time Pointcloud Completion. ACM SIGAPP
multilingual is Multilingual BERT? arXiv preprint Applied Computing Review, 22(2):21–33, 2022.
arXiv:1906.01502, 2019. [39] Z. Yang, Z. Dai, Y. Yang, J. Carbonell,
[26] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, R. Salakhutdinov, and Q. V. Le. XLNet: Generalized
M. Matena, Y. Zhou, W. Li, and P. J. Liu. Exploring Autoregressive Pretraining for Language
the Limits of Transfer Learning with a Unified Understanding. arXiv preprint arXiv:1906.08237,
Text-to-Text Transformer. arXiv preprint 2019.
arXiv:1910.10683, 2019. [40] H. M. Yohannes and T. Amagasa. Named-Entity
[27] E. F. Sang and F. De Meulder. Introduction to the Recognition for a Low-resource Language using
CoNLL-2003 Shared Task: Language-Independent Pre-Trained Language Model. In Proceedings of the
Named Entity Recognition. arXiv preprint 37th ACM/SIGAPP Symposium on Applied
cs/0306050, 2003. Computing, pages 837–844, 2022.
[28] H. Sintayehu and G. Lehal. Named entity recognition: [41] T. Young, D. Hazarika, S. Poria, and E. Cambria.
a semi-supervised learning approach. International Recent Trends in Deep Learning Based Natural
Journal of Information Technology, 13(4):1659–1665, Language Processing. IEEE Computational
2021. intelligence magazine, 13(3):55–75, 2018.
[29] S. Strassel and J. Tracey. LORELEI Language Packs:
Data, Tools, and Resources for Technology
Development in Low Resource Languages. In
Proceedings of the Tenth International Conference on

ABOUT THE AUTHORS:
Hailemariam Mehari Yohannes is a Ph.D. student of Systems and Information

Engineering at University of Tsukuba and a research assistant at National Institute of
Advanced Industrial Science and Technology (AIST). He received his master’s
degree in Computer Science from Mekelle University. His main research interest
includes Natural Language Processing and data augmentation.
Toshiyuki Amagasa received B.E., M.E., and Ph.D. from the Department of
Computer Science, Gunma University in 1994, 1996, and 1999, respectively. He is
currently a full professor at the Center for Computational Sciences (CCS) and the
Center for Artificial Intelligence Research (C-AIR), University of Tsukuba. His
research interests cover database systems, data mining, and database application in
scientific domains. He is a senior member of IPSJ, IEICE, and IEEE, a board
member of DBSJ, and a member of ACM.

NER For

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

NER For

Uploaded by

Copyright:

Available Formats

A Method of Named Entity Recognition for Tigrinya

Hailemariam Mehari Yohannes Toshiyuki Amagasa

ABSTRACT Despite its popularity in population, the current status of

APPLIED COMPUTING REVIEW SEPT. 2022, VOL. 22, NO. 3 56

APPLIED COMPUTING REVIEW SEPT. 2022, VOL. 22, NO. 3 57

Table 2: Inter-annotator agreement for our dataset calculated

APPLIED COMPUTING REVIEW SEPT. 2022, VOL. 22, NO. 3 58

APPLIED COMPUTING REVIEW SEPT. 2022, VOL. 22, NO. 3 59

4.3 TigRoBERTa Language Model

APPLIED COMPUTING REVIEW SEPT. 2022, VOL. 22, NO. 3 60

and found that it contained documents from different lan-

4.3.3 Model training 1. The pre-trained TigRoBERTa language model is fine-

APPLIED COMPUTING REVIEW SEPT. 2022, VOL. 22, NO. 3 61

Table 6: Evaluation of CNN-BiLSTM-CRF using differ-

APPLIED COMPUTING REVIEW SEPT. 2022, VOL. 22, NO. 3 62

Conditional Random Field (CRF). The CRF [14] is a class

5.2 Experimental Results

5.2.1 Baseline Model Result

APPLIED COMPUTING REVIEW SEPT. 2022, VOL. 22, NO. 3 63

For this experiment, we evaluated TigRoBERTa on V1 and

APPLIED COMPUTING REVIEW SEPT. 2022, VOL. 22, NO. 3 64

dataset (V2), mentioned in section 3, to evaluate our model.

APPLIED COMPUTING REVIEW SEPT. 2022, VOL. 22, NO. 3 65

APPLIED COMPUTING REVIEW SEPT. 2022, VOL. 22, NO. 3 66

APPLIED COMPUTING REVIEW SEPT. 2022, VOL. 22, NO. 3 67

Hailemariam Mehari Yohannes is a Ph.D. student of Systems and Information

APPLIED COMPUTING REVIEW SEPT. 2022, VOL. 22, NO. 3 68

You might also like