Professional Documents
Culture Documents
Abstract
1. Introduction
2. Amharic Language
3. The Proposed Semantic Network Model
4. Experiment
5. Conclusion and Future Works
References
3. Objective: The main objective of the article is to build Automatic Amharic Semantic
Network using Amharic WordNet.
Amharic WordNet
▪ Composed of 890 single word terms (all are nouns) grouped into 296 synsets
(synonym groups). Which implies on synset can have three to four terms in it to
represent a single concept.
▪ Nouns are chosen for concepts extraction because most relation types are
detected between nouns. whereas Verbs and adverbs are used to show relations
between nouns since, they are relation indicators.
▪ Additionally synsets are further related with each other with Part-of, Type-of
and antonym relations.
Text Analysis and Indexing
▪ Removal of non-letter tokens and stop words from the corpus.
▪ Stemming of words will be performed (several words derived from the same
morpheme are considered in further steps as the same token) using stemmer
algorithm.
▪Stem is used as a term for indexing, which is performed by applying term
frequency inverse document frequency weighting algorithm.
Computing Term Vectors
▪ A sequence of term-weight (co-occurrence frequency of the term) pairs.
▪ WordSpace model uses random projection algorithm to create term vectors
semantically, and finally contains the list of term vectors found from the corpus
along with co-occurrence frequencies of each term
Concept Extraction
▪ Concepts are extracted by computing cosign similarity between term vectors from
WordSpace model and WordNet .Then top-k number of related concepts will be ranked.
Relation Extraction:
▪ The relations among concepts considered in this work are “part-of” and “type-of”
using semi supervised approaches to identify the relation
Implementation:
▪ Concept and relation extraction processes were also implemented using Java.
Source of information(Data):
▪ Walta Information Center and Ethiopian National News Agency
Test Result
▪ Average accuracy of the system to extract the “type-of“ and “part-of“ relations between
concepts (synsets) from free text corpus is 68.5% and 71.7%, respectively.
Part of the Amharic semantic network automatically constructed by the proposed system
5. My view as strength:
▪ To make the Amharic WordNet they use 3-4 terms for a single synset to represent
a single concepts and it is grate part of it.
6. My view as gap
▪ The Amharic WordNet used is composed of 890 single word terms (all are nouns)
grouped into 296 synsets. From the complex nature of the language those terms
are small in number.
▪ using Java to extract Concepts and relations extraction implementation is very difficult.