You are on page 1of 3

1.

About the Paper


Author (s) of the Article Alelgn Tefera(Jigjiga University), Yaregal Assabie(Addis Ababa
University)
Year of Article 2014
Article Title Automatic Construction of Amharic Semantic Networks
From Unstructured Text Using Amharic WordNet
Publisher: University of Tartu Press

2. About the Paper Organization

Abstract

1. Introduction
2. Amharic Language
3. The Proposed Semantic Network Model
4. Experiment
5. Conclusion and Future Works
References

3. Objective: The main objective of the article is to build Automatic Amharic Semantic
Network using Amharic WordNet.

4. Main Concepts in Each section:


 INTRODUCTIONThey define terms which clarifies there research
A semantic network is a network, which represents semantic relations among
concepts, and those concept relations can be used to represent knowledge.
Concepts are the abstract representations of the meaning of terms.
Concepts are the abstract representations of the meaning of terms
The relations between concepts in semantic networks are
 synonym (similar concepts),
 antonym (opposite concepts),
 meronym/holonym (“part-of” relation between concepts), and
 hyponym/hypernym (“type-of” relation between concepts).
Semantic networks uses for:
 Search engines to search not only for the key words given by the user but
also for the related concepts, and show how this relation is made.
 Document summarization by compressing the data semantically and for
document classification using the knowledge stored in it.
Common approaches to construct semantic networks automatically are:
 Knowledge basedconcepts are extracted using a thesaurus (book with
similar meaning) in a supervised manner.
 Corpus-basedConcepts are extracted from a large amount of text in a semi-
supervised method and
 hybrid approaches combines both

The Proposed Semantic Network Model:


Has the following major components:
▪ Amharic WordNet
▪ Text analysis and indexing
▪ Computing term vectors
▪ Concept extraction, and
▪ Relation extraction
How does it work?
1. Index terms representing text corpus are extracted.
2. Term vectors are then computed from the index file and stored using WordSpace
model.
3. By searching the WordSpace, semantically related concepts are extracted for a given
synset in the Amharic WordNet.
4. Finally, relations between those concepts in the intervening word patterns are extracted
from the corpus using pairs of concepts from Amharic WordNet.
The following architectural model can clarify the whole process-

Amharic WordNet
▪ Composed of 890 single word terms (all are nouns) grouped into 296 synsets
(synonym groups). Which implies on synset can have three to four terms in it to
represent a single concept.
▪ Nouns are chosen for concepts extraction because most relation types are
detected between nouns. whereas Verbs and adverbs are used to show relations
between nouns since, they are relation indicators.
▪ Additionally synsets are further related with each other with Part-of, Type-of
and antonym relations.
Text Analysis and Indexing
▪ Removal of non-letter tokens and stop words from the corpus.
▪ Stemming of words will be performed (several words derived from the same
morpheme are considered in further steps as the same token) using stemmer
algorithm.
▪Stem is used as a term for indexing, which is performed by applying term
frequency inverse document frequency weighting algorithm.
Computing Term Vectors
▪ A sequence of term-weight (co-occurrence frequency of the term) pairs.
▪ WordSpace model uses random projection algorithm to create term vectors
semantically, and finally contains the list of term vectors found from the corpus
along with co-occurrence frequencies of each term
Concept Extraction
▪ Concepts are extracted by computing cosign similarity between term vectors from
WordSpace model and WordNet .Then top-k number of related concepts will be ranked.
Relation Extraction:
▪ The relations among concepts considered in this work are “part-of” and “type-of”
using semi supervised approaches to identify the relation
Implementation:
▪ Concept and relation extraction processes were also implemented using Java.
Source of information(Data):
▪ Walta Information Center and Ethiopian National News Agency
Test Result
▪ Average accuracy of the system to extract the “type-of“ and “part-of“ relations between
concepts (synsets) from free text corpus is 68.5% and 71.7%, respectively.

Part of the Amharic semantic network automatically constructed by the proposed system

5. My view as strength:
▪ To make the Amharic WordNet they use 3-4 terms for a single synset to represent
a single concepts and it is grate part of it.

6. My view as gap
▪ The Amharic WordNet used is composed of 890 single word terms (all are nouns)
grouped into 296 synsets. From the complex nature of the language those terms
are small in number.
▪ using Java to extract Concepts and relations extraction implementation is very difficult.

You might also like