You are on page 1of 16

10/15/2022

HiLCoE School of Computer Science and


Technology

Predicting Ethoipian Forcibly Displaced and Stateless Persons based on


Data Collected by UNHCR

Group Members
1 Abdi Kitesa
2 Abrham Getachew
3 Admas Teshome
4 Bruk Zerihun
5 Habtamu Workineh

Submited to Dr Eyob N.Alemu(PHD)


Table of Contents

Background.................................................................................................................................................2
Problem Statement......................................................................................................................................2
Objective of the project...............................................................................................................................2
Methodology...............................................................................................................................................3
Proposed Solution........................................................................................................................................4
Evaluation...................................................................................................................................................5
Conclusion and Recommendation.............................................................................................................10
References.................................................................................................................................................11
Annex........................................................................................................................................................12

1
Background

There are many Ethiopian That forcibly displaced and stateless persons in Different Countries
Asking for Asylum in many Norwegian refugee council’s international displacement monitoring
centers –IDMC, more than 5.1 million new displacements in Ethiopia were triggered by conflict
and violence. Ethiopia set a world record for displacement in a single year with three times more
displacements recorded in 2020. The UNHCR, UN Refugee Agency alongside Ethiopia
authorities are providing emergency aid to thousands of Eritrean refugees who have fled after an
attack on the camp in the Afar region.

Problem Statement

The number of Ethiopians emigrating country to fell to other countries fluctuates depending on
internal Political, Economic, and Social Situations. The model would help to zoom in on the
destinations with the highest reporting of Ethiopian Asylum seekers and assist as input for
humanitarian, security, and Political affairs of the country.

Objective of the project

The objective of this project is to design a predictive model for Ethiopian forcibly displaced and
stateless persons in Different Countries.

2
Methodology

This project is designed for data mining to design a model for predicting reporting of asylum of
people of Ethiopian origin across different countries.
The data set was cut down to years from 2012 -2021.
The reporting was classified as less reporting (100 – 17096), Medium reporting (17097 – 34194),
and High reporting (above 34195). This is because the highest reporting recorded within the 10
years is 51288.
We performed data cleaning by removing attributes not relevant to our model from the original
data(country of origin, country code, asylum report station, etc..)
Based on the attribute selection output from weka, we selected attributes ranked > 0.2 (with
exception of Year & Country of Asylum). And here it is as follows:
Dataset
This dataset consists of several specifications (features or fields) and includes a large number of
transactions (records). The features in the dataset are described below:
 Year
 Country Origin
 Country Asylum nominal
 Gender
 Population
 Age Numeric
 Location
 Accommodation Type
 Urban or ruler

Data Exploration and preprocessing


The set of transactions in this data set equals 16,590 transactions. The number of features is also
9. The project aimed to propose a model that uses classification techniques in data mining to
assess the dataset.
Then, data were collected, prepared, and pre-processed, and then, the J48 Decision Trees, K-
nearest, and Neural Network are used on the dataset. Data preprocessing is one of the most
important activities in data mining.

3
The set of operations performed here for preprocessing data are as follows: Removing some
transactions with missing values, and converting the values of some fields such as age, financial
balance, and day from categorical to numerical. The set of credit risk assessment features of the
credit unions customers are described in the table below:

Datasets Age Type


Female 12-17
Female 18-59 Numerical
Female Unknown Numerical
Male 12-17 Numerical
Male 18-59 Numerical
Male unknown Numerical
Country of Asylum
Year Numeric

Proposed Solution

 The proposed model of this project is developed after we’ve analyzed our datasets we
used Weka software to preprocess the data a trained and evaluated the data using three
Algorithms, K-nearest, Decision Tree, and Neural Network, and for their ability to
predict.

Forcibly displaced and


Stateless Persons Dataset

Data Exploration
4
Data Preprocessing

model using data mining techniques

Ethiopian Forcibly displaced and


Stateless persons Model

Fig 1. The Proposed Model

Evaluation

K-nearest neighbors algorithm


k-nearest neighbors algorithm, also known as KNN or k-NN, is a non-parametric, supervised
learning classifier, which uses proximity to make classifications or predictions about the
grouping of an individual data point. While it can be used for either regression or classification
problems, it is typically used as a classification algorithm. An evaluation of the Ethiopian
Forcibly displaced and Stateless person algorithm is described as follows.

Algorithm Accuracy Precision Recall

5
K-nearest 0.999 0.99 0.998
neighbors

K-nearest Algorithm
100 99.9
99.8
99.8

99.6

99.4

99.2
99
99

98.8

98.6

98.4
Accuracy Percision recall

Series 1 Series 3

J48 Algorithm
A decision tree is a non-parametric supervised learning algorithm, which is utilized for both
classification and regression tasks. It has a hierarchical, tree structure, which consists of a root
node, branches, internal nodes, and leaf nodes. An evaluation of the Ethiopian Forcibly displaced
and Stateless person algorithm is described as follows

Algorithm Accuracy Precision Recall

J48 100% 0.99 0.99

6
J48Decision Tree
100%
90%
80%
70%
60%
50% 100 99 99.99
40%
30%
20%
10%
0%
Accuracy Percision Recall

Series 1 Series 3

Neural Network

Neural networks, also known as artificial neural networks (ANNs) or simulated neural networks
(SNNs), are a subset of machine learning and are at the heart of deep learning algorithms. Their
name and structure are inspired by the human brain, mimicking the way that biological neurons
signal to one another. Artificial neural networks (ANNs) are comprised of a node layer,
containing an input layer, one or more hidden layers, and an output layer.
Algorithm Accuracy Precision Recall

Neural 99.5 0.996 0.995


Network

7
Neural NETWORK

100%
90%
80%
70%
60%
50% 99.5 99.6 99.5
40%
30%
20%
10%
0%
Accuracy precision recall

Series 1

We’ve identified earlier that we have a dataset and so we need to make sure we’re using the
appropriate evaluation metrics for our case. For this reason, we’ll be looking at the
common Accuracy metrics. To illustrate why this is the case, accuracy calculates the ratio of
total truly predicted values to the total number of input samples, meaning that our model would
get pretty high accuracy by predicting the majority class but would fail to capture the minority
class, default, which is not good. This is why the evaluation metrics that we’ll be focusing on to
assess the classification performance of our models are Precision, Recall.
Firstly, Precision gives us the ratio of true positives to the total positives predicted by a classifier
where positives denote default cases in our context. Given that they’re the high class in our
dataset, we can see that our models do a good job at correctly predicting those minor instances.
Moreover, Recall, true positive rate, gives us the number of true positives divided by the total
number of elements that actually belong to the positive class.

8
We selected nonlinear algorithms (specifically Neural Network, K-Nearest Neighbors and
Decision tree.J48.
Based on the experiment result (a 10 fold cross validation) the effective algorithm found
Decision Tree (J48) was found to be the effective algorithm (see screenshot titled Fig.
Experiment result of Selected Algorithms)
We tHen removed some data from the dataset to see if the model is effective or not.

Comparision of Algorithms
100.1

100

99.9

99.8

99.7

99.6

99.5

99.4

99.3

99.2
Neural Network k-Nearest Decision Tree

Accuracy Percision Recall

9
Conclusion and Recommendation

Conclusion
 To sum up, we’ve analyzed and pre-processed our data, and trained and evaluated three
Algorithms, A decision tree model for the prediction of forcibly displaced populations
originating from Ethiopia was described in the paper.
 The model was created based UNHCR report collected between 2012 -2021. The
countries of reporting were classified as less reporting (100 – 17096), Medium reporting
(17097 – 34194), and High reporting (above 34195) based on the annual report.
 The dataset was first tested on three different algorithms (namely K-nearest, Decision
Tree, and Neural Network) to find the most suitable one.
 Based on the analysis, the Decision tree was found to be effective (Relative Absolute
Error 3.155%, Root Relative Square Error 19.6884%)
 Evaluation of the test set showed high Relative Absolute Error (140.5%) & Root Relative
Square Error (240.941%) indicating further investigation on the preparation of the test
set.

Recommendation
 Even though results from this project are encouraging, further classification need using
different Datamining tools

10
References
 https://data.humdata.org/dataset/unhcr-population-data-for-eth

11
Annex

Attribute Selection

Performance of Decision Tree

12
Performance of K-Nearest

Performance of Neural Network

13
14
15

You might also like