You are on page 1of 16

Bahir Dar University

Bahir Dar Institute of Technology

School of Research and Postgraduate Studies

Faculty of Computing

Department of Computer Science

A Research proposal on:


Predicting Wealth Status of Households in Ethiopia: Using
Ensemble Machine Learning Techniques
By Group member
NAME ID-NUMBER
1. Abraham Alamirew………….BDU1500322
2. Yeamlaksra Degu……………BDU1500290
3. Lijalem Gezahegn…………....BDU1500273
4. Hailemariyam Mulualem….…BDU1500305

Submitted to-Tesfa T (Ph.D.)

February, 2023
Bahir Dar, Ethiopia
Acronyms
UI- User Interface

RF- Random Forest

ML- Machine learning

AI- Artificial Intelligence

US$- United States Dollar

PL- Programming Language

AdaBoost- Adaptive boosting

XGBoost- Extreme Boosting

GDP- Gross Domestic Product

PPP- Purchasing Power Parity

EDHS- Ethiopian Demographic and Health Survey

FDRE- Federal Democratic Republic of Ethiopia

i
Contents
Acronyms ....................................................................................................................................................... i
1. Introduction .......................................................................................................................................... 1
1.1. Background of the study ............................................................................................................... 1
1.2. Statement of the Problem ............................................................................................................ 2
1.3. Objective of the study ................................................................................................................... 4
1.3.1. General objective .................................................................................................................. 4
1.3.2. Specific objectives ................................................................................................................. 4
1.4. Research methodology ................................................................................................................. 4
1.4.1. Literature review ................................................................................................................... 4
1.4.2. Research design .................................................................................................................... 5
1.4.3. Data collection and preprocessing ........................................................................................ 5
1.4.4. Algorithm selection ............................................................................................................... 5
1.4.5. System development tools ................................................................................................... 6
1.4.6. Design and development ...................................................................................................... 7
1.4.7. Demonstration ...................................................................................................................... 7
1.4.8. Evaluation.............................................................................................................................. 7
1.4.9. Communication ..................................................................................................................... 7
1.5. Scope and limitation of the study ................................................................................................. 8
1.6. Significance of the study ............................................................................................................... 8
1.7. Work Detail (work breakdown & timeline, Budget) ..................................................................... 9
1.7.1. Time Schedule ....................................................................................................................... 9
1.7.2. Budget plan ......................................................................................................................... 10
Summary ..................................................................................................................................................... 11
References .................................................................................................................................................. 12

ii
Tables
Table 1 Time schedule .................................................................................................................................. 9
Table 2 Budget plan .................................................................................................................................... 10

iii
1. Introduction
1.1. Background of the study
Wealth is an accumulation of valuable economic resources that can be measured in terms of
either real goods or money value [1]. Specific people, organizations, and nations are said to be
wealthy when they can accumulate many valuable resources or goods. Wealth is the measure of a
person, community, company, or country's total value of assets, including both physical and
intangible assets [1]. Wealth can be contrasted with income in that wealth is a stock and income
is a flow, and it can be seen in either absolute or relative terms [1]. Today, evidence reveals that
there is a wide gap between the richest and the poorest in most economies [2].

Based on the Federal Democratic Republic of Ethiopia (FDRE) poverty assessment, in 2000
Ethiopia had one of the highest poverty rates in the world, with 56% of the population living on
less than US$1.25 PPP a day [3]. Ethiopian households experienced a decade of remarkable
progress in wellbeing since then and by the start of this decade less than 30% of the population
was counted as poor. In 2011 less than 30% of the population lived below the national poverty
line and 31% lived on less than US$1.25 PPP a day [3].

We believe that, the average household in Ethiopia also has better health, education and living
standards today than in 2000. But this assumption needs to be supported by evidences. Now a
days due to political instability in Ethiopia many households near the area of wars even doesn’t
have fulfill their basic needs. In addition to political problems, Ethiopia was passed through
natural weather shocks like drought in many places. Those natural and political instabilities will
lead to economical reduction of individual households and the whole Gross Domestic Product
(GDP) of country.

Additionally, life expectancy is also directly related to the wealth status of the society. FDRE
poverty assessment states that in 2000 life expectancy of households in Ethiopia was 52 years.
Due to various socio-economic enhancements in 2011 the report was increased by 11 years (life
expectancy of 63) [3].

The above findings are updated through time and needs extra researches. Knowing the wealth
status of households will solve many problems for government in order to update their strategies
of poverty reduction for targeted populations.

1
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that
'learn', that is, methods that leverage data to improve performance on some set of tasks [4].
Machine learning algorithms build a model based on sample data, known as training data, in
order to make predictions or decisions without being explicitly programmed to do so [4]. In the
area of Machine Learning combining two or more algorithms will provide better classification
accuracy than a single model. Because, by combining multiple base learners will give an
improved performance.

The researchers plan to do a predictive model of classifying households’ wealth status based on
Ethiopian Demographic and Health Survey (EDHS) dataset. On this dataset we hope that
applying ensemble Machine Learning algorithms will generate good output.

The dataset covers nine regions and two city administrations of the entire area in Ethiopia (both
rural and urban). Hence, it is more representative to the country level. Then, research
methodologies should be applied in to this dataset in order to get the required result.

1.2. Statement of the Problem


Governments and humanitarian agencies offer considerable resources towards poverty and mal-
nutrition reduction efforts. One key factor in the effectiveness of such efforts is the accuracy with
which poor populations can be identified. Accurate identification of poor or malnourished
populations in space and time serves multiple purposes for their next plan of support.

In Ethiopia context, there is a big wealth difference between very poor and very rich households.
Some of them lives in comfort by biding so many expenditures because of their massive wealth.
On the other hand, a lot of people can’t even cover their basic needs. Eating two times a day is a
big deal for many citizens. Getting nearby health services, clear water, schools and transportation
is not possible for many of them especially in rural areas.

Usually, we try to classify someone as poor or rich based on some wealth status indicators. But
these trends are more vulnerable to subjective matter of predictors and differ from person to
person.

Governments also try to extract poorest citizens from the richest one in order to give aids,
guidance and economical support. Their classification is mainly made by oral categorization of
households with respect to the determinant factors of wealth status. This led to miss-

2
classifications and discriminate between poor and rich citizens for their economic aids and
benefits. As a result, supporting vulnerable households will be very difficult. So, knowing real
wealth status of households will help governments to support low-level citizens in any sector.
This will have great role to change individuals’ life-style as a single household and for poverty
reduction as a whole country.

For example, governmental and non-governmental organizations provide funds for food-insecure
households in any time and place of residence especially risky families in the areas of war on
North-Ethiopia. In order to give such aids appropriate and accurate knowledge of wealth status
should be required to facilitate fair distribution of resources.

Knowing wealth status for individuals is also very important; because, it can help households to
identify areas where they spend too much money. Just because of able to afford something
doesn't mean that have to buy it. Finally, it will initiate households to work hard and go to the
next level of development.

With time, technology becomes more robust and able to solve many real-world problems. A
years ago, many repetitive and error-prone tasks were solved and managed by various
programming languages (PL) of developed applications. Those applications were doing their
tasks by using pre-programmed instructions or algorithms.

Now a days, the era of Artificial Intelligence (AI) becomes more popular by creating models
based on the idea of human brain. This will give a machine extra power of human behaviors like
classification, prediction and decision-making capabilities through learning from previous
experiences without the intervention of human beings. Machine Learning is one part of Artificial
Intelligence to do such things with better performance.

In order to know the wealth status of households developing predictive model using ensemble
Machine Learning (ML) algorithms is very important. In addition to predict the wealth status,
knowing the determinant factors of economic status will help to focus on the gaps led to poverty
and backwardness.

3
To this end the researcher will answer the following research questions: -

 What are the determinants attributes to predict wealth status of households in Ethiopia?
 Which ensemble Machine Learning algorithm is best performed for the prediction of
households’ wealth status in Ethiopia?
 How much the predictive model accurately identifying the wealth status of households?

Research hypotheses

 Wealth status of household related with life expectancy


 The existence of car determine the wealth status of household
 The existence of car determine the wealth status of household
 The type of house dwelling determines wealth status
 Religion does not determine the wealth status of household

1.3. Objective of the study


1.3.1. General objective

The general objective of this research proposal is to predict the wealth status of households in
Ethiopia using Ensemble Machine Learning techniques.

1.3.2. Specific objectives

In order to achieve the general objective, the following specific objectives are identified
 To review appropriate literatures on the area of proposed study
 To identify determinant attributes from the dataset
 To select best predictive ensemble Machine Learning models
 To build, train and evaluate those selected predictive models
 To design user interface (UI) for the predictive model and deploy the best model

1.4. Research methodology


Research methodology explains how you plan to put your research into practice and explains
why you think this is the best approach. It's a plan you intend to follow to complete your research
and come to a conclusion [5].

1.4.1. Literature review

4
In order to get a good understanding of predictive ensemble Machine Learning techniques,
relevant documents should be reviewed: books, journal articles, previous related research works
and electronic publication on the Internet could be consulted to obtain a detailed understanding
of state-of-the-art (current techniques and technologies with respect to the proposed system).
An in-depth literature review will make to get more insight into the concept of wealth status
prediction especially related to individual households from various sources.

1.4.2. Research design

This research proposal will follow the Design Science (DS) Approach of problem-solving
mechanism in order to accomplish its objective.
Design science is a research paradigm focusing on the development and validation of
prescriptive knowledge [6].

1.4.3. Data collection and preprocessing

One of the basic steps for research work is believed to be data collection. So, the dataset should
be collected for the experimentation purpose. This research will be done based on EDHS dataset.
The initial dataset taken from EDHS should be converted to Machine Learning-based format and
this will require various data pre-processing techniques.
Since the researchers is willing to use ensemble Machine Learning algorithms, the collected
datasets are labeled and determinant attribute used to predict wealth status should be filtered out
from the whole dataset.
The original SPSS (.dta) file will be converted to comma separated file (CSV) for further and
easier preprocessing and model training.
There will be so many other types of data preprocessing activities to be done like data cleaning,
filling missing values, data transformation, handling class imbalance and selecting important
features.

1.4.4. Algorithm selection

An algorithm is a set of instructions used to solve problems based on the understandability of


available alternatives. Algorithms are specifications for performing calculations, data processing,
automated reasoning, or decision-making [4].

5
Over the last couple of decades, combining multiple classifier systems, also called ensemble
systems have enjoyed growing attention within the computational intelligence and machine
learning community. This attention has been well deserved, as ensemble systems have proven
themselves to be very effective and extremely versatile in a broad spectrum of problem domains
and real-world applications.
Originally it was developed to reduce variance; then improving the accuracy of an automated
decision-making system, ensemble methods have since been successfully used to address a
variety of ML problems, such as feature selection, confidence estimation, missing feature,
incremental learning, error correction, class imbalanced data, learning concept drift from non-
stationary distributions, among others.
There are many real world examples to support the idea and benefit of ensemble ML algorithms
like: the essence of democracy where a group of people vote to make a decision, consulting with
several doctors before agreeing to a major medical operation, reading user reviews before
purchasing an item, calling references before hiring a potential job applicant, even peer review of
this research proposal prior to approval...will address the power of combination.
For this research proposal work advanced Ensemble Machine Learning algorithms named:
Random Forest (RF), Adaptive boosting (AdaBoost) and Extreme Gradient boosting (XGBoost)
will be applied on a preprocessed dataset. Various research works visualize that, those selected
algorithms have a great performance for classification and prediction purposes because of their
bagging and boosting capabilities. It will enhance the performance of the model by combining
multiple base learners and errors will be minimized by reducing variance and bias during
experimentation.

1.4.5. System development tools

From the beginning to the completion of this research proposal various main and supportive tools
will be used for data collection, data analysis and for the experimentation purpose. Among this,
Microsoft applications like MS-word is used for documentation; MS-excel for the easy use of
textual dataset as CSV file and for filtering out determinant attributes for predicting wealth status
of households. IBM SPSS Statistics will be used to read the original dataset sent from EDHS and
to make quantitative analysis of data.

6
Python anaconda 3 will be used for converting SPSS files in to their equivalent CSV file as well
as applicable for many other purposes. After the dataset is preprocessed and ready for
experimentation, Anaconda with special features and packages will be used for the model
development, training and testing purposes.

1.4.6. Design and development

The design and development process will involve designing and developing a model that
forecasts the Wealth Status of Households. It will follow the design of an efficient model that
solves the problem defined and fulfills the objectives. We will use data gathered in different
years of the EDHS with experts to have a general understanding. To develop the model we will
use different tools and techniques to achieve the goal of this research proposal objective by using
pre-processed data. After the compilation of data and preparation using data preprocessing
techniques, the next step is building a predictive model.

1.4.7. Demonstration

We will demonstrate how to use the design prototype that integrates the development model to
solve an unseen instance of the problem. This involves its use in prediction or other appropriate
activities to use the model to solve the problem. The design prototype accepts user input via GUI
and then displays the model's Wealth Status of Households.

1.4.8. Evaluation

We will evaluate the performance of our model by using both objective-based and subjective-
based evaluation metrics. For objective evaluation metrics like accuracy, subjective evaluations
prepare user acceptance questionnaires that will be evaluated by the domain experts to determine
the importance of the mode.

1.4.9. Communication

We will summarize the test results and present conclusions and recommendations from the test
results. The end-user will interact with the model by using the designed interface. We will plan
to communicate the study to the user in the form of a journal article or conference paper for
publication.

7
1.5. Scope and limitation of the study
The coverage of this research proposal is limited to investigate the possibility of designing
predictive model of knowing wealth status of households in Ethiopia using ensemble machine
learning approaches. The research proposal cannot predicate the wealth status of households
based on daily income. We obtain the dataset from Ethiopian Demographic and Health survey
(EDHS) repository collected by Ethiopian Statistical Agency (ESA) starting from 2011 to 2019.
To train, test and analyze the results, only ensemble Machine Learning Algorithms will be used.

1.6. Significance of the study


This research proposal work worth better results in the prediction of households’ wealth status
using Ensemble Machine Learning techniques. There is no previous work was done in Ethiopian
households using selected methodologies. So, the results of the research can be used as an input
to the next researchers in the area of wealth status prediction for specific place of areas in
particular and for the whole country level in general.

Wealth status prediction can play an important role for Governmental and Non-governmental
organizations (NGOs). Real wealth status of households will be significant for governments to
decide economic strategies.

Additionally, due to the current political instability (wars) in the north part of Ethiopia and
drought harmed societies in different areas of low-level citizens should be exactly filtered from
the whole population in order to give aids and economic support.

NGOs like humanitarian agencies want precise households’ economic status in order to fill gaps
of interest by providing available service for available time and space. At the end, this research
proposal work will filter out the determinant critical factors affecting wealth status of households
in Ethiopia and this will identify the main focal points of poverty reduction.

8
1.7. Work Detail (work breakdown & timeline, Budget)
1.7.1. Time Schedule

It describes the estimation of how long the study will take to complete and it can complete in a
given period of time. It expresses communication plan of the study. It will tell in what time to
perform the task of each activity and visualizes the length or duration of time to do each and
every activity.
Table 1 Time schedule

Activities December January February March April

Writing
proposal
Literature
review
Data collection

Data
preprocessing

Data analysis

Model
development
and
Evaluation
Result &
Conclusion
Final
submission

9
1.7.2. Budget plan

Table 2 Budget plan

Unit Price Total


Price Remar
No Tasks Items Unit Quan (Estimated)ETB
(ETB) k
tity
1 Proposal Paper Pack 1 500 500
Develop Printing Pages 90 3 270
ment Pen Number 5 10 50
Computer Number 1 20,000 20,000
Bindings Number 1 30 30
Total =20,850
2 Literature Paper Pack 1/2 250 250
Review
Pen Number 5 10 50
Total=300
3 Data Paper Pack 1 500 500
Analysis
&Organizati Pen Number 10 5 50
on Printing Pages 200 3 600
Flash16 Number 1 250 250
GB
Total =1400
Grand total=22,550

10
Summary
Wealth is an accumulation of valuable economic resources that can be measured in terms of
either real goods or money value. Wealth is the measure of a person, community, company, or
country's total value of assets, including both physical and intangible assets. In Ethiopia context,
there is a big wealth difference between very poor and very rich households. Some of them live
in comfort by biding so many expenditures because of their massive wealth. On the other hand, a
lot of people can’t even cover their basic needs. Eating two times a day is a big deal for many
citizens. Getting nearby health services, clear water, schools and transportation is not possible for
many of them especially in rural areas. Governments also try to extract poorest citizens from the
richest one in order to give aids, guidance and economical support. Their classification is mainly
made by oral categorization of households with respect to the determinant factors of wealth
status. this led to miss-classifications and discriminate between poor and rich citizens for their
economic aids and benefits. As a result, supporting vulnerable households will be very difficult.
So, knowing real wealth status of households will help governments to support low-level citizens
in any sector. In order to know the wealth status of households developing predictive model
using ensemble Machine Learning algorithms is very important. In addition to predict the wealth
status, knowing the determinant factors of economic status will help to focus on the gaps led to
poverty and backwardness.

Keywords- Wealth, income, life expectancy, and accumulation

11
References
[1] “Understanding Wealth: How Is It Defined and Measured?,” Investopedia.
https://www.investopedia.com/terms/w/wealth.asp#toc-how-to-measure-wealth (accessed Feb.
05, 2023).
[2] E. O. Oyedepo, O. R. Lasabi, and ..., “Determinants of wealth status among rural and
urban households in Nigeria,” J. Stud. …, no. December, 2019, [Online]. Available:
https://www.researchgate.net/profile/Elizabeth-
Oyedepo/publication/357093247_Determinants_of_wealth_Status_among_Rural_and_Ur
ban_Households_in_Nigeria/links/61bb551963bbd932429877b4/Determinants-of-wealth-
Status-among-Rural-and-Urban-Households-in-Nigeria

[3] C. Mugnier, “Federal democratic Republic of Ethiopia,” Photogramm. Eng. Remote


Sensing, vol. 69, no. 3, p. 213, 2003, doi: 10.1007/978-3-030-42088-8_11.
[4] “what is Machine learning - Yahoo Search Results,” search.yahoo.com.
https://search.yahoo.com/search?fr=mcafee&type=E211US91213G0&p=what+is+Machine+lear
ning (accessed Feb. 05, 2023).

[5] “what is Research methodology - Yahoo Search Results,” search.yahoo.com.


https://search.yahoo.com/search?fr=mcafee&type=E211US91213G0&p=what+is+Research+met
hodology (accessed Feb. 06, 2023).
[6] “Design science (methodology),” Wikipedia, Mar. 04, 2020.
https://en.wikipedia.org/wiki/Design_science_(methodology)

12

You might also like