You are on page 1of 22

Assosa University

College of Computing and Informatics


Department of Computer Science

Research Proposal on:


Land Change Prediction using Machine Learning Technique in the case of

Dangure Wereda

Prepared by:

Principal-Investigator:

Name: Seiyfu Yesuf (MSc. in Computer Science)


Address: E-mail: seiyfallah@gmail.com
Tel: +251920173376
Co-Investigator:
Name: Eshetu Gusare (MSc. in Computer Science)
Address: E-mail: harmeekoo2011@gmail.com
Tel: +251944211626

October 14, 2019


Assosa, Ethiopia
Table of Contents
List of Tables...............................................................................................................................................ii
List of Figures.............................................................................................................................................iii
Acronyms...................................................................................................................................................iv
1. INTRODUCTION...................................................................................................................................1
1.1 Background....................................................................................................................................1
1.2 Statement of the Problem.......................................................................................................................3
1.3 Objective........................................................................................................................................4
1.4 Scope and Limitation.....................................................................................................................5
1.5 Significance....................................................................................................................................5
2. Literature Review....................................................................................................................................6
3.1 Overview........................................................................................................................................6
3. Method and Material...............................................................................................................................7
3.1 Overview........................................................................................................................................7
3.2 Study Area.....................................................................................................................................7
3.3 Data/sample Collection.................................................................................................................8
3.4 data preprocessing......................................................................................................................10
3.5 Development Tools......................................................................................................................11
4. Project Schedule....................................................................................................................................13
4.1 TimeLine......................................................................................................................................13
5. Estimated Project Costs.........................................................................................................................14
5.1 Cost Breakdown...........................................................................................................................14
References.................................................................................................................................................16

i
List of Tables
Table5. 1 Per-diem Cost…………………………………………………………………………14
Table5. 2 Data collection Cost…………………………………………………………………...14
Table5. 3 Mobile Cost……………………………………………………………………………15

ii
List of Figures
Figure3. 1 Map of Study Area.........................................................................................................7

iii
Acronyms
BGRS Benishangule Gumuze Regional State

USGS United States Geographical Survey

GIS Geographical Information Systems

SPSS Stastical Product and Service Solutions

RS Remote Sensing

CBC Center of Biodiversity and Conservation

WEKA Waikato Environment for Knowledge Analysis 

GNU General Public License

iv
1. INTRODUCTION

1.1 Background

At a global scale, extensive conversion of native vegetation (forests and grasslands) to


agriculture to produce food for society has occurred over the last ten millennia although rates of
change in the last century have been unprecedented (Anderson. 2009). Recently, the human
species crossed an important threshold as more than half of us now live in cities. Studies have
shown that deforestation has a significant impact on local weather, greenhouse gas emissions in
the atmosphere (carbon dioxide), cloudiness and rainfall, etc. The quantifiable knowledge about
changes that occur in land cover and land use at a global scale is important to make effective
planning for conservation and sustainable use of natural resources such as forest cover and
agricultural land. Thus, land use cover change can be complex and the need to understand the
drivers of these changes at multiple spatial-temporal scales are among some of the most pressing
needs currently in environmental science research. One of the most pressing global
environmental change issues is climate change but its causes are not solely due to the burning of
fossil fuels. We now recognize that a significant amount of climate change, up to half, is due to
land use change. Due to the importance of land cover and land use change detection, it has been
a topic of active research area. The land cover change detection problem is to detect when the
land cover of a particular location has been converted from one type to another, i.e. conversion
of forested land to barren land due to agriculture, fires, droughts, insect damage, etc.

Environmental management, and land-use planning specifically, take place at different spatial
and organizational levels in Ethiopia, often corresponding with either eco-regional or
administrative units, such as the national or provincial level. The information needed and the
management decisions made are different for different locations. At the national level, it is often
sufficient to identify regions that qualify as “hot-spots” of land use change, i.e., areas that are
likely to be faced with rapid land use conversions. The land use changes and its impact on forest
resources can be analyzed using various conventional methods such as change detection study
for deforestation. Once these hot-spots are identified, a more detailed about the change and its
impact analysis are often needed. Using conventional analysis methods of statistical data better

1
solutions can be derived but it is tedious and time consuming process. In order to handle the
complex spatial data and derive strategic decisions from the knowledge obtained, a machine
learning remote sensing data processing techniques can be used. The effect of land-use changes
on natural resources can be determined by finding interrelationship among various factors using
remote sensing. Remote sensing data processing deals with real-life applications with great
societal values. For instance, urban monitoring, fire detection or flood prediction from remotely
sensed multispectral or radar images have a great impact on economic and environmental issues.
To treat efficiently the acquired data and provide accurate products, remote sensing has evolved
into a multidisciplinary field, where machine learning algorithms play an important role
nowadays.

This paper presents an idea that incorporates spatial predicates describing the spatial
relationships between land use patterns and surrounding factors which may cause deforestation.
A machine learning algorithm is implemented to realize knowledge discovery for predicting the
change. We will present the remote sensing image processing chain, and take the attendants on a
tour of different strategies for feature extraction, classification, retrieval, and pattern analysis for
remote sensing data analysis. We will present the powerful methodologies of supervised
classification, extracting knowledge from data, including classifiers that encode prior knowledge.

2
1.2 Statement of the Problem

Usually, forest plantation/ forest reserve is for a purpose or purposes. They usually experience
luxuriant growth and the resultant effect is that such trees intercept direct rain drops and prevent
it from having direct impact on the soil surface and climate, a situation which apart from
preventing erosion and stream flood, reducing evaporation and temperature. Forest when located
in catchments area regulates stream flow. This regulation ensures that lands on lower slopes are
protected from erosion and flooding and the silting of canals and rivers is minimized.
Unfortunately, forest reserved constituted for such purpose are being removed indiscriminately
in order to satisfy the guest for urbanization and farming activities at the present time. They are
being destroyed at an alarming rate that could potentially lead to many different types of
environmental catastrophe, not only in the local forest zones but globally. The greatest threat
comes from deforestation. Deforestation, clearance, clearcutting or clearing is the removal of
a forest or stand of trees from land which is then converted to a non-forest use. It can occur for
several reasons: trees can be cut down to be used for agriculture, building or sold as fuel
(sometimes in the form of charcoal or timber), while cleared land can be used
as pasture for livestock and plantation. The removal of trees without sufficient reforestation has
resulted in habitat damage, biodiversity loss, and aridity. It has adverse impacts on environment
and is a contributor to global warming, often cited as one of the major causes of the
enhanced greenhouse effect.

Remote sensing can be the basis of fast data collection and the analytical capabilities through
machine learning technique can be used for analyzing the types, location and rates of
deforestation. Machine Learning brings out the power of data in a new way. Working on the
development of computer programs that can access data and perform tasks automatically through
predictions and detections, and enables computer systems to learn and improve from experience
continuously. By classifying the forest and non-forest areas of 1990, 2000 and 2010 satellite
images and overlaying them, the changes were identified. To control and decrease the forest
degradation the government should know where, when, why and how such deforestation occurs
and what measures can be taken to address the problem. It would seem that technological
advances in remote sensing especially in the form of earth observing satellites, has made it easier

3
to the scientific community to analyze the impact on the environment as well as naturally
occurring changes using machine learning algorithms. The science and technologies of machine
learning technique could be a perfect method for solving the above problem.

Therefore, this research aims to present a general-purpose machine-learning-based framework


for predicting land change in the case of Dangur Woreda. In particular, we focus on the
development of a set of attributes which serve as an input to the model that could be reused for a
broad variety of problems. Specifically, the research helps to answer the following questions. 

a) What are the changes that have occurred?


b) How to identify the nature of the change?
c) What are the spatial patterns of the change?

1.3 Objective

1.3.1 General objective


The main objective of the study is to predict the land change and its factors using Machine
Learning technique, in Dangur Woreda, west Ethiopia for monitoring the changes and then
modeling the future land classes.

1.3.2 Specific objectives


 To gather data.
 To identify the forest plantation changes in Dangur from 1990 to 2000 using
LANDSAT datasets.
 To examine the specific human activity types responsible for the changes.
 To demonstrate the capabilities of Machine Learning in the area of image processing
and classification in the study.

4
1.4 Scope and Limitation

The aim of this paper is to predict the change upon land cover deforestation using machine
learning technique in the study area only; the study will not include other areas. Our work will
purely be using satellite images and validated by two different date satellite image data later on
1990s. The dataset will not consider before 1990s.

1.5 Significance

In general, the goal of predicting and analyzing in some problem area is to extract useful
information and find a way to ensure its solutions. So, forest degradation in Metekel Dangur, is
one of deforestation areas in Ethiopia. Deforestation as being practiced in this area present
multiple societal and environmental problems. The long term effect and consequence of this
deforestation are almost certain to jeopardize life. Some of the consequences may include
exposure of the catchments area that can lead to dryness and hotness. Therefore, conducting a
study in this area brings the problem in to light; is one of the means to formulate
recommendations and to assist the regional land administrator making informed decisions on
taking measurement on current problem. To understand why deforestation is such a dangerous
practice and should be discontinued forth with, forest plantation must first be given credit for the
role they play or their impact on the ecosystem. There have been no more studies about the
research area; this study will provide a baseline for other studies as a reference for further
research activities. It will provide information necessary for managing and monitoring that is of
benefit to the country.

5
2. Literature Review

3.1 Overview

Change detection is the process of identifying differences in the state of an object or


phenomenon by observing it at different times (Salami, 2004). Change detection is an important
process in monitoring and managing natural resources and urban development because it
provides quantitative analysis of the spatial distribution of the population of interest. Macleod
and Congation (Kokolwin, 2005) list four aspects of change detection which are important when
monitoring natural resources:

a) Detecting the changes that have occurred


b) Identifying the nature of the change
c) Measuring the area extent of the change
d) Assessing the spatial pattern of the change

A remote sensing device records response which is based on many characteristics of the land
surface, including natural and artificial cover. An interpreter uses the element of tone, texture,
pattern, shape, size, shadow, site and association to derive information about land cover. Salami
(2004) noted that proper forest monitoring and management can only be achieved by using
remote sensing techniques and creating spatial representations such as maps to know the exact
locations and extent of deforestation.

The Center of Biodiversity and Conservation (CBC) had established the Remote Sensing and
Geographic Information System (RS/GIS) facilities. Its technologies have helped identify
potential survey sites, analyze deforestation rates in focal study areas, incorporate spatial and
non-spatial databases and create persuasive visual aids to enhance reports and proposals.

6
3. Method and Material

3.1 Overview

This chapter provides an overview of the detailed description of the research study, research
design, sample size, sampling techniques, variables, instrumentation, and procedures for data
collection, data analysis and interpretation. It also highlights the ethical considerations that
adhered to in the research.

3.2 Study Area

The BGRS has an estimated area of 51,000 square kilometers and shares common borders with
the State of Amhara in the east, the Sudan in the north-east, and the State of Oromia in the south.
It is divided into 3 administrative zones, 19 Weredas and 33 Kebeles (Aynalem, 2008).  Metekel
is the largest zone with an area of 26,272 square kilometers followed by Assosa and Kamashi.
The state has diverse topography and climate. The later includes the familiar traditional zones -
"kola", "dega", and "woyna dega". "About 75% of the State is classified as "kola" (law lands)
which is below 1500 meters above sea level. The altitude ranges from 550 to 2,500 meters above
sea level. The average annual temperature reaches from 20-250C. During the hottest months
(January - May) it reaches a 28 - 340C. Dangur is one of the town in the Metekel zone of BGRS
which lies on a longitude of 10°0o0°N 39°590 E36°0'0"E and latitude of 12°0'0"N. The town has
a total of 44,187 populations, and deforestation is implicated to be one of the major climatic
change problems in the area.

7
Figure3. 1 Map of Study Area (Aynalem, 2008)

3.3 Data/sample Collection

Data collection plays a very crucial role in the statistical analysis. In research, there are different
methods used to gather information, all of which fall into two categories, i.e. primary data, and
secondary data. As the name suggests, primary data is one which is collected for the first time by
the researcher while secondary data is the data already collected or produced by others.

a) Primary Data Collection

Primary data is data originated for the first time by the researcher through direct efforts and
experience, specifically for addressing his research problem. Primary data collection is quite
expensive, as the research is conducted by the organization or agency itself, which requires

8
resources like investment and manpower. The data collection is under direct control and
supervision of the investigator.

The instruments used for primary data collection were interview guides and questionnaires.
According to Rasmussen and Erik (2002), interview guides and questionnaires are useful for
getting in-depth understanding of the issues under investigation rather than measuring those
issues. The data collected through various methods like surveys, observations, physical testing,
mailed questionnaires, questionnaire filled and sent by enumerators, personal interviews,
telephonic interviews, focus groups, case studies, etc.

b) Secondary Data Collection

Secondary data implies second-hand information which is already collected and recorded by any
person other than the user for a purpose, not relating to the current research problem. It is the
readily available form of data collected from various sources like censuses, government
publications, and internal records of the organization, reports, books, journal articles, and
websites and so on.

Data can have said to be the live wire of any study most especially remote sensing datasets.
Remote Sensing and Geographical Information Systems have become effective tools for
detecting objects and phenomena change. Since the nature of land cover monitoring requires
images of different time period, and that change detection analysis is carried out most effectively
with not less than 3 images. For this study, three Landsat Satellite images 1990, 2000 and 2010
employed as a main data for digital image processing in this study. Landsat satellite images are
downloaded from USGS earth explorer website. The satellite collects images of earth with 16-
day repeat cycle, referenced to the Worldwide Reference System. The image data are radio
metrically and geometrically corrected and are available in TIFF. Meanwhile, dynamics of
spatial pattern of the land use land cover types, both bio-physical (soil) and socio-economic
(population density) parameters are considered as important potential drivers causing changes in
the land use land cover pattern.

9
3.4 data preprocessing

Pre-processing of Landsat images involved application of various digital images


processing technique such as, geometric rectification, radiometric calibration, dark subtraction
and cloud masking. In this study, the selected images in 1990, 2000 and 2010 were geometrically
corrected and projected to the specific area standard projection. Then, radiometric calibration
was analyzed using ENVI software. The process is applied to convert sensor spectral radiance to
atmospheric reflectance.

3.4.1 Classification
The intent of the classification process is to categorize all pixels in a digital image into one of
several land cover classes, or "themes". This categorized data may then be used to produce
thematic maps of the land cover present in an image. Normally, multispectral data are used to
perform the classification and, indeed, the spectral pattern present within the data for each pixel
is used as the numerical basis for categorization. Unsupervised and supervised image
classification techniques are the two most common approaches. However, object-based
classification has been used more lately because it’s useful for high-resolution data.

With supervised classification, we identify examples of the Information classes (i.e., land cover
type) of interest in the image. These are called "training sites". The image processing software
system is then used to develop a statistical characterization of the reflectance for each
information class. This stage is often called "signature analysis" and may involve developing a
characterization as simple as the mean or the rage of reflectance on each bands, or as complex as
detailed analyses of the mean, variances and covariance over all bands. Once a statistical
characterization has been achieved for each information class, the image is then classified by
examining the reflectance for each pixel and making a decision about which of the signatures it
resembles most.

The images datasets imported into Tersat image processing tools for classification i.e. the
process of extraction of differentiated classes or theme from raw remotely sensed digital satellite
data. Each cluster of observations is a class. A class occupies its own area in the feature space i.e.
specific part of the feature space corresponds to a specific class value. Once the classes have
been defined in the feature space, each image pixel observation can be compared to these classes

10
and assigned to the corresponding class. Classes to be distinguished in an image classification
need to have different spectral characteristics, which can be analyzed by comparing spectra
reflectance curve. The only limitation of image classification is that if classes do not have
distinct clusters in the feature space. Such image classification does not give reliable results.
Training sites generated on the images by on-screen digitizing for each land cover classes
derived from image of different band combination. A supervised maximum likelihood machine
learning algorithm proposed for the classifications. This was due to the fact that the operator has
familiarized with the study area through dedicated field observation, whereby the spectra
characteristics of the classes in the sampled area has been identified. Ground truth information
was used to assess the accuracy of the classification. Table 1 shows the selected training attribute
classification.

Table3. 1 Training sites

S/N Training Sample Description


1 Settlement Area occupied by people for habitation
2 Cropland Area occupied by farming activities
3 Forest Area of open forest devoid of forest plantation
4 Water body surface area occupied by stream, pond or river, dam
5 Barren land Area covered by road

3.5 Development Tools

This study uses the following software for processing and then creating relation between the
drive factors and thus predicts the expected result using Tersat, SPSS, and Erdas software tools.

3.5.1 Erdas Imagine


ERDAS IMAGINE provides true value, consolidating remote sensing, photogrammetry,
analysis, basic vector analysis, and radar processing into a single product. There are many
solutions in one, incorporating the following standards, enterprise capabilities, and products:

 Image analysis, and remote sensing


 Support for optical panchromatic, multispectral and hyperspectral imagery

11
 User-friendly ribbon interface
 Multi-core and distributed processing
 Spatial modeling with raster, vector and point cloud operators, as well as real-time
results preview
 High-performance terrain preparation and mosaicking
 A variety of change detection tools

3.5.2 Tersat
Tersat (formerly IDRISI) is an integrated geographic information system (GIS) and remote
sensing software developed by Clark Labs at Clark University for the analysis and display of
digital geospatial information. Tersat is a PC grid-based system that offers tools for researchers
and scientists engaged in analyzing earth system dynamics for effective and responsible decision
making for environmental management, sustainable resource development and equitable
resource allocation.

Key features of Tersat include:

 GIS analytical tools for basic and advanced spatial analysis, including tools for surface and
statistical analysis, decision support, land change and prediction, and image time series
analysis;
 an image processing system with multiple hard and soft classifiers, including machine
learning classifiers such as neural networks and classification tree analysis, as well as image
segmentation for classification;
 Land Change Modeler, a land planning and decision support toolset that addresses the
complexities of land change analysis and land change prediction.
 Earth Trends Modeler, an integrated suite of tools for the analysis of image time series (time
series) to assess climate trends and impacts.
 Climate Change Adaptation Modeler, a facility for modeling future climate and its impacts.

12
3.5.3 Weka
Waikato Environment for Knowledge Analysis (Weka), is a collection of visualization tools and
algorithms for data analysis and predictive modeling, together with graphical user interfaces for
easy access to these functions. Weka supports several standard data mining tasks, more
specifically, data preprocessing, clustering, classification, regression, visualization, and feature
selection. All of Weka's techniques are predicated on the assumption that the data is available as
one flat file or relation.

Advantages of Weka include:

 Free availability under the GNU General Public License.


 Portability, since it is fully implemented in the Java programming language and thus runs
on almost any modern computing platform.
 A comprehensive collection of data preprocessing and modeling techniques.
 Ease of use due to its graphical user interfaces.

13
4. Project Schedule

4.1 TimeLine

Table4. 1 Project Schedule

Task Name 2019 2020

Mar

May

Aug
Nov

Apr

Nov
Dec

Feb

Jun
Jan

Sep
Oct

Oct
Jul
Planning
Approval & Finalize
research proposal
Submission it to
clearance committee
Collection of budget,
material and other
resources
Data collection and
organization
Experimentation
Data analysis &
interpretation
draft write up
Final report
dissemination

13
5. Estimated Project Costs

5.1 Cost Breakdown

In order to carry out this study, a variety of costs will be required. As a result, the following
table illustrates the minimum cost expenditure required for the completion of the study.

5.1.1 Per Diem Cost for Researcher’s at the Time of Field


The per diem is calculated based on the total number of population participated in the data
collection.

Table5. 4 Per-diem Cost

Role Number Payment per Total Justification


Day in (ETB) Payment
Participants of days

One Principal Observation 7 50 7*50*2 = To observe and communicate


Investigator 700 all bureaus and stakeholders
And one 10 50 10*50*2  To select data
Coinvestigators 1000 collectors
Selection

15 50 15*50*2 =  To orient data


Orientation 1500 collectors

30 50 30*50*2=  To collect existing


Managing Data 3000 data from land
collection administration offices

30 50 29*50*2=  To interview the


Interviewing 2900 employer

Subtotal 9100 ETB

14
Table5. 5 Data collection Cost

Role Number Payment per Day Total Justification


in (ETB) Payment
Participants of days

Data collectors Data Collecting 20 50 50*2*20 =  Orienting


(2) orientation 2000 for data
collectors
for 10 days
Data collectors Collecting data 23 50 50*12*20 =  To collect
(12) 12,000 the data for
10 which
will be
collected 8
samples per
day.
Subtotal 14000 ETB

Table5. 6 Mobile Cost

No . Item or Services Required Price (Birr)

1. Communication (Mobile) 1000.00

Subtotal ETB 1000.00

Grand total = 9100+14000+1000 = 24100

10 % contingency = 2410.0 birr

Therefore, net budget = 24100+2410.0 = 26510 birr

15
References
Aynalem Adugna (2008). Ethiopian Demography and Health, Retrieved January 13, 2019
from http://www.ethiodemographyandhealth.org/Benishangul.html.
Anderson, H. A., (2009). Use and Implementation of Urban Growth Boundaries, an
Analysis Prepared by the Center for Regional and Neighborhood Action.
Kokolwin, Ryosuke, Shibasaki (2005): Monitoring and Analysis of Deforestation
Process; using Satellite Imagery and GIS (a case study of Myanmar)
Rasmussen, S., & Erik, S. (2002). Essentials of Social Research Methodology. Southern
Denmark: Odense University Press.
Salami A.T. & Balogun E.E. (2004): Validation of Nigeria Sat-1 for Forest Monitoring in
South-west Nigeria.

16

You might also like