You are on page 1of 7

Available online at www.sciencedirect.

com
Available online at www.sciencedirect.com
ScienceDirect
ScienceDirect
Procedia
Available Computer
online Science 00 (2018) 000–000
at www.sciencedirect.com
Procedia Computer Science 00 (2018) 000–000 www.elsevier.com/locate/procedia
www.elsevier.com/locate/procedia
ScienceDirect
Procedia Computer Science 135 (2018) 712–718

3rd International Conference on Computer Science and Computational Intelligence 2018


3rd International Conference on Computer Science and Computational Intelligence 2018
Spatial Empirical Best Linear Unbiased Prediction in Small Area
Spatial Empirical Best Linear Unbiased Prediction in Small Area
Estimation of Poverty
Estimation of Poverty
Novi Hidayat Pusponegoroaa*, Ro’fah Nur Rachmawatibb
Novi Hidayat Pusponegoro *, Ro’fah Nur Rachmawati
a
Statistics Departement, Polytechnic of Statistics STIS, Jakarta, Indonesia 13330
b
Statistics Departement,
a School of Computer
Statistics Departement, Science,
Polytechnic Bina Nusantara
of Statistics University,
STIS, Jakarta, Jakarta13330
Indonesia , Indonesia 11480
b
Statistics Departement, School of Computer Science, Bina Nusantara University, Jakarta , Indonesia 11480

Abstract
Abstract
Spatial data contains of observation and region information, can describes spatial patterns such as social phenomenon or poverty.
Spatial dataparameter
In poverty contains of observation
estimations, theand
lessregion information,
of sample adequacy cantodescribes spatial
deliver direct patterns such
estimation is oneasofsocial phenomenon
the limitation, thusorthe
poverty.
Small
Area Estimation
In poverty (SAE)
parameter developedthe
estimations, to handle it. Since,
less of sample the smalltoarea
adequacy estimation
deliver techniquesisrequire
direct estimation “borrow
one of the strength”
limitation, thus across
the Small
the
Area Estimation
neighbor (SAE) developed
areas furthermore SAE wastodeveloped
handle it. by Since, the small
integrating areainformation
spatial estimation into
techniques require
the model, named“borrow strength”
as Spatial across the
SAE. Therefore,
neighbor
the purposeareas
of furthermore
this paper is SAE was developed
to compare the SAEby integrating
and Spatial SAEspatial information
model in order into the model,
to estimate, named as Spatial
at sub-district level, SAE.
mean Therefore,
per capita
the purpose
income of this
of each areapaper
usingis the
to compare
poverty the SAEdata
survey andinSpatial
Bangka SAE model in
Belitung order toatestimate,
province 2017 by atPolytechnic
sub-districtoflevel, mean STIS.
Statistics per capita
The
findings
income of of each
the paper is spatial
area using the information
poverty surveydon'tdata
influence the parameter
in Bangka estimationatin2017
Belitung province SAE.by Polytechnic of Statistics STIS. The
findings of the paper is spatial information don't influence the parameter estimation in SAE.
© 2018 The Authors. Published by Elsevier Ltd.
© 2018
© 2018
This The
is an open
The Authors. Published
accessPublished
Authors. by Elsevier
article under
by Elsevier Ltd.
the CC BY-NC-ND
Ltd. license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
This is an and
Selection openpeer-review
access article under
under the CC BY-NC-ND
responsibility of the 3rdlicense (https://creativecommons.org/licenses/by-nc-nd/4.0/)
International Conference on Computer Science and Computational
Selection and peer-review under responsibility of the 3rd International Conference on Computer Science and Computational
Selection
Intelligenceand peer-review
2018.
Intelligence 2018. under responsibility of the 3rd International Conference on Computer Science and Computational
Intelligence 2018.
Keywords: EBLUP; Poverty; SAE; SEBLUP
Keywords: EBLUP; Poverty; SAE; SEBLUP

1. Introduction
1. Introduction
The poverty eradication is the first Sustainable development goals that carried by the United Nations. Since, poverty
is aThe povertydeprivation
well-being eradication thus
is theinformation
first Sustainable development
of living condition goals that in
of people carried by the
a certain United
area Nations.
is a point Since,topoverty
of interest policy
is a well-being deprivation thus information of living condition of people in a certain area is a point of interest to policy

* Corresponding author. Tel.: +62-81294630586


* E-mail address:author.
Corresponding novie@stis.ac.id
Tel.: +62-81294630586
E-mail address: novie@stis.ac.id
1877-0509 © 2018 The Authors. Published by Elsevier Ltd.
This is an open
1877-0509 access
© 2018 Thearticle under
Authors. the CC BY-NC-ND
Published license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
by Elsevier Ltd.
Selection
This is an and
openpeer-review under
access article responsibility
under of the 3rdlicense
the CC BY-NC-ND International Conference on Computer Science and Computational Intelligence 2018.
(https://creativecommons.org/licenses/by-nc-nd/4.0/)
Selection and peer-review under responsibility of the 3rd International Conference on Computer Science and Computational Intelligence 2018.

1877-0509 © 2018 The Authors. Published by Elsevier Ltd.


This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Selection and peer-review under responsibility of the 3rd International Conference on Computer Science and Computational
Intelligence 2018.
10.1016/j.procs.2018.08.214
Novi Hidayat Pusponegoro et al. / Procedia Computer Science 135 (2018) 712–718 713
2 Author name / Procedia Computer Science 00 (2018) 000–000

makers and researchers. So that, in Indonesia some surveys have been generally designed to estimate poverty
parameters in nationwide scale. Those surveys approaches in estimating parameters based on the application of design
based models where the estimators generated are called direct estimation. Direct estimation is a method of estimation
in an area based on information of sample data from that area. Problems appear if the poverty estimation parameter or
any poverty information for smaller areas from the survey needs to be obtained, for example at the provincial level,
district level or sub district level.
Less of sample adequacy to deliver direct estimation at the level of the area will lead to a large standard error [1].
An area is regarded as “small” if the area-specific sample is not large enough to support direct estimates of adequate
precision [2]. Then, the term a small area can be expressed as an area with a small sample for which direct estimates
of adequate precision cannot be produced. To overcome this problem, a parameter estimation method called Small
Area Estimation (SAE) has been developed.
The SAE model is a mixed model with variance within the sub-populations can be explained entirely by the
corresponding variance in auxiliary information as the fixed effect and the specific variance of sub-populations that
can’t be explained by auxiliary information as the random effect. The first researchers that developed a small area
statistics based on linear mixed model were Fay and Herriot [3], with empirical best linear unbiased prediction
(EBLUP) as a parameter estimation method. Since, the small area estimation techniques require “borrow strength”
across neighbor areas so thus EBLUP approach was developed by integrating spatial information into the model. It
takes into account the random effects of spatial correlation areas is known as spatial empirical best linear unbiased
prediction (SEBLUP). The SEBLUP method improve the variance covariance structure of small area models that have
spatial correlations between areas. The SEBLUP method is presented as the area-based model since the spatial data
type are available for area level [4, 5, 6, and 7].
Therefore, the purpose of this paper is to compare EBLUP and SEBLUP parameter estimate in order to verify the
spatial effect in SAE modelling. This paper is organized into five sections with the first section presents the
background, motivation and purpose of this research. Second section is a review of the latest literature that is related
to this research topic. This section also clarifies the position of this research among similar researches that has been
done. The third section describes the data and methodology used in this paper. The empirical results obtained in the
application of the developed method are presented in section 4. The final section summarizes the main findings of the
analysis and discusses the further possible researches.

2. Spatial Dependence in Small Area Estimation

There are two types of model in small area estimation, i.e. area level model and unit level model. The area level
model is a model based on the availability of supporting data that exists only for a given area level, and it’s a special
form of linear mixed model. While the unit level model is a model with auxiliary data that is individually compatible
with the response data but sometime its lack of availability. Fay-Herriot proposed a small area estimation model for
area level presented in the form of a linear mixed model which assumed that random effects of the areas are
independent. The model can be written as:

̂ = 𝑿𝑿𝑿𝑿 + 𝒖𝒖 + 𝒆𝒆
𝜽𝜽 (1)

where θ be the 𝑚𝑚 × 1 vector of the parameters of inferential interest and assume that the direct estimator 𝜽𝜽 ̂ is
available, 𝒆𝒆 is a vector of independent sampling errors with mean vector 0 and known diagonal variance matrix 𝑹𝑹 =
diag(𝝋𝝋𝒊𝒊 ), 𝝋𝝋𝒊𝒊 representing the sampling variances of area direct estimators. Thus, α is the 𝑝𝑝 × 1 vector of regression
parameters, u is the 𝑚𝑚 × 1 vector of independent random area effects with zero mean and 𝑚𝑚 × 𝑚𝑚 covariance matrix
∑𝑢𝑢 = 𝜎𝜎𝑢𝑢2 𝐼𝐼𝑚𝑚 . The Empirical Best Linear Unbiased Predictor (EBLUP) is extensively used to obtain model based
indirect estimators of small area parameters θ and associated measures of variability, which can be expressed as:

𝐹𝐹𝐹𝐹
𝜃𝜃̂𝑖𝑖 = 𝛾𝛾̂𝑖𝑖 𝜃𝜃̂𝑖𝑖 + (1 − 𝛾𝛾̂𝑖𝑖 )𝒙𝒙𝑖𝑖 𝑇𝑇 𝜶𝜶
̂ (2)
714 Novi Hidayat Pusponegoro et al. / Procedia Computer Science 135 (2018) 712–718
Author name / Procedia Computer Science 00 (2018) 000–000 3

where 𝛾𝛾̂𝑖𝑖 = 𝜎𝜎̂𝑢𝑢2 ⁄(𝜎𝜎̂𝑢𝑢2 + 𝜑𝜑𝒊𝒊 ) and 𝜶𝜶


̂ is the weighted least squares estimate of  with weights (𝜎𝜎̂𝑢𝑢2 + 𝜑𝜑𝒊𝒊 )−1 .
Based on Tobler's first law of geography which is "everything is related to everything else but the related things
are more related than distant things", the independence assumption between areas in Fay-Herriot model are often
violated. Furthermore, SAE model with an assumption that spatial dependence is incorporated into the error component
of a random factor following the Simultaneous Autoregressive (SAR) process was developed [8].
Based on the SAR model which its random effect is a function of the spatial weighted matrix and spatial
autoregressive coefficients, subsequently the Spatial SAE is developed in 2004 and the model can be expressed as:

̂ = 𝑿𝑿𝑿𝑿 + 𝑫𝑫𝑫𝑫 + 𝒆𝒆
𝜽𝜽 (3)

where V is a 𝑚𝑚 × 1 vector of spatially correlated random area effects given by the following autoregressive process
with spatial autoregressive coefficient ρ, which is can be written as:
v = Wv + u  v = (I m − W ) u ,
−1

and D is a 𝑚𝑚 × 𝑚𝑚 matrix of known positive constants, and W is 𝑚𝑚 × 𝑚𝑚 spatial interaction matrix. The covariance
2 𝑻𝑻 −1
matrix of v is the 𝑚𝑚 × 𝑚𝑚 Simultaneously Autoregressive (SAR) 𝑮𝑮(𝜹𝜹) = 𝜎𝜎
̂ 𝑢𝑢 [(𝑰𝑰 − 𝜌𝜌𝑾𝑾 ) (𝑰𝑰 − 𝜌𝜌𝑾𝑾)] and the
̂ 2𝑢𝑢 , 𝜌𝜌). So that, the estimator of θ as Spatial
̂ is given by 𝑽𝑽(𝜹𝜹) = 𝑹𝑹 + 𝑫𝑫𝑫𝑫𝑫𝑫𝑻𝑻 with 𝛿𝛿 = (𝜎𝜎
covariance matrix of 𝜽𝜽
Empirical Best Linear Unbiased Predictor (SEBLUP) is:

̂ 𝑖𝑖 𝑆𝑆𝑆𝑆𝑆𝑆 = 𝒙𝒙𝑖𝑖 𝜶𝜶
𝜽𝜽 ̂ + 𝑏𝑏𝑖𝑖 𝑇𝑇 𝑮𝑮 ̂𝑫𝑫𝑻𝑻 )−1 + (𝜃𝜃̂ − 𝒙𝒙𝜶𝜶
̂ 𝐷𝐷 𝑇𝑇 (𝑹𝑹 + 𝑫𝑫𝑮𝑮 ̂) (4)

̂ −𝟏𝟏 𝑿𝑿)−1 𝑿𝑿𝑽𝑽


̂ = (𝑿𝑿𝑻𝑻 𝑽𝑽
where 𝜶𝜶 ̂ −𝟏𝟏 𝜃𝜃̂ and 𝑏𝑏𝑖𝑖𝑇𝑇 is a 1 × 𝑚𝑚 vector (0, 0, …, 0, 1, …, 0) with value 1 for the i-th area. Many
researchers stated that adding appropriate geographic information and using geographical modeling can help to
estimate small area parameters more accurately by added that the geographic boundaries of small areas are generally
defined according to administrative criteria without considering spatial interactions in the variables studied. Thus, the
random effects between neighboring regions can also be replaced by the criteria of contiguity.
Based on those literatures above, this paper applied the spatial model of small area estimation in official statistics
especially in the poverty analysis. We determine the area level model in small area level model and use Euclidean
distance as spatial weighted rather than contiguity.

3. Data and Methodology

In small area estimation of poverty studies, we require some auxiliary variables in order to estimate the response or
the regional mean of per capita income. Infrastructure can be the auxiliary variable since its effect on poverty has been
studied many times by researchers. Electricity is part of the basic infrastructure needed besides clean water and
sanitation [9, 10, and 11]. Electricity can be proxy of technological development and its existence can move access to
basic services such as education and health. Moreover, the level of regional accessibility also has significant
relationship with poverty [12]. Regional accessibility that defined by the distance of region to city center represents
access to social economic facilities. On purpose to verify the useful of spatial information instead of its independency,
we need a data set that consists of poverty and spatial variables in area level. So, data set of this study derived from
the poverty survey data in Bangka Belitung province at 2017 that consist of 139 sub-district data. Bangka Belitung
province has 387 sub-districts thus we also apply the SAE model to predict poverty model of not surveyed sub-districts.
In order to fulfill the purpose, this study will compare the Fay-Herriot model and Spatial Fay-Herriot model.
Since income distribution is skewed [13], we need to transform the response in log value. And then, we arrange
the Fay-Herriot model with the spatial information accounted as random effect. Furthermore, we also determine
the Spatial Fay-Herriot model and replace independent random area effects by function of matrix spatial interaction
that is invers Euclidean distance matrix.
Novi Hidayat Pusponegoro et al. / Procedia Computer Science 135 (2018) 712–718 715
4 Author name / Procedia Computer Science 00 (2018) 000–000

In order to select the best fit model for the poverty data in Bangka Belitung province despite of the
parsimonious model thus this paper uses Akaike’s Information Criteria (AIC) and Means square error (MSE) of
parameter estimate as goodness of fit information criteria [14].

4. Results and Discussion

The response variable of this study is 139 sub-districts’ mean per capita income and the distribution is illustrated
as follows in Fig. 1. Fig.1 describes that the distribution of mean per capita income is positively skewed or we can say
that most of Bangka Belitung’s people have low income. This results comparable to BPS’ report in 2016 that its head
count index is 5.22%. Based on those data, we need explore the poverty problem in Bangka Belitung even in smaller
area level such as sub-district.
On purpose to apply linear mixed model thus we need to fit the distribution of mean per capita income into normal
distribution, we transform it by log transformation [15]. Then, the distribution of log mean per capita income can be
seen in Fig.2. It shows that the distribution of log mean per capita income is more fit to normal distribution than the
previous data values.

Fig.1. Distribution of the mean per capita income Fig.2. Distribution of the log mean per capita income

In this research, we use 2 auxiliary variable which are household total in each sub-district that have access to
electricity from Indonesia Electricity Company and distance the sub-district center to the municipality office. As proxy
to infrastructure and access to social economic facilities, their relationship with the log mean per capita income are
illustrated in Fig.3 and Fig. 4,
Fig. 3 describes that total household with electricity access is positively correlated with log mean per capita income
and Fig. 4 shows that distance and log mean of per capita income isn’t correlate enough. Furthermore, we need to
figure out the fixed effect of both variables simultaneously with the log mean per capita income and assume that there
is no relationship poverty condition among sub-district by applying Fay-Herriot model.
716 Novi Hidayat Pusponegoro et al. / Procedia Computer Science 135 (2018) 712–718
Author name / Procedia Computer Science 00 (2018) 000–000 5

Fig. 3 Scatter plot of household total that have access Fig. 4 Scatter plot of distance and the log mean per
to electricity from Indonesia Electricity Company and capita income
the log mean per capita income
Since this paper’s result expects an explanation about the correlation among the variables and takes into account
the spatial effects in small area estimation, so we also apply the Spatial Fay-Herriot model. The spatial correlation are
represent by invers Euclidean distance among 139 observed sub-district in Bangka Belitung province.
In order to obtain the fit model of poverty among sub-district in Bangka Belitung province, we present AIC and
MSE of the Fay-Herriot and Spatial Fay-Herriot in Table 1. Not only presenting MSE of SAE parameter estimate, we
also give MSE of the direct estimation in Table 1. Based on the Table 1, we can conclude that EBLUP has better
performance in parameter estimation since it has smaller value of AIC and MSE.

Table 1. AIC and MSE by Estimation Method


Estimation Method AIC MSE
Direct Estimation - 1.079052
EBLUP -250.714 1.002438
SEBLUP -251.462 1.005639

Furthermore, another excellence of linear mixed model is the estimation of parameter estimate can be predict more
accurate. The estimation results for Fay-Herriot and Spatial Fay-Herriot model can be seen in Table 2. Based on Table
2, parameter estimate of Fay-Herriot and spatial Fay-Herriot have similar values and levels of significant. The same
results have been shown by [16]. Those values verify that spatial information don’t effect the log of mean per capita
income since the EBLUP and SEBLUP have the similar performance in poverty parameter estimation.

Table 2. Parameter Estimate results


Parameter FH SFH
Intercept 6.02333100 6.02541700
Electricity 0.00002396** 0.00002318**
Distance -0.00086498** -0.00089630**
Note:
** : significant in  =5%

In order to emphasize that take into account spatial information in SAE model do not give more advantages,
we predict log mean per capita income for 387 all sub-districts in Bangka Belitung province even for the not
surveyed ones. The predicted values of mean per capita income are obtained by Fay-Herriot and Spatial Fay-Herriot
model, and the distributions are shown in Fig. 5. Fig. 5 illustrates that those two models have the same results in
Novi Hidayat Pusponegoro et al. / Procedia Computer Science 135 (2018) 712–718 717
6 Author name / Procedia Computer Science 00 (2018) 000–000

sub-district grouping. Sum of sub-district where mean per capita income that less than poverty line is 30. And
mostly, 198 sub districts agglomerate in third group where the range of mean per capita income between 1 and 2
millions rupiah. Thus, the rest of them agglomerate in second group.
a.

b.

Fig. 5. Sub-district’s mean per capita income distribution in Bangka Belitung Province from a) FH model b).
Spatial FH model

: Less than poverty line


: More than poverty line and less than 1 million
: More than 1 million and less than 2 million

Finally, based on those evidences we can say that adding spatial information in small area estimation model
doesn’t give better prediction than the simple one which is take into account area as specific random effect.

5. Conclusion

This paper shows that the EBLUP and Spatial EBLUP have the same performance in poverty parameter
estimation. Anyway, we need to further clarify some relationship pattern between variables since distance doesn’t
have enough linear relationship with log mean per capita income. And, we also need to know the effect by
replacing Euclidean distance with another spatial information such as geospatial information. So the next step is
to explore the spatial effect in order to provide the actual effect even in non linearity form.
718 Novi Hidayat Pusponegoro et al. / Procedia Computer Science 135 (2018) 712–718
Author name / Procedia Computer Science 00 (2018) 000–000 7

References

[1] Ghosh M and Rao JNK. (1994) “Small Area Estimation: An Appraisal.” Statistical Science 9(1): 55-76
[2] Rao, JNK. (2003) “Small Area Estimation” London, Wiley
[3] Fay RE and Herriot RA. (1979) “Estimation of income from small places: an application of James –stein procedures to census data.” Journal of
American Statistical Association 74(366): 269-277.
[4] Salvati, N. (2004) “Small area estimation by spatial models: the spatial empirical best linear unbiased prediction (Spatial EBLUP).” Working
paper 2004/03. University of Florence: Department of Statistics “G Parenti”. Florence, Italy.
[5] Petrucci A, Pratesi M, and Salvati N. (2005) “Geographic information in small area estimation: small area models and spatially correlated
random area effects.” Statistics in Transition 7: 609–623
[6] Singh BB, Shukla GK, and Kundu D. (2005) “Spatio-temporal models in small area estimation.” Statistics Canada 31(2): 183-195
[7] Pratesi M and Salvati N. (2008) “Small area estimation: the EBLUP estimator based on spatially correlated random area effects.” Statistical
Methods & Applications 17(1): 113–141
[8] Anselin L. (1992) “Spatial Econometrics: Method and Models.” Boston, Kluwer Academic Publishers
[9] Balisacan, Arsenio M, Ernesto M. Pernia and Abuzar Asra. (2003) “Revisiting Growth and Poverty Reduction in Indonesia.” Bulletin of
Indonesian Economic Studies 39(3):329-353.
[10] Fedderke, J. W. and Z. Bogetic. (2005) “Infrastructure and Growth in South Africa: Direct and Indirect Productivity Impacts of Nineteen
Infrastructure Measure.” Manuscript of the workshop on Infrastructure and Growth of Economic Research Southern Africa; May 29-31, Cape
Town, South Africa
[11] Nugroho, Sidiq S. (2015) “The Roles of Basic Infrastructure on Poverty Alleviation in Indonesia.” Kajian Ekonomi dan Keuangan 19(1): 1-
96
[11] Astuti AW and Musiyam, M. (2009) “Kemiskinan dan Perkembangan Wilayah di Kabupaten Boyolali.” Forum Geografi 23(1): 71 – 85
[12] Rahmawati R and Djuraidah A (2010) “Regresi Terboboti Geografis dengan Pembobot Kernel Kuadrat Ganda untuk Data Kemiskinan di
Kabupaten Jember.” Forum Statistika dan Komputasi 15(2): 32-37
[13] Piketty T and Saez E. (2003) “Income Inequality in the United States, 1913-1998.” Quarterly Journal of Economics 68(1):1-39
[14] Dziaj JJ, Lanza ST, and Xu S. (2012) “Sensitivity and Specificity of Information Criteria.” The Methodology Center, the Pennsylvania State
University. Retrieved from http://methodology.psu.edu
[15] Kurnia A. (2009) “The Best Predictions for Empirical Logarithm Transformation Models in Small Area Estimation with Application on
SUSENAS Data” Dissertation, Bogor: Bogor Agricultural University
[16] Zainuddin HA. (2016) “A Study of Logarithmic Transformation on Spatial Empirical Best Linear Unbiased Prediction Estimator in Small
Area Estimation” Thesis, Bogor: Bogor Agricultural University

You might also like