You are on page 1of 8

Current Trends in Statistics in V6 region

c CSS 2008

MULTIVARIATE STATISTICAL METHODS


APPLIED IN POVERTY RESEARCH
Alina Măriuca Ionescu

Abstract: The phenomenon of poverty presents multiple dimensions, such


as: lack of necessary income, lack of productive resources, hunger and mal-
nutrition, analphabetism, lack of a shelter or inappropriate living conditions,
unsafe environment, discrimination and social exclusion.
Given the multiple facets of poverty, multivariate statistical methods
represent the appropriate tools for studying such a multidimensional phe-
nomenon.
In this study, I apply a few multivariate statistical methods – principal
components analysis, cluster analysis and discriminant analysis – in order to
evaluate the poverty profile at the level of territorial administrative unities
from Romania.
The profile of community level poverty is underlined by presenting terri-
torial distribution of the clusters of counties, establishing the characteristics
of each cluster, and identifying specific programs for poverty reducing.
Keywords: poverty, multivariate methods, principal components analysis,
cluster analysis, discriminant analysis

1 Introduction
Poverty and fight against poverty are in the middle of recent debates among
researchers, decision makers and investors. The phenomenon exists in all the
countries and societies and presents multiple dimensions comprising both
monetary and non-monetary aspects of individuals wellbeing, such as:

• low income, consumption and occupation of labor force;

• insufficient or poor nourishment;

• precarious health;

• limited access to education;

• reduced participation in making decisions;

• limited possibility of influencing ones own living standard;

• impossibility to actively participate to community life and, as a conse-


quence, impossibility of integrating in society.
34 Alina Măriuca Ionescu

The study of poverty dimensions, their measurement and permanent mon-


itoring of this phenomenon are of great importance for identifying the poor
people, targeting the efforts for poverty alleviation, and assessing the success
or failure of governmental programs and politics to reduce poverty.
In order to describe the multidimensionality of poverty, it is necessary to
evaluate the phenomenon by several variables analyzed at the same time.
Simultaneous analysis of many variables is possible with the help of mul-
tivariate methods of statistical analysis, extremely useful in the research of a
multidimensional phenomenon as poverty.
The advantage of multivariate techniques consists in their ability to bring
to researcher a new form of describing a phenomenon, impossible to reach
using univariate statistical methods. They allow us to describe the models
of the relationships between variables.
In poverty research, the main directions for the use of multivariate sta-
tistical methods concern especially the realization of poverty maps and the
construction of poverty composite indicators.

2 Material and methods


In this study, I apply a few multivariate statistical methods in order to eval-
uate the poverty profile at the level of territorial administrative unities from
Romania.
The variables considered in the study characterize poverty aspects as re-
gards: demography, economy, education, health, infrastructure, living condi-
tions.
Data were recorded at county level, the capital Bucharest being excluded
from analysis as it presents extreme values for some of the variables.
The statistical methods used in the paper are: principal components anal-
ysis – to select the most relevant variables that describe poverty; cluster
analysis – to identify homogenous clusters of Romanian counties according
to poverty characteristics in these areas; and discriminant analysis – to vali-
date cluster solution.
Statistical data processing was conducted using SPSS software.

3 Preliminary analysis of data set using principal com-


ponents analysis
The study of statistical results and graphical representations obtained with
PCA allowed the selection of a number of 14 variables from the original data
set of 18 variables.
The graphical representation of variables on the first two factorial axis
(figure 1) shows that the first axis opposes, on the one hand, variables that
describe the percentage of rural population and occupation in agriculture and,
on the other hand, variables that express economic and industrial develop-
ment. Therefore, it is most likely that counties with high levels of investments
Multivariate statistical methods applied in poverty research 35

and GDP, as well as with high percentages of population occupied in indus-


try to be characterized by low percentages of rural population and reduced
occupation in agriculture, and inverse.

Figure 1: Variables position on the first two factorial axis (Source: Output
obtained in SPSS with PCA)

4 Identification of homogenous groups of Romanian


counties according to their poverty characteristics,
using cluster analysis
After selecting the most significant variables that express poverty, Roma-
nian counties are grouped, on the basis of these variables and using cluster
analysis, in clusters as homogenous as possible and different comparing with
the others. Due to the fact that the investigated populations size is rela-
tively small (41counties), there are used hierarchical methods of clustering,
while the Squared Euclidian Distance is chosen as proximity measure as it is
frequently employed when working with interval data.
The result of hierarchical clustering is plotted using Ward dendogram.
There is no exact procedure for determining the number of clusters. Some
information on this issue comes from dendogram and hockey stick plot of
agglomeration schedule coefficients (figure 2).
For a good cluster solution, we look for a sudden jump in the distance
coefficients succession. The stage before the sudden change indicates the
optimal stopping point for merging clusters. For this example, we should
consider using a 9, 6, 4 or 2-clusters solution (figure 2). There have resulted
3 possible solutions: with 9, 6 and 4 clusters.
36 Alina Măriuca Ionescu

Figure 2: Hockey stick plot of agglomeration schedule coefficients versus


number of clusters

Highlight of the obtained clusters on dendogram, for each of the 3 solu-


tions, shows the 4 clusters solution as being the optimal one, as the counties
grouping in 6 or in 9 clusters differentiate more clearly only the counties
affected in a lower extent of poverty (figure 3).

5 Validation of cluster solution using discriminant anal-


ysis

Validation of cluster solution is done with discriminant analysis. The method


tries to determine if there exists a combination of variables which objectively
separates resulted clusters of counties.

In the context of this study, discriminatory variables or predictors are


represented by the 14 independent variables selected with PCA, grouping
variable being cluster membership, that is a variable obtained with cluster
analysis. Discriminant function correctly classifies 100% of the total cases
that means all the 41 counties considered in the study. When using cross-
validation, 80.5% of the cases are correctly classified, meaning 33 of the total
of 41 counties. The result of discriminant analysis allows us to conclude
that the selected variables significantly differentiate the 4 obtained clusters,
according to their poverty profile. That means the solution found is indeed
the searched one.
Multivariate statistical methods applied in poverty research 37

Figure 3: Counties grouping in 9, 6 and, respectively, 4 clusters (Ward Den-


dogram)

6 Poverty profile of the clusters of Romanian counties


The solution found optimal groups the Romanian counties in four clusters,
in accordance with the intensity of their poverty. For this solution, territorial
distribution of the clusters significantly reproduces the geographical repar-
tition, as it groups, usually, neighbored counties (figure 4). Therefore, this
solution is appropriate both when preparing programs for small areas, and
when intending to set up programs addressing larger areas and regarding the
territorial distribution of the considered counties, too.
Darker the surfaces are colored on the map, poorer counties they indicate.
There were identified the following characteristics of community poverty
profile for the four clusters of counties:
• Cluster 1 comprises relatively economic developed and very industrial-
ized counties, but with low urbanization and high percentage of rural
population. Population of these counties presents a moderate access to
health services and to public utilities infrastructure. It is the group of
counties characterized by a moderate intensity of poverty.
• Cluster 2 is visibly the most affected in all dimensions of poverty. Coun-
ties from this group are characterized by at a loss economy, marked by
investors discouragement, as it records the lowest level of gross invest-
ments (twice smaller than the next cluster and 5 times lower comparing
with the richest clusters). It comprises mainly counties located in field
or plateau areas: those following the Danube line and those from Mol-
davia region, with very low urbanization and industrialization. Pop-
ulation is preponderant rural (57.4%) and most of it is occupied in
38 Alina Măriuca Ionescu

Figure 4: Territorial distribution of the clusters of counties for the solution


with 4 clusters

agriculture (45.4%). This cluster presents the poorest access to health


services and to public utilities, being very affected as regards the infras-
tructure of access to drinking water and sewerage, essential elements for
a decent living. According to the territorial distribution, the counties
from this cluster form a horseshoe of poverty (figure 4).
• Cluster 3, formed of counties Timioara, Cluj and Constana, records the
best living conditions for its inhabitants. It is characterized by the high-
est levels of gross investments and GDP, an average industrialization
and the smallest percentage of the population occupied in agriculture
(25%). Comparing with the other clusters, it presents a high degree of
urbanization, having the lowest percentage of rural population (33.1%).
It also owns the best access to health services, meaning the highest
number of beds in hospitals per 100000 inhabitants, and the smallest
number of persons per physician. Access to specialized medical stuff is
almost 3 times higher than that of cluster 2. Localities with running
water installation represent an important percentage (87.7%) compar-
ing to the other clusters, which present fewer than 65% percentages. It
has the best access to heating energy and sewerage.
• Cluster 4 presents a moderate economic development, an average level
Multivariate statistical methods applied in poverty research 39

of industrialization and a relatively reduced percentage of the popu-


lation occupied in agriculture. It is formed in the most of its part
of counties located in mountain area, with increased urbanization and
small percentage of rural population. Access to health infrastructure
is up the average. As regards the infrastructure of public utilities, it
owns the best access of population to the network of natural gas, four
times higher that cluster 2, and a much over average access to all the
other public utilities.

7 Conclusions
Though the clusters obtained are homogenous within them, they differ among
them as regards the affected dimensions of poverty and the intensity of the
phenomenon. Therefore, it is not efficient to elaborate a unique program for
poverty alleviation that would be suitable for the entire country. Accord-
ing to each clusters profile there could be designed and developed specific
poverty alleviation programs that take into account poverty intensity in each
considered dimension. To the clusters that present deprivations in health di-
mension of poverty, it should be designed and applied appropriate programs
to improve the access to health services (encouragement of the medical stuff
to develop their activity in the affected areas, allocation of resources for the
construction of new hospitals and the extension of the existing ones in these
areas). For clusters with low level of investments and GDP, the companies
may be stimulated, by getting facilities, to invest in problematic areas, so
as to create new jobs and new perspectives of economic development. The
programs that focus on infrastructure development can target the groups of
counties characterized by low sustainable access to running water and sew-
erage correlated with high percentage of rural population and of population
employed in agriculture.
Identifying the poverty profile at territorial level, may be of a real utility
in designing poverty reduction programs and policies as it permits to detect
the most poverty affected areas and the povertys specific in these areas and
help the policy-makers to target the poor for the best resource allocation
to alleviate poverty. The results obtained in such a study may be used
for the elaboration of poverty alleviation programs specific to each group
of homogenous counties.

References
[1] Anderberg, M. R. (1973), Cluster Analysis for applications, Academic
Press, New York
[2] Arcia, G. (1999), Proyecto de la red de protección social: focalización de
la fase piloto, Washington DC, Inter-American Development Bank
[3] Boccanfuso, D. (2004), A conceptual framework for approaches to poverty,
Workshop, Dakar-Senegal, February 18–20, 2004
40 Alina Măriuca Ionescu

[4] Chatfield, C., Collins, A.J. (2000), Introduction to Multivariate Analysis,


Chapman & Hall / CLC (first edition, 1980)
[5] Davis, B. (2003), Choosing a method for poverty mapping, Economist,
Agriculture and Economic Development Analysis Division, Food and
Agriculture Organization of the United Nations, Rome, 2003, pp. 12–16
[6] Everitt, B., Landau, S., Leese, M. (2001), Cluster analysis, 4th Edition,
London: Edward Arnold Publishers Ltd.
[7] Guvernul Republicii Moldova (2004), Strategia de Creştere Economică şi
Reducere a Sărăciei (2004-2006), Chişinău
[8] Huberty, C. (1994), Applied discriminant analysis, New York: Wiley
[9] INS România (2006), Anuarul Statistic al României, 2005, Bucureşti
[10] Ionescu, A.M. (2006), Poverty mapping of romanian counties using clus-
ter analysis, in Analele Universităţii Al. I. Cuza Iaşi, Seria: Ştiinţe eco-
nomice, 2006
[11] Ionescu, A.M., Buruian, A.I. (2007), Evaluarea statistic a nivelului de
dezvoltare din Romnia n profil teritorial, folosind analiza componentelor
principale, in Evaluarea statistic a dezvoltrii economico-sociale (coord.
Elisabeta Jaba), Editura Junimea, Iaşi
[12] Jaba, E., Şerban, D., Vioric, E.D., Balan, C.B. (2006), Analiza discrim-
inant, metodă statistică de bază pentru evaluarea campaniilor de market-
ing, articol in Revista Romnă de Statistică nr.4/2006
[13] Kaufman, L., Rousseeuw, P. J. (1990), Finding groups in data: An in-
troduction to cluster analysis, John Wiley & Sons, New York
[14] Kendall, Sir Maurice (1975), Multivariate Analysis, Griffin, London
[15] Manly, B.F.J. (2005), Multivariate statistical methods: a primer, third
edition, Chapman & Hall / CLC
[16] Stevens, J.P. (2002), Applied multivariate statistics for the social sci-
ences, fourth edition, Lawrence Erlbaum Associates
[17] Timm, N. (2002), Applied Multivariate Analysis, Springer Text in Statis-
tics
[18] National Human Development Report Romania at www.undp.ro

Acknowledgement: I gratefully acknowledge National Council for Scientific


Research from Romania for a grant that allowed this work to be supported by
funding from the Ministry of Education and Research – National Authority
for Scientific Research from Romania. I also thank Professor Elisabeta Jaba,
University Al. I Cuza from Iasi, Romania, for the support offered in my
research activity during the doctoral studies.
Address: University Al. I. Cuza from Iai, Romania

E-mail : alina.ionescu@yahoo.com