Current Trends in Statistics in V6 region

c CSS 2008

MULTIVARIATE STATISTICAL METHODS APPLIED IN POVERTY RESEARCH
Alina M˘riuca Ionescu a
Abstract: The phenomenon of poverty presents multiple dimensions, such as: lack of necessary income, lack of productive resources, hunger and malnutrition, analphabetism, lack of a shelter or inappropriate living conditions, unsafe environment, discrimination and social exclusion. Given the multiple facets of poverty, multivariate statistical methods represent the appropriate tools for studying such a multidimensional phenomenon. In this study, I apply a few multivariate statistical methods – principal components analysis, cluster analysis and discriminant analysis – in order to evaluate the poverty profile at the level of territorial administrative unities from Romania. The profile of community level poverty is underlined by presenting territorial distribution of the clusters of counties, establishing the characteristics of each cluster, and identifying specific programs for poverty reducing. Keywords: poverty, multivariate methods, principal components analysis, cluster analysis, discriminant analysis

1

Introduction

Poverty and fight against poverty are in the middle of recent debates among researchers, decision makers and investors. The phenomenon exists in all the countries and societies and presents multiple dimensions comprising both monetary and non-monetary aspects of individuals wellbeing, such as: • low income, consumption and occupation of labor force; • insufficient or poor nourishment; • precarious health; • limited access to education; • reduced participation in making decisions; • limited possibility of influencing ones own living standard; • impossibility to actively participate to community life and, as a consequence, impossibility of integrating in society.

34

Alina M˘riuca Ionescu a

The study of poverty dimensions, their measurement and permanent monitoring of this phenomenon are of great importance for identifying the poor people, targeting the efforts for poverty alleviation, and assessing the success or failure of governmental programs and politics to reduce poverty. In order to describe the multidimensionality of poverty, it is necessary to evaluate the phenomenon by several variables analyzed at the same time. Simultaneous analysis of many variables is possible with the help of multivariate methods of statistical analysis, extremely useful in the research of a multidimensional phenomenon as poverty. The advantage of multivariate techniques consists in their ability to bring to researcher a new form of describing a phenomenon, impossible to reach using univariate statistical methods. They allow us to describe the models of the relationships between variables. In poverty research, the main directions for the use of multivariate statistical methods concern especially the realization of poverty maps and the construction of poverty composite indicators.

2

Material and methods

In this study, I apply a few multivariate statistical methods in order to evaluate the poverty profile at the level of territorial administrative unities from Romania. The variables considered in the study characterize poverty aspects as regards: demography, economy, education, health, infrastructure, living conditions. Data were recorded at county level, the capital Bucharest being excluded from analysis as it presents extreme values for some of the variables. The statistical methods used in the paper are: principal components analysis – to select the most relevant variables that describe poverty; cluster analysis – to identify homogenous clusters of Romanian counties according to poverty characteristics in these areas; and discriminant analysis – to validate cluster solution. Statistical data processing was conducted using SPSS software.

3

Preliminary analysis of data set using principal components analysis

The study of statistical results and graphical representations obtained with PCA allowed the selection of a number of 14 variables from the original data set of 18 variables. The graphical representation of variables on the first two factorial axis (figure 1) shows that the first axis opposes, on the one hand, variables that describe the percentage of rural population and occupation in agriculture and, on the other hand, variables that express economic and industrial development. Therefore, it is most likely that counties with high levels of investments

Multivariate statistical methods applied in poverty research

35

and GDP, as well as with high percentages of population occupied in industry to be characterized by low percentages of rural population and reduced occupation in agriculture, and inverse.

Figure 1: Variables position on the first two factorial axis (Source: Output obtained in SPSS with PCA)

4

Identification of homogenous groups of Romanian counties according to their poverty characteristics, using cluster analysis

After selecting the most significant variables that express poverty, Romanian counties are grouped, on the basis of these variables and using cluster analysis, in clusters as homogenous as possible and different comparing with the others. Due to the fact that the investigated populations size is relatively small (41counties), there are used hierarchical methods of clustering, while the Squared Euclidian Distance is chosen as proximity measure as it is frequently employed when working with interval data. The result of hierarchical clustering is plotted using Ward dendogram. There is no exact procedure for determining the number of clusters. Some information on this issue comes from dendogram and hockey stick plot of agglomeration schedule coefficients (figure 2). For a good cluster solution, we look for a sudden jump in the distance coefficients succession. The stage before the sudden change indicates the optimal stopping point for merging clusters. For this example, we should consider using a 9, 6, 4 or 2-clusters solution (figure 2). There have resulted 3 possible solutions: with 9, 6 and 4 clusters.

36

Alina M˘riuca Ionescu a

Figure 2: Hockey stick plot of agglomeration schedule coefficients versus number of clusters

Highlight of the obtained clusters on dendogram, for each of the 3 solutions, shows the 4 clusters solution as being the optimal one, as the counties grouping in 6 or in 9 clusters differentiate more clearly only the counties affected in a lower extent of poverty (figure 3).

5

Validation of cluster solution using discriminant analysis

Validation of cluster solution is done with discriminant analysis. The method tries to determine if there exists a combination of variables which objectively separates resulted clusters of counties. In the context of this study, discriminatory variables or predictors are represented by the 14 independent variables selected with PCA, grouping variable being cluster membership, that is a variable obtained with cluster analysis. Discriminant function correctly classifies 100% of the total cases that means all the 41 counties considered in the study. When using crossvalidation, 80.5% of the cases are correctly classified, meaning 33 of the total of 41 counties. The result of discriminant analysis allows us to conclude that the selected variables significantly differentiate the 4 obtained clusters, according to their poverty profile. That means the solution found is indeed the searched one.

Multivariate statistical methods applied in poverty research

37

Figure 3: Counties grouping in 9, 6 and, respectively, 4 clusters (Ward Dendogram)

6

Poverty profile of the clusters of Romanian counties

The solution found optimal groups the Romanian counties in four clusters, in accordance with the intensity of their poverty. For this solution, territorial distribution of the clusters significantly reproduces the geographical repartition, as it groups, usually, neighbored counties (figure 4). Therefore, this solution is appropriate both when preparing programs for small areas, and when intending to set up programs addressing larger areas and regarding the territorial distribution of the considered counties, too. Darker the surfaces are colored on the map, poorer counties they indicate. There were identified the following characteristics of community poverty profile for the four clusters of counties: • Cluster 1 comprises relatively economic developed and very industrialized counties, but with low urbanization and high percentage of rural population. Population of these counties presents a moderate access to health services and to public utilities infrastructure. It is the group of counties characterized by a moderate intensity of poverty. • Cluster 2 is visibly the most affected in all dimensions of poverty. Counties from this group are characterized by at a loss economy, marked by investors discouragement, as it records the lowest level of gross investments (twice smaller than the next cluster and 5 times lower comparing with the richest clusters). It comprises mainly counties located in field or plateau areas: those following the Danube line and those from Moldavia region, with very low urbanization and industrialization. Population is preponderant rural (57.4%) and most of it is occupied in

38

Alina M˘riuca Ionescu a

Figure 4: Territorial distribution of the clusters of counties for the solution with 4 clusters agriculture (45.4%). This cluster presents the poorest access to health services and to public utilities, being very affected as regards the infrastructure of access to drinking water and sewerage, essential elements for a decent living. According to the territorial distribution, the counties from this cluster form a horseshoe of poverty (figure 4). • Cluster 3, formed of counties Timioara, Cluj and Constana, records the best living conditions for its inhabitants. It is characterized by the highest levels of gross investments and GDP, an average industrialization and the smallest percentage of the population occupied in agriculture (25%). Comparing with the other clusters, it presents a high degree of urbanization, having the lowest percentage of rural population (33.1%). It also owns the best access to health services, meaning the highest number of beds in hospitals per 100000 inhabitants, and the smallest number of persons per physician. Access to specialized medical stuff is almost 3 times higher than that of cluster 2. Localities with running water installation represent an important percentage (87.7%) comparing to the other clusters, which present fewer than 65% percentages. It has the best access to heating energy and sewerage. • Cluster 4 presents a moderate economic development, an average level

Multivariate statistical methods applied in poverty research

39

of industrialization and a relatively reduced percentage of the population occupied in agriculture. It is formed in the most of its part of counties located in mountain area, with increased urbanization and small percentage of rural population. Access to health infrastructure is up the average. As regards the infrastructure of public utilities, it owns the best access of population to the network of natural gas, four times higher that cluster 2, and a much over average access to all the other public utilities.

7

Conclusions

Though the clusters obtained are homogenous within them, they differ among them as regards the affected dimensions of poverty and the intensity of the phenomenon. Therefore, it is not efficient to elaborate a unique program for poverty alleviation that would be suitable for the entire country. According to each clusters profile there could be designed and developed specific poverty alleviation programs that take into account poverty intensity in each considered dimension. To the clusters that present deprivations in health dimension of poverty, it should be designed and applied appropriate programs to improve the access to health services (encouragement of the medical stuff to develop their activity in the affected areas, allocation of resources for the construction of new hospitals and the extension of the existing ones in these areas). For clusters with low level of investments and GDP, the companies may be stimulated, by getting facilities, to invest in problematic areas, so as to create new jobs and new perspectives of economic development. The programs that focus on infrastructure development can target the groups of counties characterized by low sustainable access to running water and sewerage correlated with high percentage of rural population and of population employed in agriculture. Identifying the poverty profile at territorial level, may be of a real utility in designing poverty reduction programs and policies as it permits to detect the most poverty affected areas and the povertys specific in these areas and help the policy-makers to target the poor for the best resource allocation to alleviate poverty. The results obtained in such a study may be used for the elaboration of poverty alleviation programs specific to each group of homogenous counties.

References
[1] Anderberg, M. R. (1973), Cluster Analysis for applications, Academic Press, New York [2] Arcia, G. (1999), Proyecto de la red de protecci´n social: focalizaci´n de o o la fase piloto, Washington DC, Inter-American Development Bank [3] Boccanfuso, D. (2004), A conceptual framework for approaches to poverty, Workshop, Dakar-Senegal, February 18–20, 2004

40

Alina M˘riuca Ionescu a

[4] Chatfield, C., Collins, A.J. (2000), Introduction to Multivariate Analysis, Chapman & Hall / CLC (first edition, 1980) [5] Davis, B. (2003), Choosing a method for poverty mapping, Economist, Agriculture and Economic Development Analysis Division, Food and Agriculture Organization of the United Nations, Rome, 2003, pp. 12–16 [6] Everitt, B., Landau, S., Leese, M. (2001), Cluster analysis, 4th Edition, London: Edward Arnold Publishers Ltd. [7] Guvernul Republicii Moldova (2004), Strategia de Cre¸tere Economic˘ ¸i s as Reducere a S˘r˘ciei (2004-2006), Chi¸in˘u a a s a [8] Huberty, C. (1994), Applied discriminant analysis, New York: Wiley [9] INS Romˆnia (2006), Anuarul Statistic al Romˆniei, 2005, Bucure¸ti a a s [10] Ionescu, A.M. (2006), Poverty mapping of romanian counties using cluster analysis, in Analele Universit˘¸ii Al. I. Cuza Ia¸i, Seria: Stiinte ecoat s ¸ ¸ nomice, 2006 [11] Ionescu, A.M., Buruian, A.I. (2007), Evaluarea statistic a nivelului de dezvoltare din Romnia n profil teritorial, folosind analiza componentelor principale, in Evaluarea statistic a dezvoltrii economico-sociale (coord. Elisabeta Jaba), Editura Junimea, Ia¸i s [12] Jaba, E., Serban, D., Vioric, E.D., Balan, C.B. (2006), Analiza discrim¸ inant, metod˘ statistic˘ de baz˘ pentru evaluarea campaniilor de marketa a a ing, articol in Revista Romn˘ de Statistic˘ nr.4/2006 a a [13] Kaufman, L., Rousseeuw, P. J. (1990), Finding groups in data: An introduction to cluster analysis, John Wiley & Sons, New York [14] Kendall, Sir Maurice (1975), Multivariate Analysis, Griffin, London [15] Manly, B.F.J. (2005), Multivariate statistical methods: a primer, third edition, Chapman & Hall / CLC [16] Stevens, J.P. (2002), Applied multivariate statistics for the social sciences, fourth edition, Lawrence Erlbaum Associates [17] Timm, N. (2002), Applied Multivariate Analysis, Springer Text in Statistics [18] National Human Development Report Romania at www.undp.ro Acknowledgement: I gratefully acknowledge National Council for Scientific Research from Romania for a grant that allowed this work to be supported by funding from the Ministry of Education and Research – National Authority for Scientific Research from Romania. I also thank Professor Elisabeta Jaba, University Al. I Cuza from Iasi, Romania, for the support offered in my research activity during the doctoral studies. Address: University Al. I. Cuza from Iai, Romania E-mail : alina.ionescu@yahoo.com

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.