You are on page 1of 9



ABSTRACT Population and economic data are often collected or published on seemingly incomparable aerial units. Even in fortunate countries where a standard geographical classification is used for all national statistical data, changes to the boundaries of the aerial units have traditionally been considered as a break in the time series. Can GIS successfully solve the problem of comparing data collected on different aerial units or on a classification which has changed over time ? Some simple examples will show the benefits and the limitations of the GIS solution.

INTRODUCTION GIS is a tool widely used in many different disciplines. Cartography, environment, geology, meteorology and statistics, to name but a few. It must be remembered, however, that, even though they share common tools, some of these disciplines have quite different views of the world. A GIS is a powerful and intuitive tool to model the world around us but it has its origin in topographic mapping and there are some fundamental differences between the topographic and statistical models. The two diagrams below illustrate

Figure 1 Grape Vines in the SLA of Swan(S) The Statistical Model

Figure 2 Grape Vines in the SLA of Swan(S) The Mapping Model

In figure 2 the small black shapes represent grape vines. These shapes are mutually exclusive of any other land use and the only descriptive information required is "grape vine", (which would normally appear as a label on the map or in the map legend). The accuracy of the information is entirely dependent on the accuracy with which the areas of vine are mapped and the distribution of the vineyards is obvious. This is a cartographers view of the world but statisticians rarely use this model. Statistical organisations, such as the Australian Bureau of Statistics (ABS), gather information about various populations of people or businesses, and attribute that information to an area or region. In figure 1, for example, the area is the Statistical Local Area (SLA) of Swan (S) to which has been attached the attribute of area of grape bearing vines. In an ABS dataset, such as the Agricultural Census of 1997, there would of course be hundreds of other attributes attached to the SLA. The ABS uses a common geographical classification for most collections, both social and economic, so, in fact, within the ABS databases there would be many thousands of other attributes, or statistics, also attributed to this same area. The statistical model of the world consists of aerial units which contain a population about which some information has been gathered. The difficulty arises when data are collected on different, incomparable aerial units. GIS is often touted as a means to compare these data collected on different aerial units and this paper seeks to show, by some simple examples, the limitations of the GIS approach and to offer an imperfect but workable alternative. GIS BOUNDARY OVERLAY Where data are available for overlapping but different aerial units, a GIS can very graphically show the relationship between areas. Sometimes this is all that is needed to give a reasonable picture of the relationship between two otherwise unrelated variables. The example below shows the population density for part of Australia in the form of a dot density map overlayed over recent rainfall data. The correlation between rainfall and settlement patterns is clearly visible and there is really no need to quantify this further by transforming one or other of the two data sets to a common set of boundaries. In other words, the GIS picture speaks for itself.

Figure 3 2

Where it is necessary to transform data to a common set of boundaries, various techniques can be used, including area overlays in a GIS. Although some GIS can apply a level of sophistication (see "Handbook on GIS and digital mapping for population and housing census", UN Statistical Division January 1999, Sec 14.1.2) this technique basically relies on the population being uniformly distributed within the target aerial unit. Consider the example below. The boundary of a river catchment cuts through a number of local government areas. In the case of Holbrook (A) the catchment boundary cuts the area exactly in half. If the population of Holbrook (A) is proportioned according to the area ratios then half the population would be attributed to the northern catchment area and the other half of the population to the southern catchment area.

Figure 4 In this case, however, population data is available at a higher level of geographical disaggregation, ie a more detailed picture of the population distribution is available. The underlying enumeration districts reveal that the small town of Holbrook, which is the administrative centre for the local government area of Holbrook (A) comprising three enumeration districts, lies just to the south of the river catchment boundary. The population figures for the enumeration districts show that in fact there are five times as many people in the southern half of the local government area than in the northern half. This example shows that area proportion, the basis of most GIS overlay techniques, should not be used unless the population in question is distributed very smoothly. The ideal solution is of course to have point referenced data which shows precisely the spatial distribution of the population, ie the position of individual dwellings, farms or businesses, but most countries are far from achieving this. In the mean time, area overlay as a technique for comparing data on dissimilar boundaries has very limited application.

Figure 5

BOUNDARIES WHICH CHANGE OVER TIME A special case of incomparable boundaries is a geographical classification where the aerial units change over time. The ABS uses a single classification, the Australian Standard Geographical Classification (ASGC), for most data collections. Thus it is generally possible to compare social and economic data from different censuses and surveys on a common set of boundaries. The aerial units of the ASGC have a direct relationship to local government areas. By including local government in the geographical classification, the ABS is able to provide local administrations with a range of data vitally important to good governance at the local area. Unfortunately local government boundaries have changed considerably over time. Over the years, large areas have been split and rearranged as population growth has occurred. Conversely, in these times of economic rationalism, federal and state governments have encouraged local governments to amalgamate and merge to form larger administrative regions capable of delivering a wider range of services to their rate payers. This has meant that, while it is possible to integrate statistics from various collection, it is often not possible to compare data, even from the same collection, across time. Consider the following example. In the early 1990's, to allow for the expansion of the city of Townsville, the three administrative areas shown below became four areas.

Figure 6 Before the change

Figure 7 After the change

In statistical terms the new aerial units are different to the old and this constitutes a break in the statistical time series. A time series of agricultural data showing the number of cattle and calves in these areas would look as follows. 5

Area Burdekin (S) [ASGC Ed2.3] Dalrymple (S) [ASGC Ed2.3] Thuringowa (C) - Pt B [ASGC Ed2.3] Total Area Burdekin (S) [ASGC Ed2.4] Dalrymple (S) [ASGC Ed2.4] Thuringowa (C) - Pt B [ASGC Ed2.4] Townsville (C) - Pt B [ASGC Ed2.4] Total

1991 85,602 482,086 26,977 594,665 1991

1992 79,379 521,660 28,415 629,454 1992

1993 79,430 499,530 26,401 605,361 1993

1994 74,420 413,669 32,664 520,753 1994 1995 72,791 395,052 30,575 75 498,493 1996 73,611 424,125 29,134 1,439 528,309

Table 1 It is not possible to examine the growth or decline in cattle production in these areas because it is not known how much is real growth and how much is a result of boundary changes. A simple GIS overlay, however, reveals the relationship between old and new boundaries. Table 2 below shows the ratio of the area of change to the area of the old unit.
Area a b c d e F g From Dalrymple (S) Thuringowa (C)- Pt B Thuringowa (C)- Pt B Thuringowa (C)- Pt B Burdekin (S) Burdekin (S) Thuringowa (C)- Pt B To Thuringowa (C)- Pt B Thuringowa (C)- Pt B Townsville (C)- Pt B Burdekin (S) Burdekin (S) Dalrymple (S) Dalrymple (S) Area
(square kilometres)

(of old area)

16.9 1676.6 1556.7 280 4840.4 161.9 490.8

0 0.42 0.39 0.07 0.97 0.03 0.12

This ratio of areas can be applied to the pre 1995 data on cattle and calves as follows: Thuringowa (C) - Pt B [ASGC Ed2.41994 = 32,664 + (0 x 413,669) - 0.39 x 32,664) (0.07 x 32,664) - (0.12 x 32,664) Thus a time series based on the new boundaries results in the following table.
Area Burdekin (S) [ASGC Ed2.4] Dalrymple (S) [ASGC Ed2.4] Thuringowa (C) - Pt B [ASGC Ed2.4] Townsville (C) - Pt B [ASGC Ed2.4] Total 1991 84,922 487,891 11,330 10,521 594,665 1992 78,987 527,451 11,934 11,082 629,454 1993 78,895 505,081 11,088 10,296 605,361 1994 74,474 419,821 13,719 12,739 520,753 1995 72,791 395,052 30,575 75 498,493 1996 73,611 424,125 29,134 1,439 528,309

Table 3 Using the area proportion technique it is estimated that the Townsville (C) - Pt B area would have contained over 10,000 cattle had it existed in 1994. This area is known to contain only 75 cattle in 1995 and just over 1,000 in 1996 so the estimate is most unlikely to be correct. The GIS technique does of course calculate the area of land moved from one unit to another very precisely. The error in this methodology is in the assumption that the population is evenly distributed over the original area. The GIS, however, shows not only how much land 6

is transferred. It also shows which land. If additional information is available about that land, it is possible to improve the estimate. For example an overlay which shows urban area or national park or forest could be used to weight the estimate by eliminating some land as unlikely to contain cattle. AN ALTERNATIVE ESTIMATION The problem of boundaries which change over time is very common in any statistical analysis. Statistical agencies are continually confronted by the conflicting desire for currency with real world aerial units versus stability of existing statistical boundaries. It can be seen from the above that GIS offers only a limited solution. In this case, however, there is an alternative approach. Reviewing the example above, three areas became four but the outside perimeter of the total area has not changed. This is quite often the case, except where a mass redesign of administrative units has occurred (such as the Australian State of Victoria in 1993/94). For the purpose of this discussion this outside perimeter which has not changed will be termed the bounding region. Where data is available before and after the change a simple estimate can be made as follows.

Figure 8 The same GIS generated area proportions, discussed in the Holbrook example above, can be applied to these boundaries. Because of the necessity to assume evenly distributed population, the same limitations also apply. Cattle may be more evenly distributed than people but without some underlying information about topography and land use it is dangerous to make this assumption. For example, Table 1 above shows that in 1995 the number of cattle in the newly created area of Townsville (C)- Pt B is very low compared to the adjacent areas. The GIS approach to the problem of showing this years data on last years boundaries, or vice versa, is to divide the old and new areas into constituent parts which are common to both. The diagram below illustrates this process. The areas of change lie in the north east corner of the total area. The GIS can determine, quite precisely, by spatial overlay the area which has moved from one unit in one year to another unit in the next.

Figure 9 Areas of change Vy = Totaly x V(y+1)/ Total(y+1) Where y is the year preceding the change y+1 is the year immediately after the change Vy is the value of the data item for year y but for an area as it exists after the change Totaly is the sum of the data item for year y for all the areas which make up the bounding region V(y+1) is the value of the data item, for year y+1 for an area as it exists after the change A feature of this estimation procedure is that, in proportioning this years data to last years boundaries or vice versa, the actual data item being studied is used in the calculation. The distribution of the population is not assumed to be even but is assumed to be the same before and after the change in boundaries. In other words if the new area of Townsville (C) - Pt B has only 0.015% of the cattle and calves in 1995 it is assumed that this area, had it existed in 1994 would have contained 0.015% of the total cattle and calves in the bounding area. For the area in the above example this method of estimating gives the following results for the old data on the new boundaries. It is of course equally simple to estimate the 1995 and 1996 years on the pre 1995 boundaries.
Area Burdekin (S) [ASGC Ed2.4 Dalrymple (S) [ASGC Ed2.4] Thuringowa (C)-Pt B [ASGC Ed2.4] Townsville (C)-Pt B [ASGC Ed2.4] Total 1991 86,834 471,268 36,474 89 594,665 1992 91,914 498,838 38,607 95 629,454 1993 88,396 479,744 37,130 91 605,361 1994 76,041 412,693 31,940 78 520,753 1995 72,791 395,052 30,575 75 498,493 1996 73,611 424,125 29,134 1,439 528,309

Table 4 The estimate for the area Townsville (C) - Pt B [ASGC Ed2.4], known to be low in cattle and calves, is much more acceptable than an area ratio estimate. The limitation of this method lies in the assumption that growth, for whatever population is being studied, is even across the bounding region. This is unlikely where the bounding area is very large or contains a large number of changed units. For example the local government redistribution in the 8

Australian States of Victoria and Tasmania in the early 1990s caused changes to most statistical areas. In these cases the bounding area became the whole State and the assumption of uniform growth breaks down. There is also a difficulty where a change to boundaries has occurred specifically because of differential growth of some sort. For example where a boundary is adjusted to reflect the expansion of an urban area into the surrounding countryside then assumptions of uniform growth for the bounding region would not be true for social statistics. To take advantage of this relatively simple method of estimating it is necessary to have access to a complete history of change for the geographical classification so that bounding regions can be identified from one edition to the next. While this is not difficult to do during the normal maintenance cycle of a geographical classification it is quite laborious to do in retrospect. This methodology has been implemented in the ABS statistical dissemination, CDROM, product - the Integrated Regional Data Base. The IRDB allows ABS clients to compare data over time despite changes to the ASGC. This product also incorporates traditional concordance files and GIS technology to improve the comparability of statistics over time and for different aerial units. This combination of technologies and methodologies is considered preferable to reliance on GIS techniques alone. CONCLUSION GIS techniques are extremely valuable for comparing data for dissimilar aerial units and for distinguishing real growth from boundary changes. Attempts to transpose data from one set of aerial units to another, however, should be treated with caution and the underlying assumption examined carefully. The danger is that GIS techniques can provide what looks like very precise answers but if the assumptions are invalid the errors can be very large. In the future, statistical units, be they households or businesses, will be georeferenced more precisely and the limitations of the statistical model of the world will be reduced. GIS will play a vital role in both achieving that goal and delivering its benefits to decision makers.