You are on page 1of 22

Computers, Environment and Urban Systems 29 (2005) 558579 www.elsevier.

com/locate/compenvurbsys

A cokriging method for estimating population density in urban areas


Changshan Wu
a b

a,*

, Alan T. Murray

b,1

Department of Geography, University of Wisconsin-Milwaukee, P.O. Box 413, Milwaukee, WI 53201-0413, USA Department of Geography, The Ohio State University, Columbus, OH 43210-1361, USA

Abstract Population information is typically available for analysis in aggregate socioeconomic reporting zones, such as census blocks in the United States and enumeration districts in the United Kingdom. However, such data mask underlying individual population distributions and may be incompatible with other information sources (e.g. school districts, transportation analysis zones, metropolitan statistical areas, etc.). Moreover, it is well known that there are potential signicance issues associated with scale and reporting units, the modiable areal unit problem (MAUP), when such data are used in analysis. This may lead to biased results in spatial modeling approaches. In this study, impervious surface fraction derived from Thematic Mapper (TM) imagery was applied to derive the underlying population of an urban region. A cokriging method was developed to interpolate population density by modeling the spatial correlation and cross-correlation of population and impervious surface fraction. Results suggest that population density can be accurately estimated using cokriging applied to impervious surface fraction. In particular, the relative population estimation error is 0.3% for the entire study area and 1015% at block group and tract levels. Moreover, unlike other interpolation methods, cokriging gives estimation variance at the TM pixel level. 2005 Elsevier Ltd. All rights reserved.

Corresponding author. Tel.: +1 414 2294860; fax: +1 414 2293981. E-mail addresses: cswu@uwm.edu (C. Wu), murray.308@osu.edu (A.T. Murray). Tel.: +1 614 688 5441; fax: +1 614 292 6213.

0198-9715/$ - see front matter 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.compenvurbsys.2005.01.006

C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579 Keywords: Population interpolation; Cokriging; Remote sensing

559

1. Introduction The diculties associated with the application of zone-based census population data in geographical analyses have been well documented in previous studies (Fotheringham & Wong, 1991; Martin, 1989, 1996). One important issue is data aggregation. In many applications, census data cannot suciently represent the underlying geographical distribution of population because it is reported through aggregating individual population counts in irregular areal units, which can be geographically meaningless. This aggregation tends to smooth local variability and requires an assumption of uniformly distributed population within a reporting unit (Moon & Farmer, 2001). While there are legitimate reasons for reporting census information in this way (i.e. privacy of census respondents), business and service planning benet substantially from greater resolution population data (Longley & Clarke, 1995). For example, Martin and Williams (1992) and Beguin, Thomas, and Vandenbussche (1992) emphasized the importance of detailed population information in the location analyses of health-care centers and public libraries. Moreover, in urban sustainability studies Harris and Longley (2000) point out that census-based models tend to overestimate residential area because of its coarse resolution. Another diculty with zone-based population data is related to incompatible spatial information layers (Bracken, 1993; Goodchild, Anselin, & Deichmann, 1993). Dierent departments and agencies collect and distribute data in varying zonal arrangements (e.g. school districts, transportation analysis zones, metropolitan statistical areas, etc.). As a consequence, a signicant problem arises in regional analysis and modeling, in which multiple data sources must be integrated before analysis can be implemented (Goodchild et al., 1993). Moreover, the boundaries of areal units in census data are not data derived, but rather are the result of enumeration and reporting. The modiable areal unit problem (MAUP) may exist when utilizing such data in geographical applications. In particular, the relationship between variables may only be valid for one particular zonal arrangement and scale, potentially biasing results obtained in statistical and spatial analyses (Martin, 1996; Openshaw, 1977). One approach for dealing with the above problems is to transform aggregated census data to grid-based population estimates using areal interpolation (Langford, Maguire, & Unwin, 1991; Martin, 1989; Okabe & Sadahiro, 1997). Areal interpolation methods may be grouped into two categories: simple interpolation and intelligent interpolation (Okabe & Sadahiro, 1997). Simple interpolation involves transferring data from irregular polygons to regular grids without any supplementary data (Lam, 1983; Martin, 1996; Tobler, 1999). This method is preferred when fast computation is important or additional information is unavailable (Okabe & Sadahiro, 1997). In contrast, intelligent interpolation transfers data with the help

560

C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579

of additional information (Harris & Longley, 2000; Langford et al., 1991). This method has proven more accurate than simple interpolation, although greater computational processing is required (Fisher & Langford, 1995; Sadahiro, 1999). Regression analyses supplemented with land use and land cover data are often applied in intelligent interpolation (Langford et al., 1991; Langford & Unwin, 1994). However, detailed biophysical information is usually lost in producing land use data from remotely sensed images (Jensen, 1983). As a result, limited land use types are too coarse for estimating detailed population density. Moreover, the basic assumptions of regression analyses (e.g. spatial independence) are unlikely to be satised in geographical applications (Grith & Can, 1996). Impervious surface fraction in residential areas may be useful for supplementing the developed interpolation process. Detailed information on residential areas can thus be maintained, providing clues on population distribution (Ji & Jensen, 1999). Spatial autocorrelation in impervious surface fraction and population, and the cross-correlation between these two spatial variables, are explored and modeled in this paper using geostatistical techniques. Based on modeled spatial relationships, cokriging is applied in this paper to determine population density in Columbus, OH. The organization of this paper is as follows. Our study area and data sources are described in Section 2. The process of deriving impervious surface fraction in residential areas from remotely sensed imagery is described in Section 3. In particular, we detail the creation of impervious surface fraction from ETM+ imagery for the entire study region and describe a procedure for delineating residential areas within this region. Population density estimation using cokriging combined with residential impervious surface fraction is reported in Section 4. Accuracy assessment of the population estimates is addressed in Section 5. Section 6 reports an adjustment of the population estimates. Finally, conclusions and discussion are provided in Section 7.

2. Study area and data sources A portion of the Columbus metropolitan area in Franklin County, OH, USA was chosen as our study region for this research. This region is 47.4 km2 and is divided into 36 tracts, 125 block groups, and 2445 blocks in the 2000 US Census (see Fig. 1). The 2000 Census data were acquired from the ESRI website in the shapele format (United States Census Bureau, 2002). Landsat 7 ETM+ imagery, which was utilized to derive residential impervious surface fraction, was acquired on July 8, 1999. Additional data, such as Digital Orthophoto Quarterquadrangles (DOQQs) from the Ohio Geographically Referenced Information Program (OGRIP, 1999) and National Land Cover Data (NLCD) from the Multi-Resolution Land Characteristics Consortium (Multi-Resolution Land Characteristics Consortium, 2002), were utilized to examine residential classication accuracy and select training samples. Moreover, parcel data from the Franklin County Auditor (2002) and address-based employment data from the Mid-Ohio Regional Planning Commission (MORPC,

C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579

561

Fig. 1. Study area as part of the Columbus metropolitan area in Franklin County, OH, USA (left) and Landsat ETM+ image acquired on July 8, 1999 for this area (right).

2002) were utilized to identify possible misclassied pixels since these data maintain detailed local information about land use and employment.

3. Estimating impervious surface fraction in residential areas Impervious surface is any material prohibiting the inltration of water into soil. As a major component of urban infrastructure, impervious surface has become a primary variable in urban planning and environmental management (Ji & Jensen, 1999; Ridd, 1995). Impervious surface fraction, calculated as the proportion of impervious surface over a small area, has been found to reveal more information about built-up areas than land use and land cover classication (Ji & Jensen, 1999). For population

562

C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579

estimation, as an example, impervious surface in residential areas generally corresponds to housing, which serves as an indicator of people. 3.1. Impervious surface fraction estimation Methods for quantifying impervious surface from remotely sensed data are typically based on either fuzzy classication or spectral mixture analysis (Ji & Jensen, 1999; Phinn, Stanford, Scarth, Murray, & Shyy, 2002; Rashed, Weeks, Gadalla, & Hill, 2001). In this study, a spectral mixture analysis method was applied to estimate impervious surface fraction from an ETM+ image (Wu & Murray, 2003). Four endmembers (see Fig. 2), vegetation, high albedo, low albedo and soil, were selected to represent heterogeneous urban land use and land cover through the analysis of the spectral feature spaces of a transformed ETM+ image using the maximum noise fraction (MNF) transformation, the details of which are given in Green, Berman, Switzer, and Craig (1988) and Lee, Woodyatt, and Berman (1990). Consequently, a fully constrained four-endmember linear mixing model was applied to calculate each endmember fraction from the Landsat ETM+ data (see Fig. 3). Furthermore, impervious surface fraction in each ETM+ pixel was modeled by adding the fractions of low albedo and high albedo endmembers after removing the eects of water and clouds (see Fig. 4). 3.2. Residential area classication To this point we have detailed impervious surface fraction estimation for the entire study area. However, we know that population (the major interest in this

Fig. 2. ETM+ reectance spectra of selected endmembers. These endmembers were chosen by analyzing the spectral feature spaces of the MNF transformed ETM+ image.

C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579

563

Fig. 3. Endmember fraction images calculated through a fully constrained four-endmember linear mixing model: (a) vegetation fraction image; (b) high albedo fraction image; (c) low albedo fraction image (including water); (d) soil fraction image.

research) is generally restricted to residential areas. Therefore, it is necessary to identify residential land use within the study area. A maximum likelihood classication was applied to delineate residential pixels. Similar approaches have been utilized in classifying residential land uses by Lo (1995), Mesev (1998), and Chen (2002). Six classes, vegetation, soil, water, commercial and transportation, low density residential, and high density residential, were specied in selecting training samples with the help of DOQQ data, NLCD data, and the original ETM+ image. The classication (see Fig. 5) was conducted using a maximum likelihood classier provided in ERDAS Imagine 8.4 (ERDAS Imagine, 1997). After deriving this image, we grouped the six classes into two major classes: residential and non-residential. Since we are estimating detailed population density, residential classication accuracy is essential in this research. Therefore, we performed post-processing to identify possible misclassied pixels. In particular, pixels within zero population census

564

C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579

Fig. 4. Impervious surface fraction image calculated through adding low albedo and high albedo endmember fractions after removing the eects of water and clouds.

Fig. 5. A maximum likelihood classication of the ETM+ image for the Columbus metropolitan area.

blocks should obviously not be classied as residential land use. Such pixels were identied and reclassied as non-residential. Alternatively, pixels within high population density census blocks were also subject to further scrutiny. If these pixels are not classied as residential, they are possibly misclassied and require further analysis. In this study, we utilized parcel and employment data to identify potential misclassied pixels. Group-quarter populations, people in institutions, shelters, and nursing homes, and students in university dormitories (Plane & Rogerso, 1994), were typically found in these misclassied pixels. Such areas are dicult to classify using only remotely sensed data because they share similar spectral signatures to commer-

C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579

565

Fig. 6. Residential land use classication after the maximum likelihood classication and post-processing.

Table 1 Residential land use classication accuracy assessment Classied image Reference image Residential Residential Non-residential Omission error 146 25 14.62 Non-residential 15 214 6.55 Commission errors (%) 9.32 10.46

Overall accuracy = 90.00%, overall kappa statistics = 0.7942.

cial land uses. With the help of parcel and employment data, we were able to identify these pixels and reclassify them as residential areas. The classication accuracy of residential land use after the maximum likelihood classication and post-processing (see Fig. 6) was examined using 400 stratied randomly selected samples. The DOQQ images acquired between 1994 and 1995 were used in this study for ground truthing. These DOQQs were co-registered with the ETM+ image. A 3 by 3 sampling unit was adopted to avoid geometric errors. The overall classication accuracy is 90% and the overall kappa coecient is 0.7942 (see Table 1). With impervious surface fraction for the entire study area (Fig. 4) and the identied residential land use areas (Fig. 6), impervious surface fraction in residential areas was easily obtained (see Fig. 7). 4. Interpolating population density using cokriging After obtaining impervious surface fraction for residential areas, it can be utilized as supplementary data to interpolate population density. Population density is

566

C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579

Fig. 7. Impervious surface fraction in residential areas.

usually estimated using a regression approach, which models the relationship between population and supplementary data derived from remote sensing imagery (Chen, 2002; Harvey, 2002; Lo, 1995). An implicit assumption of regression analysis is that population density is spatially independent. However, many researchers have questioned this assumption, claiming that simple regression may lead to biased results (Grith, 1993; Grith & Can, 1996). Therefore, a model considering spatial autocorrelation is more appropriate. Cokriging may improve the estimation precision by accounting simultaneously for spatial autocorrelation in population density and impervious surface fraction and the cross-correlation between these spatial variables. Moreover, it is suitable when the variable to be estimated (e.g. population density) is under-sampled while other supplementary variables are abundant (e.g. impervious surface fraction). Cokriging is a geostatistical method originating from mining applications (Cressie, 1993; Journel & Huijbregts, 1978) and widely applied in soil science (Vauclin, Vieira, Vachaud, & Nielsen, 1983; Webster, 1985; Webster & Burgess, 1980). Geostatistical methods were introduced in remote sensing in the late 1980s (Curran, 1988; Woodcock, Strahler, & Jupp, 1988). Now geostatistics are commonly applied in soil science, biogeography, climatology, and environmental studies (Atkinson, Webster, & Curran, 1992, 1994; Oliver, Webster, & Gerrard, 1989a, 1989b). A review of geostatistical methods and associated applications may be found in Cressie (1993), Curran and Atkinson (1998), and Curran (2001). Although widely applied in physical geography, cokriging has rarely been utilized in estimating socio-economic conditions, such as population densities. In this paper, population density is estimated using a cokriging method in which the impervious surface fraction is taken as a secondary variable to improve estimation accuracy.

C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579

567

4.1. Cokriging theory As an extension to two or more variables in ordinary kriging, cokriging is based on regionalized variable theory (Journel & Huijbregts, 1978; Oliver et al., 1989a). According to this theory, any regionalized variable z(x) can be considered a realization of a random function Z(x), which is a combination of a deterministic component, m(x), and random uctuation, e(x): zx mx ex 1

where x denotes the geographical coordinates in one, two, or three dimensions; m(x) indicates a geographical trend or drift; and, e(x) is the spatially dependent random errors with mean zero. In most applications, the deterministic component, m(x), is assumed to be locally constant, mx l 2

and for any given distance and direction h, the variance of dierences between z(x) and z(x + h) is nite and independent of x: varzx zx h Efzx zx hg2 2ch 3

where vector h, the lag, is a given separation distance and direction from x, and c(h) is the variogram. c(h) has been found to be an important tool in modeling spatial autocorrelation (Journel & Huijbregts, 1978). Moreover, if two or more variables are needed, a cross-variogram is dened as follows: cuv h 1Efzu x zu x hgfzv x zv x hg 2 4

Based on regionalized variable theory, it is necessary to estimate an under-sampled variable using cokriging. This method ensures unbiased estimates with minimum and known variance (Curran, 2001). If we consider estimating a variable u in a block B with sampling points of u and a second variable v, our estimate will be ^u B z
Nu X i1

kui zu xui

Nv X j1

kvj zv xvj

in which Nu and Nv are the number of sampling points for variable u and v; xui and xvj are the locations of sampling points for variable u and v, respectively; and, kui and kvj are the weights to be calculated. In order to ensure unbiasedness, the following constraints must be satised (Abourassi & Marino, 1984):
Nu X i1 Nv X j1

kui 1

kvj 0

568

C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579

The rst constraint indicates that at least one observation of the primary variable u is necessary for cokriging. Moreover, constraint (7) ensures that the summation of the weights for the secondary variable v is zero. Subject to these constraints, we minimize the estimation variance: r2 B Efzu B ^u Bg2 z u 8

This is an optimization problem in which kui and kvj are the decision variables and r2 B is the objective function. Standard Lagrangian techniques can be applied to u solve this problem. This results in the following:
Nu X i1 Nu X i1

kui cuu xui ; xuk

Nv X j1 Nv X j1

kvj cuv xuk ; xvj wu uu B; xuk c

k 1; N u

kui cuv xui ; xvl

kvj cvv xvj ; xvl wv uv B; xvl c

l 1; N v

10

cuu(xui, xuk) is the semi-variogram of variable u between site i and k, cuv(xuk, xvj) is the cross semi-variogram between variable u and v at site k and j. Finally, uv B; xvl is the c cross semi-variogram between variable u and v at block B and site l. Using this method, there are Nu + Nv + 2 equations and Nu + Nv + 2 variables, which can be easily solved by linear algebra. After obtaining the parameters kui and kvj, ^u B may be estimated using Eq. (5). The cokriging variance can be obtained z as a byproduct of the cokriging process as follows: r2 B u
Nu X i1

kuiuu B; xui c

Nv X j1

kvjuv B; xvj wu uu B; B c c

11

Matrix formulations of these equations can be found in Myers (1982), McBratney and Webster (1983), and Abourassi and Marino (1984). Details on solving this problem using Lagrangian techniques are given in Vauclin et al. (1983) and Atkinson et al. (1992). 4.2. Variogram estimation From Eqs. (6), (7), (9) and (10), it is clear that parameters kui and kvj are dependent on the variograms associated with variables u and v, their cross-variogram, and block size. In this study, block size is dened to be the same as the TM image resolution (30 m by 30 m). Therefore, once the variograms and cross-variogram have been derived, cokriging is a straightforward process (Atkinson et al., 1992, Atkinson, Webster, & Curran, 1994). In practice, the variograms are typically estimated using sampling points as follows: ^h c
N h 1 X 2 fzxi zxi hg 2N h i1

12

C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579

569

where z(xi) are known values of variable u or v at sampling point xi, and N(h) is the number of sampling point pairs separated by lag h. Similarly, the cross-variogram can be estimated as follows: ^uv h c
N h 1 X fzu xi zu xi hg fzv xi zv xi hg 2N h i1

13

After obtaining the variogram and cross-variogram, a theoretical model is needed to t them. Such a model needs to be positive denite and coregionalized to ensure the cokriging variance is non-negative. More discussion about choosing theoretical functions can be found in McBratney and Webster (1986) and Curran (1988). In this study, we chose the model satisfying the positive denite and coregionalized requirements, the details of which are discussed later in this paper. 4.3. Interpolating population density using cokriging In this study population density is considered the primary variable to be estimated. In addition, residential impervious surface fraction is considered a secondary variable used to increase estimation accuracy. One issue is that reported census statistics are not based on a sampling point, but rather on an areal unit like a block. The centroid of a census block may be used as the sampling point for the assignment of population density. However, this method is not realistic because there may not actually be people at the centroid of a block. Martin (1989) solved this problem by using a population-weighted point as the representative point of a census block. In a similar manner, in this research the central point of the pixel whose impervious surface fraction is approximately equal to the block mean is used as a populationweighted block point. In addition, we assign impervious surface fraction of the pixel and average population density of the block to this sampling point. After obtaining the impervious surface fraction and population density on these samples, the characteristics of the data are explored. If they are not secondary stationary, i.e. have the same mean and variance, the accuracy of the estimated experimental variogram and associated cokriging will be degraded (Cressie, 1993). The histograms for population density (see Fig. 8a) and impervious surface fraction (see Fig. 9) were captured based on the sampling points. It is clear that population density is highly positively skewed and may be approximated by a Poisson function with its variance proportional to its mean value (Bailey & Gatrell, 1995; Harvey, 2002). A square root transformation was performed on population density to stabilize its variance. The histogram of the transformed population density (see Fig. 8b) shows that its distribution is near normal and its variance is approximately constant. The histogram of impervious surface fraction is slightly negatively skewed, but may be considered approximately normal. Thus, no transformation was conducted on impervious surface fraction. We excluded zero population density census blocks because no interpolation is necessary for these blocks. In this study, the primary variable u is the square root of population density, and the secondary variable v is impervious surface fraction. Experimental variograms

570

C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579

Fig. 8. Histogram of (a) population density and (b) square root of population density at sampling points. It shows that population density may be described by a Poisson distribution, while the square root transformation is a reasonable approximation of a normal distribution.

Fig. 9. Histogram of impervious surface fraction at sampling points.

and cross-variograms were calculated using Eqs. (3) and (4). Gstat software was utilized to t these variograms to theoretical functions (Pebesma & Wesselin, 1998). The weighted least squared method and visualization were applied in modeling the experimental variograms (Cressie, 1985). Directional variograms were also computed and no obvious anisotropies were found. Therefore, the variograms were assumed to be isotropic and were tted using an exponential model of the following form:

C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579

571

( ch

C 0 C 1 f1 eh=r g 0

for h > 0 h0

14

Here C0 is the nugget representing unexplained variance and r denes the spatial scale of the variation. In practice, the sill is C0 + 0.95C1 at the point of 3r. In this study, the parameters were calculated for the variograms of the square root of population density and impervious surface fraction, and also for their cross-variogram (see Table 2 and Fig. 10). After obtaining the variograms of impervious surface fraction, square root of population density, and their cross-variogram, a block cokriging was performed to interpolate population density (see Fig. 11) using Gstat software embedded in Idrisi (Harmon, 2002). Fig. 11 shows a clear geographical pattern of population distribution in the study region. In particular, few people live in the CBD except

Table 2 Coecients of the theoretical variogram and cross-variogram functions C0 Population density Impervious surface Population densityimpervious surface 0.196 0.007 0.012 C1 0.176 0.0089 0.030 r 1000 1000 1000

Fig. 10. Variograms of (a) square root of population density, (b) residential impervious surface fraction, and (c) the cross-variogram between square root of population density and impervious surface fraction. Exponential functions with r = 1000 are chosen to model these variograms.

572

C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579

Fig. 11. Estimated population density using developed cokriging method. The height indicates the value of population density for each TM pixel. The average population density is 4.28, with a maximum of 52, and a minimum of 0.

group-quarter populations. High-density household-based populations are adjacent to the CBD in the southern and northwestern portions of the study region. Moreover, low-density household-based populations reside relatively far away from the CBD (in the eastern and southern portions).

5. Accuracy assessment Using the cokriging variance approach dened in Eq. (11) for the square root of population density, the mean cokriging variance is 23.5% (minimum of 21.3% and maximum of 50.3%). Fig. 12 shows the distribution of cokriging variance in the study area. In particular, cokriging variance is high along the study area boundary because few samples are used in estimating population density in this portion of the region. It is possible to examine population count estimation accuracies at each census zonal level using the root mean square error (ERMS) and coecient of variation (V) to evaluate the absolute and relative error as follows: E
RMS

" #1=2 n 1X 2 b P i P i n i1

15

n 1X b jP i P i j P i1

16

C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579

573

Fig. 12. Cokriging variance of the square root of population density estimation. The average cokriging variance is 0.235, with a maximum of 0.503, and a minimum of 0.213.

where n is the number of total census zones; P is the total population in the study b area; Pi is the population count of census zone i; and P i is the estimated population count for zone i. The overall regional assessment of population count estimation accuracy can be carried out using the relative estimation error (R): b R P P =P 17 b where P is the total population estimate for the study area. The cokriging method contrasts the traditional regression approach used to estimate population density. The rst regression model explores the relationship between population density and the proportion of low and high density residential areas within a census block (Langford et al., 1991; Lo, 1995; Chen, 2002). Applied to our study area, the model is as follows (see Table 3): bT P i 2.25526 RL 5.0612 RH i i RL i RH i 18

where and are the proportion of low and high density residential areas in a bT census block and P i is the expected population density in a census block using the traditional regression approach. A valid alternative regression model would be investigating the relationship between population density and impervious surface fraction in low and high density
Table 3 Coecients of the regression model with residential land cover classes as explanatory variables Coecients RL RH Value 2.2552 5.0612 Std. error 0.1727 0.0958 t value 13.0554 52.8349 Pr(>jtj) 0.0000 0.0000

574

C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579

Table 4 Coecients of the regression model with residential impervious surface fraction as explanatory variables Coecients IL IH Value 6.5798 9.4650 Std. error 0.4335 0.1687 t value 15.1793 56.1212 Pr(>jtj) 0.0000 0.0000

residential areas within each census block. Applied to our study area, the model is as follows (see Table 4): bA P i 6.5798 I L 9.4650 I H i i lL i lH i 19

where and are the fraction of impervious surface in low and high residential bA areas in a census block and P i is the expected population density in a census block using this alternative regression approach. In both regression models, the area of each census block was chosen as a weighting factor to reduce the eects of zone size. Moreover, the intercepts in these regression models are not included because they are not statistically signicant (further its meaning in population estimation is not clear). The explanatory variables are statistically signicant (p 6 0.0001), which shows the strong correlation between population density and the chosen explanatory variables (see Tables 3 and 4). Comparative results (see Table 5) show that the cokriging method is the most accurate. In particular, the coecient of variation is relatively low at the census block level (34.7%), low at the block group and tract levels (15.2% and 10.2% respectively), and near zero for the entire study area (0.3%). The estimation accuracies of the two regression models are reported in Table 5 as well. Neither regression models perform as well as the cokriging method in terms of estimation accuracy. As an example, the coecients of variation for the census tract level in the regression models are 22.9% and 21.0% respectively, substantially higher than the variation obtained using cokriging (10.2%). Comparing the two regression models, regression with impervious surface fraction is slightly better than with land use classes (e.g. 21.0% vs. 22.9% estimate error at the census tract level). This result is consistent with the literature showing that impervious surface fraction performs better than land use/cover in urban analysis (Ji & Jensen, 1999).

Table 5 Absolute and relative estimation errors of the cokriging and regression models Zones Average population Cokriging ERMS Block (2445) Block group (125) Tract (36) Total study area 40.99 801.74 2825.84 100, 200 45.3 215.0 411.0 V 34.7% 15.2% 10.2% Regression with land cover ERMS 47.9 325.6 967.6 1.0% V 48.8% 27.8% 22.9% Regression with impervious surface ERMS 45.5 290.7 846.0 V 46.6% 25.2% 21.0% 2.6%

0.3%

C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579

575

6. Population density adjustment The cokriging approach gives unbiased estimates for the square root of population density with minimum variance. However, the population count estimation errors evaluated at the census block level are still somewhat large (34.7%). As discussed in previous studies (Langford & Unwin, 1994; Fisher & Langford, 1995; Martin, 1996), interpolation methods should preserve population counts in each reporting zone. One option is adding a volume-preserving constraint in the cokriging model. However, this will make the model more complex since it has a quadratic objective function and a quadratic regional constraint. In fact, it is not clear that this resulting model can be solved, exactly or heuristically. An alternative option is to rescale the population estimates on every pixel to satisfy this zonal constraint: b P i P P ij ij b Pi 20

b Here P is the rescaled population estimates of pixel j in census block i, P ij is the ij b population estimates through the cokriging, and Pi and P i are the population counts of block i (census count and cokriging estimates, respectively). This rescaled population density (see Fig. 13) generally maintains the estimates obtained using cokriging, but emphasizes local variation as well. For example, the cokriging method tends to underestimate population counts in multi-story and high-rise buildings (the middle portion of Fig. 11). In contrast, the rescaling approach adjusts these inaccuracies and obtains more accurate population density estimates.

Fig. 13. Adjusted population density that preserves zonal population counts. The height indicates the value of population density for each TM pixel. The average population density is 4.40, with a maximum of 143, and a minimum of 0.

576

C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579

7. Conclusion In this paper a cokriging method was developed for interpolating residential population density using census count data and impervious surface fraction. The results are clearly better than regression-based interpolation approaches. In particular, the relative population estimation error for the entire study area is 0.3%, which is better than the results obtained using regression methods (1.0%2.6% estimation error). Moreover, the estimation errors at the census block group and tract levels (15.2% and 10.2% respectively) are about 10% lower than those calculated using regression models (about 2527% and 2123% respectively). At census block level, the estimation error is about 1315% lower than those reported for the regression models (see Table 3). These results demonstrate that cokriging applied to residential impervious surface fraction is a superior alternative to traditional regression based interpolation approaches using land use and land cover data. One reason explaining why cokriging performs well is that it addresses spatial autocorrelation and cross-autocorrelation associated with the distribution of people in urban areas. Instead of ignoring spatial dependence, it models the spatial autocorrelation of population and impervious surface fraction through variograms, and applies them in population interpolation. Moreover, unlike other interpolation methods, it provides estimation variance (see Eq. (11) and Fig. 12) at the TM pixel level (30 by 30 meter). This estimation variance is an important tool for assessing population estimation error, without aggregating to census reporting zones. Another interesting aspect of this work is that residential impervious surface fraction was found to be an eective replacement for land use and land cover data typically used in modeling population density. This makes sense intuitively given that impervious surface fraction is closely related to housing development, and thus population density. Moreover, the cross-variogram (see Fig. 10c and Table 2) clearly shows that population density and impervious surface fraction are co-regionalized variables, with only 25% variance unexplained. Also, regression analyses show that the regression model with impervious surface fraction consistently performs better than the other utilizing land use classes. A nal point is that the obtained population estimates are essential for urban planning applications. As an example, in sustainability studies, residential population density is a primary indicator of automobile dependent regions (Harris & Longley, 2000). In addition, the estimates of population density may be utilized in transportation analyses. The trac analysis zone (TAZ) is typically used as a basic unit in trac demand estimation and trip generation. However, there are signicant problems with traditional TAZ denitions as well as diculties with associated travel distance calculation (Daganzo, 1980; Miller, 1999). Detailed population information may be potentially helpful in redening TAZs in order to achieve more homogeneous population densities and socio-economic characteristics, thus potentially eliminating the modiable areal unit problem in a range transportation analysis approaches. While the developed approach is a considerable improvement for estimating population density at a ne scale, there are potential improvements that may be worth

C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579

577

exploring. One improvement would be satisfying the volume preserving constraint during the interpolation process, requiring that interpolated population counts in every census zone be equal to observed counts. In this study, we satised this constraint by rescaling population density in every pixel after interpolation. Although the population counts in every census zone are maintained, this adjustment may introduce bias and increase estimation variance. More sophisticated models might increase population density estimation accuracy and maintain population counts in every census zone simultaneously. References
Abourassi, M., & Marino, M. A. (1984). Cokriging of aquifer transmissivities from eld measurements of transmissivity and specic capacity. Mathematical Geology, 16(1), 1935. Atkinson, P. M., Webster, R., & Curran, P. J. (1992). Cokriging with ground-based radiometry. Remote Sensing of Environment, 41, 4560. Atkinson, P. M., Webster, R., & Curran, P. J. (1994). Cokriging with airborne MSS imagery. Remote Sensing of Environment, 50, 335345. Bailey, T., & Gatrell, A. C. (1995). Chapter 7: The analysis of area data. Interactive Spatial Data Analysis, Longman Group Limited. Beguin, H., Thomas, I., & Vandenbussche, D. (1992). Weight variation with a set of demand points, and locationallocation issues: A case study of public libraries. Environment and Planning A, 24, 17691779. Bracken, I. (1993). An extensive surface model database for population related information: Concept and application. Environment and Planning B, 20, 1327. Chen, K. (2002). An approach to linking remotely sensed data and areal census data. International Journal of Remote Sensing, 23, 3748. Cressie, N. (1985). Fitting variogram models by weighted least squares. Mathematical Geology, 17, 563586. Cressie, N. (1993). Statistics for spatial data (revised edition). New York: Wiley. Curran, P. J. (1988). The semivariogram in remote sensing: An introduction. Remote Sensing of Environment, 24, 493507. Curran, P. J. (2001). Remote sensing: Using the spatial domain. Environmental and Ecological Statistics, 8, 331344. Curran, P. J., & Atkinson, P. M. (1998). Geostatistics and remote sensing. Progress in Physical Geography, 22(1), 6178. Daganzo, C. F. (1980). Network representation, continuum approximations and a solution to the spatial aggregation problem of trac assignment. Transportation Research, 14B, 229239. ERDAS Imagine (1997). ERDAS Imagine tour guides (4th ed.). Atlanta Georgia: ERDAS, Inc. Fisher, P. F., & Langford, M. (1995). Modeling the errors in areal interpolation between zonal systems by Monte Carlo simulation. Environment and Planning A, 27, 211224. Fotheringham, A. S., & Wong, D. W. S. (1991). The modiable areal unit problem in multivariate statistical analysis. Environmental and Planning A, 23, 10251034. Franklin County Auditor (2002). Franklin county auditors interactive geographic information system. <http://209.51.193.83/search.html>. Goodchild, M. F., Anselin, L., & Deichmann, U. (1993). A framework for the areal interpolation of socioeconomic data. Environment and Planning A, 25, 383397. Green, A. A., Berman, M., Switzer, P., & Craig, M. D. (1988). A transformation for ordering multispectral data in terms of image quality with implications for noise removal. IEEE Transactions on Geoscience and Remote Sensing, 26, 6574. Grith, D. A. (1993). Spatial regression analysis on the PC: Spatial statistics using SAS. Washington, DC: Association of American Geographers.

578

C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579

Grith, D. A., & Can, A. (1996). Spatial statistical/econometric version of simple urban population density models. In S. L. Arlinghaus & D. A. Grith (Eds.), Practical handbook of spatial statistics. CRC Press. Harmon, D. (2002). Quick Take Reviews: Idrisi32 Release 2. GEOWorld, March, pp. 5051. Harris, R. J., & Longley, P. A. (2000). New data and approaches for urban analysis: Modeling residential densities. Transactions in GIS, 4(3), 217234. Harvey, J. T. (2002). Estimating census district populations from satellite imagery: Some approaches and limitations. International Journal of Remote Sensing, 23(10), 20712095. Jensen, J. R. (1983). Biophysical remote sensing. Annals of the Association of American Geographers, 73, 111132. Ji, M., & Jensen, J. R. (1999). Eectiveness of subpixel analysis in detecting and quantifying urban imperviousness from Landsat Thematic Mapper imagery. Geocarto International, 14(4), 3139. Journel, A. G., & Huijbregts, C. J. (1978). Mining geostatistics. New York: Academic Press. Lam, N. S. (1983). Spatial interpolation methods: A review. American Cartographer, 10(2), 129149. Langford, M., Maguire, D. J., & Unwin, D. J. (1991). The areal interpolation problem: Estimating population using remote sensing in a GIS framework. In I. Masser & M. Blakemore (Eds.), Handling geographical information: Methodology and potential applications (pp. 5577). Harlow, Essex: Longman. Langford, M., & Unwin, D. J. (1994). Generating and mapping population density surfaces within a geographical information system. Cartographic Journal, 31, 2126. Lee, J. B., Woodyatt, A. S., & Berman, M. (1990). Enhancement of high spectral resolution remote sensing data by a noise-adjusted principal components transformation. IEEE Transactions on Geoscience and Remote Sensing, 28, 295304. Lo, C. P. (1995). Automated population and dwelling unit estimation from high-resolution satellite images: A GIS approach. International Journal of Remote Sensing, 16(1), 1734. Longley, P., & Clarke, G. (1995). GIS for business and service planning. Cambridge: GeoInformation International. Martin, D. (1989). Mapping population data from zone centroid locations. TransactionsInstitute of British Geographers, 14, 9097. Martin, D. (1996). An assessment of surface and zonal models of population. International Journal of Geographical Information Systems, 10(8), 973989. Martin, D., & Williams, H. C. W. L. (1992). Market-area analysis and accessibility to primary health-care centers. Environment and Planning A, 24, 10091019. McBratney, A. B., & Webster, R. (1983). Optimal interpolation and isarithmic mapping of soil properties: 5. Co-regionalization and multiple sampling strategy. Journal of Soil Science, 34(1), 137162. McBratney, A. B., & Webster, R. (1986). Choosing functions for semi-variograms of soil properties and tting them to sampling estimates. Journal of Soil Science, 37, 617639. Mesev, V. (1998). The use of census data in urban image classication. Photogrammetric Engineering and Remote Sensing, 64, 431438. Mid-Ohio Regional Planning Commission (MORPC) (2002). GIS technology. <http://www.morpcsoft.org/GIS/gis.htm>. Miller, H. J. (1999). Potential contributions of spatial analysis to geographical information systems for transportation (GIS-T). Geographical Analysis, 31(4), 373399. Moon, Z. K., & Farmer, F. L. (2001). Population density surface: A new approach to an old problem. Society and Natural Resources, 14, 3949. Multi-Resolution Land Characteristics Consortium (MRLC) (2002). National land cover data (NLCD). <http://www.epa.gov/mrlc/nlcd.html>. Myers, D. E. (1982). Matrix formulation of co-kriging. Mathematical Geology, 14(3), 250257. Ohio Geographically Referenced Information Program (OGRIP) (1999). Digital orthophoto quarterquadrangles. <ftp.geodata.gis.state.oh.us/geodata/doqq>. Okabe, A., & Sadahiro, Y. (1997). Variation in count data transferred from a set of irregular zones to a set of regular zones through the point-in-polygon method. International Journal of Geographical Information Science, 11(1), 93106.

C. Wu, A.T. Murray / Comput., Environ. and Urban Systems 29 (2005) 558579

579

Oliver, M., Webster, R., & Gerrard, J. (1989a). Geostatistics in physical geography, Part I: Theory. TransactionsInstitute of British Geographers, 14, 259269. Oliver, M., Webster, R., & Gerrard, J. (1989b). Geostatistics in physical geography, Part II: Applications. TransactionsInstitute of British Geographers, 14, 270286. Openshaw, S. (1977). Optimal zoning systems for spatial interaction models. Environment and Planning A, 9, 169184. Pebesma, E. J., & Wesseling, C. G. (1998). Gstat: A program for geostatistical modeling, prediction and simulation. Computers and Geosciences, 24(1), 1731. Phinn, S., Stanford, M., Scarth, P., Murray, A. T., & Shyy, T. (2002). Monitoring the composition and form of urban environments based on the vegetationimpervious surfacesoil (VIS) model by sub-pixel analysis techniques. International Journal of Remote Sensing, 23, 41314153. Plane, D. A., & Rogerson, P. A. (1994). The geographical analysis of population with applications to business and planning. New York: Wiley. Rashed, T., Weeks, J. R., Gadalla, M. S., & Hill, A. G. (2001). Revealing the anatomy of cities through spectral mixture analysis of multispectral satellite imagery: A case study of the Greater Cairo region, Egypt. Geocarto International, 16(4), 515. Ridd, M. K. (1995). Exploring a VIS (vegetationimpervious surfacesoil) model for urban ecosystem analysis through remote sensing: Comparative anatomy for cities. International Journal of Remote Sensing, 16, 21652185. Sadahiro, Y. (1999). Accuracy of areal interpolation: A comparison of alternative methods. Journal of Geographical Systems, 1, 323346. Tobler, W. (1999). Linear pycnophylactic reallocationcomment on a paper by D. Martin. International Journal of Geographical Information Science, 13(1), 8590. United States Census Bureau (2002). United States Census 2000. <http://www.census.gov/main/www/ cen2000.html>. Vauclin, M., Vieira, S. R., Vachaud, G., & Nielsen, D. R. (1983). The use of cokriging with limited eld soil observations. Journal of Soil Science Society of American, 47(2), 175184. Webster, R. (1985). Quantitative spatial analysis of soil in the eld. Advances in Soil Science, 3, 170. Webster, R., & Burgess, T. M. (1980). Optimal interpolation and isarithmic mapping of soil properties, III changing drift and universal kriging. Journal of Soil Science, 31, 505524. Woodcock, C. E., Strahler, A. H., & Jupp, D. L. B. (1988). The use of variograms in remote sensing: I. Scene models and simulated images. Remote Sensing of Environment, 25, 323348. Wu, C., & Murray, A. T. (2003). Estimating impervious surface distribution by spectral mixture analysis. Remote Sensing of Environment, 84, 493505.