You are on page 1of 16
Creating Buffers on Surfaces Xingong Li, Christopher M. Larson, and Arthur B. Rex ABSTRACT: Creating buffers is an import (GIS) to perform spatial analysis, However, deli planning applications is problematical in mountainons 2D) Fucliclean distance (ie. planimen s in an inaccurate representation of buffers when they ave verilied in calculates two-dimensional {or slope) distance and res the field. A method of detineat paper. An baller on sur HHicient implementation of the method is achieved through the use of a ing bulers for setbacks in conservation and reas. Atypical vector buller fianetion in GIS. distance) instead of surface cs in the raster data model is presented in this, heap and a hash-table based location index. The method is tested and analyzed on both hypothetical and real Introduction GIS is often utilized in conservation and_plannin iis powe ng applications beeause of ease of use, and rigorous and repeatable analysis of data. Buflering is a common spatial analysis process performed on. GIS data, particularly for conservation and plan- hing purposes (Chrisman 1997). For researchers working in these fields, accurate delineation of buffers around sensitive areas such as riparian habitats (Holder 1992; Phillips 1989: Niang and Stratton 1996) or setbacks in zoning and ph ning applications (Xiang 1996; Chrisman 1997) is necessary to make informed decisions, While governments and regulatory agencies are creat- ing and using planimetrie butlers i urs areas because of the lack of appropriate GIS, tools, landowners and property rights activists, desire accurate bullers on surfaces tw prevent asonable” setbacks or development restric- ions (Barnes 2002; Phillips 1989), Delineating buffers in mountainous areas is problematic. A typical vector buffer function in GIS is based on 2D Euclidean distance instead of surface (or slope) distance, resulting in am inac curate representation of buffers when they arc mountain Xingong i, Department of Geography, Unwersity of Kansas, Lawrence, KS 66045 Email: . Tel 785-854-5545, Fax: 785-864.5378. Christopher M. Larson, Department of Geourapty and Planing, Appalachian State University, Boone, NC 28608. Email: , Te: 828-252-6098; Fax; 828-262-3067. Arthur B. Rex, Department of Gaography and Panning, Appalachian State University, Boone, NC 28608. Email . Te 828-262-7067; Fax: 828-262-3087, verified in the field To illustrate the problem, ed on 50-meter butlers for two lines are genera a 2D plane and then projected onto the slant planes in Figure 1. The line in Figure 1a rans in the slope direction of the slant plane, while the line in Figure 1b is perpendicular to the slope direction. The slant plane has a 30° angle with the horizontal plane. The 50-meter 2D buffer in Figure 1a, when projected on the slant plane, will have the same buffer width on the surface, On the other hand, the 50-meter 2D buffer in Figure 1b, when projected on the slant plane, will have a buffer width of 57.7 meters (50/cos (30°), which, 4 percent more than the true surface buff width. As the angle of the slant plane increases, so does the buffer width when projected on the surface. The projected buffer (dark grey region on the slant plane in Figure 1b) delineates a larger area than the desired buffer on the surface (light grey region on the slant plane in Figure Ib). This ‘example shows how 2D bullers, when weated as ace buffers, may give false results, In areas that are either relatively flat or where the buffered features are parallel to slope direction, the traditional buffering method that relies upon 20 Euclidean distance is acceptable, Over ste and complex terrain, however, buffers generated based on 2D Buclidean distance will be larger when projected on the terrain surface. In this researeh we investigate the problem of generating accurate buffers on complex surfaces. We propose a method of delineating surface buffers in the raster data model. The method is implemented with efficient data structures and algorithms and analyzed with both hypothetical and real surface datasets. Gartogiapy and Gengraphic Information Science, Wl. 32, No. 3, 200 pp. 195-210 Buffering on Surfaces \ clear definition on buffer is necessary belore we stat exploring the methods of delineating but fers on surfaces. We define the buffer of a feature (point, line, or polygon) as a region where dhe short- fst distance from any location in the region to the feature is less than or equal {0 a specified but ‘width, Essentially; buffers designate the proximity regions around features, Although proximity is still calculated based on 2D Euclidean distance in most vector GIS, it has been extended in raster GIS 10 handle various types of non-Fuelidean distance, such as time, energy, and effort, Most raster GIS packages provide fimetions for proximity calcula tion in non-Fuclidean space, ‘These functc usually referred to as can diane or avighted dance depending on the GIS package. Generating bul= fers using the cost distance function in at raster GIS, aster kayer on which cumulative cost f0 the bulfered features is first calculated, Bullersare then dlelineated by selecting the cells whose values on the cost layer are less than or equal to a specified cost isa two-step process. A cost cach cell stores the least a Delineating buffers on a terrain surface requires thar distance is measured on the surface rather than in 2D Euclidean space, Measuring distance ona surface can be conceived aya problem of least cost calculation where surface distance is the €ost. With proper parameters, the cost distance fune- tion might be used to delineate surface butters. The key parameter to the function is the friction raster layer which stores the cost pur wait distance while moving through the celly on the raster layer In the case of del 1 surface buffers, the Itic- tion layer would store the stuface distance per unit (horizontal) distance when moving through a cell. This friction layer can be caleulated as the secant of the surface slope layer An implementation of the above approach in araster GIS would first derive the slope of a te surface. Next, the friction layer is obtained by calculating the secant of the slope layer. The least accumulative surface distance could then be calculated through the cost distance function with the Iriction layer and butte cdl features (sources) as input parameters. The output raster layer from the cost distance function stores the least surface distance from the features. Finally, surface butlers are delineated by marking the cells with their least surface distance less than or equal to a specified butler with ion, however, will not ly, but The above implement: generate aecurate surface butlers. Act 1% @ Figure 1. Line buffers in 20 and their projections on a slant plane. (a) Buffer fora line parallel to the slope direction of the slant plane. (b) Butter fora line perpencicular tothe slope direction ofthe slant plane fers generated in this fashion are usually swatler than their true surface buffers, ay we will see im the analysis and discussion section of this article The fundamental problem lies in the fact that the cost distance fianetion assumes a friction layer on whieh the friction at a cell is firwasiandt of movement diction. his assumption implies that no matter which direction a cell is waversed, the friction encountered (i¢., the cost per unit di the cell will be the same. ‘This may be appropri- ate for some applications but is not the ease in calculating surface distance, To explain this, first we have to clarily the slope and aspect layers routinely derived and used in raster GIS. Theoretically, surface slope is the maxi mun rate of change a nd aspect isthe dizection in which the maximum rate of change exists. In practice, the slope layer derived froma digital elevation model (DEM) represents the slope of the bestlit plane through the neighborhood cells Hodgson 1998), The aspect layer stores the direction of the slope of the fitted plane (also called slope direction), When a movement is made at a cell, the surface slope in the movement direction lies hetween 0, when the movement is perpendicular and the maximum rate of the movement is in the slope direc tion, For surface distance caleulation, the friction {(e., surface distance per unit horizontal distance) ata cell is see(y), where y is the surface slope in the movement direetion. Since surface slope () varies according to movement direction, frietion ata cell changes between | and sec(@), wher is the maximum surface slope at the cell. WI surface slope in the movement direction is 0, small est friction of 1 (see(0)) is incurred, On the other hand, when movement is in the slope direction ich has the maximum surface slope, maximu friction of see(®) is encountered. Since the cost distance function always uses th uum Aric tion at a cell no matter which direction the cell is ee) al to the slope ditectios Garoginphy an Ga uphic ormation Sciener traversed, buffers thus delineated will be smaller than their true surface buffers, A closely related topic in GIS literature is the tion of riparian butfers to absorb or filter contaminated run-off before it waters. Instead of using a standard butler width, for the entire area under consideration, Xiang, (1993) determined buffer width entirely on the basis of the surrounding physical conditions. The variable width buffers were derived from a pollut ant detention time model proposed by Phillips (1989), Because the actual calculation of dete tion time is data intensive and computationally complex, Phillips (1989) proposed the Riparia Buller Delineation Equation (RBDE} which relates, a surface to a reference surface by surface slope and soil properties. For the reference surface ( inclined plane), am effective distance was calculated based on a hydrological analysis of surface slope and soil properties. The distance indicates the path length which pollutants flow on or through the surface before they are reduced to an aceept- ble level. Representing the terrain in the triangulated ular network (FIN) data model and then laying the TIN with soil property polygon layer, Nang (1993) created a set of inclined planes space. Each plane was related to the reference plane through the RBDE, and an effective distance was calculated for the plane. This distance would have the same detention efleet as on the reference plane. The reciprocal of the distance (Le., deten= tion effectiveness per unit distance) was used as the friction layer in the cost distance function to calculate accumulative detention effectiveness Butlers were then delineated by marking all the cells with an accumulative effectiveness Value of 1 or less (Xiang and Stratton 1996), The effective distance on the reference surface assumes that the river is perpendicular to the slope direction of the surface and thus implies the maximum flaw speed on or through the sur- face. When the river is not perpendicular to the surface, the How speed toward the river will be less than the maximum speed and therefore smaller effective distance is required. The effiee- tive distance thus calculated for each TIN plat through the RBDE is valid only in the TIN plane's slope direction. However, when the reciprocal of fective distance is used ay the friction layer to calculate the accumulative detention effective- ness, the cost distance fimction assumed the same effective distance no matter which direction the flow moves. Variation of friction on movement direction way neglected, del ners. surface Val. 32, No As the above review illustrates there have been some efforts to delineate variable width butfers using the cost distance function in GIS. Unfortunately, some of the efforts failed to realize, and therefore neglected, the fundamental assumption (i¢., tric tion ata cell is invariant of went direction) onwhich the cost distance function is based. T cost distance funetion gives incorrect results whe the friction at a cell docs vary with movement directions. The next section presents a method for delineating accurate surface buffers in the raster da 1 model, Methodology The cost distance fumetion calculates for each cell over a friction layer the Teast accumulative cost 10 One oF several Source cells represent ing the interested feature(s). The friction layer stores the cost per nnit distance moving through a cell, which may vary from cell to cell. To cal- alate the least cost, the friction layer is viewed as a network where cell centers are the nodes and the Hines which connect a cell center to its cight immediate neighbor cell centers are the links (Figure 2). With the network scheme, 1 algorithms on a graph, such as ‘ore and Dijkstra algorithms (Cormen et al 2001), can be adopted to calculate least cost. Link friction is computed ay the arithmetic mean of the frictions at the cells the link eon- nects. The cost of traveling through a Tink iy obtained by multiplying the link friction by the Jength of the link, For cardinal links, the link Jength is cell size, while for diagonal links the Jength is ¥ 2 times cell size. While the cost distance fumetion is powerful and, used in many GIS applications, it only considers the frietion changes in betaven-cell movement (as nodes | —>links Figure 2. Network view of a raster layer 7 the average of two frictions making a link) but not the variation in within-cell movement. In the € of surface distance calculations, the friction (ie., sec(y) at a cell varies depending on movement direction. Since moyement direction for each cell i not defined when the process of finding least cost is initiated, the friction layer is also not defined. Therefore, the traditional cost distance function in raster GIS cannot be applied to generate but fers on a terrain surface. The method proposed her newvork schi adopts the same sed in the cost distance fi tion but incorporates its own direction-dlependent cost formula, It views the topographi¢ surface ay a network and, depending on movement direc- tion, computes the link cost (ic., surface distance) directly from the terrain surface rather than from a friction layer, With known elevation and cell size, surface distances from each cell to its eight neighbor cells can be calculated 1s for e distance between adjacent cells ell d andl ¢) and diagonal cells (For ‘example, cell ¢ and) are shown in Figure 3, where E, iy clevation difference and CS is cell size Dijkstra’s algorithm solves the single-source shortest-paths problem on a weighted, directed graph for the case in which all edge weights are nonnegative (Cormen etal 2001). The algorithm, iy adopted here to calculate least surface distance for single or multiple source cells. The modified algorithm iteratively spreads from the source cells by maintaining alist of candidate cells which are the hbors of the source cells, Initially, the candidate list contains only original source cells. In each iteration, the algorithm extracts the cell with minimum surface distance (min-cell hereafter) from the list, writes min-cell’s surface distance on the output surface distance raster SORTIE# + (1.414 * C5)7) SQRTIEZ +05") Figure 3. Surface distance calculation for adjacent end diagonal cells. a,b, ..., i are elevations at cells and CS is coll size, 198 layer, and updates the candidate list by caleulat- ing the surface distance of min-cell’s neighbor cells ng the cells into the list. The process all the cells have their least surface distance calculated. Figure 4 gives an example surface and a source raster layer from which k (Figure da) has six rows nd six coh The source layer (Figure 4b) cell size is one uni has foursource cells located at to different places. Figures 4c to 4j illustrate the results of a first few iterations ru m. The modified Dijkstra’s algorithm can be informally deseribed by the following pseudo code: 1. Open the surface and source raster layers finpuns 2 Create an empty surface distance raster layer (output); 3. Initialize the candidate cell list by adding all the source cells on the source raster layer to the list; and) 4. Repeat the following steps while the list is not empty # Extract the cell (min-cell) with smallest surface distance from the list: # Write min-cell’s surface distance on the surface distance raster lay # Calculate surface distance for min-cell’s ight neighbor cells; and # Add the neighbor cells into the list if they are not in the list or update the neighbor cells’ surface distance if they are in the list and their new surface dista is less than their existing surface distance. The key clement in the algorithm is the candi- date cell list which must provide two operations needed by the algorithm. The extraction apevation finds the cell with smallest surface distance and removes it from the list, ‘The insertion operation either adds new cell into the list i the cell does not exist in the list or updates the cell if the cell's ce is less than its existing su andlidate list expands and shrinks ally during the iterations. Figure 5 shows € fora 101 row by 101 column surface layer with a single source cell at the center of the layer. As shown in Figure 5, the number of cells in the list increases rapidly after first few iterations ntil the last few iterations. During the total of 10,201 rations, the average number of cells in the list is 265. The maximum number of cells in the list ofthe ichey 376, whieh is close to the circumference ter layer: While the rgest circle on the Cartography and Geographic Information Science tb tefsle 1 tefs[elele tRplefele tebe] & a fel (a) (b) (c) (d) te) © (9) (h) Nooata ca [Bh catin tn canat it Bott atta suse cistonce Vales in shaded cel are surface distance w 0) Figure 4. Calculating surface distance to source cells on an example surface. (a) The example surface. (b) The source raster layer with 4 source cells. () Initial candidate lst with four source cells and O surface distance. (d) - (g) Surface distance calculation forthe source cells and the evolution of the candidate cellist. (h) ~(j) Surface distance calculation {or three additional cells and the evolution of the candidate cell list. extraction operation is called the number of cells on the surface raster faye the insertion operation is called many more times than the extraction operation, As the size of the raster layer increases, so does the number of cells in the list and the increase quadratically. This means the extraction and insertion operations will be called more times with more cells in the list. Designing efficient data structures and algorithms for the candidate list and its operations becomes crucial ton efficient tation of the algorithm, andidate cell list behaves as a priority queue (Cormen et al 2001), Fach item ina priority queue has an associated priority. Wh eeds to be removed from the queue, it selects the item with the highest or lowest priority, It does not matter how the items in a priority queue are jumber of iterations will sorted, as long as it can always find the highest or lowest priority item when it is needed. A priority queue which selects highest priority item from the queue is called max-priority and the priority queue which removes the lowest priority item is called min-priority queue. Air traffic control uses 1 max-priority queue concept. Planes that are trying to kand and are running out of fel have op priority. Other planes trying to land have a second priority, Planes on the ground have a third priority because they are ina safer position than planes in the air: Overtime, new planes with third priority will be added to the queue and some of the priorities might change because planes trying to land will eventually run low on fuel. In our ease, the candidate cell list behaves ay a min-priority queue. The items in the mi cells. The priority associated with each ell is ity priority queue are Ti 32, No.3 199 surface distance, In each it the cell with the lowest priority (i.e, smallest surface distance) is extracted from the list. Surface distance to the ht immediate neighbor cells of the calculated, the neighbor cells are new to the list they are inserted into the min-priority queue, If the neighbor cells exist in the list and thei surface distance is less than their existing values, their priorities (i.c., surface distance) in th priority queue will be decreased, One simple way to build a min-priority queue is to use a linked list and keep the items sorted. in order of inereasing priority. To add to the queue, it searches through the list until the position where the new item belongs is found and, inserts the item at that position. Sometimes the new item will go near the head of the list, and. sometimes it will go near the rear, but on average itwill fall somewhere in the middle, Thus, adding, anew item to the queue takes OWN) run time where Nisthe number of items in the list. To decrease the priority of an existing item, it searches through the list and find the new position where the reduced. priority belongs and moves the item to the posi- tion. This will take, on average, same as adding, aa new item, O(N) run time, To extraet the lowest priority item from the list, it simply removes the first item. Because the list iy kept sorted in aycend= ing priority order, the first item in the list tras the lowest priority. So, it takes O(1) run time to extract the lowest priority item. A better scheme is (0 usea binary heap (Cormen et al 2001). A binary heap is a complete bi tree which has as many nodes as it can hold at all levels except for the leaves. Any nodes present on. the leaves are pushed to the lefi heap, each tree node is at most as large as its to chil- dren. While the two children must be at least as large as their parent, either may be larger than the other, Because each node is at most as large as the two nodes below, the root node is always, the smallest node in a min-heap. This makes min= heaps a good data structure for implem: inin-priority queues, In addition, complete binary trees have a number of important properties. First, they are the shortest trees that can hold a given umber of nodes. Second, ifa complete binary tree contains N nodes, it will have Oog(X)) height This fact is important because many algorithms traverse binary trees from the top to the bottom or vice versa. An algorithm that does this once has Oflog(N)) run time, Another particularly useful property of complete binary trees is that they can be stored very compactly in arrays, Wh the nodes are numbered from top to bottom and, new item Figure 5. The number of cells in the candidate cellist while calculating surface distance for a 101 by 101 surface raster layer with a single source cel atthe center ofthe layer. from left to right they can be placed in this order, The tree's root belongs in position 0. The chiklren of node m belong in positions 2 m+ Land 2° m + 2, Figure Ga gives an example of a min-heap with its array storage. To remove the smallest items fron the last item is moved to the top of the tr item is then pushed down ur final position and the tree is Because the tree has height of log(N), this pro- cess can take, at most, log(N) steps. This m a mincheap base priority queue takes O(log(N)) run time to extract the smallest item, where N is the number of items in the queue, To add a new item to the min-heap, the item is placed at the bottom of the tee and then pushed upward until the tree iy again a min-heap, Because the tree has height of log(N), this process can take most, Oflog(N)) run time. Decreasing an item’s priority is similar to adding a new item, Alter the priority of the item is reduced, the item is pushed upward until the tree isagain a min-heap, This will take O(log(N)) run time too. Compared with linked list implementation, as the 1 of items in the queue incre: min-priority queues will eventually out-perform linked list based implementation, especially, when the insertion operation is called more often than the extraction operation. Figures 6b and Ge give examples of extracting a node and decreasing the priority of anode in a min-heap. eap, The hes its \ crucial operation implicitly used in the inse tion operation isto determine whethera min-cell’s neighbor cells exist in the candidate cell list before adding them into the list or updating their surface distance. The ope to search through the list bayed on the location of the cells 200 Gartography and Grogiaphic tnformation Science Index] 0] 1|2]3/4/5|6|7]| 8] 9 /10/11|12]13/14 9] [5] vauel+[3]2[4[e[ os ls[tolia[ fralrapral7 15] [10] [13] [8] [14] [11] 2] [7 (a) index [0[4]2[3]4]5]6] 7] 8] 9 [10[t1 [12] 13]14 [4] [6] [8] 7} vauel2[3]5[4]6[e]7[isiolia[e frafiifi2 15) [10] [13] Le] [14] [11] frz (b) index[0]1[2[3[4[5]6]7] 8] 9 [10/4 [r2fia}i4 [5] 7] vawel2[3[4]4]6]5]7[1sfrofis[s fia] 9 f12) 15] [10] [3] [8] [*4] [9] [2 (c) Figure 6. An example min-heap. (a) The min-heap and its array storage after items 15, 14, 13, 9, 10, 12, 4, 3,1, 8,6, 7, 11, 2, and 5 are inserted into the heap. (b) The min-heap and its array storage after item 1 is extracted. (c} The min-heap and its array storage after item 11°s priority in (b) decreased to 4 in the list. Since each iteration calls this opera-— Hash table size and hash finetion can greatly tion eight times, ity efficiency greatly impacts the affect the efficieney of a hash-table based index, overall efficiency of the algorithm. Both linked/list For a binary-tree index with randomly inserted and min-heap based min-priority queues could cells, search time is O(logiN)). Worse-case occurs tise a sequential search to implement the opera- when ordered cells are added, In this case the tion, However, ay the number of cells in the list search time is O(N), same as sequential search, inereases, this operation becomes the bottleneck ‘To avoid worse-case in a binary tree, a red-black ofthe algorithm, To improve the performance, yee could be used 10 maintain a balanced tree location index for the cells is needed. The location (Cormen et al 2001). Both average and worse-case index could be implemented using a hash table earch time for a ved-black wee is O(logN). ora binary tree data structure, With all those considerations, the data structure For a hash-table index, the average time to designed for the candidate cell list is shown in search for a cell is O(1) while worst-case O(N) where N is the number of cells i Figure 7. ‘The list has a min-heap and a location index. The items in the the list heap are poil ersto Til 32, No.3 201 the raster cells which are immediate neigh- M: intaining the position pointer in adals extra complexity to the extraction and insertion operations which move cells in this effort is bors to the source cells with ed least — null surface distance. Cell surface distance is the a : ull key im the min-heap. Cells are stored in a | Y\. [eeaion = i dynamiearray which expands and shrinks, > — nul_| } ay cells are inserted and extracted from it = null ull] : The location index shown in Figure 7 is ull_| } hash table. Each hash table node stores } null_| } oF lection stich ik (eneranRAIDGRERS = i (recaton ; on the row and column of a cell, and & HNE nul]: 0 the cell in the min-heap. Each Cuil ull_| } cell in the lis stores its location (1ow and ; null] column), surface distance, and a pointer 1 cull} | to its position in the min-heap imin-heap } hash table location index | f re 7. Data structure designed for the candidate cell lis. “ap implementation uses a heap array. Both implementations use the same hash table as their location i the linked-list implementa~ tion dr running time with ace raster layers, the min-heap imple- i i i mer increases its running time slowly and Analysis and Discussion linearly 1 is obvious from the table that the use of It will be interesting to see the performance dhe min-heap data structure playsa key role for an of different implementations of the proposed elficient implementation of the modified Dijkstra’s method. Both hypothetical and real surface algorithm for surface distance calculation: datasets are used to test the implemen The datasets include three hypothetic Table 2 shows the effect of a location index. Both implementations use the same array-based planes with a single source cell at the ¢ min-heap to manage cells. One implementation the planes and a USGS 7.5-minute DEM with ses a hashetable based index while the other five randomly selected source cells. All three does not have a location index. With small raster slant planes have a 30° gle with the horizos layers, the implementation without a location index plane and a cell size of one unit. ‘The size of the performs better than the implementation w Uiree slant planes are I (row) by TH (column), location index because of extra space and 101 by 101, and 201 by 201 cells, respectivel sled to maintain the index. The location index The USGS 7.5-minute 10-meter DEM is the is used to search through the list and detern Mount Guyot quadrangle on the border of whether a cell exists in the list. This operation is North Carolina and Tennessee, which has M14 called whenever the min-cell’s 8 neighbor cells by 1164 cells. All the implementations are pro- © inserted into the list. With large raster layers, med i st » Visual Basic and the tests are run mplementation with a hash-table based loca on a 1.2 GHz Mobile Pentium Dell Precision tion index performs better than the implementa M50 laptop with IGB RAM. tion without an index, which uses a much slower Table I shows the difference between a linked sequential search, list and a min-heap based implementation. The Table 3 compares three different location indices. linked list based implementation uses a sorted All implementations in Table 3 are based on min- index methods: hash table, doubly linked list to manage the cells while the heap but with differe 101 x 101 201x201 Colts Location Inox 11x11 (seconds) | Soe a) {seconds} USGS DEM (minutes) Linked list. table 2 8 49 400.28 Min-heap Hash table 2 I a 8 37 Table 1. Performance difference between a linked-list and a min-heap based implementation, 22 Cariograpihy and Geogzaphic Information Science cats | tect SAAN, tasty | teams asso Min-heap None 1 3 4 7431 Table 2. Effect of location index. cots uence ulmeh | uaayy supa tt tee 2 | | so Table 3. Differences among different location indices. co | tmnt (2A, Min-heap {object) None 1 9 | 50 423.62 Table 4. Overhead of object-based implementation. binary tree, and red-black tee, All the imple tations run much faster than the linked-list based implementation, Although the hash performs slightly better than the binary tree red-black index, the difference among th relatively small, While the binary-tree in the potential 1o degrade intoa sequential search when the cells are inserted in order, the red-black ood this with its tree balancing Ie indlex strongly depends m is lex has tree index can ‘capability: The hash: on the size of the table and th \ small table size or an inapprop hash function can quickly tun the hash-table index into a sequential search. Table 4 illustrates performance overhead intvo- duced by the object-oriented technology. Both implementations use a mincheap array witho a location index. The structure-based implem tat cetly in the mins while the object-oriented implementation stores pointers to the cell objects in its mnin-heap array The str asec! implementation is much more efficient than the object-oriented implementatic and the overhead introduced by object referencing is evident in Table 4. However, the structure-based implementation has to update its location index: whenever cells in the min-heap array change their location. ‘This happens when the min-cell is extracted from the min-heap and when a cell’s surface distance needs to be reduced. The cells ‘ust be searched and located in the location index, This yields inet stores cells d > array before an update ean be made ficiency with the structure-based impk ‘The object-oriented implementation, howeve: fice of updating its location index since both the min-heap array and the location index store point- ersto the cell objects. Dijkstra’s algorithm is not the only way to solve the shortest-path problem and works only on. graphs with positive edge weights. ‘The Bellman Ford algorithm (Cormen et al 2001) solves the problem in the general case in which edge weights lay be negative, When modified for calculating, face distance, the Bellman-Ford algorithm miously scans the surface distance output wer. In each scan, it updates a cell's sur- wee distance based on the surface distance of its «neighbor cells. The algorithn no update occurs during a sean which means all cells obtained their least surface distance. hough, the Bellman-Ford algorithm. is mueh easier to implement, it runs much slower than Dijkstra’ algorithm when tested (lable 5). With an efficient surface distance calculation method and tool available, errors associated with different approaches of delineating surface buffers can now be investigated and compared The first approach simply delineates 2D buffers and treats them as surface bullers. The second method uses the cost distance function with a Irietion layer calculated as the secant of a surface slope layer: The final approach, named topo-di tance hereafter, uses the method proposed in this, article. Buffers in 2D can be caleulated by the cost raster 203 distance function with a constant friction 3 11x11 101x101 201x201 USGS DEM. layer on which all the cells have a friction Algorithm (seconds) (seconds) _ (secon« (i of L. The convenient network scheme used | ——_—-# it | {esnonde}____(mintes} in the cost distance funetion and also inthe PelimanFord 2 zai 206 855.87 topo-distanee method does not come free. Dijkstra 2 3 8 37 Since the neswork scheme limits movement direction to eight neighbor cells, ‘occur for the cells not aligned with those ight directions (Goodchild 1977; Huber and Church 1985). As illustrated in 8, the distance from cell A to cells B and D is cor rectly calculated, however, this is not the case for cell With the network scheme, distance from cell 1 to cell Cis calculated as A-e-C or dw-C, which is Jonger than the true shortest distance shown as the dashed line in Figure 8. Distance error depends, on where the cells are located and how far away they are from the source cell(s). [ Figure 8. Distance error incurred because of limited movement directions. ne 9 illustrates the true QD distance and the distance calculated by the cost distance function for an example 201 by 201 raster layer: 1 also illus- trates the spatial distribution of distance error. As, shown in Figure 8, the cost distance function always, ‘overestimates 2D distance. It is clear fiom Figure 9c, that larger errors (overestimations) appear at the cells between the 8 movement directions and further away from the source cell at the center of the raster layer. Because of the error inherent in the cost distance finetion, most raster GIS ages also provide a function just for 20 dist This function is often referred to as Euclidean distance in some raster GIS packages. n, distance at each cell is directly the length of the line connecting a source cell using the Pythagorean theorem. ‘The Euclidean distance fumetion is usually preferred while calculating 2D distance. However, Table 5. Performance difference between the Bellman-Ford and Dijkstra’s algorithm. Dikstra’s algorithm is based on a min-heap with a hash table location index. when barriers exist in space, the Euclid tance function has to give way to the cost distance function since it can not handle barriers. With two methods available for calculating 2D distance, a total of four methods (i.e., the Euclidean distance, 2D cost distance, cost distance, and topo: distance ted and compared. All the methods are first tested on hypothetical slant planes having different angles (0, 15°, 30°, 45°, 60°, and 75%) with the horizontal plane (Figure 10). AII the planes have 201 (rows) by 201 (col tummy) cells with a cell size of one unit, Elevation increases trom 0 at the right edge of the planes to maximum valuesat the left edge of the pla Surface distance and 50-unit butlers to a single source cell at the center of the planes are caleutated, The slant planes are chosen hecause true surface distance can be accurately calculated. tine surface distance on the p ieulated by following several steps using map algebra oper tions available in most raster GIS packages. First, ) distance to the source cell is calculated by using. the Euclidean distance function. The 2D distance raster layer stores at each cell horizontal distance to the source cell. Next, a constant raster layer is created with all the cells having the same elevation as that of the source cell. An elevation difference raster layer is then calculated by subtracting, the constant raster layer from the slant planes. The elevation difference raster layer stores vertical difference beuveen every cell and the source cell, nally, the true surface distance raster layer is, calculated by applying the Pythagorean theoren to the 2D distance and the elevation diffe raster layers, Each distance raster layer calculated by the four methods is compared with the true surface distance raster layer: Percentage distance lated! as 100 * (distance ~ true dis- tance) / true distance. From error layers, mean, standard deviation, and maximum absolute error are obtained. Table 6 shows the mean errors associated with the four methods on the six slant planes. From mean errors it’s obvious that none of the four methods consistently calenlates surface distance accurately. Fopo-distance consistently overestimates error is calc eT Ganiogniphy and Geographic Information Seience (a) (b) (c) Figure 9. Spatial distribution of 20 distance errors. Dark shades represent small values while light shades represent large values. (a) Distance in 20 to the center cell calculated by the cost distance function, {b) True 2D distance to the center cell. (c) Distance difference between (a) and (b). Figure 10. A hypothetical slant plane having a 30 angle with the horizontal plane. The single source cells located at center. the surface distance because of the network scheme. This is elearer when the errors for the horizontal plane (first row in ‘Table 6) are cu the three methods (i.e., 2D cost distance, cost dis- tance, and topo-listance) with inherent network scheme overestimate their distance, The Kuclide distance m surface distanee because it teats hod consistently underestimates the 1D dist ace distance, which is always longer than or Ito 2D distance slope friction layer), on the other hand ly overestimates surface distance because eq The cost distance method (vith consist it assumes the maximum slope in all directions. The 2 cost distance method (with a constant friction of 1) is slightly different from other methods. I ‘overestimates the distance for lower slope planes but underestimates the distance for higher slope planes. This occurs because there are ovo error sources working against each other. While eating 2D distance as surface distance underestimates Vol. 32, No. F surface distance, the network scheme used in the method overestimates surface distance. For lower slope planes, overestimation is more severe than underestimation and the method overestimates, the distance overall, ‘The situation reverses with, higher slope pkines. Although none of the four topordistance performs consiste timations caused by the network scheme. As slope increases, topo-distance performs better than the other three methods with inherent methodologi cal fault. ‘This is further verified by examining imum absolute errors and standard de tions of the errors associated with the methods as shown in Figures HT and 12. Both the maximi absolute lard deviation of topo-distance are minimal when compared to other three methods as slope increases. Icis interesting 10 see how the distance error is distributed, Figure 13 shows the spatial distribution of the percentage distance Lith four methods for the 30° slant plane shown, in Figure 10, The can distance method, which hay ween “34 percen and 0, lerestimates surface distance except in the direction perpendicular to the slope direction of the slant plant where 2D distance is equal to surface distance (Figure 13a). The other three hods show a similar network effeet to that cen in Figure 9 The 2D cost distance method, which has 134 percent and 6.2 pe nce in the direc cent, gives correct surface dist tion perpendicular to the slope direction (Figure 13b). As it moves away from the direction, treatin 2D distance as surface distance underestimates 205 surface distance, while the : Plane Slope Euclidean _—-2DCost_ Cast Distance Topo-istance network scheme overestimates "fgegrees) Distance (Ss) Distance (%) | (%) o 2D distance in the locatic we ss between the eight movement 8 50 58 directions. The bautle between 15 34 37 13 54 those wo opposite factors is 30 “134 15 137 56 otis in gure 180, ‘The 7 a4 vy wei 63 cost distance method, which has an error range between 0 0 500 a2 a6 Bs and 22.7 percent, overestimates 5 144 28.0 1783 4s surface dist slope direct is atits nce except in the n where the slope imum, Maximum, ation occurs at the methods, location close to the direction perpendicular to, the slope direction where both the network sehem and the using of maxim the surface distance (Figt whieh has an error ra cent, produces the Qe. Ieoverestimates surface € ight movement directions (Figure 13d), Bullers of 50-amit width are also delineated with }0° slant plane methods shown in Figure 10. Figure L shows the difference The buffer delineated with the Euelidean di the largest because the method co estimates the surface distance and, nore cells are included in the buffer. The cost distance method ereates the smallest bu because the method consistently overestimates, surface distance using the network schen maximum slope in all directions. ‘Tope generates a smaller buffer than the true bulfer because the method overestimates surface dis- tance with the network scheme. Some parts of the tio 11 slope overestimates 130). Topo-distance, heaven 0 and 4.9 per- ne error pattern as in Figure ve exceptin th buffer generated by the 2D cost distance methoed ane inside the true buffer while other parts of the buffer are outside the true buffer. In the direc- = F |e] L Same / i / be i a _ ee Figure 12. Error standard deviations associated with the ‘methods, 206 OO Table 6, Mean percentage distance error associated with four distance calculation 06 re 11. Maximum absolute errors associated with the methods. tion perpendicular to the slope dire rection), 2D distance is equal to surface dist Howes estimates 2D distance love to the direction, and this makes the top and bottom portions of the buffer smaller than the true buffer. In the direction parallel to the slope direction (horizontal direction), overestimation works against underestimation. On the 30° plane, underestimation is larger than overestimation. ‘This iakes the left and right portions of the butler larger than the true buffer Tables 7 to 10 show the accuracy and commis- sion, omission, and total errors associated with 50-tmnit bullers delineated by the four methods on different slant planes, Accuracy is defined as the percentage of areas correctly computed by the methods, Commission error is the percentage of areas that are identified as inside buffer by the side the tue bufler and omission error isthe percentage of areas that are identified ay outside buffer by the methods reas are actually inside the true buffer al error is simply the sum of the commission Gartogvaply and Geogiaphic Tajormation Science Slope Accuracy Commission Omission Tota 0 100.0 00 et) 15 967 33 033 30 86.5 134 0 134 8 703 231 oA Cy 498 502 o | soz 15 8 | 42 0 | m2 Commission Omission Total | Slope Accuracy eT enor Enor o 1000 oo no 10 98 | on 15 18 330 | 70 31104 a2 28 15243 so 980 450 03453 | 5 m6 na oo na Table 8. Accuracy and errors associated with 50-unit buffers delineated by the 20 cost distance method on different stant planes. Slope Accuracy Commission Omission Tt 0 mo opto to | 1s mo aos mmo | 00 tease 45 1000 | 00 512572 0 sno ore tain 100 | 00 aaa 37 Table 9. Accuracy and errors associated with 50-unit buffers delineated by the cost distance method on different slant planes. Siope Accuracy Commission Omission otal o 100.0 0.0 11.0 | 0 1510000 to | tt so i000 | tt m0 007 tae so 100g tea B10 00 ean Table 10. Accuracy and errors associated with 50-unit buffers delineated by the topo-distance method on different slant planes. Til. 32, No. Table 7. Accuracy and errors associated with S0-unit buffers delineated by the Euclidean distance method on different slant planes. and omission errors. ‘The Euclidean distance method overestimates its buffers by including too many cells which are outside the true buffer and therefore has larger commission errors espe cially with higher slope planes. All three network based methods underestimate their butlers with smaller slope planes. As slopes increasing, the 2D cost distance method quickly overestimates its bulfers by including cells lying outside the true bulfer. The cost distance method underestimates its buffers by omitting cells which are inside the true bulfer and therefore have larger omission errors especially with higher slope planes. Although topo-distance does umderestimate its buffers, itis the only method that has a well constrained or sion error for a range of slopes. All other methods cither significanily overestimate or underestimate their butlers with higher slopes. The four methodsare also tested on a real surface dataset which is a watershed (Figure 15) derived from the Mount Guyot USGS 7. 10-meter DEM. on the border of North Carolina and Tennessee: Elevations within the watershed vary between 1006 and 1799 meters with a mean elevation of 1372 meters. Slope varies between 0° and 70° with a mean slope of 4°. ‘The stream network is derived, from the USGS DEM. Surface distance and stream, bullery are investigated and compared. Since true surface distance isunknown, all other methods are compared with the topo-distanee method, Table L1 shows the statistics of the percentage distance error associated with the methods when I vith the topo-distance method. By exan- x the meat, maximum, and minimum errors in Table 11, itis clear that both the Euclidean distance method and the 2D cost distance method underestimate their surface distance, because they teat 2D distance ay surface distance. It is not surprising that the 2D cost distance method consistently underestimates its surface distance. The 2D cost distance method has a net effect of underestimation when compared with topo- distance because both methods overestimate distance with the network scheme mean error for the cost dist positive realm, it is surprisi Theoretically, the cost distance method has a much higher overestimation of surface distance when compared with topo-distance and should have a positive minimum error. However, this is not the ease with the wat 3 comp Although the \ee method isin the 1g Lo see a negative rshed DEM where 3 percent of the cells have negative errors. A closer amination of the slopes at those negative error celly reveals that the assumed slope, which is derived from the DEM and used in the cost distance method, may not be the maximum at all As an example shown in Figure 16, the assumed “maximum” slope at cell A is 1,03°, while the slope between cell A and 22.0% cell Bis 20.8" and the slope between cell A and C is 19.3°. It is clearer by exam- ing the profile along cell “AB that the “maximunn’ 0 slope, which is calculated as the angle of the best fitted plane, is much smaller than the slopes between cell and B or C. Thus, in practice, slopes derived by GIS from a faces may not be the maximum rate of change because ofthe inerent ds @ crete nature and the limitation Figure 13. Spatial distribution of distance errors associated with the methods. (a) of cell size. This explains why Errors with the Euclidean distance method. (b) Errors with the 2D cost distance the coxt distance method las. method. (c) Errors with the cost distance method. 4) Errors with the topo-distance ve errors. method. Butlers of 100-meter width are also delineated and ¢ pared (lable 12) for the | topo-distance true buffer 2D cost distance streams, ‘The columns are the sameas in Table 7, except that the reference buller used here is the buffer generated by topo-distance. Both the Euclidean distance and the 2D cost distance method overestimate their butlers nel have larger commis- sion errors. ‘The cost distance method delineates a buffer amo the simeastopodi eget distance euclidean distance tance with a small omission error. This small difference — Figure 14. Buffers delineated with different methods. between the cost distance Shin besa 12a imum Maximum | Neon ovation Fusstheteat slopes nine | Sess “m m5 ___18 a watershed (30°) and in the 100- | —20.cest distance _41.6_00_/_-116_}_4.8_ meter buffers (34°) are rela-_ | Euclidean distance 44.0 00 18.3 49 ely small. Second, smaller Table 11. Percentage distance errors associated with different methods in the “maximum” slope used in the watershed. cost distance method reduces its overestimation. Last, as calculates accurate surface distance along slope shown in Figure Ie, the cost distance method direction, although it overestimates the distance 208 Ganiograply aad Geographic Tnfornation Science Figure 15. Watershed and streams derived from USGS 7.° 10-meter DEM, 5 \ lszo}1173.2| 1724 & +167.9|1167 166 9 c 1170.3]:1709}s171.3 tod plan cells with negative error Figure 16. Cells with negative errors and a closer examination on one negative cell. Slope at cell (the “maximum” slope) is 1.03° and the slope between cell A and B is 20.8° which is larger than the assumed “maximum slope. fecacy Ondnion | Tot costdsme 1000 2020 Dastismee 88 oo is fucdemdstnce 613187008 Table 12. Accuracy and errors associated with differant methods while delineating a 100-meter buffer for the stream network in other location, While caleulating the surface distance to the streams, the cost distance method minimizes its ervors as most of the streams are perpendicular to the valley slopes. (Gao and Menon 199: Conclusions Delineating accurate bullers on surfaces is a necessmy GIS. ime tion used in environmental analy= sis in mountainous areas to mike informed decisi tucating 2D dist tance nor using the cost distance fiction with a slope friction is a Viable solution. The topo-distance method presented in the article views raster surface lay works and provides solution 1 calculating surface distance in the aster clata modtel. ‘This method is implemented with efficient data structures and algorithms. From the analysis, it shows that Dijksira’s algorithm outperforms the Bellnan-Ford algorithm, The efficiency of the modified Dijkstra’s algorithm strongly elependls on the data structures and associated algo: rithmy used in its implementation. For small datasets, the difference among various implementations is indistinguishable, With large datasets, the diflerence is significant, ranging from a few m on the terrain dataset tested. The use of a min-heap for the candidate cell list is the key to significantly improv- ing the efficiency of the method, and adding a location index to the list nutes to several hours further enhances the performance of the method. The combination of a min-heap and a hash-table based location index gives the best test results, although ments might still be Fibonacei heap (Cormen etal 2001), While the implementation is efficient 1s tested, this has not been tested on very large raster datasets where prove rade by using the raster layers have to be divided. into smaller blocks and individually loaded and processed in computer memory. Some efficient in-memory algorithms and their implementa tions may turn out to be inefficient while handlin; Analysis on the hypothetical skint planes reveals that topo-distance is not perfect. As with the cost 209 distance fimetion in most raster GIS packages which adopts the network scheme and limits movement to eight neighbor directions, topo- distance overestimates surface distance, Howeve unlike other methods which significantly either overestimate or underestimate surface distance for higher slope planes, topo-distance hay well constrained errors fora range of slopes. Goodchild (197) analyzed the error caused by the network scheme for 2D distance calculation and provided error bound. Methods have been proposed to. reduce the error by incorporating more neighbor cells (Huber and Church 1985; Douglas 1994: Xu and Lathrop 1995). Future improvement on topo-distance may include more nei or other feasible solutions. Topodistance only works in the raster data model, Developing a method for calculating sur face distance on TIN surfaces presents a challenge. Unlike the raster method with a finite mu directions to move from one cell, a TIN method could provide an infinite number of directions to move from one point to another. While this increases the complexity of deriving surface dis- ¢, it allows an increased level of detail and is not limited by pre-defined direetions in the raster data model. It is therefore well worth investigat- ing in future research, REFERENCE! Barnes, S. 2002. River buffer rules face opposition fiom property rights advocates, Walang Mountain Times April 18, 2002, p. 1H Chrisman, N. 1997. Exploring geographic information systeis. New York, New York: John Wiley and Sons. Cormen, T. H., GE. Leiserson, Re L. Rivest, and ©. Stein, 2001, Fnimduction to algavithns, 2" ed Cambridge, Massachusetts: MIT Press. Doughs, D. H. 1904, Least cost path in GIS using an accumulated cost surfice and. slope lines Cartographiea 3137-51 Gao, Band S. Menon, 1991. A di ‘approach to a class of propagation functions. In Proceedings ofthe 6% International Symposin on Spatial Data Hanalng, Edinburg, Scotland. pp.177-189. Goodchild, M. E1977. solutions to the probles Enciroument and Planning A. 9: 727-38. Hodgson, MLE, 1998, Comparison of bi-direetional angles from surface slopeyaspect Cartography aud Geographic Information Systems 532 V7-NT. Holder, We J. 1992. Assesonent of riparian buffers effectiveness sing GIS technology: Lawel Springs, Ashe Connty North Carolina. Unpublished Master's thesis, Appalachian State University Huber, D. L., and R. L. Church, 198 corridor location modeling, journal of Transportation Engineering WWM): 114-30. Phillips, J.D. 1989. Effeet oth and riparian land use in. Eastern North Carolina, Southeastern Geographer SXIX(2):1 36-49. Stratton, We EL. 1993, Generating stream buffers using a geographic information system. Unpublished Master's thesis, University of North Carolina at Charlote, Charloue, North Carolina. Niang, W. 1993, A GIS method for riparian water quality butler generation, Jnleruational Journal of Geographical bnformation Systems V7(1):57°70. Niang, We 1996, GIS-basee rip Injecting geographic information into landscape pkinning, Landscape and Urhan Planning 34(1) 1 lo. Niang, W., and W. L. Sutton. 1996, The b-funetion ‘and variable stream buffer mapping: A note om 'A-GIS method for riparian water quality buffer generation". dutermational Journal of Geographical Information Systems VO(AY: 499-510. Xu, Joand R.G. Lathrop, 1995, Improving simulation sand iterate An evaluation of kattiee of corridor location, algorithms 5. Transmission aviable wield Duilfer analysis: y of spread phenomena in a raster-based Geographic Information System. Journal of Geographical Information Sysiems 9:1 6s, . International a Caving aed Geographic Tiformuition Science

You might also like