You are on page 1of 4

International Journal of Advanced Computer Science, Vol. 3, No. 5, Pp. 227-230, May, 2013.

Identical Entity Geometric Matching from Multi-source Spatial Data


Liu Hai-yan,Wang Xin,Liu Xin-gui,Liu Chen-fan
Manuscript
Received: 14,Mar.,2013 Revised: 27,Mar.,2013 Accepted: 3,Apr.,2013 Published: 15,Apr.,2013

Keywords
Spatial Data Matching, Identical Entity, Geometric Matching, Multi-source Spatial Data, Vector Data and Image Matching,

Abstract The integration of spatial data cannot be done simply through data format conversion, but needs to be solved by further data matching and other techniques. As the key technique of spatial data integration and updating, identical entity matching is to establish relationship between geographic entities from multi-source spatial database. It aims to realize the comprehensive utilization of the multi-source data. This paper discusses some algorithms of geometric matching entities with the same name, includes same scale road networks geometry matching, different scale road networks geometry matching, vector data and image matching. The matching methods can be used for map updating and merging, integrated information sharing, and data variance analysis.

1. Introduction
In practice, the integration of cartographic data cannot be done simply through data format conversion or overall transformation of coordinates, but needs to be solved by further spatial data matching and other techniques. From the present research status [1, 2, 3, 4, 5], feature matching of spatial data still lacks of complete and integrated framework. This paper describes three approaches to identical entity matching from multiple sources. Since the matching methods are not appropriate to every entity, the errors need to be repaired by necessary manual editing after automated matching. The matching results can be used for map updating and merging, integrated information sharing, and data variance analysis. The matching of spatial data includes both the matching of different entities and entities with the same name. The matching of different entities is mainly used for GIS analytic application, e.g. we can match the road data in a 1:1 million map and the data of the residents, and find out how many settlement place with a population over 0.5
This work was supported by the National High Technology Research and Development Program of China (2012AA12A404), the National Natural Science Foundation of China (41201391), the Plan for Scientific Innovation Talent of Henan Province. Liu Hai-yan, Zhengzhou Institute of Surveying and Mapping, Zhengzhou 450052, China (liu20 00@vi p.sina.com)
20 ip

million are connected with that road, and thus, find out how important that road is. The matching of same name entities is the process of identifying the same surface feature or surface feature sets (i.e. same name entities) in the real world from different map sources through the analysis of the differences and similarities in space entities[6,7] . Spatial data is generally considered to have three basic features: Spatial feature, property feature and time feature[8]. Spatial feature, i.e. its location feature, determines the same geographic entity has relatively the same location in different database; property feature determines that the description of the same entity from different data source may be the same or similar. The above two features determines that matching entities with the same name is technically practicable, and they laid a basis for it. Meanwhile, time feature add time element to the spatial data, and the outdated information do not have present value, and it determines the necessity to study the entities with the same name, and is the root of studying the matching of same name entities. The basis of matching same name entities includes distance measure, geometric shapes, topology relation, graphic structure and attributes information, and other similarity index. According to different similarity index, the calculating method of matching can be divided into three main matching methods: geometry matching method, topology matching method, and semantic matching method [9]. Geometry matching method is simple and easy to carry out, and it is the most commonly used method in the matching of the entities with the same name; topology matching method overcomes the inaccuracy of points and the sensitivity about geometry in distance method, but a tiny topology relation difference of the same surface features could cause failure of the matching, therefore, topology matching is mainly used to narrow down the scope of searching or to check if the result of geometry matching is correct, and it is hardly used independently; semantic matching i.e. property matching, if the data from two different source sets is defined by the same property fields, and the semantic information of the two data sets is knowable, semantic resemblance is effective in accelerating the process of matching the entities with the same name, and improving the accuracy of matching, the shortage of this method is that it depends largely on data model and property data model, therefore, it is hardly used.

Wang Xin, Liu Xin-gu, Liu Chen-fan, Zhengzhou Institute of Surveying and Mapping, Zhengzhou 450052, China

228

International Journal of Advanced Computer Science, Vol. 3, No. 5, Pp. 227-230, May, 2013.

A. Qualification This article states that the qualification of the matching of the entities with the same name mainly include [10]: The two source databases in the matching are of the same area, or at least have some overlapping part, i.e. the scope of their maps belongs to the relation of equal, intersecting or including. The two source databases in the matching have been eliminated the difference of the systems, mainly optical projection system, coordinated system and mapping control network. This could be done through general projection transformation, coordinate transformation, etc. to unify or rectify.
Maps update Data set1 Data set2

Data preprocessing

Data preprocessing

Matching
Matching process in later period

Maps merge

The two source databases in the matching have been made into consistent format and standard logic. For example, eliminate the difference in data storage format; eliminate the difference of the same surface features in the expressing of the entity, e.g. if the same surface feature is line type of entity in one database, while area type entity in the other, entity type transformation should be done; eliminate the non-nominated topology relation of data, e.g. do not have surplus entities, unreasonable connections or "pseudo node" etc. The measuring scale of the two source databases in the matching should not be too different. If it is quite different, the difference between two same name entity could be huge, and there will be much more elements to be taken into consideration in data matching, and the setting certain matching parameters will be nonsense. The two source databases in the matching have same or similar time phase, i.e. the time span between the two databases should not be very large. If the time span is large, the difference between the same name entities will get large and matching rate will decrease.

Quality evaluation, variance analysis

2. Geometric Matching for Entities with the Same Names


Fig.1 Process of the Matching

Map integration Information sharing

Geometry matching of entities with the same name depends on the calculation of the geometry similarities. Some common geometry similarities rules are [11] [12] [13]: 1) Distance: If without mistake, the same object in the real world has relatively the same geographic location in different maps. Comparing the distance between the entity elements is the simplest and most understandable method in entity matching. The distance between point entities is calculated by Euclidean distance; and the distance between two line entities is calculated by Hausdorff distance; the features of entity distance between area entities including centroid distance, corresponding vertex average distance, boundary distance, etc. Distance marker is the main basis of the matching of point to point, line to line, and area to area. 2) Direction: The direction of line entities could be measured by the angle of the line from its starting point to the end against the X axis; the direction of the entities are often measured by the bounding box. Direction marker could be the review standard in the matching of line-line or area-area. 3) Position relations: Refers to if a point entity is in or outside an area entity, or if a point entity is on a line entity. Position relations often act as the main rule in the matching of area-area. 4) Shape features of the entity: The features of the shape of the line include length, the biggest string, bending degree, and etc.; the features of the shape of area include minimum circumscribed rectangular, perimeter, area, density, shape ratio, Fourier shape coefficients, etc. Shape features of the entity is often set as review standards in matching line-line, area-area.
International Journal Publishers Group (IJPG)

B. Process of the Matching The technical process of matching of entities with the same name is shown in Figure 1. Normally, before matching two map databases, we have to preprocessing the data, including the unification of data storage format, optical projection system and coordinated system, the construction and integration of topological relation, the consistent of index of spatial entity, similar surface features type of entity, the compression of redundant data point, etc. The matching method is not appropriate to every entity, so after the matching process, we have to do necessary artificial editor to matching errors or matching failure. The matching result could be applied to updating and merging maps, integrated information sharing, data variance analysis.

Author I et al.: the full paper title comes here.

229

A. Geometry Matching for Same Scale Road Network The basic steps are: Build a buffer with an appropriate radius for the reference line, search for the target lines that completely fall in the buffer zone, and select the lines that match the reference line. Do matching for all selected lines according to the geometric similarity. In Figure 2, the dotted lines represent the constructed buffer area base on the road network of the 1:500000 aviation map. The bold green lines represent the roads selected from the road network of the 1:500000 topographic map through the geometric matching. 100 roads from the aviation map are selected for matching; the matching results are shown in Table 1.
TABLE 1 GEOMETRIC MATCHING RATE OF SAME SCALE ROADS

After setting the matching parameters, the final matching result is shown in Figure 3 and the matching rate is shown in Table2.

Fig. 3 An example of Geometry Matching for Multi-scale Roads

Number of roads 100

Matching number 90

Incorrect matching number 7

Failure to match 3

Matching rate 90%

TABLE 2 GEOMETRY MATCHING RATE OF ROADS OF MULTI SCALE

Index of roads 1 2 3 4 100

Incorrect Ideal or failure matching of number matching 6 1 5 3 0 3 1 0 1 5 0 5 3 1 4 Average matching rate90.45% Matching number

Matching rate 87.9% 100% 100% 100% 701%

Fig. 2 An Example of Geometry Matching for the Same Scale Roads.

B. Geometry Matching for Different Scale Road Network The basic steps are: Build a buffer with an appropriate radius for the reference lines (from 1:1000000 aviation map), search for the target lines (from 1:500000 aviation map) that fall completely in the buffers zone, and select the lines that match the reference line. Calculate the distance between the points on the selected line and the reference line; select the minimum value as the distance from the point to the reference line. Set the appropriate distance matching parameters (namely "matching tolerance"), calculate the number of the points of which the distance to the reference line are less than the matching tolerance; calculate their percentage against the total points number of the selected line; if the percentage is greater than the threshold, it is considered to match the reference line.
International Journal Publishers Group (IJPG)

C. Vector Data and Image Matching The basic steps are: Extracting the coordinates of the water area boundary after the image is grayed out. Calculate the centroid of the extracted water area features. Determine if the centroid lies in certain water polygon area in the vector topographic map. In Figure 4, (a) is a screenshot of TM image, (b) is the grayed result, (c) is the extracted coordinates of the water area, (d) is the matching result after calculating the centroid of the extracted water area (marked by "+").The area circled by the green line is the extracted target area that matches the reference area from 1:250000 water areas. All the area water features of the TM image data are selected to join the matching. The matching rate is shown in Table 3.
TABLE 3 VECTOR DATA AND WATER IMAGE MATCHING RATE

Number of facet water areas 63

Matching numbers 60

Numbers of incorrect matching 2

Number of failure of matching 1

Matching rate 95%

230

International Journal of Advanced Computer Science, Vol. 3, No. 5, Pp. 227-230, May, 2013.

[7]

[8]
(a)Original Image (b)Gray Scale Image

[9]

[10]
(c)Extraction Water Features (d) Result of Area Centroid Matching

Fig. 2 An example of Geometry Matching for Fig. 4 Vector Data and Image Matching Multi-scale Roads.

[11]

[12]

3. Conclusions
The presented matching methods refer to many matching parameters, such as the buffer width, line length, matching tolerance, ratio, etc., which are used for eliminating incorrect lines or areas. The key point is the parameters value, and it would have a great impact on the matching results if the value is too big or too small. Matching review parameters depend largely on the features of the data. It is difficult to get appropriate values at the beginning of the matching. We can adopt the automatic increment adjustment to get them. First, set an arbitrary small number for matching to see if there exists the right line that matches the condition. If not, increase the iteration times until the matching target line is found. By analysing the different matching results with different parameters, how to refine the classification and further quantize the matching parameters according to the features of data are the problems that need to be solved urgently.

[13]

200530(2)78-80. Liu, D.Q. and Su, S.W., Many space database position matching method and its application [J]. Surveying and Mapping, 200530(2)78-80. Hua, Y.X., Wu, S., and Zhao, J.X., Geographic information system principle and technology [M] Beijing: PLA Publishing House, 2001(159-165). Sagi Filin and Yerahmiel Doytsher, 2000. Detection of Corresponding Objects in Linear-based Map Conflation. Surveying and Land Information Systems. Vol60, No2. Li, D.R., Gong, J.Y, and Zhang, Q.P., Map database combine with technology [J]. Science of Surveying and Mapping, 2004, 29(1) 1-4. Liu, Z.Y., Map database entity matching and merged technology research [D]. Hehai university master's degree papers, 2006. Yuan and Tao. Development of Conflation Components[C], in Proceedings of Geoinformatics99 Conference, 1999. Zhang, Q.P., Map database entity matching and merged technology research [D]. Wuhan university degree papers, 2002.

Liu Hai-yan, PhD, Professor, Department of Cartography, Zhengzhou Institute of Surveying and Mapping, China; Visiting Scholar(2012.12-2013.12), Department of Geography, University of California, Santa Barbara, CA, United States. Wang Xin, Ms., Engineer, Department of Cartography, Zhengzhou Institute of Surveying and Mapping, China. Liu Xin-gui, PhD, Associate Professor, Department of Cartography, Zhengzhou Institute of Surveying and Mapping, China. Liu Chen-fan. PhD Student, Zhengzhou Institute of Surveying and Mapping, China.

References
[1] SAALFELD A. Automated Map Complication [J].International Journal of Geographical Information Systems, 1988, 2 (3):217-228. Cobb M., Chung M., Foley H.A Rule-based Approach for the Conflation of Attributed Vector Data [J], GeoInformatica, 1998, 2 (1):7-35. Walter V, and Fritsh D. Matching Spatial Data Sets: A Statical Approach [J]. International Journal of Geographical Information Systems, 1999, 13 (5): 445-473. Filin S. and Doytsher Y.A Linear Mapping Approach to Map Conflation: Matching of Polylines [J], Surveying and Land Information Systems, 1999, 59(2):107-114. He,J.B., Ke,Z.Y., and Chen,C.S. Land Resources Environment and Regional Economic Information System Integration Scheme, Space informatics and its application -RSGPS GIS and Its Integration [M].Wuhan Wuhan university of science and technology of surveying and mapping in the press 1998. Liu, D.Q.and Su,S.W.,Space Database Position Matching Method and Its Application [J]. Surveying and Mapping,

[2]

[3]

[4]

[5]

[6]

International Journal Publishers Group (IJPG)

You might also like