賴昆祺 - 運用DBSCAN演算法與Google Maps於大量物種出現紀錄之研究

You might also like

You are on page 1of 12

DBSCAN Google Maps

Applying DBSCAN Algorithm and Google Maps in the study of LargeScale of Species Occurrence Data

1 2 3 4 5
Kun-Chi Lai, Elie Chen, You-Sheng Li, Kwang-Tsao Shao

TaiBIF 26 150
85 (Geospatial Knowledge
Discovery)
DBSCAN
(Eps)
(MinPts)
Convex hull Polygon clipping Google Maps
Google Maps
Google Maps

DBSCAN Google Maps


1

/
PhD Student, Department of computer Science, National Chengchi University
Project Manager of Taiwan Biodiversity Information Facility, Biodiversity Research Center,
Academia Sinica
2

PhD Student, Institute of Marine Biology, National Taiwan Ocean University


3

Software Engineer of Taiwan Biodiversity Information Facility, Biodiversity Research
Center, Academia Sinica
4

Software Engineer, Research Center for Information Technology Innovation, Academia
Sinica
5

Research Fellow and Executive Officer for Systematics and Biodiversity Information
Division, Biodiversity
Research Center, Academia Sinica

Abstract
The primary species occurrence data include data on animal and plant specimens in museums
and herbaria, as well as species observations. The TaiBIF data portal has integrated 26
datasets so far, resulting in more than 1.5 million species occurrence data; 85% of them are
geo-referenced. Geospatial clustering is an important method for geospatial knowledge
discovery which explores spatial data. In this paper, we present density-based clustering
method. It utilizes DBSCAN algorithm to draw arbitrary distribution maps by using two
parameters (one is -neighborhood and the other is MinPts, the minimum number of points).
DBSCAN algorithm describes the visualization of occurrence data on Google Maps which
can be helpful in understanding and discovering the knowledge embedded in the species
geographical mapping, leading to better conservation effort.
Keywords: species occurrence data, cluster analysis, biodiversity informatics

1.
2001 (Global Biodiversity
Information Facility, GBIF)
GBIF
(Darwin Core) DiGIRTAPIR BioCASE
(Hill et al., 2009) 2.7
GIS

GBIF

(Geospatial Knowledge Discovery)(Data Mining)

(Cluster Analysis)(Miller et al. 2009)

150

(Biodiversity informatics)

(Peterson et al,
2010)
Google Maps
(Zang et al., 2008) Google Maps


Google Maps

2.
2.1

( )
(Chapman, 2005)

GIS
GBIF

1
0.1
(GBIF Data Portal, 2011)
(Encyclopedia of Life, EOL)
(Encyclopedia of Life, 2011)

Hijmans 50*50
DIVA-GIS (Hijmans, 2001)Flemons GBIF
GBIF-MAPA (Services
Oriented ArchitectureSOA)
(Flemons et al., 2007)

2.2
2002
Dublin
Core

(TaiBIF) TAPIR (TDWG Access


Protocol for Information Retrieval)

TAPIR (Customization) GBIF
extension XML file Darwin
Core TaiBIF ( ,2010)

150
TaiBIF
40*40
10*10
2*2 3

2010

(a) 40
(b)10
1.

3.
3.1 DBSCAN

(Clusters)
(1) K (Partitioning methods)(2)

(Hierarchical methods) (3) (Density-based


methods)(4)(Grid-based methods)

(noises)(outliers)

DBSCANOPTICSDENCLUE
DBSCANDensity-Based Spatial Clustering of Applications with Noise
(Eps)
(MinPts)(Eps) MinPts
(1) (Directly
density-reachable) 3 D E E F (2)(Density-reachable)
3 D E F (3)(Density-connected)
3 B A C A

(Han et al., 2007)

3 DBSCAN

3.2
Darwin Core TAPIR
TaiBIF ()
DBSCAN
Google Maps 4

4
DBSCAN
MinPts

Convex hull
Incremental Jarvis's March (Gift Wrap) Divide and
ConquerQuick hull Quick hull

2010

5
MinPts
convex hull 6
35 MinPts 515
( 6)

Eps =5 ; MinPts=15

Eps =4 ; MinPts=15

Eps =3 ; MinPts=15

=3

=7

=6

6 Eps MinPts
DBSCAN
() MinPts
outliers outliers MinPts
k-dist MinPts k-dist dist(p,q)q
p k k-dist MinPts (Xu et al., 1998)
k-disk (Threshold point) MinPts Eps
MinPts Eps

MinPts Eps

Polygon Cliping
: SutherlandHodgman clippingWeilerAtherton clippingVatti
clipping Greiner
subject polygonclipping polygon

4.

1999

(8)

8 (, 1999)
DBSCAN 4 MinPts 11
9a 9b 8
DBSCAN

4 DBSCAN

a
b
DBSCAN (Eps =0.04 MinPts=11)
9 DBSCAN

5.

DBSCAN
()

4

Google Maps

Google Maps
GIS

Web 2.0 Google Maps

(scientific
workflow)

Chapman, A.D. (2005) Uses of primary species-occurrence data, version 1.0. Global
Biodiversity Information Facility.
Encyclopedia of Life (2011) Retrieved from http://www.eol.org
Ester, M., Kriegel, H.P., Sander J., Xu X. (1998) Clustering for Mining in Large Spatial
Databases. Special Issue on Data Mining, KI-Journal, ScienTec Publishing, 1, 1-7.
Finley, D.R. (2007) Point-In-Polygon Algorithm - Determining Whether A Point Is Inside A
Complex Polygon. http://www.alienryderflex.com/polygon/.
Flemons, P., Guralnick, R., Krieger, J., Ranipeta, A., Neufeld, D. (2007) A web-based GIS
tool for exploring the world's biodiversity: The Global Biodiversity Information
Facility Mapping and Analysis Portal Application (GBIF-MAPA). Ecological
Informatics, 2(1), 49-60.
GBIF Data Portal. (2011). Retrieved from http://data.gbif.org
Han, J., Kamber, M. (2006). Data Mining: Concepts and Techniques (2 ed.). Morgan
Kaufmann.
Hijmans, R.J., Spooner, D.M. (2001). Geographic distribution of wild potato species.
American Journal of Botany, 88(11), 2101-2112.
Hill, A.W., Guralnick, R., Flemons, P., Beaman, R., Wieczorek, J., Ranipeta, A. et al (2009)
Location, location, location: utilizing pipelines and services to more effectively
georeference the world's biodiversity data. BMC Bioinformatics, 10, S3.
Jaffe, A., Naaman, M., Tassa, T., Davis, M. (2006) Generating summaries and visualization
for large collections of geo-referenced photographs. In: Proceedings of the 8th ACM
international workshop on Multimedia information retrieval, 89-98.
Li, W., Ong, E., Xu, S., Hung, T. (2005) A Point Inclusion Test Algorithm for Simple
Polygons. In: Computational Science and Its Applications - ICCSA 2005, 3480, 769775.
Miller, H.J., Han, J. (2009) Geographic Data Mining and Knowledge Discovery (2 ed.). CRC
Press.
Mucke, E. (2009) Computing Prescriptions: Quickhull: Computing Convex Hulls Quickly.
Computing in Science Engineering, 11(5), 54-57.
Peterson, A.T., Knapp, S., Guralnick, R., Sobern, J., Holder, M.T. (2010) The big questions
for biodiversity informatics. Systematics and Biodiversity, 8(2), 159-168.
Shao, K.T., Peng, C.I., Yen, E., Lai, K.C., Wang, M.C., Lin, J. et al (2007) Integration of
Biodiversity Databases in Taiwan and Linkage to Global Databases. Data Science
Journal, 6, S2--S10.
Zang, N., Rosson, M.B., Nasser, V. (2008) Mashups: who? what? why?. CHI '08 extended
abstracts on Human factors in computing systems, 3171-3176
(1999)
288

2010
Biodiversity Science2010 Vol. 18 (5)
pp. 444-453ISSN10050094
2010 Google Map
--2010
ISBN9789860258349

You might also like