Professional Documents
Culture Documents
ANALYSIS-READY FORMAT
ABSTRACT requires the user to know the storage structure, index fields,
and the data format, and thus fails to satisfy the demand of
Diverse storage formats, archive dispersal, and inconsistent fast data discovery and on-demand read in the big data era.
naming make it difficult for researchers and the general So how to remove barriers to data availability and
public to find and access remote sensing data. To facilitate accessibility becomes one of the key issues to be solved in
the use of remote sensing data, this paper provides an geosciences and spatial information sciences.
integrated framework for direct reading remote sensing data Nowadyas, to process remote sensing data, except for
in a widely compatible and analysis-ready format, NumPy commercial software, there is a growing interest in open
ndarray. The framework is composed of two main source alternatives[5]. Nowadays, the Python programming
components. One is the raster data processing and storage language has become one of the fastest-growing
model. All the operational gridded remote sensing data are programming languages in the remote sensing community in
split into tiles, and reorganized in n-dimensional array. Then the last decade, and many libraries for processing remote
the N-Dimensional data array is serialized into netCDF and sensing data can be accessed through Python. The
stored into distributed file system. The other is the Geospatial Data Abstraction Library (GDAL;
spatiotemporal filter to achieve parallel query, and it has www.gdal.org), for instance, provides a single abstract
been encapsulated into Internet-accessible application model for reading and writing all of the common image file
programming interfaces (APIs). The scenario of calculating formats and has been widely used. The Remote Sensing and
NDVI of a specified spatiotemporal range given at last GIS Library (RSGISLib; www.rsgislib.org) offers more than
illustrate the efficiency and convenience of our platform 300 commands for processing remote sensing data,
provided for remote sensing data analysis. including stacking image bands, image segmentation, image-
to-image registration, and so on. The Scikit-learn Python
Index Terms— remote sensing data, spatiotemporal library [6], which requires data to be presented as NumPy
retrieval, analysis-ready format, multi-dimensional array ndarrays, contains a number of machine learning algorithms,
including random forest, neural network models, etc. All
1. INTRODUCTION these libraries are leading us to a new world of processing
remote sensing data.
With the rapid development of remote sensing technology, So, in order to facilitate the use of remote sensing data,
the volume of global coverage image data increases greatly. we propose an architecture for data access in an analysis-
In order to obtain higher-quality decision-making results, ready format, which is NumPy ndarray, the most popular
many applications, such as environmental change monitoring, data format for n-dimensional data[7]. Through the data
are using remote sensing data more and more frequently for access API we designed and implemented, users are able to
comprehensive analysis. read remote sensing image values of a given spatiotemporal
To transform remote sensing data into scientific range and other query conditions.
understanding, a majority of research have been done, such This paper is organized as follows. In section 2, the
as change detection[1], image fusion[2], edge detection[3], system architecture of the platform is proposed. In section 3,
land use and land cover extraction[4], and so on. we give a detailed description about the data storage model.
However, the diverse storage formats, archive dispersal, In section 4, we present the spatiotemporal query process.
and inconsistent naming make it difficult for researchers and Section 5 present a case of querying and calculating NDVI
the general public to find and access these data. To obtain using Landsat data. Finally, Section 6 offers conclusion and
long time series data of a certain region, for instance, one future directions of this paper.
need to find the image data by region and temporal query
first, and then download and read them. In this way, it
Request Parse
Spatiotemporal Filter
Grids intersect with
Spatial mask MBR in WGS84
satellite image
Temporal Filter
Grid intersection reprojection
MBR
Locating Data
Grids transformation Satellite image
Data Storage
Image tiles array
5263
Grid_X increase by 1 with the longitude increase by one Region Queries
analysis of the spatial query conditions and the tiles stored. Fig. 3 Flow chart of a spatial region query
And NumPy arrays obtained from multiply cluster nodes are
aggregated at last. After that, a number of functions for 5. APPLICATION ANALYSIS
image processing available through NumPy can be used for
analysis. This section presents a real-world scenario of calculating
Fig 3 depicts the process of a region query which finds NDVI from Landsat data for a given specific spatial and
out data that intersect or covered by a given region. temporal criteria.
Because it is very common for a single user to repeat Landsat data is one of the longest series available of
the same queries, caching technique is used to avoid re- satellite observations distributed by USGS in GeoTiff format.
searching values that have already been previously requested. In our work, 2.5 TB Landsat data has been reorganized and
All the query conditions and results stored in the cache are stored, and data retrieve method has been encapsulated as
expired by the improved Least Recently Used (LRU) API. Users can retrieve all the Landsat data that stored in
algorithm, which guarantees that each query is stored only our system through the data access API based on our Jupyter
once. When subsequent requests for the same data arrive, the environment.
earlier retrieve results are returned by checking the cache. If In order to illustrate how the platform can be leveraged
the query has not been performed, transform the coordinate for data access and analysis, figure 4 gives the details of the
system of the query polygon into WGS84, and calculate its real-time NDVI calculation from Landsat remote sensing
Minimum Boundary Rectangle (MBR). The Grid_X and data.
Grid_Y can be easily obtained according to equations 1 and
2. And the girds covered by the MBR can be retrieved out.
Then, by intersection analysis, the intersections geometries
are returned.
5264
7. ACKNOWLEDGMENTS
8. REFERENCES
[1] H. Leichtle, T., Geiß, C., Wurm, M., Lakes, T., & Taubenböck,
“Unsupervised change detection in VHR remote sensing imagery–
an object-based clustering approach in a dynamic urban
environment,” Int. J. Appl. Earth Obs. Geoinf., vol. 54, pp. 15–27,
2017.
Fig. 4 Spatiotemporal data access and NDVI calculation
[2] H. S. Jung and S. W. Park, “Multi-sensor fusion of landsat 8
From figure 4, we can see that, by giving the spatial thermal infrared (TIR) and panchromatic (PAN) images,” Sensors
query region and other search criteria, such as "LANDSAT", (Switzerland), vol. 14, no. 12, pp. 24425–24440, 2014.
"L45TM" , "EPSG:4326", time period, and calling the
[3] M. Han, X. Yang, and E. Jiang, “An Extreme Learning
“info_by_geom” function, the results can be located by Machine based on Cellular Automata of edge detection for remote
returning the info data. Then, calling the “query_by_geom” sensing images,” Neurocomputing, vol. 198, pp. 27–34, 2016.
functions, the results are retrieved. At last, NDVI can be
calculated by reading the values of band3 and band4. The [4] P. Tokarczyk, J. D. Wegner, S. Walk, and K. Schindler,
NDVI results are displayed in the below. “Features, color spaces, and boosting: New insights on semantic
classification of remote sensing images,” IEEE Trans. Geosci.
Remote Sens., vol. 53, no. 1, pp. 280–295, 2015.
6. CONCLUSIONS
In this paper, a platform providing analysis-ready remote [5] V. T. T, “Object-based remote sensing image analysis with
sensing data is presented. It consists of two main modules OSGeo tools,” AGSE 2012–FOSS4G-SEA, p. 79, 2012.
which are tasked with splitting remote sensing data into tiles
and spatiotemporal query algorithm. All the tiles are stored [6] F. Pedregosa, R. Weiss, and M. Brucher, “Scikit-learn :
in a distributed file system, and metadata are stored to Machine Learning in Python,” J. Mach. Learn. Res., vol. 12, no.
facilitate the processing of queries. And a two-step Oct, pp. 2825–2830, 2011.
spatiotemporal query algorithm is implemented. By parallel
performing user’s query, the platform can save cluster [7] T. A. Collaboration et al., “Astropy : A community Python
package for astronomy,” Astron. Astrophys., vol. 558, p. A33,
resources to support a large number of concurrent queries. In
2013.
addition, caching is used as a means to increase system
response time and improve data retrieval efficiency.
Our platform makes it easy to use remote sensing data.
Instead of downloading massive amounts of data and
clipping out values of desired region, users can access and
read desired data by just connecting the Internet and call the
encapsulated APIs. This will lead to an innovation in data-
driven analysis, which will also greatly promote the
popularization of remote sensing data in various fields.
This paper addresses the challenges for efficient remote
sensing data access and make it easy to process the remote
sensing data. However, the remote sensing data we provide
have not been processed to remove cloud, and the
corresponding quality data are not integrated. How to
improve our data storage model and data retrieve method to
support data storage with more information and of higher
quality is another interesting topic. And it will be the next
step of our research.
5265