You are on page 1of 8

Geographic Service Discovery

for the Internet of Things

Martin Bauer and Salvatore Longo

NEC Laboratories Europe,


Kurfürsten-Anlage 36,
69115 Heidelberg, Germany
{martin.bauer,salvatore.longo}@neclab.eu

Abstract. In the Internet of Things vision, physical things become part


of the Internet and, . as a result, the Internet extends into the physi-
cal world. Applications start to be aware of the users’ environment and
users can have mediated interactions with the physical world through
the Internet of Things. Within the physical world, the spatial structure,
which can be described by geographical coordinates, is relevant for find-
ing services. Thus geographic service discovery becomes a core part of
an Internet of Things infrastructure. The key problem to solve is how
to efficiently extract the set of services whose geographic service area
overlaps with the geographic scope of the request from the potentially
huge number of services in the Internet of Things.
In this paper, we investigate the use of spatial indexes for the effi-
cient discovery of services within an area specified by geographic coordi-
nates. An experimental evaluation based on a prototype implementation
demonstrates the feasibility of the approach by measuring the perfor-
mance with respect to the request throughput under varying parameter
settings and configurations.

Keywords: Internet of Things, service discovery, spatial index,


geographic area, throughput evaluation.

1 Introduction

Today more and more products are being equipped with sensors and actuators,
and many of these are connected to the Internet. In the Internet of Things (IoT)
vision, these sensors and actuators become part of an IoT infrastructure that is
accssible to a variety of applications, typically through IoT services hosted in the
cloud. This enables a new class of opportunistic IoT applications that are not
configured or hard-wired for specific IoT services; they need to discover relevant
ones, e.g. in the current physical environment of a user, making discovery a key
functionality for an IoT infrastructure, as identified in the functional view of the
Architectural Reference Model (IoT ARM) [1].1
1
The IoT ARM was developed in the European Project IoT-A [2].

R. Hervás et al. (Eds.): UCAmI 2014, LNCS 8867, pp. 424–431, 2014.

c Springer International Publishing Switzerland 2014
Geographic Service Discovery for the Internet of Things 425

With the number of connected IoT devices growing into the billions – e.g.
Cisco forecasts 50 billion devices connected by 2020 [3] – the discovery function-
ality needs to be highly scalable. To achieve the required scalability, a distributed
discovery approach is needed since the throughput of a single node in the cloud
is always limited. The number of nodes involved in each discovery operation
should be limited as well, as this may become the bottleneck for aggregation.
Geographic loaction based on geographic coordinates is a slective criterion
for the distribution, i.e. one node is responsible for a certain geographic area.
Geographic coordinates are easy to determine, e.g. by using GPS or selecting a
location on a map and also highly selective, e.g. there may be millions of services
providing air quality information, but only a few are related to the location of
interest. Within a single node, efficient access can be achieved by using a spatial
index structure. Some spatial index structures like quadtrees [4] or kd-trees are
used for indexing point locations, whereas R-trees [5] and its variants can also
be used for indexing area locations, which we need for indexing service areas.
In this paper, we propose a cloud-based distributed service discovery for the
Internet of Things, based on geographic scopes. Geographic Information Systems
(GIS) have been using spatial data infrastructures with catalogues for geograph-
ical information [6]. These are utilized for storing and accessing large amounts
of relatively static geographic information like roads or buildings, but not for
the discovery of services. On the other hand, there have been proposals for
ontology-based service discovery using symbolic locations [7]. The focus of our
approach is on using geographic scopes based on geographic coordinates. The
core contribution is on measuring key performance aspects for evaluating the
practical feasibility of such an approach – something we have not been able to
find elsewhere.
Section 2 gives an overview of the geographic service discovery architecture
and the functionalities provided. Section 3 presents the evaluation of our pro-
totype. We first look at the performance of a single node and then we analyze
the distributed case with a single provider, allowing a perfect geographic parti-
tioning, where each node is responsible for a distinct geographic area and the
multi-provider case with overlapping geographic service areas. Finally, we pro-
vide a conclusion and an outlook on future work in Section 4.

2 Approach
The core idea of geographic discovery is to find information related to a geo-
graphic area. The geographic area is given as a geographic scope. In addition the
information to be discovered needs to be specified. The result of a geographic
discovery request is all the information whose geographic location matches the
geographic scope. Geographic location can be given as a point location or an
area location. For example, a point location may be suitable for determining the
location of small objects, whereas services may have larger service areas, e.g. the
area covered by a video camera.
426 M. Bauer and S. Longo

2.1 Functionality

Overall, the approach follows the service-oriented architecture (SOA) paradigm [8].
In a typical interaction, a client queries the geographic discovery for service descrip-
tions (represented in RDF/XML), providing a service specification, which specifies
what services are of interest to the client, and a geographic scope, that describes the
geographic area for which the services are requested. The geographic scope is then
matched against the geographic service areas, filtering the service descriptions ac-
cording to the service specifications. The fitting service descriptions are returned
to the client. Subsequently the client may call one or more of the services using the
information provided.
To serve the different needs of applications, we see the requirement to sup-
port synchronous one-time discovery requests, as well as requests for continuous
asynchronous notifications informing about changes.
In addition, management operations for inserting, updating and deleting ser-
vice descriptions with the respective service areas are needed in order to update
index structures. For the purpose of this paper we only evaluate synchronous
discovery requests (with rectangles specified by the coordinates of two diagonal
vertices as scopes).

2.2 Architecture

In the following subsections, we first describe the internal architecture of a single


geographic index server and then we describe the distributed architecture that
we proposed to achieve the required scalability as well as the envisioned multi-
provider approach.

Geographic Index Server. The geographic index server implements the dis-
covery and management operations described above, using a REST-like binding.
The internal subcomponents are the discovery indexer based on the spatial index
and the object information index. The discovery indexer part implements the
logic core of the geographic index server using an in-memory spatial index, based
on an R-Tree [5], indexing the geographic information, or a persistent spatial in-
dex implementation, which internally also uses an R-Tree index. The in-memory
object information index is used for storing other information associated to the
services like the output of a service or the service type. We decided to use the
standard R-Tree data structure because we need a spatial index structure that
can handle rectangular geographic areas, as we are indexing service areas.

Distributed Architecture. Due to the large number of IoT services and the
required throughput to serve the expected number of application requests, a
single geographic index server will not be sufficient. Therefore, we propose a
distributed hierarchical architecture as shown in Figure 1. We introduce cata-
logue servers that do not store the service areas of IoT services, but rather the
service areas of geographic index servers. So for the discovery of IoT services
Geographic Service Discovery for the Internet of Things 427

first the top-level catalogue server is contacted, which then uses the geographic
scope to identify the (small) set of geographic index servers that have overlap-
ping service areas. The request is then forwarded to this subset and the results
are aggregated.
In principle, a hierarchy of catalogue servers can be used as indicated in
Figure 1, since catalogue servers can transparently be used instead of geographic
index servers.

Fig. 1. Distributed geo-discovery architecture

If the distributed geo-discovery infrastructure is operated by a single opera-


tor, the geographic areas served by geographic index servers can be partitioned
as shown on the right bottom part of Figure 1, which would be useful with re-
spect to limiting the number of servers that have to be contacted for executing a
certain request. For the Internet of Things, we consider a scenario with multiple
operators more likely. These operators may also want to keep their core informa-
tion to themselves for both business and privacy reasons. As a result, there may
be multiple geographic index servers responsible for a certain area as shown in
the left bottom part of Figure 1. In Section 3 we provide an initial comparison
of the two cases to get an idea what performance penalty has to be paid in the
multi-operator case with overlapping service areas.

3 Evaluation

In this section, we investigate the performance and scalability of our geographic


discovery approach. First we describe the testbed configuration together with the
evaluation methodology. The main operation evaluated is the service discovery
428 M. Bauer and S. Longo

based on geographic scopes. We show how our approach performs with respect
to throughput in different settings. For a single geographic index server, the
parameters we vary are the number of service descriptions stored, the number of
requests executed, the available network bandwidth, the size of the result set, and
the use of persistent and in-memory spatial index implementations. Finally, we
evaluate a distributed setting with partitioned as well as overlapping geographic
areas.

3.1 Testbed Configuration and Evaluation Methodology


Our testbed configuration has a client-server structure, where the client estab-
lishes several HTTP connections to the server on which the geographic index
server is running. The connection between server and client is a point to point
100Mbit LAN connection. The server on which the geographic index server was
running has the following configuration:

– CPU: Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz


– OS: Ubuntu Server 12.10
– MemTotal: 8GB

The overall evaluation is focused on a throughput analysis where the through-


put is the average rate of successfully communicated and executed requests. The
main goal is to understand the request load supported by a geographic server
running on a single node. For this reason we look at the throughput of the dis-
cover operation and how different network bandwidths affect this. Two tools
were used for performing the measurements: ab (Apache Benchmark) [9] and
Netem [10]. The first is an open source Apache tool for benchmarking HTTP
servers and the second is a tool for emulating different network bandwidths.

3.2 Experimental Result


We first investigated how a single geographic index server performs and later on
we evaluated the distributed approach. Before the real tests we analyzed which
parameters could influence the performance, and we identified the following pa-
rameters:

– the number of inserted service descriptions


– the available network bandwidth
– the response body size

The tests were selected based on the identified relevant parameters.

Single Geographic Index Server Evaluation. First we analyzed how the


geographic discovery performs for the discover operation taking into account the
network layer. For this evaluation we performed several discovery requests (100,
1,000, 10,000, 100,000), with a fixed response body size (making sure that only
Geographic Service Discovery for the Internet of Things 429

5 locators, that is the url where the service description is stored, were returned
per each request, encapsulated in an XML message with response body size of
1578 bytes). The achieved average throughput was around 2,000 requests/second.
The performed tests have demonstrated that the number of inserted service
descriptions has little if any influence on the discovery throughput.
The tests performed on the internal index structure show that no matter if
there are 500 or 100,000 services inserted in the geographic index server, the ge-
ographic discovery operation is only marginally affected. Therefore, all following
tests are based on a geographic index server pre-populated with 10,000 services.
The next step was to analyze how the available network bandwidth limits
the geographic discovery throughput using different network configurations. For
changing the network configuration, we used the netem [10] tool with the fol-
lowing ethernet configurations: 1Mb, 10Mb and 100Mb uplink/downlink.

Fig. 2. Geographic discovery evaluation Fig. 3. Throughput comparison between


on changing network bandwidth in-memory and persistent implementation

As shown in Figure 2, there is a strong relation between the bandwidth


size and the throughput. With a 10Mbit bandwidth, the maximum throughput
achieved was about 700 requests/second but in case of 100Mbit we got around
2,000 requests/seconds. The bandwidth limits the information transfer rate and
this directly affects the discovery throughput, indicating that the limiting factor
is the size of the response messages and not the index lookup.
The next step was to compare the current implementation of the geographic
discovery, based on an in-memory spatial index, with a persistent implementation
that is using postGIS for the spatial indexing. Figure 3 shows the comparison. It
is evident that the in-memory approach is more efficient in terms of throughput
compared to the persistent one. In addition to that, it is interesting to see how
the throughput decreases with increasing response body size (i.e. number of
discovered services). As also shown in Figure 3, if the response body size is
small, i.e. 5 stored services, the throughput is about 2,000 requests/second but
in case of 3,200 services the performance drops to less than 200 requests/second.
This demonstrates how the available network bandwidth and the number of
discovered services can significantly influence the overall performance.
430 M. Bauer and S. Longo

Catalogue Approach Evaluation. The evaluation of a single geographic in-


dex server was the basis for understanding how a distributed geographic discov-
ery architecture, described in Section 2, could perform. We evaluate the catalogue
approach focusing more on which benefits it introduces and which price we need
to pay in terms of throughput.

Fig. 4. Catalogue server throughput evaluation: single vs. multi-domain

The evaluation of the distributed approach with one catalogue server took
into consideration the single and multi-domain approaches. In the first case there
is a single operator that will serve a particular geographic area that could be
partitioned as shown on the right side of the tree in Figure 1. In this case
each geographic index server could be assigned to a specific area without any
overlaps. In the case of the multi-domain approach the overlap between areas
cannot be prevented as shown on the left side of the tree in Figure 1. The
tests were performed using the same testbed configuration with one catalogue
and four geographic index servers running on the server. Results are shown in
the Figure 4. Compared to the single geographic index server evaluation, the
overall performance decreased and this seems reasonable because we introduced
an additional layer between the test client and the geographic index server.
The maximum catalogue throughput achieved in this environment was about
650 requests per second for the single domain approach. The penalty is almost
2/3 of the achieved throughput on a single server (2,000 requests/seconds). The
performance comparison shows that the penalty for having overlapping service
areas is visible, but limited as shown in Figure 4.

4 Conclusion
As can be seen from the evaluation, the available network bandwidth plays an
important role for the overall performance of the geographic discovery infras-
tructure. This shows that a high selectivity of the request, i.e. limiting the result
set early in the process is important. Using geographic scopes already provides
relatively high selectivity as compared to other parts of the service description.
Geographic Service Discovery for the Internet of Things 431

In addition, the network bandwidth should be taken into account when choosing
the representation of the information, e.g. a plain RDF/XML-based represen-
tation is relatively verbose and thus has a negative impact on the throughput.
A distributed setting with a single catalogue server has a lower performance,
because the catalogue server has to wait for and aggregate responses from the
geographic index servers. The good point is that it is comparatively cheap to
replicate catalogue servers as the set of geographic index servers is expected to be
relatively stable compared to the set of IoT services, so the overhead of keeping
replicas synchronized is low. Based on the measurements we took, we believe that
it will be possible to build a scalable geographic discovery infrastructure for the
Internet of Things. As a next step, we plan to analyze a large scale IoT scenario
with respect to the discovery request load it generates and evaluate what geo-
graphic discovery infrastructure configuration is needed to support such a load
and whether such a configuration seems viable from a business perspective.

Acknowledgment. This paper describes work undertaken in the context of


the projects Internet of Things Architecture (IoT-A) and MobiNet.. IoT-A and
MobiNet are Large Scale Collaborative Projects supported by the European 7th
Framework Programme under the contract numbers 257521 and 318485 respec-
tively.

References
1. Bassi, A., Bauer, M., Fiedler, M., Kramp, T., van Kranenburg, R., Lange, S.,
Meissner, S. (eds.): Enabling Things to Talk: Designing IoT solutions with the IoT
Architectural Reference Model. Springer, Heidelberg (2013)
2. IoT-A European Project, http://www.iot-a.eu
3. Cisco IoT Forecast, http://share.cisco.com/internet-of-things.html
4. Finkel, R., Bentley, J.: Quad trees a data structure for retrieval on composite keys.
Acta Informatica 4(1), 1–9 (1974)
5. Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Pro-
ceedings of the 1984 ACM SIGMOD International Conference on Management of
Data, SIGMOD 1984, pp. 47–57. ACM, New York (1984)
6. Groot, R., McLaughlin, J.: Geospatial data infrastructure - Concepts, cases, and
good practice. Oxford University Press (2000)
7. Lutz, M.: Ontology-based descriptions for semantic discovery and composition of
geoprocessing services. Geoinformatica 11(1), 1–36 (2007)
8. Papazoglou, M.P., Traverso, P., Dustdar, S., Leymann, F.: Service-oriented com-
puting: State of the art and research challenges. Computer 40(11), 38–45 (2007)
9. Apache Benchmark Tool, http://httpd.apache.org/docs/2.2/programs/ab.html
10. Netem, Network Emulator Tool,
http://www.linuxfoundation.org/collaborate/workgroups/networking/netem

You might also like