Professional Documents
Culture Documents
Textbook Database and Expert Systems Applications 29Th International Conference Dexa 2018 Regensburg Germany September 3 6 2018 Proceedings Part Ii Sven Hartmann Ebook All Chapter PDF
Textbook Database and Expert Systems Applications 29Th International Conference Dexa 2018 Regensburg Germany September 3 6 2018 Proceedings Part Ii Sven Hartmann Ebook All Chapter PDF
https://textbookfull.com/product/big-data-analytics-and-
knowledge-discovery-20th-international-conference-
dawak-2018-regensburg-germany-september-3-6-2018-proceedings-
carlos-ordonez/
https://textbookfull.com/product/computer-vision-
eccv-2018-workshops-munich-germany-
september-8-14-2018-proceedings-part-ii-laura-leal-taixe/
https://textbookfull.com/product/computer-vision-eccv-2018-15th-
european-conference-munich-germany-
september-8-14-2018-proceedings-part-iii-vittorio-ferrari/
https://textbookfull.com/product/computer-vision-eccv-2018-15th-
european-conference-munich-germany-
september-8-14-2018-proceedings-part-iv-vittorio-ferrari/
https://textbookfull.com/product/computer-vision-eccv-2018-15th-
european-conference-munich-germany-
september-8-14-2018-proceedings-part-vii-vittorio-ferrari/
https://textbookfull.com/product/computer-vision-eccv-2018-15th-
european-conference-munich-germany-
september-8-14-2018-proceedings-part-i-vittorio-ferrari/
https://textbookfull.com/product/computer-vision-eccv-2018-15th-
european-conference-munich-germany-
september-8-14-2018-proceedings-part-x-vittorio-ferrari/
Sven Hartmann · Hui Ma
Abdelkader Hameurlain
Günther Pernul
Roland R. Wagner (Eds.)
LNCS 11030
123
Lecture Notes in Computer Science 11030
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board
David Hutchison
Lancaster University, Lancaster, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Friedemann Mattern
ETH Zurich, Zurich, Switzerland
John C. Mitchell
Stanford University, Stanford, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
C. Pandu Rangan
Indian Institute of Technology Madras, Chennai, India
Bernhard Steffen
TU Dortmund University, Dortmund, Germany
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max Planck Institute for Informatics, Saarbrücken, Germany
More information about this series at http://www.springer.com/series/7409
Sven Hartmann Hui Ma
•
123
Editors
Sven Hartmann Günther Pernul
Clausthal University of Technology University of Regensburg
Clausthal-Zellerfeld Regensburg
Germany Germany
Hui Ma Roland R. Wagner
Victoria University of Wellington Johannes Kepler University
Wellington Linz
New Zealand Austria
Abdelkader Hameurlain
Paul Sabatier University
Toulouse
France
LNCS Sublibrary: SL3 – Information Systems and Applications, incl. Internet/Web, and HCI
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
This volume contains the papers presented at the 29th International Conference on
Database and Expert Systems Applications (DEXA 2018), which was held in
Regensburg, Germany, during September 3–6, 2018. On behalf of the Program
Committee, we commend these papers to you and hope you find them useful.
Database, information, and knowledge systems have always been a core subject of
computer science. The ever-increasing need to distribute, exchange, and integrate data,
information, and knowledge has added further importance to this subject. Advances in
the field will help facilitate new avenues of communication, to proliferate interdisci-
plinary discovery, and to drive innovation and commercial opportunity.
DEXA is an international conference series that showcases state-of-the-art research
activities in database, information, and knowledge systems. The conference and its
associated workshops provide a premier annual forum to present original research
results and to examine advanced applications in the field. The goal is to bring together
developers, scientists, and users to extensively discuss requirements, challenges, and
solutions in database, information, and knowledge systems.
DEXA 2018 solicited original contributions dealing with any aspect of database,
information, and knowledge systems. Suggested topics included, but were not limited
to:
– Acquisition, Modeling, Management, and Processing of Knowledge
– Authenticity, Privacy, Security, and Trust
– Availability, Reliability, and Fault Tolerance
– Big Data Management and Analytics
– Consistency, Integrity, Quality of Data
– Constraint Modeling and Processing
– Cloud Computing and Database-as-a-Service
– Database Federation and Integration, Interoperability, Multi-Databases
– Data and Information Networks
– Data and Information Semantics
– Data Integration, Metadata Management, and Interoperability
– Data Structures and Data Management Algorithms
– Database and Information System Architecture and Performance
– Data Streams and Sensor Data
– Data Warehousing
– Decision Support Systems and Their Applications
– Dependability, Reliability, and Fault Tolerance
– Digital Libraries and Multimedia Databases
– Distributed, Parallel, P2P, Grid, and Cloud Databases
– Graph Databases
– Incomplete and Uncertain Data
– Information Retrieval
VI Preface
Roland R. Wagner, to our publication chair, Vladimir Marik, and to our workshop
chairs, A Min Tjoa and Roland R. Wagner.
We wish to express our deep appreciation to Gabriela Wagner of the DEXA con-
ference organization office. Without her outstanding work and excellent support, this
volume would not have seen the light of day.
Finally, we like to thank Günther Pernul and his team for being our hosts during the
wonderful days in Regensburg.
General Chairs
Abdelkader Hameurlain IRIT, Paul Sabatier University, Toulouse, France
Günther Pernul University of Regensburg, Germany
Roland R. Wagner Johannes Kepler University Linz, Austria
Publication Chair
Vladimir Marik Czech Technical University, Czech Republic
Program Committee
Slim Abdennadher German University, Cairo, Egypt
Hamideh Afsarmanesh University of Amsterdam, The Netherlands
Riccardo Albertoni Institute of Applied Mathematics and Information
Technologies - Italian National Council of Research,
Italy
Idir Amine Amarouche University Houari Boumediene, Algeria
Rachid Anane Coventry University, UK
Annalisa Appice Università degli Studi di Bari, Italy
Mustafa Atay Winston-Salem State University, USA
Faten Atigui CNAM, France
Spiridon Bakiras Hamad bin Khalifa University, Qatar
Zhifeng Bao National University of Singapore, Singapore
Ladjel Bellatreche ENSMA, France
Nadia Bennani INSA Lyon, France
Karim Benouaret Université Claude Bernard Lyon 1, France
Benslimane Djamal Lyon 1 University, France
Morad Benyoucef University of Ottawa, Canada
Catherine Berrut Grenoble University, France
Athman Bouguettaya University of Sydney, Australia
Omar Boussaid University of Lyon/Lyon 2, France
Stephane Bressan National University of Singapore, Singapore
Barbara Catania DISI, University of Genoa, Italy
Michelangelo Ceci University of Bari, Italy
Richard Chbeir UPPA University, France
X Organization
Additional Reviewers
Valentyna Tsap Tallinn University of Technology, Estonia
Liliana Ibanescu AgroParisTech, France
Cyril Labbé Université Grenoble-Alpes, France
Zouhaier Brahmia University of Sfax, Tunisia
Dunren Che Southern Illinois University, USA
Feng George Yu Youngstown State University, USA
Gang Qian University of Central Oklahoma, USA
Lubomir Stanchev Cal Poly, USA
Jorge Martinez-Gil Software Competence Center Hagenberg, Austria
Loredana Caruccio University of Salerno, Italy
Valentina Indelli Pisano University of Salerno, Italy
Jorge Bernardino Polytechnic Institute of Coimbra, Portugal
Bruno Cabral University of Coimbra, Portugal
Paulo Nunes Polytechnic Institute of Guarda, Portugal
William Ferng Boeing, USA
Amin Mesmoudi LIAS/University of Poitiers, France
Sabeur Aridhi LORIA, University of Lorraine - TELECOM Nancy,
France
Julius Köpke Alpen Adria Universität Klagenfurt, Austria
Marco Franceschetti Alpen Adria Universität Klagenfurt, Austria
Meriem Laifa Bordj-Bouarreridj University, Algeria
Sheik Mohammad University of Sydney, Australia
Mostakim Fattah
Mohammed Nasser University of Sydney, Australia
Mohammed Ba-hutair
Ali Hamdi Fergani Ali University of Sydney, Australia
Masoud Salehpour University of Sydney, Australia
Adnan Mahmood Macquarie University, Australia
Wei Emma Zhang Macquarie University, Australia
Zawar Hussain Macquarie University, Australia
Hui Luo RMIT University, Australia
Sheng Wang RMIT University, Australia
Lucile Sautot AgroParisTech, France
Jacques Fize Cirad, Irstea, France
XIV Organization
Information Retrieval
Uncertain Information
Data Streams
Learning
Emerging Applications
Creating Time Series-Based Metadata for Semantic IoT Web Services. . . . . . 417
Kasper Apajalahti
Data Mining
Privacy
Text Processing
Data Semantics
Efficient Top-k Cloud Services Query Processing Using Trust and QoS . . . . . 203
Karim Benouaret, Idir Benouaret, Mahmoud Barhamgi,
and Djamal Benslimane
Answering Top-k Queries over Outsourced Sensitive Data in the Cloud. . . . . 218
Sakina Mahboubi, Reza Akbarinia, and Patrick Valduriez
Social Networks
Community Structure Based Shortest Path Finding for Social Networks . . . . . 303
Yale Chai, Chunyao Song, Peng Nie, Xiaojie Yuan, and Yao Ge
Contents – Part I XXIII
1 Introduction
In a recent study, it is showed that 90% of Web mail traffic is machine generated [1, 7,
8, 10]. As mentioned in [1], “A common characteristics of these machine generated
messages is that most of them are highly structured documents, with rich HTML
formatting, and they are repeated over and over, modulo minor variations, in the
global mail corpus. These characteristics clearly facilitate the application of auto-
mated data extraction and learning methods at a very large scale”. With a significant
chunk of web mail traffic composed of machine generated emails, it becomes a natural
goal to automatically extract the information from these emails, which must be acted
upon by the recipient by scheduled time, e.g., payment of utility bill by the due date.
Therefore, we define actionable information as “information that must be acted upon by
scheduled time, by the recipient”. The requirement to extract actionable information
from machine generated emails is present for many domains, such as flight information,
shipment arrival, payment due date, etc., [5].
A big challenge faced by such systems is: such emails contain personal informa-
tion. Therefore, it becomes difficult to get access to large corpus of labeled data to build
annotators based on supervised methods [1, 3]. Moreover, even though machine
generated emails are structured, their structure changes over time. Further, new service
providers join continuously. Hence, it becomes difficult for supervised methods to
model the tail traffic. For example, hotel reservation emails have more than 8000
different small providers representing more than 50% of the total such emails in our
database. Similarly, we have over one thousand small service providers for flight data,
representing more than 40% of all flight reservation emails. There exist supervised
methods [8, 9] that work on labeled data with varying degree of success.
In this paper, we present a novel method to extract actionable information from
emails. We show that the region in the email containing actionable information can be
represented as combination of one or more small template trees. These template trees
have significantly simpler structure compared to the original email. Further, these
template trees are highly repetitive across a diverse set of email from a given service
provider and contain all the information of our interest. We develop a principled and
scalable approach which exploits the semi-structured format of the emails to extract
these template trees. Requirement of training data, for building supervised annotators
over email data, can be reduced significantly, if such small and well-structured snippets
from the emails can be identified for information extraction.
Only information needed by our system is a domain specific dictionary. We call it
domain knowledge. The domain knowledge has been used in earlier systems to
automatically extract the information from HTML web pages [15]. Similarly, there are
existing systems to extract wrappers from HTML pages [16]. However, unlike a
wrapper a template tree is a sub-tree in the email HTML DOM tree. In [17], authors
present a system to extract templates of an entire web pages in an unsupervised manner.
On the other hand, we templatize only the structure of the core region of the emails,
containing the information of our interest, with the aid of domain knowledge.
However, the work in [16, 17] exposes that machine generated HTML documents
(web pages or emails) have a fixed structure, encoded with the actual data. We present
a novel system, that builds on these ideas to extract actionable information from emails.
We use the term ‘domain’ for an entire service. For instance, for extracting
information from flight related emails, ‘flight’ is the domain. A specific vendor in a
domain is called service provider, or just ‘provider’. In ‘flight’ domain, each airline is a
provider (e.g., ‘American Airlines’). We will use the emails from flight domain as
running example throughout the paper.
Domain specific dictionaries contain keywords and regex patterns specific to that
domain. Dictionaries are applicable on entire domain and are not specific to any
provider. Therefore, they are built using public information and it is much easier to
acquire and learn domain specific dictionaries as opposed to acquiring labelled data for
every provider (cf. Sect. 3.2).
The technique described in this paper is applicable across many domains for which
domain specific information can be represented in a dictionary. This is a highly generic
requirement, applicable across most domains, producing machine generated emails.
Template Trees: Extracting Actionable Information 5
1.1 Contributions
2 Related Work
One of the first methods to learn templates from email was reported in [11]. They learn
templates from the email subject, over a representative sample of emails. In [3], the
authors present a statistical method to extract product names from email. Their system
introduces the concept of email templates and domain specific dictionaries, but the
focus of work in [3] is to annotate product names from the extracted text. In [8], the
authors propose a method to discover templates in the email but here template implies a
fixed phrase repeating across emails, whereas, for our problem presented in this paper,
a template implies a fixed DOM structure along with phrases.
In [4], authors present a method to mine ‘Data Records’ in a Web Page. Although
the technique is not tailored for emails, they extract ‘Data Record’ as a core region from
an HTML web page. The concept of ‘core region’ is similar to our concept core-region.
However, they use an edit-distance based method to identify the core region and makes
specific assumptions about the email structure. Even a minor variation in the email
structure will make the method un-scalable.
In [2, 6], authors present a method to categorize incoming emails into categories
such as bills, itineraries, promotions, receipts, etc. In [2], the authors propose a text
based template generation method, as opposed to tree-based templates in our system. In
[6], the objective is to assign one of the predefined categories to an incoming email. In
[7], the objective is to predict the distribution of the category of mail that may arrive
over a given time window.
In [10], authors exploit the DOM tree of email HTML structure to cluster emails in
one of a prespecified m clusters. The idea to exploit HTML DOM structure to cluster
similar emails was first explored in [1]. However, template trees are different from just
matching the DOM structure (Sect. 4.1).
6 M. K. Agarwal and J. Singh
3 Preliminaries
As reported in [1], machine generated emails are HTML formatted structured docu-
ments, modulo minor variations. We exploit this observation in our methodology.
There are two types of nodes in this tree: Internal nodes, which are non-text nodes
and leaf nodes which are text nodes. We keep a hash table with node’s DeweyId as the
key. For internal nodes, it points to its children and its parent node. For leaf nodes it
points to its text value and parent node. Tree and hash tables are prepared in a single
pass, while parsing the email.
Our objective is to identify the smallest sub-tree in the email tree that contains all
the text nodes of our interest. A text node is of our interest if it contains a token from
the Domain dictionary. This sub-tree is called the core region and is denoted by iTree
(short for information tree). In next section, we show how we prepare the domain
specific dictionary.
Template Trees: Extracting Actionable Information 7
0
0.0.0.0
0.0.0.0.1
Fig. 2. The structure of the DOM tree of HTML in Fig. 1, encoded with the DeweyIds
We now present our technique to identify the core region efficiently. While processing
the text nodes in the HTML DOM tree of an email, we look for presence of domain
specific tokens in it. A node n containing w tokens is assigned a score of w. This score
travels all the way from that node up to root of the email DOM tree. The score of each
node from node n till root is incremented by w. Therefore, the common parent of two
nodes n1, with score w1 and n2 with score w2 will have a score of w1 + w2, and so on.
Therefore, the root of the tree will have the highest score (wh). We start traversing back
from the root, to find the deepest node from the root, with score of wh. This node will
be the lowest common ancestor in the email tree containing all the of tokens of our
interest (no token exists outside this tree). Thus, the sub-tree rooted at the deepest node
(farthest from root) with the highest score represents the core region in the email. We
call this node, ‘C’ and the structure of the tree rooted at C iTree (i.e., Information Tree).
Departure
10:45 AM <TIME>
Alaska <CITY>
Date 24 Jun, Wed Time <DATE>
<TIME>
<CITY> Date Time
<DATE>
(c)
Fig. 3. Template structure in (b) and final template in (c) of iTree in (a)
AN ORATION,
DELIVERED FEBRUARY 24, 1775,
BEFORE
THE AMERICAN PHILOSOPHICAL SOCIETY,
HELD AT PHILADELPHIA,
FOR PROMOTING USEFUL KNOWLEDGE.
BY DAVID RITTENHOUSE, A. M.
MEMBER OF THE SAID SOCIETY.
(INSCRIBED)
Gentlemen,
The Egyptians too, we are told, had observations of the stars for
one thousand five hundred years before the Christian era. What they
were, is not known; but probably the astronomy of those ages
consisted in little more than remarks on the rising and setting of the
fixed stars, as they were found to correspond with the seasons of the
year;[A4] and, perhaps, forming them into constellations. That this was
done early, appears from the book of Job, which has by some been
attributed to Moses, who is said to have been learned in the
sciences of Egypt.[A5] “Canst thou bind the sweet influences of
Pleiades, or loose the bands of Orion? Canst thou bring forth
Mazzaroth in his season, or canst thou guide Arcturus with his
sons?” Perhaps too, some account might be kept of eclipses of the
sun and moon, as they happened, without pretending to predict them
for the future. These eclipses are thought by some to have been
foretold by the Jewish prophets in a supernatural way.
Next follows the noble Tycho,[A10] who with great labour and
perseverance, brought the art of observing the heavens to a degree
of accuracy unknown to the ancients; though in theory he mangled
the beautiful system of Copernicus. The whimsical Kepler, too,
(whose fondness for analogies frequently led him astray, yet
sometimes happily conducted him to important truths) did notable
services to astronomy: and from his time down to the present, so
many great men have appeared amongst the several nations of
Europe, rivalling each other in the improvement of astronomy, that I
should trespass on your patience were I to enumerate them. I shall
therefore proceed to what I proposed in the second place, and take
notice of some of the most important discoveries in this science.
The first discovery then, which paved the way for others more
curious, seems to have been the circular figure of the earth, inferred
from observing the meridian altitudes of the sun and stars to be
different in distant places. This conclusion would probably not be
immediately drawn, but the appearance accounted for, by the
rectilinear motion of the traveller; and then a change in the apparent
situations of the heavenly bodies would only argue their nearness to
the earth: and thus would the observation contribute to establish
error, instead of promoting truth, which has been the misfortune of
many an experiment. It would require some skill in geometry, as well
as practice in observing angles, to demonstrate the spherical figure
of the earth from such observations.[A12]
But this difficulty being surmounted, and the true figure of the earth
discovered, a free space would now be granted for the sun, moon,
and stars to perform their diurnal motions on all sides of it; unless
perhaps at its extremities to the north and south; where something
would be thought necessary to serve as an axis for the heavens to
revolve on. This Mr. Crantz in his very entertaining history of
Greenland informs us, is agreeable to the philosophy of that country,
with this difference perhaps, that the high latitude of the Greenlander
makes him conclude one pole only, necessary: He therefore
supposes a vast mountain situate in the utmost extremity of
Greenland, whose pointed apex supports the canopy of heaven, and
whereon it revolves with but little friction.
For two or three centuries before and after the beginning of the
Christian era, astronomy appears to have been held in considerable
repute; yet very few discoveries of any consequence were made,
during that period and many ages following.
The ancients were not wanting in their endeavours to find out the
true dimensions of the planetary system. They invented several very
ingenious methods for the purpose; but none of them were at all
equal, in point of accuracy, to the difficulty of the problem. They were
therefore obliged to rest satisfied with supposing the heavenly
bodies much nearer to the earth than in fact they are, and
consequently much less in proportion to it. Add to this, that having
found the earth honoured with an attendant, while they could
discover none belonging to any of the other planets, they supposed it
of far greater importance in the Solar System than it appears to us to
be: And the more praise is due to those few, who nevertheless
conceived rightly of its relation to the whole.
Amongst the fixed stars too, Galileo pursued his enquiries. The
Milky-Way, which had so greatly puzzled the ancient Philosophers,
and which Aristotle imagined to be vapours risen to an extraordinary
height, he found to consist of an innumerable multitude of small
stars; whose light appears indistinct and confounded together to the
naked eye. And in every part of the heavens, his telescope shewed
him abundance of stars, not visible without it. In short, with such
unabated ardour did this great man range through the fields of
Astronomy, that he seemed to leave nothing for others to glean after
him.
As the apparent motion of the fixed stars, arising from this cause,
was observed to complete the intire circle of its changes in the space
of a year, it was for some time supposed to arise from an annual
parallax, notwithstanding its inconsistency in other respects with
such a supposition. But this obstacle being removed, there followed
the discovery of another apparent motion in the heavens, arising
from the nutation of the earth’s axis; the period whereof is about
nineteen years. Had it not been so very different from the period of
the former, the causes of both must have been almost inexplicable.
This latter discovery is an instance of the superior advantages of
accurate observation: For it was well known that such a nutation
must take place from the principles of the Newtonian Philosophy; yet
a celebrated astronomer had concluded from hypothetical reasoning,
that its quantity must be perfectly insensible.
The way being cleared thus far, Dr. Bradley assures us, from his
most accurate observations, that the annual parallax cannot exceed
two seconds, he thinks not one; and we have the best reason to
confide in his judgment and accuracy. From hence then we draw this
amazing conclusion; that the diameter of the earth’s orb bears no
greater proportion to the distance of the stars which Bradley
observed, than one second does to the radius; which is less than as
one to 200,000. Prodigiously great as the distance of the fixed stars
from our sun appears to be, and probably their distances from each
other are no less, the Newtonian Philosophy will furnish us with a
reason for it: That the several systems may be sufficiently removed
from each other’s attraction, which we are very certain must require
an immense distance; especially if we consider that the cometic part,
of our system at least, appears to be the most considerable though
so little known to us. The dimensions of the several parts of the
planetary system, had been determined near the truth by the
astronomers of the last age, from the parallax of Mars. But from that
rare phenomenon the transit of Venus over the sun’s disk, which has
twice happened within a few years past, the sun’s parallax is now
known beyond dispute to be 8 seconds and an half, nearly; and
consequently, the sun’s distance almost 12,000 diameters of the
earth.
But how does Astronomy change the scene!—Take the miser from
the earth, if it be possible to disengage him; he whose nightly rest
has been long broken by the loss of a single foot of it, useless
perhaps to him; and remove him to the planet Mars, one of the least
distant from us: Persuade the ambitious monarch to accompany him,
who has sacrificed the lives of thousands of his subjects to an
imaginary property in certain small portions of the earth; and now
point it out to them, with all its kingdoms and wealth, a glittering star
“close by the moon,” the latter scarce visible and the former less
bright than our Evening Star:—Would they not turn away their
disgusted sight from it, as not thinking it worth their smallest
attention, and look for consolation in the gloomy regions of Mars?[A32]