Professional Documents
Culture Documents
Discovery Science 17th International Conference DS 2014 Bled Slovenia October 8 10 2014 Proceedings 1st Edition Sašo Džeroski
Discovery Science 17th International Conference DS 2014 Bled Slovenia October 8 10 2014 Proceedings 1st Edition Sašo Džeroski
https://textbookfull.com/product/algorithmic-learning-
theory-25th-international-conference-alt-2014-bled-slovenia-
october-8-10-2014-proceedings-1st-edition-peter-auer/
https://textbookfull.com/product/discovery-science-21st-
international-conference-ds-2018-limassol-cyprus-
october-29-31-2018-proceedings-larisa-soldatova/
https://textbookfull.com/product/serious-games-development-and-
applications-5th-international-conference-sgda-2014-berlin-
germany-october-9-10-2014-proceedings-1st-edition-minhua-ma/
Hybrid Learning Theory and Practice 7th International
Conference ICHL 2014 Shanghai China August 8 10 2014
Proceedings 1st Edition Simon K. S. Cheung
https://textbookfull.com/product/hybrid-learning-theory-and-
practice-7th-international-conference-ichl-2014-shanghai-china-
august-8-10-2014-proceedings-1st-edition-simon-k-s-cheung/
https://textbookfull.com/product/formal-modeling-and-analysis-of-
timed-systems-12th-international-conference-
formats-2014-florence-italy-september-8-10-2014-proceedings-1st-
edition-axel-legay/
https://textbookfull.com/product/smart-health-international-
conference-icsh-2014-beijing-china-
july-10-11-2014-proceedings-1st-edition-xiaolong-zheng/
https://textbookfull.com/product/reversible-computation-6th-
international-conference-rc-2014-kyoto-japan-
july-10-11-2014-proceedings-1st-edition-shigeru-yamashita/
https://textbookfull.com/product/conceptual-modeling-33rd-
international-conference-er-2014-atlanta-ga-usa-
october-27-29-2014-proceedings-1st-edition-eric-yu/
Sašo Džeroski
Pance Panov
Dragi Kocev
Ljupco Todorovski (Eds.)
LNAI 8777
Discovery Science
17th International Conference, DS 2014
Bled, Slovenia, October 8–10, 2014
Proceedings
123
Lecture Notes in Artificial Intelligence 8777
Discovery Science
17th International Conference, DS 2014
Bled, Slovenia, October 8-10, 2014
Proceedings
13
Volume Editors
Sašo Džeroski
Panče Panov
Dragi Kocev
Jožef Stefan Institute
Department of Knowledge Technologies
Jamova cesta 39, 1000 Ljubljana, Slovenia
E-mail: {saso.dzeroski,pance.panov,dragi.kocev}@ijs.si
Ljupčo Todorovski
University of Ljubljana
Faculty of Administration
Gosarjeva 5, 1000 Ljubljana, Slovenia
E-mail: ljupco.todorovski@ijs.si
This year’s International Conference on Discovery Science (DS) was the 17th
event in this series. Like in previous years, the conference was co-located with
the International Conference on Algorithmic Learning Theory (ALT), which is
already in its 25th year. Starting in 2001, ALT/DS is one of the longest-running
series of co-located events in computer science. The unique combination of recent
advances in the development and analysis of methods for discovering scientific
knowledge, coming from machine learning, data mining, and intelligent data
analysis, as well as their application in various scientific domains, on the one
hand, with the algorithmic advances in machine learning theory, on the other
hand, makes every instance of this joint event unique and attractive.
This volume contains the papers presented at the 17th International Confer-
ence on Discovery Science, while the papers of the 25th International Conference
on Algorithmic Learning Theory are published by Springer in a companion vol-
ume (LNCS Vol. 8776). We had the pleasure of selecting contributions from 62
submissions by 178 authors from 22 countries. Each submission was reviewed by
at least three Program Committee members. The program chairs eventually de-
cided to accept 30 papers, yielding an acceptance rate of 48%. The program also
included three invited talks and two tutorials. In the joint DS/ALT invited talk,
Zoubin Ghahramani gave a presentation on “Building an Automated Statis-
tician.” DS participants also had the opportunity to attend the ALT invited
talk on “Cellular Tree Classifiers”, which was given by Luc Devroye. The two
tutorial speakers were Anuška Ferligoj (“Social Network Analysis”) and Eyke
Hüllermeier (“Online Preference Learning and Ranking”).
This year, both conferences were held in Bled, Slovenia, and were organized
by the Jožef Stefan Institute (JSI) and the University of Ljubljana. We are
very grateful to the Department of Knowledge Technologies (and the project
MAESTRA) at JSI for sponsoring the conferences and providing administrative
support. In particular, we thank the local arrangement chair, Mili Bauer, and
her team, Tina Anžič, Nikola Simidjievski, and Jurica Levatić from JSI for their
efforts in organizing the two conferences. We would like to thank the Office of
Naval Research Global for the generous financial support provided under ONRG
GRANT N62909-14-1-C195.
We would also like to thank all authors of submitted papers, the Program
Committee members, and the additional reviewers for their efforts in evaluating
the submitted papers, as well as the invited speakers and tutorial presenters.
We are grateful to Sandra Zilles, Peter Auer, Alexander Clark and Thomas
Zeugmann for ensuring a smooth coordination with ALT, Nikola Simidjievski
for putting up and maintaining our website, and Andrei Voronkov for making
VI Preface
Program Chair
Sašo Džeroski Jožef Stefan Institute, Slovenia
Program Co-chairs
Panče Panov Jožef Stefan Institute, Slovenia
Dragi Kocev Jožef Stefan Institute, Slovenia
Program Committee
Albert Bifet HUAWEI Noahs Ark Lab, Hong Kong and
University of Waikato, New Zealand
Hendrik Blockeel KU Leuven, Belgium
Ivan Bratko University of Ljubljana, Slovenia
Michelangelo Ceci Università degli Studi di Bari Aldo Moro, Italy
Simon Colton Goldsmiths College, University of London, UK
Bruno Cremilleux Université de Caen, France
Luc De Raedt KU Leuven, Belgium
Ivica Dimitrovski Ss. Cyril and Methodius University, Macedonia
Tapio Elomaa Tampere University of Technology, Finland
Bogdan Filipič Jožef Stefan Institute, Slovenia
Peter Flach University of Bristol, UK
Johannes Fürnkranz TU Darmstadt, Germany
Mohamed Gaber University of Portsmouth, UK
Joao Gama University of Porto and INESC Porto, Portugal
Dragan Gamberger Rudjer Boskovic Institute, Croatia
VIII Organization
Additional Reviewers
Zoubin Ghahramani
Department of Engineering,
University of Cambridge,
Trumpington Street
Cambridge CB2 1PZ, UK
zoubin@eng.cam.ac.uk
References
1. Duvenaud, D., Lloyd, J.R., Grosse, R., Tenenbaum, J.B., Ghahramani, Z.: Structure
Discovery in Nonparametric Regression through Compositional Kernel Search. In:
ICML 2013 (2013), http://arxiv.org/pdf/1302.4922.pdf
2. Lloyd, J.R., Duvenaud, D., Grosse, R., Tenenbaum, J.B., Ghahramani, Z.: Auto-
matic Construction and Natural-language Description of Nonparametric Regression
Models. In: Twenty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2014
(2014), http://arxiv.org/pdf/1402.4304.pdf
Social Network Analysis
Anuška Ferligoj
Faculty of Social Sciences,
University of Ljubljana
anuska.ferligoj@fdv.uni-lj.si
References
1. Batagelj, V., Doreian, P., Ferligoj, A., Kejžar, N.: Understanding Large Temporal
Networks and Spatial Networks: Exploration, Pattern Searching, Visualization and
Network Evolution. Wiley, Chichester (2014)
2. Carrington, P.J., Scott, J., Wasserman, S. (eds.): Models and Methods in Social
Network Analysis. Cambridge University Press, Cambridge (2005)
3. Doreian, P., Batagelj, V., Ferligoj, A.: Generalized Blockmodeling. Cambridge Uni-
versity Press, Cambridge (2005)
4. de Nooy, W., Mrvar, A., Batagelj, V.: Exploratory Social Network Analysis with
Pajek, Revised and expanded second edition. Cambridge University Press, Cam-
bridge (2011)
Social Network Analysis XV
5. Scott, J., Carrington, P.J. (eds.): The SAGE Handbook of Social Network Analysis.
Sage, London (2011)
6. Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications.
Cambridge University Press, Cambridge (1994)
Cellular Tree Classifiers
*
The full paper can be found in Peter Auer, Alexander Clark, Sandra Zilles, and
Thomas Zeugmann, Proceedings of the 25th International Confer- ence on Algo-
rithmic Learning Theory (ALT-14), Lecture Notes in Computer Science Vol. 8776,
Springer, 2014.
Online Preference Learning and Ranking
Eyke Hüllermeier
Department of Computer Science
University of Paderborn, Germany
eyke@upb.de
References
1. Ailon, N., Karnin, Z., Joachims, T.: Reducing dueling bandits to cardinal bandits.
In: Proc. ICML, Beijing, China (2014)
2. Busa-Fekete, R., Szörényi, B., Weng, P., Cheng, W., Hüllermeier, E.: Top-k selec-
tion based on adaptive sampling of noisy preferences. In: Proc. ICML, Atlanta,
USA (2013)
3. Busa-Fekete, R., Hüllermeier, E., Szörényi, B.: Preference-based rank elicitation
using statistical models: The case of Mallows. In: Proc. ICML, Beijing, China
(2014)
4. Busa-Fekete, R., Szörényi, B., Hüllermeier, E.: PAC rank elicitation through adap-
tive sampling of stochastic pairwise preferences. In: Proc. AAAI (2014)
5. Busa-Fekete, R., Hüllermeier, E.: A survey of preference-based online learning with
bandit algorithms. In: Proc. ALT, Bled, Slovenia (2014)
6. Cohen, W.W., Schapire, R.E., Singer, Y.: Learning to order things. Journal of
Artificial Intelligence Research 10 (1999)
7. Fürnkranz, J., Hüllermeier, E. (eds.): Preference Learning. Springer (2011)
8. Fürnkranz, J., Hüllermeier, E., Vanderlooy, S.: Binary decomposition methods for
multipartite ranking. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor,
J. (eds.) ECML PKDD 2009, Part I. LNCS, vol. 5781, pp. 359–374. Springer,
Heidelberg (2009)
*
The full version of this paper can be found in Peter Auer, Alexander Clark, Sandra
Zilles, and Thomas Zeugmann, Proceedings of the 25th International Confer- ence
on Algorithmic Learning Theory (ALT-14), Lecture Notes in Computer Science Vol.
8776, Springer, 2014.
XVIII E. Hüllermeier
9. Urvoy, T., Clerot, F., Féraud, R., Naamane, S.: Generic exploration and k-armed
voting bandits. In: Proc. ICML, Atlanta, USA (2013)
10. Vembu, S., Gärtner, T.: Label ranking: A survey. In: Fürnkranz, J., Hüllermeier,
E. (eds.) Preference Learning. Springer (2011)
11. Yue, Y., Broder, J., Kleinberg, R., Joachims, T.: The K-armed dueling bandits
problem. Journal of Computer and System Sciences 78(5), 1538–1556 (2012)
12. Yue, Y., Joachims, T.: Interactively optimizing information retrieval systems as a
dueling bandits problem. In: Proc. ICML, Montreal, Canada (2009)
13. Yue, Y., Joachims, T.: Beat the mean bandit. In: Proc. ICML, Bellevue, Washing-
ton, USA (2011)
14. Zoghi, M., Whiteson, S., Munos, R., de Rijke, M.: Relative upper confidence bound
for the k-armed dueling bandit problem. In: Proc. ICML, Beijing (2014)
Table of Contents
Erratum
1 Introduction
In data analysis, the analyst aims at finding novel ways to summarize the data to
become easily understandable [6]. The interpretation aspect is especially valued
among application specialists who may not understand the data analysis process
itself. Semi-automated data analysis is hence possible for the end user if data anal-
ysis processes are supported by easily accessible tools and methodologies for pat-
tern/model construction, explanation and exploration. This work draws together
2 Methodology
The dataset under study describes DNA copy number amplifications in 4,590 can-
cer patients. The data describes 4,590 patients as data instances, with attributes
being chromosomal locations indicating aberrations in the genome. These aberra-
tions are described as 1’s (amplification) and 0’s (no amplification). Authors in [12]
describe the amplification dataset in detail. In this paper, we consider datasets in
different resolutions of chromosomal regions as defined by International System of
Cytogenetic Nomenclature (ISCN) [15].
4 P.R. Adhikari et al.
J
J
d
p(x|Θ) = πj P (x | θj ) = πj xi
θji (1 − θji )1−xi . (1)
j=1 j=1 i=1
Given:
– set of training examples in empirical data expressed as RDF triples
– domain knowledge in the form of ontologies, and
– an object-to-ontology mapping which associates each object from the
RDF triplets with appropriate ontological concepts.
Find:
– a hypothesis (a predictive model or a set of descriptive patterns), ex-
pressed by domain ontology terms, explaining the given empirical data.
The Hedwig system automatically parses the RDF triples (a graph) for the
subClassOf hierarchy, as well as any other user-defined binary relations. Hedwig
also defines a namespace of classes and relations for specifying the training ex-
amples to which the input must adhere. The rules generated by Hedwig system
using beam search are repetitively specialized and induced as discussed in [20].
6 P.R. Adhikari et al.
The significance of the findings is tested using the Fisher’s exact test. To cope
with the multiple–hypothesis testing problem, we use Holm–Bonferroni direct
adjustment method with α = 0.05.
The barycenters of each matrix row are best understood as centers of gravity
of a stick divided into m sections corresponding to the row entries. An entry of
1 denotes a weight on that section. One step of the barycentric method now:
calculate the barycenters for each matrix row and sort the matrix rows in order
of increasing barycenters. In this way, the method calculates the best possible
permutation of rows that exposes the banded structure of the input matrix. It
does not, however, find any permutation of columns. In our application, neigh-
boring columns of a matrix represent chromosome regions that are in physical
proximity to one another, the goal is to only find the optimal row permutation
while not permuting the matrix columns.
The image of the banded structure can then be overlaid with a visualization
of clusters, as described in Section 2.2. Because the rows of the matrix represent
instances, highlighting one set of instances (one cluster) means highlighting sev-
eral matrix rows. If the discovered clusters are exposed by the matrix structure,
we can expect that several adjacent matrix rows will be highlighted, forming a
wide band. Furthermore, all the clusters can be simultaneously highlighted be-
cause each sample belongs to one and only one cluster. The horizontal colored
overlay of the clusters in Figure 3 can also be supplemented with another colored
vertical overlay of the rules explaining the clusters as discussed in Section 2.3.
If an important chromosome region is discovered for the characterization of a
cluster, we highlight the corresponding column. In the case of composite rules of
the type Rule 1: Cluster3(X) ← 1q43-44(X) ∧ 1q12(X) , both chromo-
somal regions 1q43 − −44 and 1q12 are understood as equally important and are
Explaining Mixture Models 7
therefore both highlighted. If a chromosome band appears in more than one rule,
this is visualized by a stronger highlight of the corresponding matrix column.
We used the mixture models trained in our earlier contribution [13]. Through
a model selection procedure documented in [16], the number of components for
modeling the chromosome 1 had been chosen to be J = 6, denoting presence of
six clusters in the data.
Fig. 2. Mixture model for chromosome 1. Figure shows first 10 dimensions of the total
28 for clarity.
We used the barycentric method, described in Section 2.4, to extract the banded
structure in the data. The black color indicates ones and white color denotes ze-
ros in the data. The banded data was then overlaid with different colors for
8 P.R. Adhikari et al.
Fig. 4. Rules induced for cluster 3 (left) and visualizations of rules and columns for
cluster 3 (right) with relevant columns highlighted. A highlighted column denotes that
an amplification in the corresponding region characterizes the instances of the partic-
ular cluster. A darker hue means that the region appears in more rules. The numbers
on top right correspond to rule numbers. For example, the notations “1, 3” on top of
rightmost column of cluster 3 indicates that the chromosome region appears in rules 1
and 3 tabulated in the left panel.
Table on the left panel of the Figure 4 show the rules induced for cluster 3,
together with the relevant statistics. The induced rules quantify the clustering
results obtained in Section 3.1 and confirmed by banded matrix visualization in
Section 3.2. The banded matrix visualization depicted in Figure 3 shows that
cluster 3 is marked by the amplifications in the regions 1q11–44. However, the
rules obtained in Table on the left panel of the Figure 4 show that amplifications
in all the regions 1q11–44 do not equally discriminate cluster 3. For example, rule
Rule 1: Cluster3(X) ← 1q43-44(X) ∧ 1q12(X) characterizes cluster 3
10 P.R. Adhikari et al.
best with a precision of 1. This means that amplifications in regions 1q43-44 and
1q12 characterizes cluster 3. It also covers 81 of the 88 samples in cluster 3. Nev-
ertheless, amplifications in regions 1q11–44 shown in Figure 3 as discriminating
regions, appear in at least one of the rules in the table on the left panel of the
Figure 4 with varying degree of precision. Similarly, the second most discrimi-
nating rule for cluster 3 is: Rule 2: Cluster3(X) ← 1q11(X) which covers
78 positive samples and 9 negative samples.
The rules listed in the table on the left panel of Figure 4 also capture the
multiresolution phenomenon in the data. We input only one resolution of data
to the algorithm but the hierarchy of different resolutions is used as background
knowledge. For example, the literal 1q43–44 denotes a joint region in coarse
resolution thus showing that the algorithm produces results at different reso-
lutions. The results at different resolutions improve the understandability and
interpretability of the rules [8]. Furthermore, other information added to the
background knowledge are amplification hotspots, fragile sites, cancer genes,
which are discriminating features of cancers but do not show to discriminate
any specific clusters present in the data. Therefore, such additional information
would be better utilized in situations where the dataset contains not only cancer
samples but also control samples which is unfortunately not the situation here
as our dataset has only cancer cases.
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the terms
of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.
• You pay a royalty fee of 20% of the gross profits you derive from
the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.F.
1.F.4. Except for the limited right of replacement or refund set forth in
paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.
Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.