You are on page 1of 15

User-Centric Pattern Mining on Knowledge Graphs: an Archaeological

Case Study

W.X. Wilckea,b,∗, V. de Boerb , M.T.M. de Kleijna , F.A.H. van Harmelenb , H.J. Scholtena
a Department of Spatial Economics, Vrije Universiteit, Amsterdam, The Netherlands
b Department of Computer Science, Vrije Universiteit, Amsterdam, The Netherlands

Abstract
In recent years, there has been a growing interest from the digital humanities in knowledge graphs as
data modelling paradigm. Already, this has led to the creation of many such knowledge graphs, many of
which are now available as part of the Linked Open Data cloud. This presents new opportunities for data
mining. In this work, we develop, implement, and evaluate (both data-driven and user-driven) an end-to-
end pipeline for user-centric pattern mining on knowledge graphs in the humanities. This pipeline combines
constrained generalized association rule mining with natural language output and facet rule browsing to allow
for transparency and interpretability—two key domain requirements. Experiments in the archaeological
domain show that domain experts were positively surprised by the range of patterns that were discovered
and were overall optimistic about the future potential of this approach.
Keywords: User-Centric, Pattern Mining, Generalized Association Rules, Knowledge Graphs, Digital
Humanities, Archaeology

1. Introduction of knowledge. This presents new opportunities for


data mining [2].
Digital humanities communities have shown a Data mining is the process of identifying valid,
growing interest in the knowledge graph as a data novel, potentially useful, and ultimately under-
modelling paradigm [1]. Already, this interest has standable patterns in data [3]. These patterns de-
inspired several large-scale international projects— scribe regularities in a dataset which can help re-
amongst which are Europeana1 , CARARE2 , and searchers gain more insight into their data. Re-
ARIADNE3 —to actively explore the creation and searchers can then use this insight as a starting
publication of knowledge graphs in their respective point to form new research hypotheses, as sup-
domains. These knowledge graphs, and many oth- port for existing ones, or simply to get a better
ers like them, have been made available as part of understanding of their data [4]. This entire pro-
the Linked Open Data (LOD) cloud—a vast and in- cess can take weeks or even months of hard work
ternationally distributed network of heterogeneous in the traditional setting. However, by incorporat-
knowledge—bringing large amounts of structured ing data mining into the workflow, much of this
data within arm’s reach of humanities researchers, time can be saved through the automatic discovery
who are now looking for ways to analyse this wealth of potentially-relevant patterns. This makes data
mining interesting as a support tool for humanities
researchers.
∗ Corresponding author
Of course, the idea of using data mining as a sup-
Email addresses: w.x.wilcke@vu.nl (W.X. Wilcke),
v.de.boer@vu.nl (V. de Boer), mtm.de.kleijn@vu.nl port tool in the humanities is, in itself, not novel.
(M.T.M. de Kleijn), frank.van.harmelen@vu.nl There have been various attempts before, for in-
(F.A.H. van Harmelen), h.j.scholten@vu.nl stance to classify coins [5] or to cluster cultural her-
(H.J. Scholten)
1 See www.europeana.eu itage [6]. However, the majority of these studies in-
2 See www.carare.eu volved mining unstructured data, most commonly
3 See www.ariadne-infrastructure.eu in the form of text mining, whereas mining struc-
tured data has thus far been largely limited to tab- we conducted experiments in the archaeological do-
ular data and tailored to specific use cases. With main, specifically on data from various excavations,
the growing popularity of knowledge graphs in the during which domain experts were asked to evalu-
humanities, mining patterns from these structures ate a set of candidate rules on interestingness.
becomes ever more important to researchers in this By placing domain experts at the centre of both
domain. the pipeline and its evaluation, as opposed to data
This work present the MINing On Semantics scientists, we contribute to an as yet largely unex-
pipeline (MINOS) for pattern mining on knowledge plored niche in this intersecting field of data min-
graphs in the humanities. Its aim is to support do- ing, knowledge graphs, and humanities. Concretely,
main experts in their analyses of such knowledge our main contributions are 1) insight into some
graphs by helping them discover useful and inter- of the challenges and possible solutions for intro-
esting patterns in their data. To this end, the MI- ducing data science tools to the humanities, 2) a
NOS pipeline places users in the centre by letting pipeline design for pattern mining on knowledge
them guide the mining process towards their topics graphs which is tailored to domain experts rather
of interest and by letting them focus the results via than to data scientists, and 3) a user-driven evalu-
a facet pattern browser. ation of our design choices instead of only a data-
Under the hood, MINOS employs generalized as- driven one.
sociation rule mining (ARM). An association rule With these contributions, our research aims to
is an implication of the form X =⇒ y, where the add to the interdisciplinary field of the Digital Hu-
presence of a set of items X implies the presence of manities. For this reason, we will refrain from de-
another item y. These implications are learned by veloping an ARM algorithm from scratch, but in-
iterating over a dataset of examples, called trans- stead focus on how we can augment such an algo-
actions. Generalized ARM works largely the same, rithm with complementary components to make it
except that the antecedent X holds the item classes into an effective tool for Humanities researchers.
rather then the items themselves. A concise overview of related work is given next,
This method was specifically chosen to help over- followed by an overview of the pipeline, the dataset,
come two key issues with technological acceptance and the experimental setup. This paper then dis-
in the humanities, namely transparency and inter- cusses the results from the user-driven evaluation,
pretability [7, 8]. With transparency, we refer to and concludes with a reflection on the chosen ap-
the ease with which a method and its underlying proach in light of these results.
theory can be understood: a black box method, for
example, is less transparent than a glass box one. 2. Related Work
With interpretability, we mean how easily one can
interpret the results of a method: it is, for instance, Studies on data mining in the humanities have
typically more difficult to interpret an n-order ten- thus far largely focussed on unstructured data (text
sor than it is to interpret a set of symbolic state- mining), whereas data mining on semi-structured
ments. or structured data has been explored less fre-
Generalized ARM satisfies both of these con- quently [4, 10]. An example of the latter kind is
straints: it employs basic statistical know-how to discussed in [11], in which the authors propose min-
produce human-readable rules in an overall deter- ing association rules from excavation data—sites as
ministic process. A limited background in statis- transactions, artefacts as items—using the proven
tics, which most humanities researchers possess, Apriori algorithm. This task is similar to that de-
therefore already suffices to understand how these scribed in this work, but it is executed on a rela-
rules map back to the input data and to check tional database rather than a graph-shaped knowl-
whether they are valid. This allows humanities re- edge base.
searchers to put their trust in both the method and Rule mining on (graph-shaped) knowledge bases
its results [9]. can take many different forms. Initial efforts pri-
Of course, this trust is only gained if the pro- marily focused on Inductive Logic Programming
duced rules provide useful and interesting patterns (ILP), due to its natural fit to logic-based sys-
which can help these researchers to get a better un- tems [12]. Performance issues led to the develop-
derstanding of their data. We call this the effective- ment of derivatives. Most well-known is arguably
ness of the approach. To assess this effectiveness AMIE [13], whose main difference is its use of the
2
Partial Completeness Assumption to cope with the Common amongst all these algorithm however,
lack of negative examples. In either case however, is that they treat resources as items—the things we
the rule-generation process is more complex than are trying to find implications between. Another
that of traditional ARM, making it less transpar- commonality is the focus on the graph’s structure—
ent for non-experts and thus a less suitable choice its vertices and edges—whereas the values of its lit-
for the humanities domain. erals are left unaddressed. Therefore, similar values
A handful of studies have focussed on applying are still treated as completely different items.
ARM algorithms to knowledge graphs. The most Moving from the level of ARM to that of the
straightforward approach simply converts all triples pipeline, we can draw parallels between the ap-
to transactions (as used in traditional ARM) and proach presented by Nebot&Berlanga [22, 23] and
then feeds these to the Apriori algorithm [14]. This the MINOS pipeline introduced in this work. Sim-
has the downside however, of 1) forcing relational ilar to MINOS, the scope of the mining process
data into an unnatural shape and potentially losing is tailored to the users’ interests provided via a
information in the process, and 2) limiting the pos- user-defined pattern. However, where our pipeline
sible exploitation of both relations (via the graph’s encodes patterns as triples with optional unbound
structure) and semantics [15]. Fortunately, these variables, Nebot&Berlanga ask users to construct a
caveats can largely be avoided by using an ARM formal SPARQL query using an extended grammar.
algorithm that is specifically tailored to knowledge While this offers more flexibility than our approach,
graphs. it also increases the difficulty of entering such pat-
Several of these tailored ARM algorithms exist, terns for anyone not familiar with SPARQL.
for instance SWApriori; an adaptation of the com- Parallels can also be drawn with the approach
mon Apriori algorithm to knowledge graphs [16]. presented in [24, 25]. Here, the authors perform
Its main selling points are its ability to discover pat- dimension reduction by eliminating duplicate item
terns which span over multiple triples, i.e., a path, sets prior to mining, and by removing unwanted
rather than over just a single one, and that it can candidate rules afterwards. This latter step is,
mine multi-relational patterns. However, all seman- again, similar to the data-driven filter used in the
tic information is disregarded early on for efficiency MINOS pipeline, whereas the former step is implic-
reasons, hence reducing the dataset to a directed itly dealt with by the uniqueness property of URIs.
graph. A final similarity between both pipelines is the use
An alternative that does address this informa- of generalized association rules.
tion is SWARM [17], which exploits RDF and RDFS A last but nevertheless important distinction
semantics to generalize patterns. Hereto, SWARM concerns the evaluation of candidate rules: none
uses the rdf:type relation to infer the classes of ev- of the cited work so far has gone beyond a data-
ery resource (item) in a set X, and then computes driven (objective) evaluation, despite its well known
an inheritance tree for all resources in X using the limitations for assessing the interestingness of pat-
rdfs:subClassOf relation. Resources in the same terns [26, 27]. Instead, this interestingness can only
branch are grouped under their top-most common be assessed properly by combining a data-driven
class, and are therefore said to share the same pat- evaluation with a user-driven one. In addition to
terns. the common data-driven evaluation, we have there-
SWARM’s ability to exploit common semantics fore asked the local archaeological community to
for generalization is unique amongst ARM algo- assess a set of candidate rules via an online survey
rithms for knowledge graphs4 . For this very reason, (see Section 5.2).
we have chosen SWARM for our implementation of
the MINOS pipeline (see Section 3). 3. The MINOS pipeline
Other alternatives are discussed in [19] (aligning
locations), [20] (constructing ontologies), and [21] The MINOS pattern mining pipeline combines
(mapping categories), but these are designed to fit an off-the-shelf ARM algorithm with a simple facet
a specific task and are therefore less applicable to rule browser, and a number of crucial pre- and
this work. post-processing components. These components
enable users to integrate their interests into the
4 Note that ARM algorithms that exploit general seman- process by restricting the search space beforehand,
tics as background knowledge do exist, for example [18]. and by filtering the results afterwards. Hereto, the
3
pre-processing components translate user-provided which are directly or indirectly related to that en-
target patterns into SPARQL queries, use these tity.
queries to retrieve relevant resources from the LOD In our implementation of the MINOS pipeline we
cloud, and perform context sampling to enrich them chose a local-neighbourhood-based sampling strat-
with additional information. The post-processing egy. This strategy was selected for its simplicity—it
components constrain the mined rules based on the only requires a single parameter—and for its ability
user-specified patterns, and present them to the to produce good approximations without the need
user for evaluation using the facet rule browser. A for background knowledge. This strategy assumes
flowchart of this structure is provided in Figure 1. that the semantic representation of a target entity
The processes depicted in this flowchart will be dis- diffuses with distance: closely-related entities are
cussed next. more relevant than those further away in the graph.
To reflect this, a local-neighbourhood strategy sam-
3.1. Data Retrieval ples the neighbouring nodes of a target pattern up
To start the mining process, users are asked to a certain depth.
to provide a target pattern which describes their
current interest. This pattern takes the form of 3.3. Rule Mining
one or more triples and serves as a language bias At the heart of the pipeline lies the ARM algo-
which restricts the search space to the relevant sub- rithm. For the reasons explained before—the ex-
graph [28]. Each triple in this pattern can convey a ploitation of RDF and RDFS semantics to general-
semantic range by leaving variables uninstantiated: ize patterns—we have chosen SWARM for our im-
( , p, o) to specify all entities for which (p, o) plementation of the pipeline. To understand how
holds, ( , p, ) to specify all entities for which p SWARM operates, we will now first expand on our
holds, ( , , ) to denote the entire graph, etc. earlier brief introduction to ARM.
The next step is the automatic transla- An association rule is an implication of the form
tion from the provided target pattern to a X =⇒ y, where X is a finite set of items and
SPARQL CONSTRUCT query, This query is then y is a single item not present in X [29]. With
used to construct an in-memory copy of the generalized ARM, X is reduced to the set of item
corresponding subgraph. For instance, the classes C1 , C2 , . . . , Cn where Ci ∈ X if it covers
pattern [( , rdf:type, :Burial Ground) ∧ ( , at least θ% of the items. An example of such a
crm:contains, :Human Remains)] results in a rule might be {Cemetery, BurialGround} =⇒
graph which holds only instances of burial grounds HumanRemains, implying that human remains
at which human remains were found. We call these are often found at both cemeteries and burial
instances the target entities. Note that not all grounds.
triples in a pattern have to be hard constraints: SWARM extends the notion of item sets from
users can specify soft constraints as well. If, for sets of items with the same type to semantic item
instance, the second triple in the example pattern sets, which contain items (entities) that frequently
would be a soft constraint, then also ContextFind share (p, o)-pairs. Instances of cemeteries and
entities for which this (p, o)-pair is unknown are burial grounds, for instance, are likely to share
included in the resulting graph. Entities with con- the (:contains, :Human Remains)-pair. Once all
flicting relations however, are still excluded. semantic item sets in the graph have been com-
puted, SWARM proceeds by joining similar item
3.2. Context Sampling sets into common behaviour sets. In our exam-
To find only relevant patterns we must first ac- ple case, a likely combination might occur with in-
curately capture the target entities’ semantic rep- stances of crypts and catacombs. The rationale un-
resentations. These representations, which we call derlying this is the assumption that entities with
their contexts, include all information which might largely overlapping contexts (sets of properties and
be relevant to them during the mining process. instances within a set distance) are likely to also
Hence, contexts serve a role similar to that of the share other, as yet-unknown, (p, o)-pairs. Put dif-
“individuals” in tradition Data Mining. Capturing ferently: these entities follow the same pattern.
these contexts is done using context sampling, and To quantify this, SWARM introduces the similar-
involves supplementing all target entities with ad- ity factor, which serves as a boundary that sepa-
ditional triples, retrieved from the original graph, rates similar from dissimilar item sets. As a final
4
Figure 1: A flowchart of the MINOS pipeline. At start, a SPARQL query, automatically generated from a user-provided target
pattern, is executed to retrieve a relevant subset from the LOD-cloud. After capturing the contexts of the target entities in
this subset, these contexts are passed to the ARM algorithm. The resulting rules are then first filtered based on a predefined
constraint template, with the remainder being presented to domain experts for evaluation.

step, SWARM generalizes the discovered patterns allows us to further reduce the number of candidate
by exploiting class inheritance, ultimately produc- rules by tightening the constraints.
ing rules of the form ∀χ(Class(χ, c) → (P (χ, φ) → Note that the filter is run after the rules have
Q(χ, ψ))). Here, χ, φ, and ψ are entities, c a class, been produced. This is a deliberate design choice
and P and Q predicates. that allows users to revert any or all of its effects
There are several measures available to assess the without having to repeat the whole mining process,
interestingness of the resulting rules. For our data- and therefore offers more flexibility during the anal-
driven evaluation we use the well known confidence ysis of the results. Hereto, the unfiltered result set
and support measures. The confidence score of a is kept in memory.
rule X =⇒ y conveys its strength and equals the
proportion of triples (transactions) which satisfy X 3.5. Rule Browser
that also satisfy y. Its support score represents Candidate rules which pass the filter are pre-
the rule’s significance and equals the proportion of sented to users in an interactive facet browser. To
triples which satisfy both X and y. improve their interpretability, these rules are au-
tomatically translated to natural language by ex-
3.4. Dimension Reduction ploiting the label-attributes of both entities and
properties. The resulting translations are then in-
Association rule mining algorithms commonly serted into predefined sentence templates; one per
produce a large number of candidate rules, resulting language.
in their own knowledge management problem [30]. To change which rules are shown, users can
To combat this, our pipeline includes a data-driven supply additional information about their interest.
filter which discards unwanted rules based on a pre- This involves modifying the same parameters as
defined set of constraints. These constraints can in- those available for the constraints templates dis-
clude minimum and maximum values for the sup- cussed previously, and thus includes both restric-
port and confidence scores, but also restrictions on tions on assigned scores and on contents. Rules
types, on predicates, and on both entire antecedents which are deemed worthy of saving can be exported
and consequents. to a human-readable file, or can be stored in a bi-
By default, the set of constraints filters all nary file which can later be reopened by the rule
rules which are too common or too rare, or browser for further analysis and mutation.
which are generally unwanted. Typical exam- For our implementation of the pipeline we opted
ples of such unwanted rules are those which in- for a virtual-terminal interface, which offered us the
clude domain-independent relations—owl:sameAs, necessary balance between simplicity and flexibility
skos:inSchema, dcterms:medium, etc—which do needed during our interactive sessions with domain
not contribute to the pattern mining process. Op- experts. If desired however, a full-fledged web inter-
tionally, the default filter can be overridden by pro- face can be used instead by running the rule browser
viding a custom constraints template at start. This via a client-side environment.
5
4. The package-slip Knowledge Graph project is of the type DHProject (Dutch Heritage
Project, a subclass of CRM-EH’s EHProject) and
Excavation data is a valuable source of informa- links to the discovered artefacts via production
tion in many archaeological studies [9]. These stud- events. To classify these artefacts, and all related
ies are therefore likely to benefit from pattern min- archaeological contexts, the package-slip graph in-
ing on this type of data, and thus make it a suitable cludes a multi-schema SKOS vocabulary which con-
choice to base our case study on. In agreement with tains more than 7.000 archaeological concepts.
domain experts, we therefore selected the package- A small and partial example of a package slip
slip knowledge graph to run our experiments with. is depicted in Figure 2. There, a single artefact
Package slips are detailed summarizations of en- (crmeh:ContextFind) is shown with its archaeo-
tire excavation projects. They are structured as logical context (crmeh:Context). The artefact also
specified by the SIKB Protocol 0102, which is a holds several attributes of various types, and is it-
Dutch standard on modelling and sharing excava- self part of a unit (crm:Collection). These units
tion data of various degree of granularity5 . Specif- are virtual containers that group artefacts which
ically, package slips capture the following aspects: were found in the same archaeological context. In
• General information about individuals, compa- contrast, bulk finds (crmeh:BulkFind) are physi-
nies and organisations which are involved, as cal containers, often a box, which group artefacts
well as the various locations at which the ex- with similar storage requirements (e.g. on humid-
cavations took place. ity, temperature, or oxygen). These are linked to a
Dutch Heritage project via a production event.
• Final and intermediate reports made during
the project, as well as different forms of media
4.1. Knowledge Integration
such as photos, drawings, and videos. Each of
these is accompanied by meta-data and (cross)
At present6 , the package-slip knowledge
references to their respective file, subject, and
graph7 contains the aggregated data from lit-
mentioning.
tle over 70 package slips, totaling roughly 425.000
• Detailed information about all artefacts dis- triples [31]. These package slips were converted
covered during the project, as well as their from existing XML files8 , and were subsequently
(geospatial and stratigraphic) relation and the integrated into a single coherent graph by using
archaeological context in which they were a central triple store which acted as an authority
found. server: hash values were generated for all entities
in a package slip—starting from the outer edge of
• Fine-grained information about the precise lo- the graph and recursively updating all encountered
cations and their geometries at which artefacts parents—which were then cross referenced with
were discovered, archaeological contexts were those on the server. By using this approach,
observed, and where media was created. we were able to unify different resources about
the same thing, most particular about people,
In its specification, the SIKB protocol makes use companies, and locations.
of the Extensible Markup Language (XML) to de-
The size of the package-slip knowledge graph is
fine the various concepts that make up a package
expected to grow consistently, as the production of
slip. The limitations of this language for sharing
package slips has recently been made mandatory
and integrating data motivated the Dutch ARI-
for all Dutch institutes and companies which are
ADNE partners, amongst whom were we, to design
involved with archaeological data. This increases
an alternative data model based on semantic web
the relevancy of this case study for the Dutch ar-
standards.
chaeological community, and thus adds support for
The semantic package slip is primarily built
our choice of the package slip.
on top of the CIDOC conceptual reference model
(CRM) and its archaeological extension CRM
English Heritage (CRM-EH). Every excavation 6 Situation on July 2018
8 The conversion tool is available at gitlab.com/
wxwilcke/pakbonLD
5 www.sikb.nl/datastandaarden/richtlijnen/sikb0102 8 pakbon-ld.spider.d2s.labs.vu.nl

6
Figure 2: Small and partial example from the package slip knowledge graph. Each project produces (amongst others) one
or more containers. These containers are composed of one or more bulk finds, which in turn are composed of one or more
artefacts. Each of these artefacts is part of a unit of which its members share a common (multilayered) context. Note that,
in this figure, solid and opaque circles represent instance and vocabulary resources, respectively. Three dots represent literals,
and opaque boxes represent classes

5. Experiments its significance as provided by archaeologists in


situ.
To assess the effectiveness of the MINOS pipeline,
we have conducted four experiments on the The bottom three granularities—artefacts, con-
package-slip knowledge graph (see Section 4). Each texts, and subcontexts—roughly correspond to the
of these experiments addressed a different granu- interests of three domain experts, with whom we
larity of the package-slip graph to investigate the had several interviews during the experiment de-
effects of these different granularities on the useful- sign phase. The most coarse-grained granularity
ness of the discovered patterns for domain experts. (projects) was added by us to create a balanced
In order from coarse-grained to fine-grained, these cross-section of the graph. From here on, we re-
are fer to these granularity levels as topics of interest
(ToI).
Projects, which, amongst other, are of a project All experiments were run using our implementa-
class, are held at a specific location, and during tion of the MINOS pipeline9 . Hereto, all four of
which one or more artefacts are discovered. these experiments were given the same set of hy-
perparameters (Table 1). Concretely, we set local-
Artefacts, which, amongst others, are of an arte-
neighbourhood context sampling depth (CSD) to
fact class, have dimensions and mass, consist
a maximum of three hops, and varied the similar-
of one or more parts, and are found in a cer-
ity factor (SF) over three values which range from
tain archaeological context and under specific
weakly similar to strongly similar. Therefore, each
conditions.
experiment was run three times (CSD×SF) with
Archaeological Contexts, which, amongst oth- different hyperparameters. Because different simi-
ers, are of a context class, have a geometry, a larity factors may result in distinctly different, yet
structure, and a dating, and which consist of possibly useful, patterns, we aggregated their out-
one or more subcontexts. put to produce a single result set per experiment.
Each experiment produced up to roughly 1800
Archaeological Subcontexts, which, amongst candidate rules, which was brought down from
others, have a certain shape, colour, and tex-
ture, which consist of zero or more nested sub-
structures, and which hold an interpretation of 9 gitlab.com/wxwilcke/MINOS.

7
about 36.000 on average using the default filter set- vocabulary: many of the artefact classes are in a
tings. From these 1800 rules, we created a more many-to-one relationship with specific time spans.
manageable evaluation sample by taking the top For instance, all known belly amphora stem from
fifty candidates based on their confidence (first) and the pre-classical period (612 BCE -– 480 BCE).
support (second) scores. We motivate our choice of These predefined relationships are treated as pat-
confidence as the primary measure due to the em- terns by the ARM algorithm, and, due to overrep-
phasis on producing correct, but also yet unknown, resented in the data, score high on confidence and
patterns. support. This is also true for other predefined re-
Five candidate rules are listed in Table 2. These lationships, such as those that involve geographical
have been hand picked from the evaluation sample, information: map areas are paired with their re-
and are representative of the result set as a whole. spective toponymies, and provinces are paired with
We will use these five candidates as running exam- municipalities.
ples in our data-driven evaluation. A final revelation of the evaluation set is that
both support and confidence scores alone provide a
5.1. Data-Driven Evaluation poor measure of pattern interestingness: over 96%
To assess the general effectiveness of the pipeline, and 99% of the patterns have support and confi-
we have chosen not to incorporate any dataset- dence scores close (δ ≤ 0.05) or equal to 0 or 1,
specific adjustments into the data-driven evalua- respectively. This makes it difficult to distinguish
tion. The reason for this is that we are not in- between patterns which are actually interesting and
terested in how well our approach works on this those which describe predefined and therefore un-
particular dataset, but rather how effective the sup- interesting relationships. While this difficulty is a
port and confidence metrics are as criteria for our known problem with ARM [30], the extreme form it
data-driven filter and, by extension, for the per- takes here is likely caused by the nature of the pack-
ceived interestingness of the discovered patterns to age slip data: nearly all information that might be
domain experts. For the purpose of this evaluation, interesting to domain experts (artefacts, archaeo-
we limit ourselves to the evaluation sample created logical contexts, etc.) is in a one-to-one relationship
earlier. with each other. Therefore, this information is not
An inspection of the evaluation sample reveals shared within, and also not between, package-slip
that many of the candidate rules apply to classes instances.
other than the four selected topics of interests.
Two examples of such rules are listed in Table 2: 5.2. User-Driven Evaluation
both rules R3 and R4 apply to entities of the To assess the interestingness of the produced
SiteSubDivision class, which is not explicitly rules from a domain expert’s point of view we
listed in any of the target patterns. In fact, 16% asked the local archaeological community—a Face-
of all rules in the evaluation sample apply to this book group with over 3000 members from various
class. If we look at the package-slip data model Dutch academic and commercial organizations—to
however, we observe that said class lies within three participate in an online survey (Fig. 3). At the
hops of the DHProject class. Therefore, the pres- start of this survey, participants were greeted with
ence of these unexpected classes can likely be at- a concise summary of this research and with a de-
tributed to the chosen context sampling strategy. tailed description of the task we asked them to per-
Indeed, the same is the case for all other unex- form: to rate a selection of forty candidate rules on
pected classes such as ContextStuff, Collection, plausibility (can this be true in the context of the
and Place Appellation, which apply to 11.5%, data? ), on relevancy (can this support you during
8%, and 6% of all rules in the evaluation sample. your research? ), and on newness (is this unknown
Another look at the evaluation sample reveals to you? ). Once completed, the participants were
that many of the candidate rules actually describe also asked some final questions about their famil-
very similar patterns. This is especially evident iarity with the domain, and whether this approach
with time spans, which are present in nearly half of rule mining could contribute to their studies.
(43%) of the patterns. In fact, of the five examples A five-point Likert scale was offered as an-
listed in Table 2, all but one involves a time span swer template throughout the entire survey. To
as consequent. The reason for their frequent occur- stress the importance of the participants’ opinions
rence can likely be traced back to the package-slip our Likert scale ranged from strongly disagree to
8
Table 1: Configurations of the four experiments. Column names indicate topic of interest (ToI), target pattern, number of
facts in the subset, context sampling depth (CSD), and similarity factors (SFs).

ToI Target Pattern # facts CSD SFs


Projects [( , rdf:type, :DHProject)] 21.9k 3 (0.3, 0.6, 0.9)
Artefacts [( , rdf:type, crmeh:ContextFind)] 192.8k 3 (0.3, 0.6, 0.9)
Contexts [( , rdf:type, crmeh:Context)] 82.2k 3 (0.3, 0.6, 0.9)
Subcontexts [( , rdf:type, crmeh:Context) 59.5k 3 (0.3, 0.6, 0.9)
∧ ( , pbont:trench type, )]

Table 2: Five example rules selected from the top-50 candidates of all four experiments. Each rule applies to resources of type
TYPE, and consists of an antecedent and a consequent in the form of an IF-THEN statement. The last two columns list
their confidence and support scores.

ID Semantic Association Rule Conf. Supp.


TYPE crmeh:ContextFind
R1 IF :has artefact type, :Adze 1.00 0.08
THEN crm:has time-span, [:Neolithic Period, :Bronze Age]
TYPE crmeh:ContextFind
R2 IF :has artefact type, :Raw Earthenware (Nijmeegs/Holdeurns) 1.00 0.90
THEN crm:has time-span, [:Early Roman Age, :Late Roman Age]
TYPE crmeh:SiteSubDivision
R3 IF :has location type, :Base Camp 1.00 0.26
THEN crm:has time-span, [:Neolithic Period, :New Time]
TYPE crmeh:SiteSubDivision
R4 IF :has location type, :Flint Carving 0.50 0.06
THEN crm:has time-span, [:Mesolithic Period, :Neolithic Period]
TYPE crmeh:Context
R5 IF :has trench type, :Esdek 1.00 0.71
THEN :has sample method, :Levelling

Table 3: Translation in natural language of the five example rules given in Table 2. The last three columns list their normalized
median and (highest) mode scores for plausibility (P ), relevancy (R), and novelty (N ).

ID Semantic Association Rule PM dn (PM ode ) RM dn (RM ode ) NM dn (NM ode )


R1 For every artefact holds: if it concerns an adze, then it dates from the 0.75 (0.75) 0.75 (0.75) 0.75 (0.75)
Neolithic period to the Bronze Age.
R2 For every artefact holds: if it concerns raw pottery Nijmeegs/Holdeurns, 0.75 (0.75) 0.50 (0.75) 0.25 (0.25)
then it dates from Early Roman to Late Roman period.
R3 For every site holds: if it concerns a base camp, then it dates from the 0.25 (0.25) 0.25 (0.25) 0.50 (0.75)
early Neolithic period to Recent times.
R4 For every site holds: if it concerns flint carving, then it dates from the 0.25 (0.25) 0.25 (0.25) 0.25 (0.25)
Mesolithic to Neolithic period.
R5 For every context holds: if it concerns an esdek, then the collection 0.25 (0.25) 0.25 (0.25) 0.75 (0.75)
method involves levelling.

9
strongly agree, where our questions were stated such 0.25, implying a neutral to slightly negative opin-
that a higher agreement corresponds with a higher ion about the approach as a whole. Remarks made
interestingness. Additionally, if desired, partici- by domain experts suggest that this stems from the
pants could enter further remarks as free-form text. frequent occurrence of patterns that are either too
All forty candidate rules were randomly sampled general, too trivial, or which describe predefined
from the evaluation sample—ten per experiment one-to-one or many-to-one relationships. This cor-
(stratified)—and automatically10 translated to nat- responds with our findings during the data-driven
ural language using predefined language templates evaluation. Nevertheless, a median and mode score
(see Section 3.5). We have listed five of these trans- of 0.75 was given to the final separate question
lated rules in Table 3. Each of these rules is the about the overall potential usefulness of this ap-
translation of the rule with the same identifier in proach, implying a more positive standpoint.
Table 2.
Table 5 lists the inter-rater agreements per ex-
5.2.1. Survey Analysis periment and per metric. The Krippendorff’s al-
Twenty-one people participated in our survey. A pha (αK ) is used for this purpose, as it allows
statistical analysis of their answers is provided in comparisons over ordinal data with more than two
Table 4. For each of the four experiments, this table raters. Overall, the ratings are in fair agreement
lists the normalized median and mode scores for with αK = 0.25. A similar agreement is found be-
plausibility, relevancy, and novelty. Also listed are tween the different topics of interest, which range
the overall scores per experiment and per metric, from 0.18 to 0.30. A more extreme difference is
and the score for the entire approach as a whole. found between the different metrics: a moderate
For all scores holds that higher is better. agreement on plausibility (αk = 0.41), and only
Looking at the four separate experiments it is a slight agreement on both relevancy (αK = 0.08)
clear that project and artefact related patterns are and novelty (αK = 0.09). This stark difference may
rated higher than those of contexts and subcon- be caused by the different familiarities and experi-
texts, with an overall median score of 0.50 versus ences of the domain experts: whereas plausibility
that of 0.25, respectively. Patterns about artefacts comes down to everyday archaeological knowledge,
also score relatively high on the mode (0.75), ex- novelty (and in a lesser degree: relevancy) is far
ceeding that of all other experiments (0.25). This more dependent on one’s own view of the domain.
implies a cautious positive opinion towards arte-
facts related patterns. Contributing to this posi-
We can get a better understanding of these scores
tiveness are the plausibility and relevancy ratings,
by reading the remarks that have been left by the
on which artefacts score better than any of the other
domain experts. Overall, these remarks suggest a
three experiments. Statistical tests (Kruskal-Wallis
disconnect between how the evaluation task was in-
with α = 0.01) indicate that these findings are sig-
structed and how the experts performed it: rather
nificant, and suggest a dependency between various
than assessing the patterns within the (limited)
metrics and the topics of interest: p = 7.66 × 10−13
context of the data set, our panel of experts appear
for plausibility, p = 6.97 × 10−5 for relevancy, and
to have judged the patterns against the knowledge
p = 1.50 × 10−3 for novelty.
in the archaeological domain as a whole. Numerous
The approach as a whole scores best on plausi-
patterns have therefore only been scored as implau-
bility and novelty, both of which have a median of
sible because they do not necessary hold outside of
0.50. Between these two metrics, plausibility takes
the data set. Similarly, remarks on relevance and
the cake with a mode of 0.75 versus that of 0.25
novelty seems to indicate that the raters only as-
for novelty. This implies a cautious positive opinion
sessed that exact pattern instance, and did not con-
towards the plausibility of the discovered patterns.
sider the potential of such patterns on different data
More negative are the relevancy ratings with a me-
sets. Interestingly, some experts appear to have
dian and mode of 0.25. Together, the metrics com-
used this limited window to try to understand the
bine to an overall score with median 0.50 and mode
knowledge creation process of fellow archaeologists:
whether, for instance, the choice of an unexpected
10 In some cases, minor edits were made to improve the class could indicate an alternate interpretation of
flow of the sentence. the same facts.
10
Figure 3: Collage of various screenshots of the web survey used during the user-driven evaluation. Participants were first asked
about their background and experience, and were then given a summary of this research and a detailed description of the task.
Next, they were presented with 40 candidate rules and asked to rate these on interestingness. Afterwards, participants were
also asked about their opinion on the pipeline as a whole.

Table 4: Normalized separate and overall median and (highest) mode scores for plausibility (P ), relevancy (R), and novelty
(N ) per topic of interest as provided by 21 raters.

ToI PM dn (PM ode ) RM dn (RM ode ) NM dn (NM ode ) Overall


Projects 0.50 (0.25) 0.25 (0.25) 0.50 (0.75) 0.50 (0.25)
Artefacts 0.75 (0.75) 0.50 (0.25) 0.25 (0.25) 0.50 (0.75)
Contexts 0.25 (0.25) 0.25 (0.25) 0.50 (0.75) 0.25 (0.25)
Subcontexts 0.25 (0.25) 0.25 (0.25) 0.50 (0.25) 0.25 (0.25)
Overall 0.50 (0.75) 0.25 (0.25) 0.50 (0.25) 0.50 (0.25)

Table 5: Inter-rater agreements (Krippendorff’s alpha: αK ) over all raters per topic of interest (left) and per metric (right).
Overall αK = 0.25.

ToI αK Metric αK
Projects 0.22 Plausibility 0.41
Artefacts 0.30 Relevancy 0.08
Contexts 0.24 Novelty 0.09
Subcontexts 0.18

11
6. Discussion we varied over three different values—0.3, 0.6, and
0.9—during our experiments. As explained before,
Our analysis of the survey’s results indicates that our choice for these specific values came forth from
the panel of experts was cautiously positive about our preliminary findings, which showed that these
the plausibility of the produced patterns. This different values result in distinctly different, yet po-
(slight) positiveness does not come as surprise, as tentially useful, patterns. Given the occurrence of
association rules describe the actual patterns which both too general and too specific patterns in our
exist in the data, rather than predict new ones. We evaluation set, however, we can now surmise that
can even further explain this observation by our de- the outer two values were likely too low and too
cision to order the candidate rules on confidence— high, respectively. This suggest that there exists
favoring accuracy above coverage—and because the but a narrow range of optimal values in the sim-
package-slip knowledge graph only contains curated ilarity factor spectrum for which SWARM yields
data. Given that this is the case however, we may desirable patterns. Determining this optimal range
wonder why the patterns in our evaluation sample would require a great deal of time and effort from
were not rated even more positively on plausibility. our domain experts, unfortunately, which is why no
A possible reason is the presence of errors in further preliminary experiments have been run.
the data, but these would likely be incidental and While the chosen hyperparameters thus seem to
should therefore have only a minor effect on the largely correlate with the plausibility score, our re-
mean plausibility score. A more likely reason is that sults suggests that it are the characteristics of the
we used a suboptimal set of hyperparameters dur- dataset which are correlated with the relevancy and
ing our experiments. Prime suspects are the cho- the novelty scores. On both of these scores, our
sen similarity factors, which determine how much group of experts were less positive about the pre-
groups of entities (the semantic item sets) are per- sented patterns. The remarks left reveal that many
mitted to overlap before they are combined into of these patterns were seen as trivial. Others were
more general groups (the common behaviour sets). seen as tautologies or were only thought to be appli-
Concretely, a too low value can result in overgener- cable in a specific context. These remarks support
alization: two or more item sets are erroneously at- our findings made during the data-driven evalua-
tributed the same pattern. This results in the gen- tion that the confidence and support metrics are in-
eration of rules which do not hold for all members effective measures for assessing the interestingness
of a set, despite a possible high confidence score of patterns to domain experts. Both measures are
implying the opposite, and which are thus likely to strongly influenced by the size, variety, and quirks
score poorly on plausibility in the eyes of domain of the data. Peculiarities of the package slip data,
experts. specifically its inherent hierarchical structure, are
We can solve this problem to a certain extent by likely the reason which made this issue manifest it-
increasing the similarity factor to a value at which self even more evidently in this work.
only minor differences in set members are accepted. Another reason for the unsatisfying novelty and
By doing so however, we risk trading one undesir- relevancy scores may be found in the similarity be-
able situation for another: rather than overgener- tween patterns: many of the produced rules de-
alizing, we might undergeneralize such that similar scribe variations of the same pattern, with only
item sets are prevented from forming common be- a different vocabulary concepts in the consequent.
haviour sets. In the worst case, we might even end One pattern, for example, might imply that some
up joining only very large items sets—the similar- pots are made from red clay, whereas another pat-
ity factor is covariant with set size—which have a tern might imply that other pots are made from
near perfect overlap, hence favouring sets with (p, grey clay. Our dataset contained over 7.000 of these
o)-pairs that frequently co-occur across the graph. concepts separated into 22 categories (e.g., material
Examples of such patterns were present in our eval- and period). If left unchecked, it is therefore likely
uation set, most particular those that implied rela- for similar patterns to make their way into the re-
tionships between two vocabulary concepts (types sult set.
of artefact and time spans, provinces and munici- Additionally, many of the potentially interesting
palities, etc.) which naturally belong together. data points were encoded via textual or numerical
Unfortunately, the optimal value(s) for the simi- literals. SWARM, as well almost all other methods
larity factor are unknown beforehand, which is why in this field, lacks the ability to compare literals
12
based on their raw values [15]. Instead, resources 7. Conclusion
are only compared via their URIs, which are unique
and therefore always spaced at equal distance from
each other in the search space. Because of this lim- In this work, we introduced the user-centric MI-
itation, our pipeline was unable to differentiate be- NOS pipeline for pattern mining on knowledge
tween closely (or distantly) related literals. This graphs in the humanities. With this pipeline, we
became especially apparent with geometries, mea- aim to support domain experts in their analyses
surements, and descriptions, all three of which are of such knowledge graphs by helping them dis-
abundantly found in archaeological data. cover useful and interesting patterns in their data.
Our pipeline therefore emphasizes the importance
A further reason for the novelty and relevancy of these experts and their requirements, rather than
scores may lie in with the ontology that is used in those of the usual data scientists. This has led to
the package slip data model: many of the properties several design choices, most particular of which is
defined in this model are expressed via rather long the use of generalized association rules to overcome
paths (up to five hops). This characteristic is di- the lack of transparency and interpretability—two
rectly inherited from the used CIDOC CRM ontology, key issues with technological acceptance in the
which specifies that entities and properties are to humanities—that persists with many other meth-
be linked via various events. Travelling along these ods.
long paths is computationally expensive, which is To assess the effectiveness of our pipeline we con-
why we had decided to leave such paths unexplored ducted experiments in the archaeological domain.
in favor for the more local patterns.
These experiments were designed together with do-
Zooming out from the individual scores can give main experts and were evaluated both objectively
us an idea about the perceived overall effectiveness (data driven) and subjectively (user driven). The
of the pipeline. The remarks left by the experts results indicate that the domain experts were cau-
suggest that this effectiveness may have been in- tiously positive about the plausibility of the dis-
fluenced considerable by the size and scope of the covered patterns, but less so about their novelty
dataset. Indeed, the relatively low number of inte- and their relevance to archaeological research. In-
grated package slips might not have been enough to stead, a large number of patterns were discarded
allow for well-generalized patterns to emerge, rather by our experts for describing trivialities or tautolo-
than the more-specific patterns which only make gies. Nevertheless, on average, the experts were
sense in a narrow context (e.g., within a single ex- positively surprised by the range of patterns that
cavation). Similarly, the limited scope of the data— our pipeline was able to discover, and were opti-
the package slips were supplied by just two compa- mistic about the future potential of this approach
nies, each with a particular specialisation—meant for archaeological research.
that experts with different specialisations (e.g., a During our research we encountered several
different culture or period) might have had insuffi- challenges which limited the effectiveness of our
cient experience to evaluate the discovered patterns pipeline. We were unable to address them at that
on the criteria we asked them to. time and therefore offer these as suggestions for
A final aspect worthy of noting are the relatively future work. For the most part, these challenges
low number of raters we were able to muster, de- concern the nature of the data and the inability of
spite our greatest efforts, and the effect this may the mining algorithms to exploit this. Concretely,
have had on the outcome of our evaluation: the in- the inability to 1) exploit common semantics other
fluence of possible outliers is greater, and the find- than RDF and RDFS (e.g., SKOS and, in a lesser de-
ings are less generalizable beyond our chosen use gree, CIDOC CRM), and 2) cope with knowledge en-
case. An analysis of the survey’s access logs indi- coded via literal attributes (rather than only via
cates that many potential raters did not actually the graph’s structure) which make up the majority
complete the survey, but instead gave up at an ear- of knowledge in the humanities.
lier point. From the remarks left by those who did Solving these challenges would unlock a wealth
complete the survey, we believe that this was likely of additional knowledge which is currently left un-
due to a combination of there being too many pat- used, and which can potentially lead to more useful
terns to evaluate, and the limited novelty and rele- and more interesting patterns which humanities re-
vance that these patterns provided to them. searchers can use to further their research.
13
Acknowledgements Statistical Database Management, Springer, 2010, pp.
169–177.
We wish to express our deep gratitude to domain [12] V. Tresp, M. Bundschus, A. Rettinger, Y. Huang, To-
experts Milco Wansleeben and Rein van ’t Veer wards machine learning on the semantic web., in: Ursw
for their enthusiastic encouragement and useful cri- (lncs vol.), Springer, 2008, pp. 282–314.
tiques during the various steps that have lead to [13] L. Galárraga, C. Teflioudi, K. Hose, F. M. Suchanek,
Fast rule mining in ontological knowledge bases with
this work. We also wish to thank all domain experts amie+, The VLDB Journal 24 (6) (2015) 707–730.
who participated in our survey for their willingness [14] T. Anbutamilazhagan, M. K. Selvaraj, A novel model
to sacrifice their free time, and without whom we for mining association rules from semantic web data,
Elysium Journal 1 (2).
would not have been able to complete this research.
[15] X. Wilcke, P. Bloem, V. de Boer, The knowledge graph
as the default data model for learning on heterogeneous
This research has been partially funded by knowledge, Data Science (1) (2017) 1–19.
the ARIADNE project through the European [16] R. Ramezani, M. Saraee, M. Nematbakhsh, Swapriori:
a new approach to mining association rules from se-
Commission under the Community’s Seventh mantic web data, Journal of Computing and Security 1
Framework Programme, contract no. FP7- (2014) 16.
INFRASTRUCTURES-2012-1-313193. [17] M. Barati, Q. Bai, Q. Liu, SWARM: An Approach
for Mining Semantic Association Rules from Semantic
Web Data, Springer International Publishing, Cham,
References 2016, pp. 30–43. doi:10.1007/978-3-319-42911-3_3.
URL http://dx.doi.org/10.1007/
978-3-319-42911-3_3
References
[18] R. G. Miani, C. A. Yaguinuma, M. T. Santos, M. Bia-
[1] M. Hallo, S. Luján-Mora, A. Maté, J. Trujillo, Current jiz, Narfo algorithm: Mining non-redundant and gener-
state of linked data in digital libraries, Journal of Infor- alized association rules based on fuzzy ontologies, En-
mation Science 42 (2) (2016) 117–127. terprise Information Systems (2009) 415–426.
[2] A. Rapti, D. Tsolis, S. Sioutas, A. Tsakalidis, A survey: [19] T. Kauppinen, K. Puputti, P. Paakkarinen, H. Kuitti-
Mining linked cultural heritage data, in: Proceedings nen, J. Väätäinen, E. Hyvönen, Learning and visual-
of the 16th International Conference on Engineering izing cultural heritage connections between places on
Applications of Neural Networks (INNS), ACM, 2015, the semantic web, in: Proceedings of the workshop on
p. 24. inductive reasoning and machine learning on the seman-
[3] U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, From data tic web (irmles2009), the 6th annual european semantic
mining to knowledge discovery in databases, AI maga- web conference (ESWC2009), Citeseer, 2009.
zine 17 (3) (1996) 37. [20] J. Völker, M. Niepert, Statistical schema induction, in:
[4] J. Hagood, A brief introduction to data mining projects Extended Semantic Web Conference, Springer, 2011,
pp. 124–138.
in the humanities, Bulletin of the American Society for
Information Science and Technology 38 (4) (2012) 20– [21] J. Kim, E.-K. Kim, Y. Won, S. Nam, K.-S. Choi, The
association rule mining system for acquiring knowl-
23.
[5] C. Velu, P. Vivekanadan, K. Kashwan, Indian coin edge of dbpedia from wikipedia categories., in: NLP-
DBPEDIA@ ISWC, 2015, pp. 68–80.
recognition and sum counting system of image data
mining using artificial neural networks, International [22] V. Nebot, R. Berlanga, Mining association rules from
semantic web data, in: International Conference on In-
Journal of Advanced Science and Technology 31 (2011)
67–80. dustrial, Engineering and Other Applications of Ap-
plied Intelligent Systems, Springer, 2010, pp. 504–513.
[6] K. Makantasis, A. Doulamis, N. Doulamis, M. Ioan-
nides, In the wild image retrieval and clustering for 3d [23] V. Nebot, R. Berlanga, Finding association rules in
cultural heritage landmarks reconstruction, Multimedia semantic web data, Knowledge-Based Systems 25 (1)
Tools and Applications 75 (7) (2016) 3593–3629. (2012) 51–62.
[7] L. Manovich, Trending: The promises and the chal- [24] R. G. L. Miani, E. R. H. Junior, Exploring association
lenges of big social data, Debates in the digital human- rules in a large growing knowledge base, International
ities 2 (2011) 460–475. Journal of Computer Information Systems and Indus-
[8] B. R. T. Röhle, Digital methods: Five challenges, in: trial Management Applications (2015) 106–114.
Understanding digital humanities, Springer, 2012, pp. [25] R. G. L. Miani, E. R. Hruschka, Analyzing the use of ob-
67–84. vious and generalized association rules in a large knowl-
[9] H. Selhofer, G. Geser, D2.1: First report on users needs, edge base, in: Hybrid Intelligent Systems (HIS), 2014
Tech. rep., ARIADNE (2014). 14th International Conference on, IEEE, 2014, pp. 1–6.
URL http://ariadne-infrastructure.eu/Resources/ [26] K. McGarry, A survey of interestingness measures for
D2.1-First-report-on-users-needs knowledge discovery, The knowledge engineering review
[10] H. Kamada, Digital humanities roles for libraries?, Col- 20 (1) (2005) 39–61.
lege & Research Libraries News 71 (9) (2010) 484–485. [27] A. A. Freitas, On rule interestingness measures,
[11] H.-P. Kriegel, P. Kröger, C. H. Van Der Meijden, Knowledge-Based Systems 12 (5) (1999) 309–315.
H. Obermaier, J. Peters, M. Renz, Towards archaeo- [28] C. d’Amato, A. G. Tettamanzi, T. D. Minh, Evolution-
informatics: scientific data management for archaeobi- ary discovery of multi-relational association rules from
ology, in: International Conference on Scientific and ontological knowledge bases, in: Knowledge Engineer-

14
ing and Knowledge Management: 20th International
Conference, EKAW 2016, Bologna, Italy, November 19-
23, 2016, Proceedings 20, Springer, 2016, pp. 113–128.
[29] R. Agrawal, T. Imieliński, A. Swami, Mining associa-
tion rules between sets of items in large databases, in:
Acm sigmod record, Vol. 22, ACM, 1993, pp. 207–216.
[30] M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivo-
nen, A. I. Verkamo, Finding interesting rules from large
sets of discovered association rules, in: Proceedings of
the third international conference on Information and
knowledge management, ACM, 1994, pp. 401–407.
[31] X. Wilcke, H. Dimitropoulos, D16.3: Final report on
data mining, Tech. rep., ARIADNE (2017).
URL http://ariadne-infrastructure.eu/Resources/
D16.3-Final-Report-on-Data-Mining

15

You might also like