Professional Documents
Culture Documents
Case Study
W.X. Wilckea,b,∗, V. de Boerb , M.T.M. de Kleijna , F.A.H. van Harmelenb , H.J. Scholtena
a Department of Spatial Economics, Vrije Universiteit, Amsterdam, The Netherlands
b Department of Computer Science, Vrije Universiteit, Amsterdam, The Netherlands
Abstract
In recent years, there has been a growing interest from the digital humanities in knowledge graphs as
data modelling paradigm. Already, this has led to the creation of many such knowledge graphs, many of
which are now available as part of the Linked Open Data cloud. This presents new opportunities for data
mining. In this work, we develop, implement, and evaluate (both data-driven and user-driven) an end-to-
end pipeline for user-centric pattern mining on knowledge graphs in the humanities. This pipeline combines
constrained generalized association rule mining with natural language output and facet rule browsing to allow
for transparency and interpretability—two key domain requirements. Experiments in the archaeological
domain show that domain experts were positively surprised by the range of patterns that were discovered
and were overall optimistic about the future potential of this approach.
Keywords: User-Centric, Pattern Mining, Generalized Association Rules, Knowledge Graphs, Digital
Humanities, Archaeology
step, SWARM generalizes the discovered patterns allows us to further reduce the number of candidate
by exploiting class inheritance, ultimately produc- rules by tightening the constraints.
ing rules of the form ∀χ(Class(χ, c) → (P (χ, φ) → Note that the filter is run after the rules have
Q(χ, ψ))). Here, χ, φ, and ψ are entities, c a class, been produced. This is a deliberate design choice
and P and Q predicates. that allows users to revert any or all of its effects
There are several measures available to assess the without having to repeat the whole mining process,
interestingness of the resulting rules. For our data- and therefore offers more flexibility during the anal-
driven evaluation we use the well known confidence ysis of the results. Hereto, the unfiltered result set
and support measures. The confidence score of a is kept in memory.
rule X =⇒ y conveys its strength and equals the
proportion of triples (transactions) which satisfy X 3.5. Rule Browser
that also satisfy y. Its support score represents Candidate rules which pass the filter are pre-
the rule’s significance and equals the proportion of sented to users in an interactive facet browser. To
triples which satisfy both X and y. improve their interpretability, these rules are au-
tomatically translated to natural language by ex-
3.4. Dimension Reduction ploiting the label-attributes of both entities and
properties. The resulting translations are then in-
Association rule mining algorithms commonly serted into predefined sentence templates; one per
produce a large number of candidate rules, resulting language.
in their own knowledge management problem [30]. To change which rules are shown, users can
To combat this, our pipeline includes a data-driven supply additional information about their interest.
filter which discards unwanted rules based on a pre- This involves modifying the same parameters as
defined set of constraints. These constraints can in- those available for the constraints templates dis-
clude minimum and maximum values for the sup- cussed previously, and thus includes both restric-
port and confidence scores, but also restrictions on tions on assigned scores and on contents. Rules
types, on predicates, and on both entire antecedents which are deemed worthy of saving can be exported
and consequents. to a human-readable file, or can be stored in a bi-
By default, the set of constraints filters all nary file which can later be reopened by the rule
rules which are too common or too rare, or browser for further analysis and mutation.
which are generally unwanted. Typical exam- For our implementation of the pipeline we opted
ples of such unwanted rules are those which in- for a virtual-terminal interface, which offered us the
clude domain-independent relations—owl:sameAs, necessary balance between simplicity and flexibility
skos:inSchema, dcterms:medium, etc—which do needed during our interactive sessions with domain
not contribute to the pattern mining process. Op- experts. If desired however, a full-fledged web inter-
tionally, the default filter can be overridden by pro- face can be used instead by running the rule browser
viding a custom constraints template at start. This via a client-side environment.
5
4. The package-slip Knowledge Graph project is of the type DHProject (Dutch Heritage
Project, a subclass of CRM-EH’s EHProject) and
Excavation data is a valuable source of informa- links to the discovered artefacts via production
tion in many archaeological studies [9]. These stud- events. To classify these artefacts, and all related
ies are therefore likely to benefit from pattern min- archaeological contexts, the package-slip graph in-
ing on this type of data, and thus make it a suitable cludes a multi-schema SKOS vocabulary which con-
choice to base our case study on. In agreement with tains more than 7.000 archaeological concepts.
domain experts, we therefore selected the package- A small and partial example of a package slip
slip knowledge graph to run our experiments with. is depicted in Figure 2. There, a single artefact
Package slips are detailed summarizations of en- (crmeh:ContextFind) is shown with its archaeo-
tire excavation projects. They are structured as logical context (crmeh:Context). The artefact also
specified by the SIKB Protocol 0102, which is a holds several attributes of various types, and is it-
Dutch standard on modelling and sharing excava- self part of a unit (crm:Collection). These units
tion data of various degree of granularity5 . Specif- are virtual containers that group artefacts which
ically, package slips capture the following aspects: were found in the same archaeological context. In
• General information about individuals, compa- contrast, bulk finds (crmeh:BulkFind) are physi-
nies and organisations which are involved, as cal containers, often a box, which group artefacts
well as the various locations at which the ex- with similar storage requirements (e.g. on humid-
cavations took place. ity, temperature, or oxygen). These are linked to a
Dutch Heritage project via a production event.
• Final and intermediate reports made during
the project, as well as different forms of media
4.1. Knowledge Integration
such as photos, drawings, and videos. Each of
these is accompanied by meta-data and (cross)
At present6 , the package-slip knowledge
references to their respective file, subject, and
graph7 contains the aggregated data from lit-
mentioning.
tle over 70 package slips, totaling roughly 425.000
• Detailed information about all artefacts dis- triples [31]. These package slips were converted
covered during the project, as well as their from existing XML files8 , and were subsequently
(geospatial and stratigraphic) relation and the integrated into a single coherent graph by using
archaeological context in which they were a central triple store which acted as an authority
found. server: hash values were generated for all entities
in a package slip—starting from the outer edge of
• Fine-grained information about the precise lo- the graph and recursively updating all encountered
cations and their geometries at which artefacts parents—which were then cross referenced with
were discovered, archaeological contexts were those on the server. By using this approach,
observed, and where media was created. we were able to unify different resources about
the same thing, most particular about people,
In its specification, the SIKB protocol makes use companies, and locations.
of the Extensible Markup Language (XML) to de-
The size of the package-slip knowledge graph is
fine the various concepts that make up a package
expected to grow consistently, as the production of
slip. The limitations of this language for sharing
package slips has recently been made mandatory
and integrating data motivated the Dutch ARI-
for all Dutch institutes and companies which are
ADNE partners, amongst whom were we, to design
involved with archaeological data. This increases
an alternative data model based on semantic web
the relevancy of this case study for the Dutch ar-
standards.
chaeological community, and thus adds support for
The semantic package slip is primarily built
our choice of the package slip.
on top of the CIDOC conceptual reference model
(CRM) and its archaeological extension CRM
English Heritage (CRM-EH). Every excavation 6 Situation on July 2018
8 The conversion tool is available at gitlab.com/
wxwilcke/pakbonLD
5 www.sikb.nl/datastandaarden/richtlijnen/sikb0102 8 pakbon-ld.spider.d2s.labs.vu.nl
6
Figure 2: Small and partial example from the package slip knowledge graph. Each project produces (amongst others) one
or more containers. These containers are composed of one or more bulk finds, which in turn are composed of one or more
artefacts. Each of these artefacts is part of a unit of which its members share a common (multilayered) context. Note that,
in this figure, solid and opaque circles represent instance and vocabulary resources, respectively. Three dots represent literals,
and opaque boxes represent classes
7
about 36.000 on average using the default filter set- vocabulary: many of the artefact classes are in a
tings. From these 1800 rules, we created a more many-to-one relationship with specific time spans.
manageable evaluation sample by taking the top For instance, all known belly amphora stem from
fifty candidates based on their confidence (first) and the pre-classical period (612 BCE -– 480 BCE).
support (second) scores. We motivate our choice of These predefined relationships are treated as pat-
confidence as the primary measure due to the em- terns by the ARM algorithm, and, due to overrep-
phasis on producing correct, but also yet unknown, resented in the data, score high on confidence and
patterns. support. This is also true for other predefined re-
Five candidate rules are listed in Table 2. These lationships, such as those that involve geographical
have been hand picked from the evaluation sample, information: map areas are paired with their re-
and are representative of the result set as a whole. spective toponymies, and provinces are paired with
We will use these five candidates as running exam- municipalities.
ples in our data-driven evaluation. A final revelation of the evaluation set is that
both support and confidence scores alone provide a
5.1. Data-Driven Evaluation poor measure of pattern interestingness: over 96%
To assess the general effectiveness of the pipeline, and 99% of the patterns have support and confi-
we have chosen not to incorporate any dataset- dence scores close (δ ≤ 0.05) or equal to 0 or 1,
specific adjustments into the data-driven evalua- respectively. This makes it difficult to distinguish
tion. The reason for this is that we are not in- between patterns which are actually interesting and
terested in how well our approach works on this those which describe predefined and therefore un-
particular dataset, but rather how effective the sup- interesting relationships. While this difficulty is a
port and confidence metrics are as criteria for our known problem with ARM [30], the extreme form it
data-driven filter and, by extension, for the per- takes here is likely caused by the nature of the pack-
ceived interestingness of the discovered patterns to age slip data: nearly all information that might be
domain experts. For the purpose of this evaluation, interesting to domain experts (artefacts, archaeo-
we limit ourselves to the evaluation sample created logical contexts, etc.) is in a one-to-one relationship
earlier. with each other. Therefore, this information is not
An inspection of the evaluation sample reveals shared within, and also not between, package-slip
that many of the candidate rules apply to classes instances.
other than the four selected topics of interests.
Two examples of such rules are listed in Table 2: 5.2. User-Driven Evaluation
both rules R3 and R4 apply to entities of the To assess the interestingness of the produced
SiteSubDivision class, which is not explicitly rules from a domain expert’s point of view we
listed in any of the target patterns. In fact, 16% asked the local archaeological community—a Face-
of all rules in the evaluation sample apply to this book group with over 3000 members from various
class. If we look at the package-slip data model Dutch academic and commercial organizations—to
however, we observe that said class lies within three participate in an online survey (Fig. 3). At the
hops of the DHProject class. Therefore, the pres- start of this survey, participants were greeted with
ence of these unexpected classes can likely be at- a concise summary of this research and with a de-
tributed to the chosen context sampling strategy. tailed description of the task we asked them to per-
Indeed, the same is the case for all other unex- form: to rate a selection of forty candidate rules on
pected classes such as ContextStuff, Collection, plausibility (can this be true in the context of the
and Place Appellation, which apply to 11.5%, data? ), on relevancy (can this support you during
8%, and 6% of all rules in the evaluation sample. your research? ), and on newness (is this unknown
Another look at the evaluation sample reveals to you? ). Once completed, the participants were
that many of the candidate rules actually describe also asked some final questions about their famil-
very similar patterns. This is especially evident iarity with the domain, and whether this approach
with time spans, which are present in nearly half of rule mining could contribute to their studies.
(43%) of the patterns. In fact, of the five examples A five-point Likert scale was offered as an-
listed in Table 2, all but one involves a time span swer template throughout the entire survey. To
as consequent. The reason for their frequent occur- stress the importance of the participants’ opinions
rence can likely be traced back to the package-slip our Likert scale ranged from strongly disagree to
8
Table 1: Configurations of the four experiments. Column names indicate topic of interest (ToI), target pattern, number of
facts in the subset, context sampling depth (CSD), and similarity factors (SFs).
Table 2: Five example rules selected from the top-50 candidates of all four experiments. Each rule applies to resources of type
TYPE, and consists of an antecedent and a consequent in the form of an IF-THEN statement. The last two columns list
their confidence and support scores.
Table 3: Translation in natural language of the five example rules given in Table 2. The last three columns list their normalized
median and (highest) mode scores for plausibility (P ), relevancy (R), and novelty (N ).
9
strongly agree, where our questions were stated such 0.25, implying a neutral to slightly negative opin-
that a higher agreement corresponds with a higher ion about the approach as a whole. Remarks made
interestingness. Additionally, if desired, partici- by domain experts suggest that this stems from the
pants could enter further remarks as free-form text. frequent occurrence of patterns that are either too
All forty candidate rules were randomly sampled general, too trivial, or which describe predefined
from the evaluation sample—ten per experiment one-to-one or many-to-one relationships. This cor-
(stratified)—and automatically10 translated to nat- responds with our findings during the data-driven
ural language using predefined language templates evaluation. Nevertheless, a median and mode score
(see Section 3.5). We have listed five of these trans- of 0.75 was given to the final separate question
lated rules in Table 3. Each of these rules is the about the overall potential usefulness of this ap-
translation of the rule with the same identifier in proach, implying a more positive standpoint.
Table 2.
Table 5 lists the inter-rater agreements per ex-
5.2.1. Survey Analysis periment and per metric. The Krippendorff’s al-
Twenty-one people participated in our survey. A pha (αK ) is used for this purpose, as it allows
statistical analysis of their answers is provided in comparisons over ordinal data with more than two
Table 4. For each of the four experiments, this table raters. Overall, the ratings are in fair agreement
lists the normalized median and mode scores for with αK = 0.25. A similar agreement is found be-
plausibility, relevancy, and novelty. Also listed are tween the different topics of interest, which range
the overall scores per experiment and per metric, from 0.18 to 0.30. A more extreme difference is
and the score for the entire approach as a whole. found between the different metrics: a moderate
For all scores holds that higher is better. agreement on plausibility (αk = 0.41), and only
Looking at the four separate experiments it is a slight agreement on both relevancy (αK = 0.08)
clear that project and artefact related patterns are and novelty (αK = 0.09). This stark difference may
rated higher than those of contexts and subcon- be caused by the different familiarities and experi-
texts, with an overall median score of 0.50 versus ences of the domain experts: whereas plausibility
that of 0.25, respectively. Patterns about artefacts comes down to everyday archaeological knowledge,
also score relatively high on the mode (0.75), ex- novelty (and in a lesser degree: relevancy) is far
ceeding that of all other experiments (0.25). This more dependent on one’s own view of the domain.
implies a cautious positive opinion towards arte-
facts related patterns. Contributing to this posi-
We can get a better understanding of these scores
tiveness are the plausibility and relevancy ratings,
by reading the remarks that have been left by the
on which artefacts score better than any of the other
domain experts. Overall, these remarks suggest a
three experiments. Statistical tests (Kruskal-Wallis
disconnect between how the evaluation task was in-
with α = 0.01) indicate that these findings are sig-
structed and how the experts performed it: rather
nificant, and suggest a dependency between various
than assessing the patterns within the (limited)
metrics and the topics of interest: p = 7.66 × 10−13
context of the data set, our panel of experts appear
for plausibility, p = 6.97 × 10−5 for relevancy, and
to have judged the patterns against the knowledge
p = 1.50 × 10−3 for novelty.
in the archaeological domain as a whole. Numerous
The approach as a whole scores best on plausi-
patterns have therefore only been scored as implau-
bility and novelty, both of which have a median of
sible because they do not necessary hold outside of
0.50. Between these two metrics, plausibility takes
the data set. Similarly, remarks on relevance and
the cake with a mode of 0.75 versus that of 0.25
novelty seems to indicate that the raters only as-
for novelty. This implies a cautious positive opinion
sessed that exact pattern instance, and did not con-
towards the plausibility of the discovered patterns.
sider the potential of such patterns on different data
More negative are the relevancy ratings with a me-
sets. Interestingly, some experts appear to have
dian and mode of 0.25. Together, the metrics com-
used this limited window to try to understand the
bine to an overall score with median 0.50 and mode
knowledge creation process of fellow archaeologists:
whether, for instance, the choice of an unexpected
10 In some cases, minor edits were made to improve the class could indicate an alternate interpretation of
flow of the sentence. the same facts.
10
Figure 3: Collage of various screenshots of the web survey used during the user-driven evaluation. Participants were first asked
about their background and experience, and were then given a summary of this research and a detailed description of the task.
Next, they were presented with 40 candidate rules and asked to rate these on interestingness. Afterwards, participants were
also asked about their opinion on the pipeline as a whole.
Table 4: Normalized separate and overall median and (highest) mode scores for plausibility (P ), relevancy (R), and novelty
(N ) per topic of interest as provided by 21 raters.
Table 5: Inter-rater agreements (Krippendorff’s alpha: αK ) over all raters per topic of interest (left) and per metric (right).
Overall αK = 0.25.
ToI αK Metric αK
Projects 0.22 Plausibility 0.41
Artefacts 0.30 Relevancy 0.08
Contexts 0.24 Novelty 0.09
Subcontexts 0.18
11
6. Discussion we varied over three different values—0.3, 0.6, and
0.9—during our experiments. As explained before,
Our analysis of the survey’s results indicates that our choice for these specific values came forth from
the panel of experts was cautiously positive about our preliminary findings, which showed that these
the plausibility of the produced patterns. This different values result in distinctly different, yet po-
(slight) positiveness does not come as surprise, as tentially useful, patterns. Given the occurrence of
association rules describe the actual patterns which both too general and too specific patterns in our
exist in the data, rather than predict new ones. We evaluation set, however, we can now surmise that
can even further explain this observation by our de- the outer two values were likely too low and too
cision to order the candidate rules on confidence— high, respectively. This suggest that there exists
favoring accuracy above coverage—and because the but a narrow range of optimal values in the sim-
package-slip knowledge graph only contains curated ilarity factor spectrum for which SWARM yields
data. Given that this is the case however, we may desirable patterns. Determining this optimal range
wonder why the patterns in our evaluation sample would require a great deal of time and effort from
were not rated even more positively on plausibility. our domain experts, unfortunately, which is why no
A possible reason is the presence of errors in further preliminary experiments have been run.
the data, but these would likely be incidental and While the chosen hyperparameters thus seem to
should therefore have only a minor effect on the largely correlate with the plausibility score, our re-
mean plausibility score. A more likely reason is that sults suggests that it are the characteristics of the
we used a suboptimal set of hyperparameters dur- dataset which are correlated with the relevancy and
ing our experiments. Prime suspects are the cho- the novelty scores. On both of these scores, our
sen similarity factors, which determine how much group of experts were less positive about the pre-
groups of entities (the semantic item sets) are per- sented patterns. The remarks left reveal that many
mitted to overlap before they are combined into of these patterns were seen as trivial. Others were
more general groups (the common behaviour sets). seen as tautologies or were only thought to be appli-
Concretely, a too low value can result in overgener- cable in a specific context. These remarks support
alization: two or more item sets are erroneously at- our findings made during the data-driven evalua-
tributed the same pattern. This results in the gen- tion that the confidence and support metrics are in-
eration of rules which do not hold for all members effective measures for assessing the interestingness
of a set, despite a possible high confidence score of patterns to domain experts. Both measures are
implying the opposite, and which are thus likely to strongly influenced by the size, variety, and quirks
score poorly on plausibility in the eyes of domain of the data. Peculiarities of the package slip data,
experts. specifically its inherent hierarchical structure, are
We can solve this problem to a certain extent by likely the reason which made this issue manifest it-
increasing the similarity factor to a value at which self even more evidently in this work.
only minor differences in set members are accepted. Another reason for the unsatisfying novelty and
By doing so however, we risk trading one undesir- relevancy scores may be found in the similarity be-
able situation for another: rather than overgener- tween patterns: many of the produced rules de-
alizing, we might undergeneralize such that similar scribe variations of the same pattern, with only
item sets are prevented from forming common be- a different vocabulary concepts in the consequent.
haviour sets. In the worst case, we might even end One pattern, for example, might imply that some
up joining only very large items sets—the similar- pots are made from red clay, whereas another pat-
ity factor is covariant with set size—which have a tern might imply that other pots are made from
near perfect overlap, hence favouring sets with (p, grey clay. Our dataset contained over 7.000 of these
o)-pairs that frequently co-occur across the graph. concepts separated into 22 categories (e.g., material
Examples of such patterns were present in our eval- and period). If left unchecked, it is therefore likely
uation set, most particular those that implied rela- for similar patterns to make their way into the re-
tionships between two vocabulary concepts (types sult set.
of artefact and time spans, provinces and munici- Additionally, many of the potentially interesting
palities, etc.) which naturally belong together. data points were encoded via textual or numerical
Unfortunately, the optimal value(s) for the simi- literals. SWARM, as well almost all other methods
larity factor are unknown beforehand, which is why in this field, lacks the ability to compare literals
12
based on their raw values [15]. Instead, resources 7. Conclusion
are only compared via their URIs, which are unique
and therefore always spaced at equal distance from
each other in the search space. Because of this lim- In this work, we introduced the user-centric MI-
itation, our pipeline was unable to differentiate be- NOS pipeline for pattern mining on knowledge
tween closely (or distantly) related literals. This graphs in the humanities. With this pipeline, we
became especially apparent with geometries, mea- aim to support domain experts in their analyses
surements, and descriptions, all three of which are of such knowledge graphs by helping them dis-
abundantly found in archaeological data. cover useful and interesting patterns in their data.
Our pipeline therefore emphasizes the importance
A further reason for the novelty and relevancy of these experts and their requirements, rather than
scores may lie in with the ontology that is used in those of the usual data scientists. This has led to
the package slip data model: many of the properties several design choices, most particular of which is
defined in this model are expressed via rather long the use of generalized association rules to overcome
paths (up to five hops). This characteristic is di- the lack of transparency and interpretability—two
rectly inherited from the used CIDOC CRM ontology, key issues with technological acceptance in the
which specifies that entities and properties are to humanities—that persists with many other meth-
be linked via various events. Travelling along these ods.
long paths is computationally expensive, which is To assess the effectiveness of our pipeline we con-
why we had decided to leave such paths unexplored ducted experiments in the archaeological domain.
in favor for the more local patterns.
These experiments were designed together with do-
Zooming out from the individual scores can give main experts and were evaluated both objectively
us an idea about the perceived overall effectiveness (data driven) and subjectively (user driven). The
of the pipeline. The remarks left by the experts results indicate that the domain experts were cau-
suggest that this effectiveness may have been in- tiously positive about the plausibility of the dis-
fluenced considerable by the size and scope of the covered patterns, but less so about their novelty
dataset. Indeed, the relatively low number of inte- and their relevance to archaeological research. In-
grated package slips might not have been enough to stead, a large number of patterns were discarded
allow for well-generalized patterns to emerge, rather by our experts for describing trivialities or tautolo-
than the more-specific patterns which only make gies. Nevertheless, on average, the experts were
sense in a narrow context (e.g., within a single ex- positively surprised by the range of patterns that
cavation). Similarly, the limited scope of the data— our pipeline was able to discover, and were opti-
the package slips were supplied by just two compa- mistic about the future potential of this approach
nies, each with a particular specialisation—meant for archaeological research.
that experts with different specialisations (e.g., a During our research we encountered several
different culture or period) might have had insuffi- challenges which limited the effectiveness of our
cient experience to evaluate the discovered patterns pipeline. We were unable to address them at that
on the criteria we asked them to. time and therefore offer these as suggestions for
A final aspect worthy of noting are the relatively future work. For the most part, these challenges
low number of raters we were able to muster, de- concern the nature of the data and the inability of
spite our greatest efforts, and the effect this may the mining algorithms to exploit this. Concretely,
have had on the outcome of our evaluation: the in- the inability to 1) exploit common semantics other
fluence of possible outliers is greater, and the find- than RDF and RDFS (e.g., SKOS and, in a lesser de-
ings are less generalizable beyond our chosen use gree, CIDOC CRM), and 2) cope with knowledge en-
case. An analysis of the survey’s access logs indi- coded via literal attributes (rather than only via
cates that many potential raters did not actually the graph’s structure) which make up the majority
complete the survey, but instead gave up at an ear- of knowledge in the humanities.
lier point. From the remarks left by those who did Solving these challenges would unlock a wealth
complete the survey, we believe that this was likely of additional knowledge which is currently left un-
due to a combination of there being too many pat- used, and which can potentially lead to more useful
terns to evaluate, and the limited novelty and rele- and more interesting patterns which humanities re-
vance that these patterns provided to them. searchers can use to further their research.
13
Acknowledgements Statistical Database Management, Springer, 2010, pp.
169–177.
We wish to express our deep gratitude to domain [12] V. Tresp, M. Bundschus, A. Rettinger, Y. Huang, To-
experts Milco Wansleeben and Rein van ’t Veer wards machine learning on the semantic web., in: Ursw
for their enthusiastic encouragement and useful cri- (lncs vol.), Springer, 2008, pp. 282–314.
tiques during the various steps that have lead to [13] L. Galárraga, C. Teflioudi, K. Hose, F. M. Suchanek,
Fast rule mining in ontological knowledge bases with
this work. We also wish to thank all domain experts amie+, The VLDB Journal 24 (6) (2015) 707–730.
who participated in our survey for their willingness [14] T. Anbutamilazhagan, M. K. Selvaraj, A novel model
to sacrifice their free time, and without whom we for mining association rules from semantic web data,
Elysium Journal 1 (2).
would not have been able to complete this research.
[15] X. Wilcke, P. Bloem, V. de Boer, The knowledge graph
as the default data model for learning on heterogeneous
This research has been partially funded by knowledge, Data Science (1) (2017) 1–19.
the ARIADNE project through the European [16] R. Ramezani, M. Saraee, M. Nematbakhsh, Swapriori:
a new approach to mining association rules from se-
Commission under the Community’s Seventh mantic web data, Journal of Computing and Security 1
Framework Programme, contract no. FP7- (2014) 16.
INFRASTRUCTURES-2012-1-313193. [17] M. Barati, Q. Bai, Q. Liu, SWARM: An Approach
for Mining Semantic Association Rules from Semantic
Web Data, Springer International Publishing, Cham,
References 2016, pp. 30–43. doi:10.1007/978-3-319-42911-3_3.
URL http://dx.doi.org/10.1007/
978-3-319-42911-3_3
References
[18] R. G. Miani, C. A. Yaguinuma, M. T. Santos, M. Bia-
[1] M. Hallo, S. Luján-Mora, A. Maté, J. Trujillo, Current jiz, Narfo algorithm: Mining non-redundant and gener-
state of linked data in digital libraries, Journal of Infor- alized association rules based on fuzzy ontologies, En-
mation Science 42 (2) (2016) 117–127. terprise Information Systems (2009) 415–426.
[2] A. Rapti, D. Tsolis, S. Sioutas, A. Tsakalidis, A survey: [19] T. Kauppinen, K. Puputti, P. Paakkarinen, H. Kuitti-
Mining linked cultural heritage data, in: Proceedings nen, J. Väätäinen, E. Hyvönen, Learning and visual-
of the 16th International Conference on Engineering izing cultural heritage connections between places on
Applications of Neural Networks (INNS), ACM, 2015, the semantic web, in: Proceedings of the workshop on
p. 24. inductive reasoning and machine learning on the seman-
[3] U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, From data tic web (irmles2009), the 6th annual european semantic
mining to knowledge discovery in databases, AI maga- web conference (ESWC2009), Citeseer, 2009.
zine 17 (3) (1996) 37. [20] J. Völker, M. Niepert, Statistical schema induction, in:
[4] J. Hagood, A brief introduction to data mining projects Extended Semantic Web Conference, Springer, 2011,
pp. 124–138.
in the humanities, Bulletin of the American Society for
Information Science and Technology 38 (4) (2012) 20– [21] J. Kim, E.-K. Kim, Y. Won, S. Nam, K.-S. Choi, The
association rule mining system for acquiring knowl-
23.
[5] C. Velu, P. Vivekanadan, K. Kashwan, Indian coin edge of dbpedia from wikipedia categories., in: NLP-
DBPEDIA@ ISWC, 2015, pp. 68–80.
recognition and sum counting system of image data
mining using artificial neural networks, International [22] V. Nebot, R. Berlanga, Mining association rules from
semantic web data, in: International Conference on In-
Journal of Advanced Science and Technology 31 (2011)
67–80. dustrial, Engineering and Other Applications of Ap-
plied Intelligent Systems, Springer, 2010, pp. 504–513.
[6] K. Makantasis, A. Doulamis, N. Doulamis, M. Ioan-
nides, In the wild image retrieval and clustering for 3d [23] V. Nebot, R. Berlanga, Finding association rules in
cultural heritage landmarks reconstruction, Multimedia semantic web data, Knowledge-Based Systems 25 (1)
Tools and Applications 75 (7) (2016) 3593–3629. (2012) 51–62.
[7] L. Manovich, Trending: The promises and the chal- [24] R. G. L. Miani, E. R. H. Junior, Exploring association
lenges of big social data, Debates in the digital human- rules in a large growing knowledge base, International
ities 2 (2011) 460–475. Journal of Computer Information Systems and Indus-
[8] B. R. T. Röhle, Digital methods: Five challenges, in: trial Management Applications (2015) 106–114.
Understanding digital humanities, Springer, 2012, pp. [25] R. G. L. Miani, E. R. Hruschka, Analyzing the use of ob-
67–84. vious and generalized association rules in a large knowl-
[9] H. Selhofer, G. Geser, D2.1: First report on users needs, edge base, in: Hybrid Intelligent Systems (HIS), 2014
Tech. rep., ARIADNE (2014). 14th International Conference on, IEEE, 2014, pp. 1–6.
URL http://ariadne-infrastructure.eu/Resources/ [26] K. McGarry, A survey of interestingness measures for
D2.1-First-report-on-users-needs knowledge discovery, The knowledge engineering review
[10] H. Kamada, Digital humanities roles for libraries?, Col- 20 (1) (2005) 39–61.
lege & Research Libraries News 71 (9) (2010) 484–485. [27] A. A. Freitas, On rule interestingness measures,
[11] H.-P. Kriegel, P. Kröger, C. H. Van Der Meijden, Knowledge-Based Systems 12 (5) (1999) 309–315.
H. Obermaier, J. Peters, M. Renz, Towards archaeo- [28] C. d’Amato, A. G. Tettamanzi, T. D. Minh, Evolution-
informatics: scientific data management for archaeobi- ary discovery of multi-relational association rules from
ology, in: International Conference on Scientific and ontological knowledge bases, in: Knowledge Engineer-
14
ing and Knowledge Management: 20th International
Conference, EKAW 2016, Bologna, Italy, November 19-
23, 2016, Proceedings 20, Springer, 2016, pp. 113–128.
[29] R. Agrawal, T. Imieliński, A. Swami, Mining associa-
tion rules between sets of items in large databases, in:
Acm sigmod record, Vol. 22, ACM, 1993, pp. 207–216.
[30] M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivo-
nen, A. I. Verkamo, Finding interesting rules from large
sets of discovered association rules, in: Proceedings of
the third international conference on Information and
knowledge management, ACM, 1994, pp. 401–407.
[31] X. Wilcke, H. Dimitropoulos, D16.3: Final report on
data mining, Tech. rep., ARIADNE (2017).
URL http://ariadne-infrastructure.eu/Resources/
D16.3-Final-Report-on-Data-Mining
15