You are on page 1of 16

Knowledge Graph Curation:

A Practical Framework
Elwin Huaman and Dieter Fensel
Semantic Technology Institute (STI) Innsbruck
Department of Computer Science,
University of Innsbruck, Austria

IJCKG 2021
Outline

● What?
Basics - Research questions

● How?
Approach - Solution

● Why?
Motivation

Elwin Huaman | IJCKG 2021 | 08/12/2021 2


What?
Basics - Research questions

Elwin Huaman | IJCKG 2021 | 23/11/2021 3


What are Knowledge Graphs (KGs)?
Over the last decade, creating and especially maintaining large KGs have gained attention.

Which KG
is best for
me?
What about their:
● Quality
● Correctness
● Completeness

Elwin Huaman | IJCKG 2021 | 08/12/2021 4


What is Knowledge Graph Curation?
It is part of the knowledge graph lifecycle.

How to curate KGs?

● How to assess their quality?

● How to improve their correctness?

● How to improve their completeness?

[Fensel et al., 2020]

Elwin Huaman | IJCKG 2021 | 08/12/2021 5


How?
Approach - Solution

Elwin Huaman | IJCKG 2021 | 23/11/2021 6


How to Curate Knowledge Graphs?
The first step to curate KGs is to evaluate their quality.

How to assess KGs quality?

1. Accessibility 11. Ease of understanding


2. Accuracy 12. Free-of-error
3. Appropriate amount 13. Interoperability
4. Believability 14. Objectivity
5. Completeness 15. Relevancy
6. Concise representation 16. Reputation
7. Consistent representation 17. Security
8. Cost-effectiveness 18. Timeliness
9. Ease of manipulation 19. Traceability
10. Ease of operation 20. Variety
[Fensel et al., 2020]

Elwin Huaman | IJCKG 2021 | 08/12/2021 7


How to Curate Knowledge Graphs?
The first step to curate KGs is to evaluate their quality.

How to assess KGs quality?

[Fensel et al., 2020]

Elwin Huaman | IJCKG 2021 | 08/12/2021 8


How to Curate Knowledge Graphs?
Cleaning task aims to improve the correctness of KGs.

How to improve KGs correctness?

● Verification
○ Check schema conformance
and integrity constraints.
■ RDFUnit, SHACL, ShEx,
SPIN, Stardog ICV, ...
❏ Detecting errors
● Validation
❏ Correcting errors
○ Compare with "real" world, a.k.a.
Fact Checking.
■ COPAAL, DeFacto,
FactCheck, FacTify, Leopard,
Surface, Tracy

[Fensel et al., 2020]

Elwin Huaman | IJCKG 2021 | 08/12/2021 9


How to Curate Knowledge Graphs?
Enrichment task aims to improve the completeness of KGs.

How to improve KGs completeness?

● Duplicate detection
○ Identifying duplicates of a same
entity in a single or various KGs.
■ ADEL, DDaaS, Dedupe, DuDe,
❏ Finding relevant KGs
Duke, Legato, LIMES, SERIMI, Silk,

❏ Duplicate detection

❏ Entity fusion ● Entity fusion


○ Resolving conflicting property
value assertions.
■ FAGI, Sieve, SLIPO Toolkit, …

[Fensel et al., 2020]

Elwin Huaman | IJCKG 2021 | 08/12/2021 10


How to Curate Knowledge Graphs?
Knowledge Graph Curation Framework KGs

Mapping Domain
Specif.
& Indexing

Weights

<<datastore>>
Quality Assessing Assessment
Metrics KGs Report

<<datastore>>
Verification Instance Configuration
Verifier
Report Matching Learning

Constraints
Validation Config.
Strategies Fusion
Strategies

Validator
<<datastore>> <<datastore>>
<<datastore>> [0.1] Instance [0.1] Triple Duplicates Entity Fusion
Validation Fusion
Report Validation Validation Report Report

[Fensel et al., 2020]

Elwin Huaman | IJCKG 2021 | 08/12/2021 11


Why?
Motivation

Elwin Huaman | IJCKG 2021 | 23/11/2021 12


Why Curation of Knowledge Graphs?

Are they needed? In May 2016 Joshua Brown was killed by his car because its auto
pilot mixed up a very long car (large wheelbase) with a traffic sign.

World Wide Web


E.g. Google as a Query Answering Engine

Virtual intelligent agents


E.g. Bots This is what the auto pilot “saw”.

Physical intelligent agents


E.g. Autonomous cars

Why do not connect the car with a Knowledge Graph containing


traffic data that simply knows that there is no traffic sign?

Elwin Huaman | IJCKG 2021 | 08/12/2021 13


Insights & Limitations
❏ Assessment
❏ Automation
❏ Cost-effectiveness
❏ Dynamic-data
❏ Prevention
❏ Reproducibility
❏ Re-usability
❏ User-in-the-loop
❏ Scalability

Elwin Huaman | IJCKG 2021 | 08/12/2021 14


Summary
● A practical framework
○ Assessment
■ Quality Dimensions
○ Cleaning
■ Verification
■ Validation
○ Enrichment
■ Duplicate Detection
■ Entity Fusion
● Insights and limitations

Elwin Huaman | IJCKG 2021 | 08/12/2021 15


Thank you!
@ElwinHuaman

16

You might also like