Professional Documents
Culture Documents
Meng Wang
Southeast University
IMGpedia
2015,2017
ImageNet,
• Automatically extracting Still many unsolved
Visipedia • Semi-supervised learning
2009,2010 problems
• Discovers common sense
relationships
Semantic Web
• Interlinking Multimedia formats
• Apply Linked Data Multimodal
Principles to Multimedia Knowledge
Fragments built upon the backbone NEIL: Image Graph
of the WordNet Knowledge 2019
Miner
ESG 2013
2004
iM
2008,2009
Input
(Question)
Goal
(Answer)
Visual generalisation vs. Symbolic VQA, Commonsense QA,
generalisation KBQA,and Machine Reading Comprehension
Cognitive Theory Knowledge Graph Perspective
两大学派:神经系统+符号主义
Neural (system1) are
• powerful for some problems
• robust to data noise
• hard to understand or explain
• poor at symbol manipulation
• unclear how to effectively use
background knowledge
“神经+符号”:从知识图谱角度看认知推理的发展, 《中国计算机学会通讯》,2020年第16卷第8期
Neural+Symbolic:
• powerful machine learning paradigm
• robust to data noise
• easy to understand and assess by humans
• good at symbol manipulation
• work seamlessly with background knowledge
representation database
meta KR
learning
learning knowledge DL
deep learning graph reasoning
semantic
reinforcement GNN QA search
learning knowledge
recommendation
Neural Symbolic
1. Multimodal is complementary
多模态知识:不同模态知识描述同一个事情!!!
抽取的是知识,而不是特征!!!
Image
Text
KG
2. More cross-modal relations, more details
and more answers
Cross-modal entity grounding VQA, Complicated scene understanding
Yao Ming and Tracy McGrady of the Houston Rockets
visit Beijing in 2004
3. Cross-modal Disambiguation:
Heterogeneous in modal, but correlated in semantic
Result:
姚明(100)
威尔史密斯(39)
梅兰芳(36)
Visual Entity Disambiguation
刘欢在美国超市被偶遇,买8美元面包,
爽快接过纸笔给网友签名
hold Ball
(Spalding) Enhance KG to Multimodal KG
Person Man1
(Yao Ming)
stand next
to
Rostrum
beside at (Tian’anm
en)
detection
Man2
(McGrady)
Jacket
wear (Houston
Rockets)
Application
1. Enhance KG to Multimodal KG
wiki Bing
city Clustering
sameAs
sight person
Diversity
Google Yahoo
Image relation
KG sources image sources Feature
rpo:sameAs rpo:contain
rpo:Image
rpo:Descriptor
rpo:Imageof rpo:pixel (xsd:string) rpo:Describes
rpo:KGentity
rpo:height (xsd:float) rpo:value (xsd:string)
rpo:width (xsd:float)
rdfs:subClassOf
rpo:sourceImage
rpo:targetImage
rpo:GHD rpo:CLD
rpo:ImageSimilarity rpo:DescriptorType
rpo:CM rpo:GLCM
rpo:Similarity (xsd:float)
rpo:HOGS rpo:HOGL
City_id:wd:Q84
Name:London
rpo:imageof rpo:imageof
rpo:sameAs
<wd:Q84,wdt:P31,wd:Q515>
<rp:0000001,rpo:imageof,wd:Q84>
Visual relation ontology
<rp:0000001,rpo:sameAs,rp:0000002>
1. Enhance KG to Multimodal KG
Richpedia is G, G=(E, R), E: entities R: relations
Richpedia.cn
Person Man1
(Yao Ming)
stand next
to (Unbalanced, Long-tail)
Rostrum
beside at (Tian’anm
en)
detection
Man2
(McGrady)
Jacket
wear (Houston
Rockets)
Application
2. Visual Relation Detection
The aim of visual relation detection is to provide a comprehensive understanding of an image
by describing all the objects within the scene, and how they relate to each other
Input Output
Statistical Information
Feature Space Relation Space
Spatial and Semantic information
VTransE, CVPR 2017 Neural motifs: Scene graph parsing with global context
Language Priors, ECCV 2016 (CVPR2018)
2. Long-tail Visual Relation Detection
Our method
feature sparsity can be alleviated by using the achieve massages passing from
feature-level attention mechanism predicates or objects in different images
2. Long-tail Visual Relation Detection
Weitao Wang, Meng Wang, Sen Wang, Guodong Long, Lina Yao, and Guilin
Qi. One-Shot Learning for Long-Tail Visual Relation Detection. AAAI 2020.
2. Unbalanced Visual Relation Detection
few
num.
pizza on a plate man on an elephant boy carrying surfboard
lager
num.
pizza on plate man on elephant boy on surfboard
(a) Nonstandard label (b) Features overlap
We use the method of memory features to realize the transfer of high-frequency relation
features to low-frequency relation features.
2. Unbalanced Visual Relation Detection
Weitao Wang, Ruyang Liu, Meng Wang, Sen Wang, Xiaojun Chang, and Yang Chen.
Memory-Based Network for Scene Graph with Unbalanced Relations. 28th ACM MM 2020.
Enhance KG to Multimodal KG
hold
Cross-modal Entity Linking
Ball
(Spalding)
Person Man1
(Yao Ming)
stand next
to beside at
Rostrum
(Tian’anm
en)
(from classes to entities)
detection
Man2
(McGrady)
Jacket
wear (Houston
Rockets)
Application
实体链接 3. Cross-modal Entity Linking
3. Cross-modal Entity Linking
Learning to Rank
With
Modal Attention
3. Cross-modal Entity Linking
hold Ball
Man1
beside at Rostrum
Man2
wear Jacket
3. Cross-modal Entity Linking
Qiushuo Zheng, Meng Wang, Guilin Qi, Chaoyu Bai. Semantic Visual Entity Linking
based on Multimodal Learning. AAAI. 2021 (submitted)
n Why Multimodal?
n Construction
n Inference
n Challenges
Multi-modal KG Completion
Logan IV R L, Humeau S,
Singh S. Multimodal attribute
extraction. NIPS, 2017.
最新尝试:多模态图谱的推理 Multi-modal KG Completion
Input
Item Details
Output
[三人、白色/浅色、不带]
多模态机器翻译比目前的多模态图谱发展更早
Is the multi-modal KG really helpful?
——需要系统性地了解哪些场景下需要多模态实体链接和实体对齐,那些场景加上反而是噪声!!!
Adversarial Evaluation of Multimodal Machine Translation. EMNLP 2018. (CCF B)
—— 质疑多模态信息是否有用?系统性地概括了哪些场景需要多模态信息
This dominance effect corroborates the seminal work of Colavita (1974) in Psychophysics
where it has been demonstrated that visual stimuli dominate over the auditory stimuli when
humans are asked to perform a simple audiovisual discrimination task.
——更细粒度认证了哪些场景下多模态信息是有用的,其实90%的场景依赖图片信
息,只有10%的场景依赖于文本信息,思考现在的基于文本的多模态知识图谱为主,
图片信息为辅是否正确?
Is the multi-modal KG really helpful?
Realistic Re-evaluation of Knowledge Graph Completion Methods: An
Experimental Study. SIGMOD 2020. (CCF A)
Is the multi-modal information only needed in very specific cases for KG?
n Why Multimodal?
n Construction
n Inference
n Challenges
Challenges
• Rules
Multimodal knowledge
graph display
Business Layer:
Apache Jena
Select the Select
multimodal
corresponding
knowledge
entity information
Data layer:
AllegroGraph
Text Image
knowledge knowledge
graph graph
AllegroGraph
• AllegroGraph is a modern, high-performance, and
durable graphical database.
Multimodal knowledge
graph display
Business Layer:
Apache Jena
Select the Select
multimodal
corresponding
knowledge
entity information
Data layer:
AllegroGraph
Text Image
knowledge knowledge
graph graph
Jena
• Apache Jena is an open source Java framework
dedicated to semantic web ontology operations. It
provides RDF and SPARQL APIs to query and
modify ontology and perform ontology reasoning. It
also provides TDB and Fuseki to store and manage
triples. There are two versions Jena1 and Jena2.
Model Layer
EnhGraph Layer
Graph Layer
Using Jena
• First, you need to configure the environment on the machine,
configure the environment variables, and then the main
functions that Jena can achieve are:
1. Write RDF
2. Store RDF as XML file
3. Read RDF from XML file
4. Given URI query, model.getResource(URI).getProperty()...
5. Given any SPO query, model.listSubjectwithProperty(),
model.listStatement()...
6. Process the RDF model as a whole, including merging
RDF and disassembling RDF
Architecture
Interaction layer:
Nodejs, JavaScript
Multimodal knowledge
graph display
Business Layer:
Apache Jena
Select the Select
multimodal
corresponding
knowledge
entity information
Data layer:
AllegroGraph
Text Image
knowledge knowledge
graph graph
Online Website
l Website : www.Richpedia.cn
Download Link
SPARQL
Query Example
Query Box
SPARQL
Query Result
Query
Category
Selection
Query
Textual KG
Visual KG
Tutorial
Ontology
OWL linking
Resource
resource linking
Bottom links