You are on page 1of 20

“Semantic Search – Fact and Fiction” Workshop

“Semantic Idol” Demonstration Booklet

Thursday November 12, 2009


Friday November 13, 2009

Roma / Italy
Planning

Selected project will have:


- 5 minutes introduction
- 10 minutes demo
- 5 minutes Q&A
Due to the tight planning, no overtime will be authorized.

The expert panel will be composed of:


- Wernher Behrendt
(http://www.salzburgresearch.at/contact/team_detail.php?person=35 ): Semantic
Expert. Werner will represent the interests of IKS
- Sandro Groganz (http://sandro.groganz.com/). Open Source Marketing expert.
Sandro will represent the interests of the end-users.
- Stéphane Croisier (http://stephanecroisier.jahia.com/). Co-Founder and Product
Strategy Manager at Jahia. Stéphane will represent the interests of the CMS
industry.

Thursday November 12, 2009


17.15 - 18:30

1. Deri (Axel Polleres - http://axel.deri.ie/~axepol/ - and Juergen Umbrich)


2. Trialox (Reto Bachmann-Gmür - http://trialox.org/ )
3. Kiwi (Rolf Sint - http://showcase.kiwi-
project.eu/KiWi/wiki/home.seam?cid=9588 )

Friday November 13, 2009


10:00 - 13:00

4. Yahoo! Research (Peter Mika - http://research.yahoo.com/Peter_Mika )


5. Salsadev (Stéphane Gamard - http://www.salsadev.com/ )
6. Scribo / Nuxeo (Olivier Grisel and Stefane Fermigier - http://www.scribo.ws /
http://www.nuxeo.com )
7. Zemanta (Tomaž Šolc - http://www.zemanta.com/)
8. Trezorix (Sander van der Meulen - http://www.trezorix.com )
9. Sourcesense (Tommaso Teofili - http://www.sourcesense.com/ )
10. Semantic Technology LAB (Aldo Gangemi and Alfio Massimiliano Gliozzo -
http://stlab.istc.cnr.it/ )
11. Semantic MediaWiki (Tran Duc Thanh, Markus Kroetsch - Karlsruher Institut
für Technologie - http://www.aifb.uni-karlsruhe.de/ )

(Moderated by Stéphane Croisier from Jahia – http://www.jahia.com )

-2-
Some related user stories

http://wiki.iks-project.eu/index.php/User-stories
Feel free to add new user stories to complete the use cases.

Story 01: Search and Disambiguation in Docs


I have a collection of 30'000 documents, and I want to find the five documents that talk
about or where edited by John Smith. Problem is, there are three John Smiths in my
company, and the two others appear in lots of documents.

Story 03: Similarity-based Image Search


I'm working with a digital asset management system, and I want to find images that are
similar to the one I'm looking at, either in terms of the real-world objects that the images
represent, or in terms or graphical similarity (colors, shapes, etc.)

Story 04: Spatio-temporal Content Queries in near-natural Language


When visiting a house rental website, I can formulate queries like “recent pages that talk
about houses to rent in the French part of Switzerland” and the website search engine
understands them.

Story 05: Assistance with Semantic Tagging


To create content in my CMS, I type plain text, and the system offers a list of tags that
describe my content, and a list of links to entities (people, companies, etc.) that my text
talks about. I can then interactively refine those lists of tags and links.

Story 06: Context-aware Content Delivery


I'm a hotel manager and I'm adding info about a music show that takes place in my hotel
next Friday. Internet users should be able to find this info using queries like "events that
take place at the end of next week within 10km of where I am now", without having to
know about my website.

Story 09: Similarity based document search


I consult or create a new document in my CMS by typing in a HTML edit form or by
uploading a document with textual content (PDF, office file, XML file, ...). I want the
user interface to show the list of the 5 most similar documents already in the CMS based
on the latent semantic meaning of the terms occurring in those document without having
to manually tag structured document properties such as DublinCore : subjects.

-3-
1st Participant: DERI
Thursday November 12, 2009: 17h20 – 17h40

Fri, 03 Jul 2009 from Stephane Corlosquet <stephane.corlosquet at deri dot org>

Below is the architecture that DERI would like to suggest for the IKS Semantic Search
Engine. The figure [1] contains a set of CMS sites complying to the best practises of RDF
data publishing, which include RDFa, a local schema export (site vocabulary), a
SPARQL endpoint. We have worked on a set of modules for Drupal detailed in a
technical report at [2], but their features could be generalized to other CMSs. The sites
can request to be included in the IKS search engine via a form on the IKS search engine
site or programmatically via a ping. Pings are also used in the case where a specific
resource/page has been updated on a given site in order for the search engine to schedule
a recrawl of the resource as soon as possible.

The semantic search engine stack is composed of several layers of data gathering,
parsing, validation and indexing. The search engine first gathers the data by crawling the
sites, it then parses the RDF data with the any23 parser [3], a java library that extracts
structured data in RDF format from a variety of Web documents (supports microformats,
RDFa and other common RDF serialization formats). If needed, the NxParser [4] cleans
up the data and formats it in n-quads [5]. Before a site can be included in the IKS search
engine, it first goes through the RDFAlerts validator, which ensures the RDF data
contained in the sites complies with the RDF publishing best practices. RDFAlerts also
does some RDF consistency checking. Additionally, other IKS specific policies regarding
the sites included in the search engine could be added here. Finally, the SWSE engine [6]
takes care of the indexing and storage of the data. Powered by YARS2, it provides
distributed storage and retrieval facilities. Indexing structures are optimized for retrieval
of RDF statements including context (quads) while minimizing the need for joins, plus
Lucene fulltext indexing for efficient keyword searches. SWSE's SPARQL endpoint
allows to plugin any RDF visualization tool, e.g. VisiNav [7] for example. See the
screencast at [8] (1'36) for the possibilities offered by VisiNav.

[1] http://srvgal65.deri.ie/files/iks_search_engine_cloud.pdf
[2] http://www.deri.ie/fileadmin/documents/DERI-TR-2009-04-30.pdf
[3] http://code.google.com/p/any23/
[4] http://sw.deri.org/2006/08/nxparser/
[5] http://sw.deri.org/2008/07/n-quads/
[6] http://www.swse.org/
[7] http://visinav.deri.org/
[8] http://www.youtube.com/watch?v=r4WgTRIRoa0

Mon, 05 Oct 2009 - From: Axel Polleres <axel.polleres at deri dot org>

First of all kindest apologies for the late response to [1].

-4-
We have/had to sort out things since the main developer of the SWSE search engine and
architecture, Andreas Harth, moved to the AIFB, Uni Karlsruhe, and in the course of the
move, we had some delays in answering the questions of what setup we could provide.

The current, but inofficial, status of SWSE/yars2 is the following:


* we are working on a licensing model for the software
* as a short-term goal we are working on making at least a non-commercial/academic
binary available (discussions on whether/how we'll open source the system are ongoing)
* if needed we could discuss whether/how we can make binaries available for project use
within IKS for testing purposes

What we can offer out-of-the box without additional resources is:


* a single server setup hosted on one of our servers
* periodical crawls of a provided list of CMS URIs of interest [as long as we are overall <
50M statements, <100K documents the resources we could free easily at the moment
should be sufficient]
* a yars2 instance for the crawled data, including
* a SPARQL endpoint on top of the yars2 index
* an instance of the current SWSE user interface [2] on top of the yars2 index

Additional notes:
- The update frequency of the index mainly depends on the number of statements we
have to parse, clean and process.

We'd hope that is sufficient for the current project needs, if not, please let us know in
what ranges your requirements would be. Without additional resources we are not
capable of offering a more advanced setup of SWSE/yars2 short term (could include
distributed index build, distributed yars2 instances, distributed SPARQL processing,
reasoning [4] on the crawled data, but we'd suggest to get things going small and then see
where we'd get from there.

Such a setup could be the starting point for a semantic search engine for IKS and on top
of that demonstrate the feasibility of a federated CMS infrastructure as we sketch it in [3,
Section 5.2], so we'd be very excited about getting this going in collaboration with IKS
and then explore further opportunities jointly!

Best,
Axel, Juergen, Aidan

Dr. Axel Polleres


Digital Enterprise Research Institute, National University of Ireland,
Galway
email: axel.polleres at deri dot org url: http://www.polleres.net/

[1] http://lists.iks-project.eu/pipermail/iks-community/2009-July/000028.html
[2] http://swse.deri.org/

-5-
[3] Stéphane Corlosquet, Renaud Delbru, Tim Clark, Axel Polleres, and Stefan Decker.
Produce and consume linked data with drupal! In Proceedings of the 8th International
Semantic Web Conference (ISWC 2009), Lecture Notes in Computer Science,
Washington DC, USA, October 2009. Available at
http://www.polleres.net/publications/corl-etal-2009iswc.pdf
[4] Aidan Hogan, Andreas Harth, and Axel Polleres. Scalable authoritative owl reasoning
for the web. International Journal on Semantic Web and Information Systems, 5(2), 2009.
Available at http://www.deri.ie/fileadmin/documents/DERI-TR-2009-04-21.pdf

Wed, 4 Nov 2009 From: Axel Polleres <axel.polleres at deri dot org>

I will (together with Juergen Umbrich, who shall send a separate mail) our ideas
on Semantic Search over networked, RDF-enabled Drupal sites [1].

Our approach is to regularly crawl and index those sites with a specialised instance of our
house-made semantic search engine SWSE [2] which offers not only a search interface
but also a SPARQL endpoint that let' you query over those sites. Additionally, if you
want to have specific, current site information our Drupal modules enable separate live
SPARQL endpoints locally on the sites. See also the architecture that Stéphane posted
earlier on this list. [3]

Axel Polleres

1. Stéphane Corlosquet, Renaud Delbru, Tim Clark, Axel Polleres, and Stefan Decker.
Produce and consume linked data with Drupal! In Proceedings of the 8th International
Semantic Web Conference (ISWC 2009), Lecture Notes in Computer Science,
Washington DC, USA, October 2009. Springer. Best paper award In-Use track.
http://www.polleres.net/publications/corl-etal-2009iswc.pdf
2. http://swse.deri.org/
3. http://www.interactive-knowledge.org/content/iks-search-engine-proposal

-6-
2nd Participant: Trialox
Thursday November 12, 2009: 17h45 – 18h05

Mon, 19 Oct 2009 From: Reto Bachmann-Gmür <reto.bachmann at trialox dot org>

Dear IKS Community,

Some of you already met me at the IKS requirement meeting in Salzburg, I'm looking
forward to meeting you again and more of you next month in Rome.

For those I didn't already met I'm quickly introducing myself here.

I'm working with trialox [1] a startup founded last year at the University of Zurich. We're
working on open source software that makes it easy to develop semantic web enabled
applications. Our system is based on OSGi technologies and support various RDF stores
as backend. The principal supported languages are Java and Scala. As we are near the end
of a major Sprint, I'll be posting more information on this software foundation very soon.

Basing on this foundation we're also building a Web Content Management System
leveraging semantic web technologies especially for the benefit of not-for-profit
organizations. We are working together with the WWF [2] to build a system that allows
better access to their vast and distributed content, both with their public website as well
as the internal information infrastructure.

All our products are open source and we're looking to build a community around the open
source projects, as well as business partners we could help implementing semantic
solutions for their customers.

So that's what I've been working on for a bit more than a year now. Before I've been
working in England for Talis and for HP Laboratories. At HP Labs I was working with
the Jena team and implemented a system for versioning as well as tracking provenance of
RDF Graphs.

My passion (or is it addiction?) for the semantic web dates back to 2002. I started
implementing the annotea protocol as a decentralized exchange system and continued the
idea of decentralized, trust and relevance based information exchange with the knobot
open source project.

Cheers,
reto

1. http://trialox.org/
2. http://www.panda.org/

-7-
3rd Participant: Kiwi
Thursday November 12, 2009: 18h10 – 18h30

Mon, 12 Oct 2009 From: Rolf Sint <rolf.sint at salzburgresearch dot at>

My name is Rolf Sint and I am researcher and developer at Salzburg Research. I studied
Computer Science and Management at the University of Salzburg. Currently I work on
the EU-funded project KiWi (http://www.kiwi-project.eu/ ) and I will present the
semantic search functionality in KiWi in the next IKS workshop in Rome.

The KiWi-System aims to break system boundaries in that it serves as a platform for
implementing and integrating many different kinds of social software services. And it
intends to break information boundaries by allowing users to connect content and to
connect each other in new ways. KiWi is a software platform that allows users to share
and integrate knowledge more easily, naturally and tightly, and to adapt content and
functionalities according to their personal requirements. In KiWi the navigation and
search of content is a key issue and is realized in several ways. One possibility to
navigate within KiWi is a very flexible facetted search, which allows a dynamic
configuration of the search facets. Please find some screenshots of the current KiWi
system attached.

You can find the running system here: http://showcase.kiwi-project.eu/KiWi and


download KiWi at http://kenai.com/projects/kiwi/downloads

Best regards
Rolf Sint

Knowledge Based Information Systems


Jakob-Haringer Strasse 5/II
5020 Salzburg
Austria

Email rolf.sint at salzburgresearch dot at


Phone +43.662.2288-430
Office Jakob Haringer Str. 5 | Techno 3 | 2.OG
http://www.salzburgresearch.at

-8-
-9-
4th participant: Yahoo! Research
Friday November 13, 2009: 10h00 – 10h20

Fri, 25 Sep 2009 From: Peter Mika <pmika at yahoo-inc dot com>

Hi All,

My name is Peter Mika, and I work as a researcher and data architect at Yahoo!, based in
Barcelona, Spain. Our research lab is part of Yahoo! Research [1] and has been
established in 2006. We are covering a wide range of topics, including multimedia,
distributed systems, data mining (in particular, web mining), and NLP.

As a researcher, my personal interests revolve around semantic technologies and the


Semantic Web, and the application of the technologies to web search, from query
interpretation, through ranking to result presentation. We are doing quite a few things in
this area, one particular initiative I wanted to mention is that we have recently started
organizing a semantic search evaluation campaign. If anyone is interested I would be
happy to discuss that as well.

On the product side, I'm working as a data architect on KR questions related to how we
consume and use metadata inside Yahoo. As an example, many of you might have heard
of SearchMonkey, which allows site owners and developers to create applications that
change the way search results are presented, by using metadata associated with those
pages [2]. I'm also doing a part of the evangelism, talking to our communities of
developers and publishers, which gives me a fair bit of understanding of how people
relate to semantic technologies 'in the wild'.

Best,
Peter

[1] http://research.yahoo.com
[2] http://developer.search.yahoo.com/start

- 10 -
5th participant: salsaDev
Friday November 13, 2009: 10h25 – 10h45

Wed, 23 Sep 2009 from Stephane Gamard <stephane.gamard at salsadev dot com>

Dear IKS community members,

My name is Stéphane Gamard, founder and CTO of salsaDev - an Information Access


company. We've joined the IKS community and upon John's recommendation it is my
pleasure to briefly present our company, what we do and give you a sneak preview of
what we'll be showing at the up-comming IKS workshop in November.

Information overload, non-structured data, dirty and miss-tagged/classified information


put a strain on the knowledge-worker. Timely access to critical and relevant information
can be become a daunting and time consuming task, especially when the user only has a
vague and/or non-explicit idea of the information sought.

salsaDev uses a technology emerged from language acquisition research at the Rensselaer
Polytechnic Institute to index textual information at a conceptual level. Our approach to
information access is not a replacement solution, but a high-value added feature:
knowledge workers are provided with a sense-centric/meaning-aware access to their
relevant content.

A very pragmatic and typical user cases: An IP lawyer, while filling for a patent, must
read, evaluate and discriminate tremendous amount of non-relevant information (too
often also out of the scope of his own area of expertise). A sense-based system such as
salsaDev's can read the patent application and provide meaning-based related information
that might be of interest.

salsaDev provides a service platform enabling innovative and meaning-based information


management. I've attached a screenshot of our latest demo (which we'll showcase during
the next IKS workshop). In the screenshot a user highlights parts of a website. Our
Firefox plugin (embedded JS popup at the bottom right) sends the highlighted content to
salsaDev and retrieves related articles. In this example we've used wikipedia as the source
of a "some-what" structured information data-set.

This is salsaDev in a nutshell (an extended one I am aware). I am sure this short
presentation raises more questions than it answers, so please feel free to send me any
questions you might have. In the mean time and in preparation for the next workshop I
wish you all a very semantic day

Cheers,
Stephane

- 11 -
- 12 -
6th participant: Scribo / Nuxeo
Friday November 13, 2009: 10h50 – 11h10

Tue, 20 Oct 2009, From: Olivier Grisel <ogrisel at nuxeo dot com>

Dear all,

My name is Olivier Grisel and I am an R&D Engineer at Nuxeo, specialized on semantic


related features with some background on Machine Learning and Semantic Web related
techs.

Nuxeo EP is an Open Source ECM (Enterprise Content Management) platform based on


a runtime component system with partial OSGi compatibility and featuring a default
Document Management Seam/JSF web application (Nuxeo DM) with workspaces,
document types, workflows, versioning, access rights, publication, ... Nuxeo DM already
features Jena based knowledge base (triple store) to link documents together, with
external URIs or with comment threads. We also have an XHTML-based ajax annotation
system that uses on the RDF Annotea standard as datamodel. Furtheremore Nuxeo uses
the Dublincore standard as a datamadol for the base document properties.

As part of the Scribo project [1], we are working on integrating semantic knowledge
extractors to semi-automatically enrich the knowledge base with named entities and
semantic relationship found in unstructured text content using UIMA components. We
plan to integrate a CRFs-based Named Entities extractor trained on multilingual corpora
such as wikipedia. CRFs are a machine learning algorithms to perform Natural Language
Processing of token sequences.

We are also working on a Digital Asset Management (Multimedia collections


management system) application and want to use make it extract semantic metadata as
automatically as possible to make it trivial to browse the collection in smart ways. To
achieve this goal I started a python prototype / proof of concept to implement similarity
based search for pictures [3] based on a semantic hashing algorithm [5] that takes GIST
image descriptors in 960 float dimensions as input [4] and give 64 bit binary code as
output to enable fast database lookups on very large image collections.

The same kind on semantic hashing algorithms should also work on textual content [6]
described with sparse TF-IDF vectors. A preliminary backlog a semantic related feature
for the Nuxeo platform is to be found here in our Jira instance [7].

[1] http://www.nuxeo.com/en
[2] http://www.scribo.ws/
[3] http://wiki.iks-project.eu/index.php/User-stories#Story_03:_Similarity-
based_Image_Search
[4] http://code.oliviergrisel.name/pyleargist/src/tip/README.txt
[5] http://code.oliviergrisel.name/libsgd/src/9f3f374becc8/examples/semantic_hashing.py

- 13 -
[6] http://wiki.iks-project.eu/index.php/User-
stories#Story_09:_Similarity_based_document_search
[7] http://jira.nuxeo.org/secure/IssueNavigator.jspa?reset=true&pid=10273&status=1

Looking forward to meeting you all in Roma,


Olivier - http://twitter.com/ogrisel

Wed, 21 Oct 2009 From: Olivier Grisel ogrisel at nuxeo dot com

Just to make it more explicit, for the demo session I should be able
to showcase the current state of the Scribo project that mainly
focuses on IKS user story #5 and a prototype of similarity search in
pictures (IKS user story #3).

http://wiki.iks-project.eu/index.php/User-
stories#Story_05:_Assistance_with_Semantic_Tagging

http://wiki.iks-project.eu/index.php/User-stories#Story_03:_Similarity-
based_Image_Search

Best,
Olivier

- 14 -
7th participant: Zemanta
Friday November 13, 2009: 11h15 – 11h35

Thu, 15 Oct 2009 From: "Tomaž Šolc" <tomaz.solc at zemanta dot com>

Hi everyone!

My name is Tomaž Šolc. I am head of reseaarch at Zemanta, working from our


headquarters in Ljubljana, Slovenia. I have a degree in electrical engineering and I am
developing algorithms for natural language analysis and the proprietary triple store used
by our content suggestion system.

Zemanta's content suggestion system is the main product of our company - it takes a
fragment of plain text as its input and provides images and articles related to the topic of
the text as well as relevant tags and automatic explanatory in-text links. It achieves that
by first annotating the text with several components (like named entity extraction, word
sense disambiguation, classification) and then using the annotated text to search through
collections of similarly annotated objects. This system can be used as an assistant for
bloggers and other authors: suggestions can be either automatically or manually applied
to enrich news articles and blog posts.

From the perspective of semantic search, Zemanta is an interesting example of automatic


semantic query construction by extracting key concepts from a longer piece of text. Since
to some degree we use external third-party search APIs we also had to address the
problem of how to construct traditional keyword queries from semantically annotated
text.

At the demo session of the next IKS workshop I would like to show a live demo of our
system [1] and explain a little bit what is happening behind the curtains. How exactly the
annotations look like, how our word sense disambiguation works and how we use open-
source solutions like Lucene to search large collections of documents.

[1] http://www.zemanta.com/demo

Best regards
Tomaž
--
Tomaž Šolc, Research & Development
Zemanta Ltd, London, LLjubljana
www.zemanta.com
mail: tomaz at zemanta dot com
blog: http://www.tablix.org/~avian/blog

8th participant: Trezorix


Friday November 13, 2009: 11h40 – 12h00

- 15 -
Tue, 3 Nov 2009 from: "Sander van der Meulen" <sander at trezorix dot nl>

Hi All,

My name is Sander van der Meulen and I am Technical Manager at Trezorix. Trezorix
was founded in the year 2000 and is located in Delft, The Netherlands.

At Trezorix we develop knowledge networks, connecting all sorts of knowledge sources


into networks with optimized findability. Most of this development is done within
projects, with knowledge institutions like museums, libraries, ministries, universities, and
partner companies, for example the RNA project [1] and Sterna project [2].

Our main software product is the RNA Toolset, a semantic web based innovative tools
for working with content, metadata and reference structures. The goal of the RNA
Toolset is to create an open environment for knowledge workers to create and edit their
content, and to enable the knowledge workers to publish the content to a semantically
rich search environment.

The roadmap for the development of the RNA Toolset points to implementing a federated
Sesame/OWLim RDF layer with RDFS and OWL support as the search platform.
Currently we only have RDF configurations in our test environments. In our production
environments we've successfully implemented Solr as the search platform, providing
superb free text and facet searching. But the lack of relational constructs and inferencing
capabilities in Solr force us to move to the richer RDF environment for more complex
knowledge systems.

Looking forward to meeting you all in Rome.

Best regards,

Sander van der Meulen

References:
1. http://www.rnaproject.org/
2. http://www.sterna-net.eu/

- 16 -
9th participant: Sourcesense
Friday November 13, 2009: 12h05 – 12h25

Fri, 23 Oct 2009 From: Tommaso Teofili <tommaso.teofili at gmail dot com>

Hi all,
my name is Tommaso Teofili and I am new to IKS. I am a software engineer at
Sourcesense [1], a european company specialized in the integration of open source
projects. We as Sourcesense strongly believe in open source and everyone in the
company is encouraged to contribute to the projects he's working on. Many of us
contribute and commit to open source projects like Infinispan, JBoss portal, Alfresco,
Apache POI, Apache Chemistry, Scarlet, WURFL and others [2].

Before joining Sourcesense I started studying, using and then contributing to Apache
UIMA [3] for my graduation thesis (since November 2008), then on August 2009 I
gained the committership.

At the moment the project is on his way towards the 2.3.0 release and possibly become an
Apache TLP [4]. During this period I realized some prototypes of applications using
UIMA for semantic search, one of which I am going to show during the workshop.

Hope to meet you all in Rome.


Cheers
Tommaso Teofili

[1] : http://www.sourcesense.com
[2] : http://opensource.sourcesense.com
[3] : http://incubator.apache.org/uima
[4] : http://wiki.apache.org/incubator/October2009

- 17 -
10th participant: Semantic Technology LAB
Friday November 13, 2009: 12h30 – 12h50

Fri, 23 Oct 2009 From: Alfio Massimiliano Gliozzo <gliozzoat gmail dot com>

Dear all,

I am Alfio Massimiliano Gliozzo, researcher at the Semantic Technology Laboratory of


the Italian National Research Council (CNR) where I coordinate the Natural Language
Processing and Information Retrieval area. You can find additional info about me and my
lab here (http://stlab.istc.cnr.it/stlab/User:AlfioGliozzo).

My main research topic is hybridizing Information Retrieval, Natural Language


Processing and Machine learning approaches with knowledge management tools at scale.
One of the applications I am interested in is Knowledge Retrieval, which is about
retrieving structured knowledge relevant for natural language queries. This task can be
performed on large RDF/OWL knowledge bases.

I will present an application of knowledge retrieval in the next IKS workshop “Semantic
Search - Fact and Fiction” in Rome. This is a semantic search engine called “Semantic
Scouting” working on an RDF/OWL ontology describing the CNR organization,
developed as a collaborative work by almost all members of my lab as a showcase for the
capabilities we are currently developing here.

CNR is the largest research institution in Italy, employing more than 20k researchers,
organized into departments and institutes, subdivided into research units characterized by
different competences, research programs, and laboratories. We performed a migration of
the information spread into different CNR databases into a common RDF/OWL
knowledge base containing both texts (e.g. the titles of the papers wrote by any
researcher) and structured data (e.g. relations between researchers and their institutes) [1].
The result is a critical mass of data representing around 30k instances organized into 50
classes and 1.8M triples.

Further, we expanded the knowledge base by performing some simple inference (e.g. the
co-authorship relation) and we automatically generated relations with linked open data
resources, and in particular DBpedia categories, by exploiting advanced text processing
techniques.

Then we developed a knowledge retrieval engine whose output are entities of different
types, where the input are queries in either Italian or English language. Using such
entities as entry points, we can further explore the ontology following two different
modalities: browsing the graph of relations around each entity or opening forms
representing relevant attributes and relations.

- 18 -
The result is a running system that I will show at the workshop and will be delivered soon
as a service within the CNR intraweb.

Looking forward to meeting you all in Rome,

Alfio Massimiliano Gliozzo


Research Scientist
Semantic Technology LAB (STLAB)
Institute for Cognitive Sciences and Technology (ISTC)
Italian National Research Council (CNR)
Via Nomentana 56, 00161, Roma, Italy
Tel: +390644161535
Fax: +390644161513
alfio.gliozzo@istc.cnr.it
http://stlab.istc.cnr.it/stlab/User:AlfioGliozzo

[1] Alïo Gliozzo, Aldo Gangemi, Valentina Presutti, Elena Cardillo, Enrico Daga,
Alberto Salvati and Gianluca Troiani."A Semantic Web Layer to Enhance Legacy
Systems, Proceedings of 6th International Semantic Web Conference, Busan, Korea,
2007

- 19 -
11th participant: Semantic MediaWiki
Friday November 13, 2009: 12h55 – 13h15

Wed, 4 Nov 2009 "Duc Tran" <Tran at aifb.uni-karlsruhe dot de>

Hi all,

I am looking forward to attend the IKS Semantic Search Workshop.

Here is some info to my contribution:

"At AIFB (Karlsruhe Institute of Technology) I work on storage, query processing, query
interface and ranking on integrated collections of structured (RDF) data and text (DB &
IR). I will demonstrate the search solutions we have developed. One is a semantic search
extension to SMW (http://semanticweb.org/wiki/Special:ATWSpecialSearch) that
computes completions and translations of keywords. This results in expressive structured
queries that can use to retrieve precise answers from semantic wiki. The other called the
Information Workbench (http://iwb.fluidops.com/) supports the lifecycle of “interacting
with data”, i.e. from data integration, to semantic search, data manipulation, presentation,
visualization up to data publishing”.

Feel free to contact me if you have any questions on these demos.

Cheers, Thanh.

------------------------------------------------------------

Tran Duc Thanh (Kim Duc Thanh)

Institut AIFB - Geb. 05.20


Karlsruher Institut für Technologie (KIT)
76128 Karlsruhe

Tel.: +49 (721) 608-4754


Fax: +49 (721) 608-6080
Mobile: +49 (1515) 8872883
E-Mail: dtr at aifb.uni-karlsruhe dot de
WWW: http://sites.google.com/site/kimducthanh

- 20 -

You might also like