Professional Documents
Culture Documents
’
Conference, 22nd July 2004
Contents
1. INTRODUCTION................................................................................................................................3
2. CONFERENCE TOPICS....................................................................................................................3
2.1 An Overview of Metasearching.......................................................................... ..............3
Cross-Searching............................................................................................ ....................4
Harvesting.................................................................................................................... ......4
Hybrid..................................................................................................... ...........................4
Scraping Content........................................................................................... ....................4
Metasearching for all sectors........................................................................... ..................4
Web Services............................................................................................................ .........4
Metasearch Requirements..................................................................................... ............5
Metadata Issues.............................................................................................................. ...5
Trust Issues........................................................................................... ............................5
Knowledge Bases ......................................................................................... ....................5
NISO MetaSearch Initiative......................................................................................... .......6
2.2 The Integration of Course Management Systems, Library Systems, OpenURL
Resolvers, and Content Repositories................................................................ ....................6
2.3 The Ex Libris Approach.................................................................................................. ..6
2.4 The Knowledge4Health Portal.......................................................................... ...............7
2.5 Using Structured Metadata to Streamline and Refine Searching for News and Company
Information from Different Collections and Repositories.................................................. ......7
2.6 Information Clustering and Natural Language Retrieval.................................. ................8
Vivisimo............................................................................................................................ ..8
Verity K2............................................................................................................................. 8
3. REFERENCES.....................................................................................................................................9
1. Introduction
The ’Metasearching, better searching?’ conference was held at The Said School of
Business’ in Oxford on 22nd July 2004. The conference synopsis was:
Metasearching is a collective term for tools of this kind that aid searching
and make it more powerful.
• An overview of Metasearching
• The integration of course management systems, library systems, OpenURL
resolvers, and content repositories
• The Ex Libris approach
• The Knowledge4Health portal
• Using structured metadata to streamline and refine searching for news and
company information from different collections and repositories
• Information clustering and natural-language retrieval
2. Conference Topics
An indication of the scale of the problem can be seen in the figures for the JISC
Information Environment in 2001, when there were 206 collections plus content
from projects such as 5/99 and X4L.
20497116.doc 3 of 9 06/08/2009 12:54
a8/p8
Report on the ‘Metasearching, Better Searching?’ Conference, Oxford, 22nd July 2004
Users require an effective means to search across all these varying resources.
Metasearching aims to solve the problems of searching across disparate
resources. This can be achieved via cross-searching and harvesting.
Cross-Searching
Harvesting
Hybrid
Scraping Content
Many services do not support Z39.50, SRW or OAI-PMH. In this situation, the
services’ Web interface has to be used and software has to be written to ‘scrape’
content out of the HTML/CGI search results. This is often difficult, laborious and
unreliable.
The need for metasearching is shared across many sectors, e.g. students,
lecturers, researchers, shoppers looking for a second-hand car or new house.
Web Services
Andy Powell defined Web Services for the purposes of his presentation as strictly
machine interfaces between services on the Web.
Web Services use SOAP [4] to encapsulate transport information in XML which can
operate over HTTP and is therefore web friendly. With a SOAP based search
interface a service can be integrated into a metasearch service. (NB it is
important to bear in mind that terms and conditions of use must be taken into
account).
The service API (Application Programming Interface) defines the kinds of queries
that can be sent and the results retrieved. API’s differ across services.
Metasearch Requirements
Metadata Issues
Format
However, domains will continue to develop and use their own metadata schemas,
such as the IEEE-LOM for learning objects. This means that mapping is required to
enable cross searching, but some of the semantic richness of the original resource
may be lost.
Common Meaning
Metadata Registries
Trust Issues
There are trust issues involved in using a portal, which are generally issues of
authorisation. This area has been researched to some extent by the EDINA GetRef
Service [7] and Shibboleth Authentication software.
Knowledge Bases
One answer is to do away with these local knowledge bases and have a single
knowledge base in the form of a ‘service registry’. This would describe the content
of the collections and the technical interface details. Metadata formats need to be
agreed for how to describe these. The Dublin Core Collection Description Working
Group is looking at how to describe this content. The technical information can be
captured in Web Services Description Language (WSDL) [8], and ZeeRex [9]
(Z39.50 Explain, Explained and Re-Engineered in XML) for Z39.50 enabled
services.
There is a need to agree the way that collection descriptions are made available
to portals, e.g. using Z39.50, SRW, UDDI.
This NISO MetaSearch initiative [10] is trying to bring the area of metasearching
together. It is looking to enable:
James pointed out that librarians have been dealing the problem of searching
multiple information resources for many years. This has traditionally been
20497116.doc 6 of 9 06/08/2009 12:54
a8/p8
Report on the ‘Metasearching, Better Searching?’ Conference, Oxford, 22nd July 2004
James noted that merging the results is a complex and major challenge as results
returned are ordered in different ways and some method of ranking the results is
required. James also made reference to the NISO MetaSearch Initiative which is
looking to get input from vendors, content providers and the library community.
Ex Libris see the major challenges for the future as being bringing in resources not
yet integrated and the widespread adoption of interoperability standards.
The following discussion noted that that are hardly any SRW enabled services
currently available and that there has been some resistance from vendors to
create XML gateways to their systems as they prefer users to use their web
interface.
The portal brings together 11 sets of resources that currently include NHS Direct
Online and zetoc. Athens is used for authentication. The full text of some of the
resources is made available through ‘Dialog Datastar’ [14].
Chris was involved with a company that was addressing a requirement from
investment banks to cross search the databases of a number of subscription
based banking news services. This was a complex task that required the
development of different ‘site agents’ for each of the news services. The ranking
needs were based on the authority of the publication i.e. Results from The
Financial Times are ranked higher than The Times, and The Times results are
ranked higher than eg. The Manchester Evening News.
Chris has worked for a number of companies that have sold solutions to
knowledge intensive sectors such as banking and the legal sector. He noted that
it was very difficult to apply generic business logic across domains – the law
sector wanted to search external systems using the same methodologies they
were using to search their internal repositories, a very different approach to the
banking sector. Chris went on to give a demonstration of the Magus research
product, the Vrisko News Tracker system.
Traditional searching methods have some inherent problems, some of which were
identified by the audience at this presentation. Those that were identified as the
major problems were; not having a very specific search (i.e. wanting to search a
subject area in a more general way), not knowing which terms to use in a search
and being overwhelmed by the number of results returned. The Institute decided
to investigate complimentary approaches. The challenge was to be able to explore
large datasets whilst reducing overload and providing context to search results.
They have been looking at the potential of searching using ‘clustering’, which is
the classification of data using structured taxonomies. They presented case
studies of their observations of using two software products for this, Vivisimo [15]
and Verity K2 [16].
Vivisimo
This software looks for patterns in a retrieved dataset and dynamically builds
taxonomies in real time based on these patterns. The user is presented with a
browse ‘tree’ and can then drill down into the taxonomy to get a more focused set
of results for their research area.
Verity K2
This is a powerful top-end searching technology that includes clustering tools. The
Institute has implemented this for the New National Journal of Physics. However,
performance was an issue, and therefore it was decided to ‘can’ clusters overnight
using the INSPEC classification tree [17].
• The cost could be prohibitive, as this software does not come cheap (the
presenter declined to give specific figures)
• Performance was an issue, and so canning was considered to be the only
option
• Verity K2 is a good option for records with high quality metadata, but
would not be suitable for records without metadata
• It is not an out-of-the-box solution and the implementation issues turned
out to be complex and time-consuming, though the software did have a
substantial amount of flexibility built into it
3. References
[1] Search/Retrieve Web Service
< http://www.loc.gov/z3950/agency/zing/srw/ >
[2] The Open Archives Initiative Protocol for Metadata Harvesting
< http://www.openarchives.org/OAI/openarchivesprotocol.html >
[3] IMS Digital Repositories Specification
< http://www.imsglobal.org/digitalrepositories/ >
[4] SOAP Version 1.2 Part 1: Messaging Framework
< http://www.w3.org/TR/soap12-part1/ >
[5] DCMI Collection Description Working Group
< http://dublincore.org/groups/collections/ >
[6] Information Environment Service Registry Project
< http://www.mimas.ac.uk/iesr/ >
[7] EDINA GetRef Service
< http://edina.ac.uk/getref/ >
[8] Web Services Description Language (WSDL) 1.1
< http://www.w3.org/TR/wsdl >
[9] ZeeRex: The Explainable “Explain” Service
< http://explain.z3950.org/ >
[10] NISO MetaSearch initiative
< http://www.niso.org/committees/MetaSearch-info.html >
[11] Merlot: Multimedia Educational Resource for Learning and Online Teaching
< http://www.merlot.org/ >
[12] MetaLib: The Library Portal
< http://www.exlibrisgroup.com/metalib.htm >
[13] Knowledge4Health Portal
< http://www.k4h.northbristol.nhs.uk/ >
[14] Dialog Datastar
< http://www.dialog.com/products/productline/datastar.shtml >
[15] Vivisimo clustering engine
< http://vivisimo.com/ >
[16] Verity K2 Enterprise
< http://www.verity.com/products/k2_enterprise/ >
[17] Outline of INSPEC Classification 1999
< http://www.iee.org/publish/support/inspec/document/class/classif.cfm >