Introduction to Metadata 3.0 ©2008 J. Paul Getty Trust

Crosswalks, Metadata Harvesting, Federated
Searching, Metasearching: Using Metadata
to Connect Users and In\ue000ormation

Mary S. Woodley

Since the turn o\ue001 the millennium, instantaneous access to a wide variety
o\ue001 content via the Web has ceased to be considered \u201cbleeding-edge tech-
nology\u201d and instead has become expected. In \ue001act, \ue001rom 2000 to the
time o\ue001 this writing, there has been continued exponential growth in the
number o\ue001 digital projects providing online access to a range o\ue001 in\ue001orma-
tion resources: Web pages, \ue001ull-text articles and books, cultural heritage
resources (including images o\ue001 works o\ue001 art, architecture, and material
culture), and other intellectual content, including born digital objects.
Users increasingly expect that the Web will serve as a portal to the entire
universe o\ue001 knowledge. Recently, Google Scholar, Yahoo! and OCLC\u2019s
WorldCat (a union catalog o\ue001 the holdings o\ue001 national and international
libraries) have joined \ue001orces to direct users to the closest library that owns
the book they are seeking, whether it is available in print or online or
both.\u00b9 Global access to the universe o\ue001 traditional print materials and
digital resources has become more than ever the goal o\ue001 many institutions
that create and/or manage digital resources.

Un\ue001ortunately, there are still no magic programming scripts that
can create seamless access to the right in\ue001ormation in the right context
so that it can be e\ue001\ue000ciently retrieved and understood. At this point, most
institutions (including governments, libraries, archives, museums, and
commercial enterprises) have moved \ue001rom in-house manual systems to
automated systems in order to provide the most e\ue001\ue000cient means to control

The author would like to thank Karim Boughida o\ue001 George Washington University \ue001or his
invaluable input about meta searching and metadata harvesting and Diane Hillmann, o\ue001
Cornell University, who graciously commented on the chapter as a whole. The author takes
\ue001ull responsibility \ue001or any errors or omissions.

\u00b9 For more about the Google and WorldCat partnership, see http://scholar.google.com/
scholar/libraries.html. More in\ue001ormation about OCLC\u2019s Open WorldCat program can be
\ue001ound at http://www.oclc.org/worldcat/open/.
Introduction to Metadata
Introduction to Metadata 3.0 ©2008 J. Paul Getty Trust

and provide access to their collections and assets.\u00b2 Some institutions have
a single in\ue001ormation system \ue001or managing all their content; others support
multiple systems that may or may not be interoperable. Individual institu-
tions, or communities o\ue001 similar institutions, have created shared metadata
standards to help to organize their particular content. These standards
might include elements or \ue000elds, with their de\ue000nitions (also known as
metadata element sets or data structure standards);\u00b3 codi\ue000ed rules or best
practices \ue001or recording the in\ue001ormation with which the \ue000elds or elements
are populated (data content standards); and vocabularies, thesauri, and
controlled lists o\ue001 terms or the actual data values that go into the data
structures (data value standards).\u2074 The various specialized communi-
ties or knowledge domains tend to maintain their own data structure,
data content, and data value standards, tailored to serve their speci\ue000c
types o\ue001 collections and their core users. It is when communities want to
share their content in a broader arena, or reuse the in\ue001ormation \ue001or other
purposes, that problems o\ue001 interoperability arise. Seamless, precise retrieval
o\ue001 in\ue001ormation objects \ue001ormulated according to diverse sets o\ue001 rules and
standards is still \ue001ar \ue001rom a reality.

The development o\ue001 sophisticated tools to enable users to
discover, access, and share digital content, such as link resolvers, OAI-
PMH harvesters, and the development o\ue001 the Semantic Web have
increased users\u2019 expectations that they will be able to search simultaneously
across many di\ue001\ue001erent metadata structures.\u2075

The goal o\ue001 seamless access has motivated institutions to convert
their legacy data, originally developed \ue001or in-house use, to standards more
readily accessible \ue001or public display or sharing; or to provide a single
inter\ue001ace to search many heterogeneous databases or Web resources at the
same time. Metadata crosswalks are at the heart o\ue001 our ability to make
this possible, whether they are used to convert data to a new or di\ue001\ue001erent

\u00b2An excellent survey o\ue001 the history and \ue001uture o\ue001 library automation can be \ue001ound in
Christine Borgman, \u201cFrom Acting Locally to Thinking Globally: A Brie\ue001 History o\ue001
Library Automation,\u201d Library Quarterly 67, no. 3 (July 1997): 215\u201349.
\u00b3What determines the granularity or detail in any element will vary \ue001rom standard to

standard. In di\ue001\ue001erent systems, single instances o\ue001 metadata may be re\ue001erred to as \ue000elds,
labels, tabs, identi\ue000ers, and so on. Margaret St. Pierre and William P. LaPlant Jr., \u201cIssues
in Crosswalking, Content Metadata Standards,\u201d in NISO Standards. http://www.niso.org/

\u2074 See the Typology o\ue001 Data Standards in the \ue000rst chapter o\ue001 this book.
\u2075 The Semantic Web is a collaborative e\ue001\ue001ort led by the W3C, the goal o\ue001 which is to provide

a common \ue001ramework that will allow data to be shared and reused across various applica-
tions as well as across enterprise and community boundaries. The Semantic Web is based on
common \ue001ormats such as RDF (see Tony Gill\u2019s discussion in the preceding chapter), which
make it possible to integrate and combine data drawn \ue001rom diverse sources. http://www.

Crosswalks, Metadata Harvesting, Federated Searching, Metasearching
Introduction to Metadata
Introduction to Metadata 3.0 ©2008 J. Paul Getty Trust
standard, to harvest and repackage data \ue001rom multiple resources, to search
across heterogeneous resources, or to merge diverse in\ue001ormation resources.
Defnitions and Scope

For the purposes o\ue001 this chapter, \u201cmapping\u201d re\ue001ers to the intellectual activ-
ity o\ue001 comparing and analyzing two or more metadata schemas; \u201ccross-
walks\u201d are the visual and textual product o\ue001 the mapping process.
A crosswalk is a table or chart that shows the relationships and equivalen-
cies (and highlights the inevitable gaps) between two or more metadata
\ue001ormats. An example o\ue001 a simple crosswalk is given in table 1, where a
subset o\ue001 elements \ue001rom \ue001our di\ue001\ue001erent metadata schemas are mapped to
one another. Table 2 is a more detailed mapping between MARC21 and
Simple Dublin Core. Note that in almost all cases there is a many-to-one
relationship between the richer element set (in this example, MARC) and
the simpler set (Dublin Core).

Metadata Mapping and Crosswalks

Crosswalks are used to compare metadata elements \ue001rom one schema or element set to one or more other schemas. In comparing two metadata element sets or schemas, similarities and di\ue001\ue001erences must be understood on multiple levels so as to evaluate the degree to which the schemas are

Table 1.Example o\ue000 a Crosswalk o\ue000 a Subset o\ue000 Elements \ue000rom Di\ue000\ue000erent Metadata Schemes
Dublin Core
655 Genre/form
Titles or Names
24Xa Title and Title\u2014
Related Information
260c Imprint\u2014
Date of Publication
1XX Main Entry
7XX Added Entry


Subject Matter
520 Summary, etc.
6xx Subject Headings


Current Location
852 Location

