Robyn Ward-MARCXML Running Head: MARCXML AS STANDARD FOR ACME INSTITIONAL REPOSITORY

1

Why MARCXML Should Be Considered For ACME University Institutional Repository Robyn Ward Emporia State University

Robyn Ward-MARCXML

2

Why MARCXML Should Be Considered For ACME University Institutional Repository The intention of this paper is to represent MARCXML as a viable metadata schema for ACME University’s Digital Institutional Repository. Outside research will be presented to support the choice and recommendation of MARCXML. Factors such as limitations and benefits of the schema will be addressed and the practical uses and implementation will also be presented. Introduction According to Perkins (2007) “In an open access environment it is necessary to ensure that all collections meet the minimal requirements for interoperability” (p. 24). Metadata planning is about interoperability and compliance. There are three types of interoperability: (1) Semantic, (2) Syntactic, and (3) Structural. These may occur at a number of different levels, i.e. local, consortial, or within communities of practice. In order to choose a metadata scheme one must first evaluate the local needs and what functions the metadata needs to serve. With considering the above criteria, MARCXML will meet the semantic, syntactic and structural needs for the Institutional Repository and each will be addressed below as the MARCXML schema is evaluated. Background and Purpose of MARCXML MARC (MAchine Readable Cataloging) has been the standard for exchanging bibliographic records between systems for decades. This standard is unlikely to go away for a number of reasons, including but not limited to: financial commitment and familiarity within the library community. “The development of an XML version of MARC21 was critical for the format. The economically deep commitment to MARC data

Robyn Ward-MARCXML

3

elements, proliferation of schemas beyond the library community control, and the rapidly growing XML tool environment mandated an evolutionary path into XML for MARC 21” (McCallum, 2006, 4). In 2002 the Library of Congress established MARCXML as a standard to transmit MARC data into XML (extensible Markup Language) syntax. MARCXML record structure is based on the W3C XML standard. Preceding MARCXML during the 1990’s, the Library of Congress developed two SGML DTDs for MARC21 one for Bibliographic information and the other for Authority information. These SGML DTDs have been converted to XML DTD. The MARCXML standard has been expanded from the DTDs and has many differences from the DTDs. The MARCXML schema supports all MARC-encoded data regardless of format. And is currently used to aid in interoperability and transferability of cataloging records between metadata standards. According to the MARC21 XML Schema Web Site it is a “framework [that] is intended to be flexible and extensible to allow users to work with MARC data in ways specific to their needs. The framework itself includes many components such as schemas, stylesheets, and software tools.” Applicable Research Findings and research from the Los Alamos National Laboratory Research Library will be presented in order to support MARCXML as the standard for ACME University’s digital repository. Los Alamos “Library Without Walls” team compared five XML schemas for consideration when creating their digital object repository. MARCXML, Dublin Core, PRISM, ONIX, and MODS were all considered viable for their needs. The Los Alamos team conducted a survey of each schema based upon three

Robyn Ward-MARCXML distinct requisites for a uniform standard. These included: (1) Granularity, (2)

4

Transparency, and (3) Extensibility. Other traits the team looked for were: (4) the support of hierarchical data structures, (5) cooperative management of the standard, (6) support for simple and complex use, and (7) familiarity or experience with the selected standard. These seven recommendations are also important to keep in mind for the implementation of a standard for ACME University Digital Repository. Findings from the study concluded that MARCXML was a robust schema capable of meeting all of the requirements of granularity, transparency, and extensibility. These three requirements should be further explained. Granularity “insures lossless data mapping without blurring the finer shades of meaning intrinsic to the original data”. Transparency “…this requirement relates to interoperability, requiring a standard widely known throughout the community…” and Extensibility “since no one metadata standard is appropriate to every situation, standards must permit growth without fracture” (Goldsmith & Knudson, 2006, ¶ 7). Jeffrey Beall, Catalog Librarian at Auraria Library, University of Colorado at Denver performed an analysis of twelve metadata schemes that are available for use. His findings are appropriate for this paper. He compiled his findings in a chart comparing each scheme to the following criteria: granularity, formats of description, content standards, availability of searching systems, level of community or domain specificity, interoperability, proven success (reputation and popularity), training, viability of the organization behind the scheme, ability to handle a particular metadata function, adaptability of the scheme to local needs, scalability, and surrogacy. MARC did well in all categories. MARC has rich granularity. The content standards are flexible though

Robyn Ward-MARCXML highly established with AACR2 and Library of Congress Subject Headings. There are many commercial systems available for its use. Training is high and associated with the library community at large. These are just few samplings from the findings on MARC. More detail can be attained from the Beall article (Beall, 2007, 31).

5

These are two separate analysis that should be considered when deciding upon the XML schema for ACME Institutional Repository. The Low Down or How It Works The MARCXML framework is quite a simple XML structure that contains MARC data. Following are characteristics of the MARCXML schema. The control fields, including the leader, are treated as a data string, the MARC fields are treated as elements with the tags and indicators as attributes. Subfields are also treated as sub-elements with the sub-field codes as attributes. The presentation of MARC data in XML is possible through writing an XML stylesheet. This stylesheet allows for the selection of particular MARC elements to be displayed. It also allows for the application of appropriate markup. There are three categories for MARCXML consumers. The first category, transformation, consists of the conversion between MARCXML and other metadata formats such as Dublin Core. The second is presentation. This allows for the display and/ or markup of MARC data into some readable form. And the third category is analysis, which involves the processing of MARC data to produce analytical output such as validation. Validation is important for making sure that the basic XML is in accordance to the MARCXML schema, the MARC21 tagging of fields and subfields, and also of the MARC record content. The above functionalities of MARCXML are provided through downloadable software offered by the Library of Congress and is referred to as the

Robyn Ward-MARCXML MARCXML toolkit. Another function provided by the toolkit is the FRBR (Functional

6

Requirements for Bibliographic Records) display. FRBR is intended to be independent of any particular cataloging code or implementation structure and relationships of bibliographic and authority records. Pros and Cons MARC format has a number of limitations that must be considered when looking at a metadata schema that will essentially support this existing format. According to the American Library Association report (2005) limitations of MARC include: (1) exclusive record structure and coding, (2) inconsistent granularity, (3) technical obsolescence, and (4) lack of scalability to digital materials (p.21). The team at Los Alamos National Research Library identified other limitations of MARC and MARCXML which included the idea that MARC was too “bibliocentric and rigid”, the increasing lack of popularity in the library community, its viability, and the complexity of the format. These limitations and have been proven to be either unfounded or manageable. Benefits of MARCXML out weigh the negatives or limitations. MARCXML can produce an exact equivalent of the MARC21 record, thus allowing lossless to and from conversion. MARCXML is also a schema that has been widely used and according to McCullum (2006) is the basis for the international standard for an XML version of the MARC structure that Danish Standards have proposed to ISO (p.4). MARCXML structure allows users to more easily write their own tools to ingest, manipulate and convert MARC data, thus making MARCXML extensible. The architecture also allows for different software in order to build custom solutions. (Library of Congress, 2006). The use of being able to use external software is a positive in the

Robyn Ward-MARCXML above benefit, but can be seen as a limitation in the fact that validation of MARC can only be enforced by external software and not by the schema itself. This is one minor limitation. The MARCXML schema also supports all MARC encoded data regardless of

7

format. It also has a number of potential uses that will systematically be described further. The first use is being able to represent a complete MARC record in XML. It can be used for original resource description in the XML syntax and can function as metadata in XML that can then be combined with an electronic resource. Secondly it can be used as an extension schema to METS (Metadata Encoding and Transmission Standard). METS supports metadata standards such as MARCXML, which allows for the inclusion of different metadata schemes to describe various facets of an object and various representations of an object. METS is a digital “wrapper”, which is an XML text file that binds together content files and metadata and specifies the logical relationship among them. Thirdly, MARCXML can represent metadata for OAI-Harvesting (Open Archives Initiative). The Open Archives Initiative is dedicated to providing digital library interoperability by defining simple protocols and standards. The protocol’s function is to transfer metadata from a source archive to a destination archive. Sampling of MARC/XML Put to Use MARCXML has been used for OCLC’s Terminology Services Project. The intention of the project is to provide web services that are machine-to-machine applications that can be used in a number of different ways. This project handles knowledge organization vocabularies, i.e. authority files, subject heading systems, thesauri, and classification schemes (McCallum, 2006, 5). Hence it maps one term in one

Robyn Ward-MARCXML vocabulary to one or more terms in a different vocabulary. MARCXML is used to normalize the data in MARC21. Normalization is a “formal analytical process by which various metadata formats are standardized to a pre-selected metadata standard” (Hutt, Rose-Sandler, & Westbrook, 2007, 41). Terry Reese, at the Oregon State University developed MarcEdit. MarcEdit is a MARC21 editing utility. MARCXML is central to the crosswalk tools provided in MarcEdit. These crosswalk tools include data conversions from Dublin Core, EAD and FGDC to MARC21. At New York University, Bill Jones used MARC/XML to perform routine tasks on the libraries catalog. HE used XSL transformations of batches of records to update data within the records all at one time. He changed content in XML then converted or transmitted it into a MARC record for the library general catalog. Work has been done at Virginia Tech with the Networked Digital Library of Theses and Dissertations regarding the use of the OAI harvesting protocol. The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) supports the use of MARC21 in XML. The Library of Congress American Memory project also exposes its metadata for OAI harvesting in MARCXML. The Library of Congress makes use of MARCXML to create MODS (Metadata

8

Object Description Schema, another XML schema for MARC21) records from MARC21 records, move ONIX to MARCXML, and provides distribution of all of its MARC21 cataloging records in the MARCXML schema, in addition to the ISO 2709 structure. MARCXML is also used by the Search/Retrieve URL service and Search/Retrieve Web service (SRU/SRW) protocols. “XML is thus the retrieval vehicle for searches

Robyn Ward-MARCXML

9

requesting MARC21 records in their entirety (McCallum, 2006, 6). The SRU/SRW is an XML based protocol correlates to Z39.50 in binary environment. “A very important aspect of structured search-and-retrieve protocols such as Z39.50 and the SRW/U family is the provision of a structure query language in which rich queries can be expressed” (Taylor & Dickmeiss, 2006, 8). SRW/U provides a text query format known as CQL or Common Query Language. Conclusions The above uses demonstrate the flexibility of the MARCXML format in meeting current and expanding needs of expressing MARC in the an XML environment. MARCXML is able to support external technologies, such as OAI-PMH, SRU/W. Regarding structural metadata, MARCXML can utilize METS for wrapping and packaging objects together. MARCXML has crosswalkability functions with the aid of utilities such as MarcEdit. As presented in the paper, MARCXML has the ability of losslessness when converting to and from another syntax such as MARC21. MARCXML is currently used within the library environment and should grow in use as more libraries experiment and move toward digital collections. MARC21 is the long-established standard for expressing data and the advent of MARCXML seems to be the logical XML standard for expressing MARC21 data in the digital environment. MARCXML also complies with the interoperability standards of syntax, semantics, and structure. The evidence provided in favor as MARCXML as the metadata schema of choice for the upcoming implementation of the Institutional Repository at ACME University.

Robyn Ward-MARCXML 10 Bibliography American Library Association. (2005). Update on major metadata standards. In Library Technology Reports, 41(6), 20-33. Retrieved April 9, 2007, from Academic OneFile via Thomson Gale. Beall, J. (2007). Discrete criteria for selecting and comparing metadata schemes. Against the Grain, 19(1), 28-31. Clarke, K. S. (2002). Updating MARC records with XMLMARC. In R. Tennant (Ed.), XML in libraries (pp. 3-16). New York: Neal-Shuman Publishers. Goldsmith, B., & Knudson, F. (2006). Repository librarian and the next crusade. D-Lib Magazine, 12(9). Retrieved April 9, 2007, from http://www.dlib.org/dlib/september06/goldsmith/09goldsmith.html Hutt, A., Rose-Sandler, T., & Westbrook, B. D. (2007). Balancing the needs of producers and managers of digital assets through extensible metadata normalization. Against the Grain, 19(1), 41-43, 45. Library of Congress. (2006). MARCXML. Retrieved April, 21 2007, from http://www.loc.gov/standards/marcxml/ McCallum, S. H. (2006). MARC/XML sampler. International Cataloguing and Bibliographic Control, 35(1), 4-6. Retrieved April 15, 2007, from Library Literature & Information Science via Wilson Web. Perkins, J. (2007). Planning for metadata: the quick tour. Against the Grain, 19(1), 20-27. Radebaugh, J. (2007). MARC 21 / MARCXML. Computers in Libraries, 27(4), 15. Taylor, M, & Dickmeiss, A. (2006). Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50. International

Robyn Ward-MARCXML 11 Cataloguing and Bibliographic Control, 35(1), 7-10. Retrieved April 15, 2007, from Library Literature & Information Science via Wilson Web. Wolfe, J., & Anderson, M. (2007). Digital collections, the next generation. Against the Grain, 19(1), 37-40.