Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Standard view
Full view
of .
Look up keyword
Like this
0 of .
Results for:
No results containing your search query
P. 1
From Metadata to MetaDATA

From Metadata to MetaDATA

Ratings: (0)|Views: 86 |Likes:
Chapter 2 of the Library Technology Report (January 2010; vol. 46 / no. 1) Understanding the Semantic Web: Bibliographic Data and Metadata, by Karen Coyle
Chapter 2 of the Library Technology Report (January 2010; vol. 46 / no. 1) Understanding the Semantic Web: Bibliographic Data and Metadata, by Karen Coyle

More info:

Published by: American Library Association on May 06, 2011
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less





   L   i   b  r  a  r  y   T  e  c   h  n  o   l  o  g  y   R  e  p  o  r   t  s
  w  w  w .  a   l  a   t  e  c   h  s  o  u  r  c  e .  o  r  g
   J  a  n  u  a  r  y   2   0   1   0
Understanding the Semantic Web: Bibliographic Data and Metadata
Karen Coyle
Chapter 2
Changing the Nature oLibrary Data
Together these provide a new conceptual oundationbut leave us with a key piece missing: how to express ourdata in a twenty-rst-century data ormat. For this we aregiven some direction in the report o the Working Groupon the Future o Bibliographic Control:
Desired Outcomes:
Library bibliographic data willmove rom the closed database model to the openWeb-based model wherein records are addressableby programs and are in ormats that can beeasily integrated into Web services and computerapplications. This will enable libraries to make betteruse o networked data resources and to take advantageo the relationships that exist (or could be made toexist) among various data sources on the Web.
The report does not say how library data must changeto make this mandate a reality. There will surely be morethan one way to accomplish this goal, but a ew thingsare certain: the library catalog data must be transormedrom being primarily a textual description to a set o dataelements to which machine processes can be applied; andthese data elements must be compatible with the current mainstream technology that is the World Wide Web. Onepossible direction or library data is to join the linkeddata “cloud,” a growing set o data on the World WideWeb that many see as having great promise or a richerinormation uture.
From Med o MeData
In our current technology environment, all inormationgoes through computers beore reaching a human being,so it is necessary to design our metadata to be data—that is, to give it the ability to be manipulated by computer
Chapter Abstract 
 In our current technology environment, all inormation goes through computers beore reaching a human being, so it is necessary to design our metadata to be data—that is, to give it the ability to be manipulated by computer  programs. To keep pace with modern advances in tech-nology, the library catalog data must be transormed rom being primarily a textual description to a set o dataelements to which machine processes can be applied;and these data elements must be compatible with thecurrent mainstream technology that is the World WideWeb. This chapter o “Understanding the Semantic Web: Bibliographic Data and Metadata” examines what stepsthe library community will need to take to acilitate thistransormation.
hange can be dicult, and change within long-standing communities o practice can be particu-larly dicult. The rst hurdle is recognizing that change is necessary. The next is to understand the natureo the change: its goals, its possibilities, and the naturallimitations that will inevitably move the eort rom anideal solution to a more realistic one. The last challengeis to arrive at an agreement within the community ona change that will return a good value or the eort it requires.Among librarians, there has already been a real-ization that a change is needed when it comes to howlibraries present their catalog data. This has been a topico study and action or well over a decade. Such think-ing produced a new model or bibliographic data, theFunctional Requirements or Bibliographic Data (FRBR),and a proposed new rule set or cataloging practice,Resource Description and Access (RDA).
i   b  a y e c n ol   o g y e p o t  s 
www. al   a t   e  c  s  o u c  e . o g
 J   an u a y 0  0 
Understanding the Semantic Web: Bibliographic Data and Metadata
Karen Coyle
programs. In a sense, we did this with the MARC record inthe 1960s, but at that time the capabilities or processingdata were much, much less advanced than they are today.It was a time beore keyword searching, beore data min-ing, and beore the concept that inormation rom a widevariety o heterogeneous sources would all intermingleover a single, large network, the Internet.Libraries were among the rst institutions to usecomputers to process text. In the 1960s, when the MARCormat was introduced, it was extremely unusual to pro-cess elds o variable length and to process text as it isnormally written, using both upper- and lowercase, punc-tuation, and even accented characters. Libraries devel-oped ways to create mixed character sets with both Latinand non-Latin characters years beore other communitiesound the need to do so. We are no longer alone, however,in our need to process and manipulate text. The develop-ment o Unicode, a single character set or all known lan-guages and scripts, and XML, a data ormat that is fexibleenough to describe very complex texts, have brought usinto a world where text processing is no longer the excep-tion in the computing world.Today’s data design has to balance the unctionalityneeded or machine processing with the understandableinormation ormat needs o the human end user. It isdenitely not a matter o serving only the machine oronly the human reader, but o creating data that can serveboth. Compromises will oten have to be made. The cur-rent version o library data, however, is not serving themachine unctionality well, so our challenge is to bringour data into the twenty-rst century or machine process-ing and to improve service to our human end users bybeing able to oer more unctionality in our systems. Thisreport presents a sample o some steps that can be takento accomplish this goal, but please keep in mind that thisis not a complete recipe or the uture o library data, just some o the ingredients.
D-y he D
The library catalog record is mainly a textual document.It is true that this text is coded in elds and subelds inthe machine-readable record, but the physical basis o therecord is still primarily text. In essence, the MARC recordcan be considered one o the rst text markup languages,i not 
rst. The record has some elds with codeddata that were designed or the machine processing o that era, which needed data to be stored at a xed lengthand with its contents as compact as possible. When theMARC ormat was developed in the 1960s, the dierencebetween the storage o “eng” versus “English” to describethe language o a work was signicant in terms o systemcapabilities. The xed-eld data overcomes some o the“text-ness” o the primary bibliographic data. For exam-ple, the date o publication in the publication statement can take dierent orms, such as:1966 (a simple date)c1966 (a copyright date)[1966] (a date supplied by cataloger)[1966?] (a date supplied by cataloger and uncertain)In the xed-eld area, the ormat o the data is strictlycontrolled as our characters, generally numeric. Each o these would be coded there as “1966”:740813s1966 enkc b 000 0aengThe punctuation and other inormation in the dateeld, while perhaps useul to human readers (at least tothose who know what they mean), are an impediment tomachine processing.Many o the key elds o the bibliographic record,however, are not available in a data ormat. One exam-ple is the ISBN, a very important element in a numbero library operations, rom acquisitions to linking coverimages with the user interace. The ISBN is stored in asubeld in the MARC record, but that subeld can con-tain other inormation in textual orm:9781416554950 (trade pbk.)0817315497 (cloth : alk. Paper)0415981484 (Hardcover : alk. Paper)0847829413 (hbk.)080327946X :Although it is possible to select the ISBN itsel romthis string using programming algorithms,
and all sys-tems that use the ISBN in processing must do so, thereseems to be little reason not to provide the ISBN in a ormthat can readily be manipulated by machines, since that is how it will be used. What causes libraries to continuewith practices that aren’t appropriate to this day and age?Habit, and the very important act that a large body o legacy data is a refection o those practices.While the use o a data eld or a particular data ele-ment may be a solution to the problem o text versus data,one o the results is that many data elements in libraryrecords are entered more than once in the same record, inslightly dierent ormats. It is well known in the world o inormation technology that any time you store the sameinormation in more than one place, you risk those sepa-
*As an example, this is a line o code rom the Open Library project (http://openlibrary.org) that extracts the ISBN rom the MARC subeld usinga regular expression: re_isbn = re.compile(‘([^ ()]+[\dX])(?: \((?:v\. (\d+)(?: : )?)?(.*)\))?’)
   L   i   b  r  a  r  y   T  e  c   h  n  o   l  o  g  y   R  e  p  o  r   t  s
  w  w  w .  a   l  a   t  e  c   h  s  o  u  r  c  e .  o  r  g
   J  a  n  u  a  r  y   2   0   1   0
Understanding the Semantic Web: Bibliographic Data and Metadata
Karen Coyle
help catalogers create library data with a certain levelo consistency. This isn’t as useul as it could be in auto-mated systems because the connection between the head-ing in the bibliographic record and that in the authorityrecord are made on the basis o the display text in theelds. Should the display text change, the link betweenthese two elements is broken, and systems cannot bringthem together. This loss o connection between the bib-liographic and the name authority data can be remediedby making use o identiers that can be read by machines.Both bibliographic and authority records can containthis identier, and the display text can be changed asneeded without breaking the link. In other words, onelinks through identiers, not through display text. Saythat you have bibliographic and name records that needto link, and what they have in terms o data is shown ingure 10.I the name record changes, nothing links the two anymore, as shown in gure 11, because the display orm haschanged, and the display orm was also the linking string.I one uses identiers or names, in addition to thedisplay orms and other common reerences in a nameauthority record, as shown in gure 12, display orms canchange without breaking the link between the records.The bibliographic record now needs to update its dis-play orm, but it can do so using the shared identier.Although the two records are showing dierent displayorms, the link between them is not broken.In addition, some areas o our data may appear thesame in display, but have dierent coding in the underly-ing data record. Because o this dierence in coding, theywill be considered dierent by machines. For example, thedata elements o what libraries call a “main entry” (usu-ally a person or corporate body that creates a resource)rate versions o the data alling out o sync. For example,someone may discover that the date has been enteredincorrectly in a record, as “c1964” instead o “c1965.”That person may correct the display orm o the data inthe publication statement, but could easily orget that there is also a coded orm in the xed-eld area. Addingto this possibility is the act that in all systems that I haveseen, these two data elements are not near each other inthe user interace used by the cataloger.The method o adding some data elds to what isessentially a textual document (the catalog entry) mayhave been appropriate in the middle o the twentieth cen-tury, but twenty-rst-century computing provides us withbetter solutions. Those solutions allow or the coding o data or machine use without sacricing service to thehuman user. The use o authority-controlled headings inlibrary data is a good example o where a small change inhow we store our data could greatly increase the machinecapabilities in relation to library records.
Ideniy he D
Some o the inormation in the library record will, o necessity, be text. The concept o authority and controland headings in library data, however, means that evenmany o the text elds are not simply ree text but havestructure and are controlled as to their content. Theseheadings are oten the primary access points that cor-respond to the inormation that users have when theyapproach the library: authors, titles, and subjects. Thereare separate records in our systems or some o theseelements, records that contain additional inormationneeded to provide entry vocabulary or the user and to
Figure 10
Sample bibliographic and name records, linked.
Bibliographic recordAuthority recordSmith, John J.Smith, John J.

Activity (3)

You've already reviewed this. Edit your review.
1 thousand reads
1 hundred reads
akbisoi1 liked this

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->