You are on page 1of 34

XML for Libraries

Roy Tennant
eScholarship California Digital Library escholarship.cdlib.org

Introduction
• Goal: introduce you to XML, explain what it can do in general terms, and highlight particular uses • Caveat: you will not learn enough to do it without further study

04:24 AM 04:24 AM

Outline
• • • • • Introduction to XML Serving XML to the Web Case Studies Tips & Advice Resources

04:24 AM 04:24 AM

Introduction to XML
• Extensible Markup Language • A method of creating and using tags to identify the structure and contents of a document — not how it should be displayed • The tags used can be arbitrary or can come from a specification

04:24 AM 04:24 AM

What it Looks Like
<?xml version="1.0"?> <book> <author>      <lastname>Tennant</lastname>      <firstname>Roy</lastname> </author> <title>The Great American Novel</title> <chapter number=“1”>      <chaptitle>It Was Dark and Stormy</chaptitle>      <p> “I’m scared,” I said.</p> </chapter> </book>
04:24 AM 04:24 AM

Two Types of XML
• Well-Formed • Valid

04:24 AM 04:24 AM

Well-Formed XML
• Follows general tagging rules:
– All tags begin and end
• But can be minimized if empty: <br/> instead of <br></br>

– All tags are lowercase – All tags are properly nested:
• <author> <firstname>Mark</firstname> <lastname>Twain</lastname> </author>

– All attribute values are quoted:
• <subject scheme=“LCSH”>Music</subject>

• Has identification & declaration tags • Software can make sure a document follows these rules
04:24 AM 04:24 AM

Valid XML
• Uses only specific tags and rules as codified by one of:
– A document type definition (DTD) – A schema definition

• Only the tags listed by the schema or DTD can be used • Software can take a DTD or schema and verify that a document adheres to the rules • Editing software can prevent an author from using anything except allowed tags
04:24 AM 04:24 AM

Ways to Use XML
• Behind the scenes as a standard and easily transformed format for information • As a transfer syntax, to exchange information in a machine-parseable form • As a method of delivery direct to the user (not recommended)
04:24 AM 04:24 AM

Why is XML Important?
• It is a standard, easily extensible way to encode loosely-structured as well as highlystructured information • Due to its easy parseability, software can transform it in countless ways, thereby allowing:
– Easy migration paths – Alternative displays – On-the-fly response to user needs
04:24 AM 04:24 AM

XML vs. Databases
(a simplistic formula)

• If your information is…
– Tightly structured – Fixed field length – Massive numbers of individual items

• You need a database • If your information is…
– Loosely structured – Variable field length – Massive record size

• You need XML
04:24 AM 04:24 AM

Serving XML to the Web
• Directly in native form • Transformed to static HTML • Transformed to HTML dynamically

04:24 AM 04:24 AM

Transforming XML: XSLT
• XML Stylesheet Language — Transformations (XSLT) • A markup language and programming syntax for processing XML • Is most often used to:
– Transform XML to HTML for delivery to standard web clients – Transform XML from one set of XML tags to another – Transform XML into another syntax/system
04:24 AM 04:24 AM

Required Components for Serving XML to the Web
• An XML-encoded “document” • An XSLT stylesheet to… • …transform it to HTML or XHTML:
– Static – Dynamic

• A CSS stylesheet (optional)

04:24 AM 04:24 AM

XML Web Publishing Software
• Required to:
– Apply dynamic transformations to XML content – Render HTML dynamically for standard web browsers

• Just beginning to be available:
– Cocoon: http://xml.apache.org/cocoon/ – AxKit: http://axkit.org/

04:25 AM 04:25 AM

Case Study: Publishing Books @ the California Digital Library
• Goals:
– To create highly usable online versions of books – To create versions that will migrate easily as technology changes – To create an infrastructure that will support dynamic presentations of the same content
04:25 AM 04:25 AM

Case Study: Publishing Books @ the California Digital Library
• Strategy:
 Markup the texts in XML  Serve them dynamically using XML web publishing software (currently Cocoon)  Create different displays for different purposes, and a mechanism for allowing the user to select their preferred view  Find and apply an XML-aware search engine – Create a method by which users can create their own Adobe Acrobat versions

04:25 AM 04:25 AM

AxKit
mod_perl

Web Server

Cocoon
Tomcat

Web Server

Cocoon
Tomcat

Web Server I want this XML doc…

XSLT Stylesheet XML Doc

Cocoon
Tomcat

Web Server

XSLT Stylesheet XML Doc

Cocoon
Tomcat

XHTML Document (no display markup)* HTML Stylesheet (CSS)

Web Server

* Dynamic document

Transformation Information
XML Doc XSLT Stylesheet

Presentation
XHTML Document (no display markup)* HTML Stylesheet (CSS)

Cocoon
Tomcat

Web Server

* Dynamic document

Case Study: ILL ASAP
ILL ASAP OCLC Local Catalog

Downloaded Requests

XML File

Internet Explorer

XSL Stylesheet

Printable XHTML File

04:25 AM 04:25 AM

04:25 AM 04:25 AM

04:25 AM 04:25 AM

Service Tasmania Architecture

04:25 AM 04:25 AM

Case Study: Univ. of Michigan

04:25 AM 04:25 AM

04:25 AM 04:25 AM

Tips and Advice
• Begin transitioning to XML now:
– XHTML and CSS for web files, XML for static documents with long-term worth

• Do not rely on browser support of XML • DTDs? We don’t need no stinkin’ DTDs! • Get on the XML4Lib discussion list: http://sunsite.berkeley.edu/XML4Lib/ • Buy my book!
04:25 AM 04:25 AM

Resources
• • • • • Web sites Electronic discussions Books Magazines and journals Individuals

04:25 AM 04:25 AM