You are on page 1of 34

GBV Adding Digital Content to Library Records

a library unions use case


eSciDoc Days 2008 Konstantin Rekk Marc-J. Tegethoff
Head Office (VZG) of the Common Library Network (GBV) www.gbv.de

Overview

What's GBV/VZG? Digital Content a New Challenge to Libraries Three Steps to Mate Both Worlds at VZG

Common Library Network: GBV


Members: Federal States of Bremen Hamburg Mecklenburg-Western Pomerania Lower Saxony Saxony Anhalt Schleswig-Holstein Thuringia and Foundation of Prussian Cultural Heritage (SPK)
VZG, Gttingen

GBV Goals introduction, maintainance and support of a common, homogenous library system infrastructure cataloging and service-oriented network of >800 scientific and public libs Partners: the Library Network of Baden-Wrttemberg
BSZ, of Hesse (HeBIS), der German National Library (DNB), OCLCPICA in Leiden, Netherland and the Agence bibliographique de l enseignement superieur, France (ABES)

VZG - Head Office of the GBV customers are libraries, not end users offer services to libs which they can use to serve their users library automation development of innovative library specific services for libraries fundamental task: shared cataloging service

VZG Services Overview


GBV Search&Order: GSO [PSI] Lokal Library Systems [LBS 3/4/Sunrise] Master-Slave Common Union Catalogue: GVK [CBS]

Export/Import

new: Catalogue Enrichment Hosting

Digital Content [eSciDoc/Fedora] [CONTENTdm] External ExternalSystems: Systems: Union Catalogs, Union Catalogs, Worldcat,... Worldcat,...

CBS - Central Library System heart of the lib. networks IT infrastr. Union Catalogue (GVK) of bibliographic records - virtual library of and public access to combined resources of all participating libs Pica/Pica+
(Pica Cataloguing Rules - 512 pp. )

further services:
ILL - Online-Interlibrary Loan document delivery service subito additional library specific services to support library business processes

DMS Goals - Catalog Enrichment and Hosting of Digital Content application hosting data backup and storage facility for projects without own infrastructure improve searchability and availability link different kind of content cooperate in standardisation of interchange formats, thesauri, classification schemas software support and development

DMS - Some Constraints all business processes are built around catalog/Pica format:
primary source for search and retrieval primary storage for metadata primary reference for content models primary field of competence of VZG staff

existing infrastructure to integrate with repository and middleware no publication workflow, rather cataloging workflow

ZVDD -

Central Directory of Digitised Prints

Different Project Categories ToC for books, abstracts, ... digitized print publications integration of full text digital born content:
National Licences: eJournals, ebooks, ...

archives museums archeological collections closed projects versus in process catalog binded or not

Archive -

Digitales Stadtarchiv Duderstadt

Archive -

Digitales Stadtarchiv Duderstadt

Archeology www.viamus.de

Views .jpg 360 Panorama .mov Text spoken .mp3 Descr. text .xml 3D-Scan .mts

Viamus

Museum Digicult

.tif .xml

Projects > Issues old projects with ended funding save data!
local legacy databases, applications and formats, island solutions logical structure of objects not always visible from the MD needs analysis of specific domain

heterogeneous objects MD from different sources and in different (non-standard) formats

Projects > Issues > Content Models expertise required for domain specific object management data objects modeling Cataloging of DOs still fragmented by business domain, no established guidelines

Projects > Issues > Amount of Data Storage: 2008 50 TB expected, about 400TB over the next years 315000 Documents ( ToC), 2.000.000 Images next year >10.000.000 Images

Solution? Three Phases: 1. add content to records 2. get SOA-ready 3. add records to content

Phase One Adding Digital Content To Library Records essentially adding a link to a suitable category of the record inject digital content into the library system catologue as a means for storing and organizing structural and semantic metadata Paradigm: one bibliographic record (aufnahme) one DO Object hope will be good enough for most purposes

Library Catalog as MD Store for DOs catalog binded objects use existing standards as reference for an object modeling approach Pica metadata format from library world force a mapping, press structural information from non library world into the pica format not 100% faithful concordance Pica <-> non-library formats

Pica Concordance Example

Exchange Formats as Guidelines for Creating Domain Specific Content Models

EAD - Encoded Archival Description (EAD

Working Group of the Society of American Archivists SAA) (Linking and Exploring Authority Files) )

EAC - Encoded Archival Context (Projekt LEAF museumdat (Special Interest Group Documentation

of German Museums Association (DMB) ) generalisation of CDWA Lite, compatible to CIDOC-CRM (ISO 21127) )

see also FRBR Functional Requirements for Bibliographic Records

Staged Hierarchical Storage


Firewall

DMS-Storage (Failover + Test)

Fedora Masterinstanz Sun X4600 (SUN SAM-FS)

Disc-Cache

Active Files 40 TB Coopan Copy 2

Archiving LTO3 WORM

900 GB

900 GB

LTO1

Quick-File-System

Copy 1

Virtual Tapelibrary

Copy 3: Tape Robot

Using eSciDoc in Phase One feeling that our needs are taken into consideration structured storage minimal metadata set and pluggable transformation automatic indexing SRU - interface REST and SOAP APIs for repository access ... built for SOA predefined straightforward one-size-fits-all container/collection model

Using eSciDoc in Phase One semi-structured free-form table of contents (toc) for objects (entry page) Shibboleth integration complex queries for selection and aggregation of objects
find all objects of type ToC that have been changed in the last 24 h ( to update Indexer CBS ) find all objects of type ToC for export to an other library organization

Phase Two SOA-based System Integration integrate catalogue, repository and web2.0 functions provide stable and consistent interfaces to cataloging clients ensure reliable processing and linking of data extract common functions from scripts to webservices (format tranformations, validation, ...) for specific purposes use scripts for flexible piping of webservices

System Integration Plan


DO Cataloging (Retrieve) DO Ingest (Search) GSO WinIBW CBS Apache Proxy

Broker

eSciDoc

Issue: Transaction Safety Data Integrity

Fedora

Handle System

Common Lib Services

Phase Three ... long, long ahead in a library far, far ... full fledged repository, when library and digital repository world will have become one, adding library records to digital objects? extended modeling and usage of object relations, connections extended semantical indexing of heterogenous MD-formats, ontologies and other semweb stuff complete models for museums, archives, specific domains

Using eSciDoc in Phase Three hope to reuse or contribute content models according to german and international standards from the community cataloging workflow support might become interesting reusing clients and workflow components

What do we need? - Summary fast ingest (eSciDoc v254: ToC 51 Tage, VD17 121 Tage, Gale 4,5 Jahre) clustering, replication reduced batch versions of interfaces Directory of reusable DO Models, Modeling recommendations, best practices (Community) Metadata Mapping Recommendations (Community) Shibboleth (already there) in our case: more put then get

Questions?

Credits and Contact

Marc.Tegethoff@gbv.de Konstantin.Rekk@gbv.de Frank.Duehrkohp@gbv.de

created by Konstantin Rekk

You might also like