You are on page 1of 19

Digital Projects in Special Collections


Digital Collections, Exhibits, and Repositories

What is the difference?
multiple collections or institutions

one collection or theme

one theme a selection of items

Multi-Institutional Digital Repository

Institutional Digital Repository

Thematic Digital Collection

Digital Exhibit

Digital Exhibit on 1960 San Francisco Fire

Alternate approach to same topic

Digitization Project Planning

What work needs to be done; How it will be done (according to which standards, specifications, best practices); Who should do the work (and where); How long the work will take; How much it will cost, both to "resource" the infrastructure and to do the content conversion

Components of Digitization Projects

Planning and Project Management
Selection File Formats master & access derivatives Conservation Treatment

Reformatting Metadata Design & Creation Quality Control Web Platform

Open source vs. proprietary systems


Selection Criteria
Should they be digitized?
Research Value

May they be digitized?

Copyright status

Can they be digitized?

Condition Format rvationAndSelection.php

Digitization Standards
Technical Standards
Federal Agency Digitization Guidelines Initiative (FADGI)

NARA California Digital Library (CDL)

University of Colorado

Metadata Requirements
Metadata Requirements
Descriptive Metadata Technical & Administrative Metadata

Element Sets and Standards

Dublin Core


VRA Core

Web Platform Options

Open Source Software
OMEKA Greenstone DSpace Fedora

Proprietary Software
Contentdm (OCLC) Luna Insight Digitool

Web Harvesting involves:

Identifying and collecting web resources Providing search capability for archived web collections Managing and preserving web resources

Web Harvesting
The most common web archiving technique uses web crawlers to automate the process of collecting web pages. Web crawlers typically view web pages in the same manner that users with a browser see the Web, and therefore provide a comparatively simple method of remotely harvesting web content.

Web Crawling Problems

Robots exclusion protocol may deny crawlers access to portions of a website. Large portions of a web site may be hidden in the deep Web. Crawler traps may cause a crawler to download an infinite number of pages, so crawlers are usually configured to limit the number of dynamic pages they crawl.
Calendars often cause problems for crawlers.

Web Harvesting Resources

International Internet Preservation Consortium

Library of Congress

Archive-It (Service)

American University Digital Collections