You are on page 1of 19

Digital Projects in Special Collections

SUSAN MCELRATH UNIVERSITY ARCHIVIST AMERICAN UNIVERSITY MARCH 7, 2012

Digital Collections, Exhibits, and Repositories


What is the difference?
Repository
multiple collections or institutions

Collection
one collection or theme

Exhibit
one theme a selection of items

Multi-Institutional Digital Repository

Institutional Digital Repository

Thematic Digital Collection

Digital Exhibit

Digital Exhibit on 1960 San Francisco Fire

Alternate approach to same topic

Digitization Project Planning


What work needs to be done; How it will be done (according to which standards, specifications, best practices); Who should do the work (and where); How long the work will take; How much it will cost, both to "resource" the infrastructure and to do the content conversion
http://www.ncecho.org/dig/guide_1planning.shtml http://www.nyu.edu/its/humanities/ninchguide/II/

Components of Digitization Projects


Planning and Project Management
Selection File Formats master & access derivatives Conservation Treatment

Reformatting Metadata Design & Creation Quality Control Web Platform


Open source vs. proprietary systems

Preservation

Selection Criteria
Should they be digitized?
Research Value

May they be digitized?


Copyright status

Can they be digitized?


Condition Format
http://www.nedcc.org/resources/leaflets/6Reformatting/06Prese rvationAndSelection.php http://www.dlib.org/dlib/september09/ooghe/09ooghe.html

Digitization Standards
Technical Standards
Federal Agency Digitization Guidelines Initiative (FADGI)
http://www.digitizationguidelines.gov/

NARA California Digital Library (CDL)


http://www.cdlib.org/services/dsc/tools/docs/cdl_gdi_v2.pdf

University of Colorado
https://www.cu.edu/digitallibrary/cudldigitizationbp.pdf

Metadata Requirements
Metadata Requirements
Descriptive Metadata Technical & Administrative Metadata

Element Sets and Standards


Dublin Core
http://dublincore.org/documents/dces/

METS/MODS
http://www.loc.gov/standards/mods/ http://www.loc.gov/standards/mets/

VRA Core
http://www.loc.gov/standards/vracore/

Web Platform Options


Open Source Software
OMEKA Greenstone DSpace Fedora

Proprietary Software
Contentdm (OCLC) Luna Insight Digitool

Web Harvesting involves:


Identifying and collecting web resources Providing search capability for archived web collections Managing and preserving web resources

Web Harvesting
The most common web archiving technique uses web crawlers to automate the process of collecting web pages. Web crawlers typically view web pages in the same manner that users with a browser see the Web, and therefore provide a comparatively simple method of remotely harvesting web content.

Web Crawling Problems


Robots exclusion protocol may deny crawlers access to portions of a website. Large portions of a web site may be hidden in the deep Web. Crawler traps may cause a crawler to download an infinite number of pages, so crawlers are usually configured to limit the number of dynamic pages they crawl.
Calendars often cause problems for crawlers.

Web Harvesting Resources


International Internet Preservation Consortium
http://netpreserve.org/about/index.php

Library of Congress
http://www.loc.gov/webarchiving

Archive-It (Service)
www.archive-it.org

American University Digital Collections