Professional Documents
Culture Documents
purpose, functions,
and goals
◦ Collection development
and management
◦ Technical Processing
◦ Index creation
◦ Counter Transactions
◦ Reference work
◦ Preservation
A Service? An Architecture?
A set of Information Resources?
A set of tools to locate, search, retrieve
information?
Possibly the tools to create such resources
and services also fall within the purview of
DLs
Digital face of traditional libraries
Include both digital collections and traditional
Backbone and nervous system of libraries.
Defining the Digital Library
DIGITAL
LIBRARY
ROLES
Articles Photographs
Reports Genome
Books Speech Human Paintings
(Aerial) Models
Manuscripts Music Animal 2D
News Papers Photos Plant Simulations
Movies
Theses 3D
Tech. Reports
The information content is
more important than the Objects should not be “locked”
systems used for its storage, in specific DLs or archives
management and retrieval
Digitization
◦ Converting paper and other media in
existing collections to digital form
Acquisition of original digital works
◦ Created by publishers and scholars like
electronic books, journals, and datasets
Access to external materials
◦ Like Web sites, other library collections, or
publishers' servers
Bibliographic databases that point to
both paper and digital materials
Indexes and finding tools
Collections of pointers to Internet
resources
Directories
Teaching and pedagogic materials
Photographs
Numerical data sets
E-books and e-journals
Six steps
◦ Selecting
◦ Acquiring
◦ Digitization
◦ Organizing
◦ Archiving
◦ Providing Access
Identification of
Selection of Books
Books
Meta Data
Scanning
Process
Image
Scanning Processing & QC
OCR
Publishing
“Conversion of any fixed or analogue
media--such as books, journal articles,
photos, paintings, microforms--into
electronic form through scanning,
sampling, or in fact even re-keying.”
Determine copyright or restrictions
Digital conversion
◦ Outsource or in house?
◦ Text conversion, formats, headers, compression,
and delivery media
◦ Digital capture with camera or scanner ?
◦ File handling
◦ File naming
Preparing the objects
Scanning
Moving files to temporary storage
Value addition- metadata preparation
etc
Long term storage
Derivative image/thumb nail for access
copy
Merging files
Data Workflow Content Project Quality
ManagementManagement
Management
ManagementManagement
Supplier Management
Data Workflow Content Project Quality
Management Management Management Management Management
Supplier Management
Supplier Management
• Processing
• Conversion
• Automatization
• Interfaces - input / output
Data Workflow Content Project Quality
Management Management Management Management Management
Supplier Management
• Style files
• Information /Object models
• Archiving
OCR, is the mechanical or electronic conversion of
scanned images of handwritten, typewritten or printed
text into machine-encoded text.
On the market are many good OCR programs
For example, among many others are:
Read-Iris (http://www.readiris.com/)
Omnipage (http://www.omnipage.com/)
Fine-Reader (http://www.finereader.com/)
Pure image formats: TIFF, JPEG
Open encoded formats: XML, HTML, ASCII,
and Unicode
Hybrid formats: PDF, DjVu – can contain both
image and text
Proprietary formats: Microsoft Word,
WordPerfect
What to digitize?
◦ Selection and policy is important
Collection description is important
◦ such as scope, format, restrictions on access,
ownership etc
Copyright
Access copy and archive copy
File size
Storage media( CD, Hard disc…)
File format ( TIFF,JPEG…)
Preservation of layout
◦ Metadata:
Resource Description – Dublin Core, WHOIS++ Templates,
US-MARC, TEI Headers, Other Open Source and Domain
Specific Standards.
Resource Identification – URN, PURL, DOI, SICI