Advancement in Library and Information Science - Nodrm

Advancement in Library
and
Information Science
"This page is Intentionally Left Blank"
Advancement
•
In
Library &Information Science
Punit Ralhan
Oxford Book Company

Jaipur India
I
ISBN: 978-81-89473-55-6
First Published 2009
Oxford Book Company

267. I O-B-Scheme, Opp. Narayan Niwas,
Gopalpura By Pass Road, Jaipur-302018
Phone: 0141-2594705, Fax: 0141-2597527
e-mail: oxfordbook@sify.com
website: www.oxfordbookcompany.com
© Reserved
Typeset by:
Shivangi Computers
267, IO-B-Scheme, Opp. Narayan Niwas,
Gopalpura By Pass Road, Jaipur-302018
Printed at Mehra Offset Press, Delhi.
All Rights are Reserved. No part of this publication may be reproduced, stored in a
retrieval system, or transmitted, in any form or by any means, electronic,
mechanical, photocopying, recording, scanning or otherwise, without the prior
written permission of the copyright owner. Responsibility for the facts stated,
opinions expressed, conclusions reached and plagiarism, if any. in this volume is
entirely that of the Author. according to whom the matter encompassed in this
book has been originally created/edited and resemblance with any such
publication may be incidental. The Publisher bears no responsibility for them,
whatsoever.
Preface
With rapid advances in information and communication

technologies, libraries are increasingly consolidating their
position a repositories of the most upto-date information,
the latest technology and as disseminators of the most
relevant information content to their users. Libraries have
traversed a long way since their very inception with even
digital libraries no longer being a figment of the
imaginations. Such changes therefore have heralded a new
age in information access, preseJting its own advantages as
well as challenges.
This book has been designqd as a manual which seeks
to explore the technological changes which have invaded
the field of library science, most notably that of information
system management. Bringing within purview both
concrete and digital libraries, the book delineates the
processes, principles and practices which have defined
information handling and access in recent years, and
discusses what they portent for the future of library and
information science. In addition to updating the reader on
the latest trends and developments in the field, the text
attempts to enhance the readers's understanding of the
field of library science itself.
Punit Ralhan
Contents
Preface v
1. Automation of Library Services 1
2. Digitisation of Library Services 35
3. ICT Applications in Libraries 69
4. Features of Digital Library 101
5. Digital Information Resources 131
6. Multimedia Systems in Libraries 157
7. Digital Information Preservation 175
8. Classification in Digital Libraries 209
9. Trends in Information Archiving 253
10. Information Retrieval in Modem Libraries 275
11. Information Access in Digital Libraries 291
Bibliography 315
Index 317
1
Automation of Library Services
Library automation which started in late 70s in few special

libraries has now reached most of the university libraries.
It is yet to take off in college libraries in India owing to
various problems. Library automation refers to use of
computers, associated peripheral media such as magnetic
tapes, disks, optical media etc. and utilization of computer
based products and services in the performance of all type
of library functions and operations. Computers are capable
of introducing a great degree of automation in operations,
functions since they are electronic, programmable and are
capable to control over the processes being performed.
The utilization of computer and related techniques
make the provision to provide the right information to right
reader at the right time in a right form in a right personal
way. Automation of library activities provides the services
very efficiently, rapidly, effectively, ad~quately and
economically. The modem libraries and information a
centre facilitates free communication because access to
information has become a fundamental right of the
clientele.
The automation is economically feasible and
technologically required in modem libraries to cope up
with the requirements of new knowledge, the enormous
increase in the collection of materials, problems of their
acquisition, storage, processing, dissemination and
2 Advancement in Library and Infomtation Science
transmission of information. The capabilities of computer

associated peripheral media and its application in library
activities and services led to a highly significant
quantitative and qualitative improvement especially in
online technology.
Information / knowledge itself is of no value. It is the
use of information that makes it valuable. This is our and
users key to a more success, more happiness in ours mission
put computers and their associated peripheral media are
being increasingly used in library and information services
for acquisition,~storage, manipulating, processing and
repackaging, dissemination, transmission, an improving
the quality of products and services of library and
information centers.
CHALLENGES IN AUTOMATED LIBRARY SERVICES
Because the collections of documents are still on paper, a

localised medium, the need for local collections, the space
needed for paper documents, the inflexibility of paper
documents, the separation of documents from the users,
opening hours for the collections, and competition for use
of copies of documents all remain as much a problem as
in the Automated Library as in the Paper Library.
The catalogue may be used in a number of places. In
particular, with remote access to the on-line catalogue, the
user is no longer separated from the catalogue and the
separation of catalogue and documents is some what
diminished since, online, a catalogue can at long last be
used in the bookstacks. The Automated Library represents
a significant improvement but for only some of the
problems and, aside for the online catalogue, benefits
directly those who are providing the service rather than
those who are using the service.
REDESIGNING LIBRARY SERVICES
The elements of the probable form of the electronic library

Automation of Library Services 3
of the twenty-first century were being glimpsed, albeit

imperfectly, by the early 1930s by perceptive thinkers. More
recently visions of the library of the future have been
associated with speculation on the demise of the book, the
supposed obsolescence of librarians, and other
questionable rhetoric. Discussion of providing "access" to
"information" is commonly incomplete or misleading. The
good news is that additional, different means for providing
library service are becoming available in a manner
unprecedented since the nineteenth century.
The challenge for all concerned with libraries, is to
determine how, whether, and when these new means
should be used. Libraries exist for the benefit of the mind,
but they have serious practical problems coping with the
acquisition, storage and handling of the documents and
records with which they deal. Major constraints arise from
the technology used as a means for prOViding service. Any
change in technology that would have a significant effect
on the methods available for acquisition, storage, delivery,
or searching procedures could have important
consequences for library service. Consequently a
continuing quest for technological improvement has been
and should continue to be important. Those responsible for
providing library service have been more or less conscious
of the nature of the underlying problems to be solved and
some of the more gifted and farsighted groped towards
radical solutions based on a deep understanding of the
nature of the problems.
The term "information" is used with very differing
meanings and is commonly used attributively to refer to
books, journals, databases, and other physical objects
regarded as potentially informative. Access to a potentially
informative document depends on identifying, locating,
and having affordable physical access to it. However, for
someone to become informed, to become more
knowledgeable, requires more: The reader needs to be able
4 Advancement in Library and Information Science
to understand and evaluate what is in it. If what is found

is rejected or not understood, then little informing will have
been achieved. Much has been written in recent years on
the possible impact of new technology on lithe library of the
future." This is nothing new. It could be that long term
visions have a beneficial effect in stimulating debate and
thought.
However one may suspect that little of the rhetoric and
few of the specific technological proposals have been of
much direct help to those with the heavy responsibility of
planning for the future of any particular library: the
administrators, funders, librarians, and library users
developing five or ten year plans, contemplating the high
cost of a major new library building, or worrying about the
relationship between the familiar technology of paper and
the less familiar, unstable technology of computers. The
problems of existing libraries are severe.
Visions of electronic libraries seem uncertain and
suspect. Even if such a vision seems good, it is not at all
clear that plausible paths of development from here to there
have been adequately mapped. Redesigning library
services has been written on three assumptions:
There has been insufficient attention to strategic
planning, that is, the making of decisions relative to a
three to ten year time frame. Researchers seek to
examine the middle ground between the large
literature on possible options among the tactical and
operational decisions made day-to-day and month-by-
month and the sweeping visions of endless, interlinked
electronic villages. The latter offer little continuity with
present experience and can make those who are
dependent on existing services understandably
nervous. Some people are enthusiasts for electronic
solutions; other want to avoid the high cost of
continuing present operations.
Automation of Ubrary Services 5
A disproportionate amount of attention has been paid

to new information technology. It is not really that too
much attention has been given to it, but rather that not
enough critical attention has been given to the
characteristics of the familiar technology of paper.
What is familiar tends to be transparent. It may take
some conscious effort to appraise critically and
evaluatively. They are so accustomed to.
There is, in fact, considerable experience on which our
strategic planning can be based, more than is generally
realised.
The purpose being pursued in library service is the,
provision of access to books, journals, and o.ther
informative materials. Libraries have never had a
monopoly since much of what is in demand is also available
in personal collections, bookshops, from personal contacts,
and, indeed, from other sorts of libraries. However, even if
it is not a monopoly, it is clearly the major role and niche
of library service. Now, in addition to the customary
difficulties in providing library service, the radical changes
in the technology available as means for providing service
leaves the future unclear.
In such a situation need to be prepared to retreat to first
principles. Library service is a busy, service-oriented
activity, with a deeply-rooted emphasis, reflected in the
professional literature, on practical and technical matters,
on means, rather than on ends, and tactics rather than
strategy. There is so much more written, for example, on
how to build collections than on the roles that collections
play. There is so much more on how to create catalogs than
on how catalogs are used. Nevertheless, there is currently
a healthy awareness that major changes are likely and a
recognition, for example, of some convergence between
library services, computing services, and telecomm-
unications services, of probable changes in the publishing
world, and that library management is, at least in part,
concerned as much with the management of service as with

the management of books.
Foundations of Library Service
Library services have two bases:

The role of library service is to facilitate access to
documents; and
The mission of a library is to support the mission of the
institution or the interests of the population served.
Interpreting these two general statements for any given
situation provides the foundations for effective library
service. The first statement stimulates us to ask how
"facilitate," "access," and "documents" should be interp-
reted and how the role of the library service is related to
the roles of the book trade, computing, and other services.
Hitherto the dominant interpretation has been the judicious
assembling of local collections as the only effective means
of providing convenient physical access to documents,
augmented by bibliographic tools and advice.
The second general statement entails that the
determination of what should be done is unique to each
specific context. Examining strategies for the development
of library services requires that three conditions are given
below:
Distinguish between means and ends. The purposes of
and justification for, library service should not be
confused with the techniques and technologies
adopted as means for providing service, even though
options are limited by the available techniques and
technologies. The long period of relative stability from
the late nineteenth century up to the 1970s in the means
for providing library service is just the kind of situation
in which it becomes easy for the distinction between
ends and means to become blurred. So long as there is
but one principal means to an end, more of the end is
Automation of llbrary Services 7
achieved by more of the means and the distinction

between ends and means has little significance in
practice. But this blurring of the distinction hinders
dealing effectively with alternative means if and when-
-as now--they become available.
The advent of novel, alternative means for service
increases the need to think clearly about the ends of
library service. The ends may not change very much,
but they are likely to need to be reinterpreted and
reaffirmed at intervals in a changing world. In any case,
responsible selection of means depends on prior
selection of ends.
Alternative means do need to be explored aggressively
otherwise the options will not be known. Here, need
to distinguish between tactical measures and strategic
measures.
Both of means and of ends implies consideration not
only of what is good and what is not so good, but also
of different sorts of goodness.
"How good is it?" is a measure of quality or, in effect, a
measure of capability with respect to serving some actual
or imagined demand. This kind of goodness is appropriate
for the evaluation and measurement of means, of tools and
techniques for prOViding service, as in "a good collection"
or "a good catalogue". Output or performance measures are
commonly of this type. Another form of goodness lies in
the question "How well is it done?", which has to do with
cost-effectiveness, efficiency, and effective management
generally.
Kinds of Library
Modern library service as we know it was largely
developed in the second half of the nineteenth century,
characterised by:
- The idea of library collections being for service;
The notion of systematic, purposeful book selection;

The adoption of a series of technical innovations, such
as relative shelf location, improved cataloging codes,
more systematic approaches to shelf arrangement and
subject classification, card catalogs, and sustained
efforts at standardisation and cooperation; and,
In the twentieth century, a trend towards self-service,
with open stacks and public catalogs.
Terminology has evolved, the scale of operation is much
increased, and technical refinements have been made.
Nevertheless examination of early issues of College and
Research Libraries, of fifty years ago, and of the Library
Journal, another fifty years before that; shows that many of
their underlying concerns are still strikingly contemporary.
The following three types of library provision, based on the
technology used, provide a convenient framework for
discussing future library service.
Until recently libraries' technical operations (e.g.
purchasing, processing, cataloging, and circulation) and
library materials were both based on paper and cardboard:
We can call this the "Paper Library." Strictly speaking,
libraries have always included materials other than paper
such as clay tablets, vellum, film, and so on, but these other
media make little difference.
Over the past two decades, libraries' technical
operations have become based on computer technology
while the library's materials still remain overwhelmingly
on paper and paper-like media: The "Automated Library."
The prospect that library materials, as well as library
operations, will increasingly be in electronic form indicates
a further change in the means of library service: The
"Electronic Library." The concept of the Electronic Library
is important because library materials will increasingly be
available in machine-readable form, users will need access
to them, and access will, therefore, have to be provided.
Automation of Ubrary Services 9
One can speculate about the eventual balance between

paper materials and electronic materials or, if one wishes,
on the prospects for paperless libraries, but these issues are
of little significance compared with the underlying
assumption that arrangements for access to some materials
in electronic form will have to be provided.
Today libraries are, or are becoming, Automated
Libraries, with the imminent prospect of needing to evolve,
at least in part, into Electronic Libraries. Since paper
documents seem unlikely to disappear, The Automated
Library and the Electronic Library to co-exist indefinitely.
More specifically, and should plan for, any real library
service to be a blend: part Automated Library and part
Electronic Library. The shift to computer-based technical
operations and, more especially, the advent of library
materials in electronic form indicate the prospect of radical
changes in the means of library service. Library materials
in electronic form differ significantly from traditional
media. In particular, unlike paper and microform, it is
possible to make electronic media available so that they:
can be used from a distance,
- can be used by more than one person at a time, and
- can be used in more different ways.
How are the circumstances of library users changing? Some
of those whom the libraries are funded to serve are
themselves adopting electronic habits, making increasing
use of the new information technology of computers,
electronic storage, and telecommunications in addition to
the old information technology of pen, paper, and
photocopier. The new electronic tools provide powerful
options for working with data, text, and images. Consider
the reduction in labour now required for producing revised
documents, for complex calculations, for image
enhancement, and for the statistical analysis of large sets of
data and passages of text.
Library services have to do with support for learning,

both the study of-what others have discovered and research
to discover what is apparently not yet known. Yet the
librarian's role is often very indirect. The librarian's
concern, rather than being with knowledge itself, is usually
with representations of knowledge--with texts and images.
Further, much of the time, the concern is not really with the
texts themselves, but with text-bearing objects: the millions
of books, journals, photographs, and databases that fill our
libraries' shelves.
Librarians generally assist, not by giving answers
directly, but by referring the inquirer to a book. Need to
maintain the underlying concern with how individuals
acquire knowledge. Librarians must concern themselves
with how individuals use information and also with how
they become informed and knowledgeable. The old
information technology of pen, paper, and, latterly,
photocopier did not encourage much departure from
library use as "read, think, write." In contrast--for some--
the new information technology is transforming the use of
library materials, with computer-based techniques for
identifying, locating, accessing, transferring, analysing,
manipulating, comparing, and revising texts, images, and
data. A wholly new dimension of the use of library services
is emerging. What would do more for users, for the
development of library service, and for rapport with users
than providing assistance that keeps pace with these
changes?
It seems that the relative stability of the past century is
but a prologue to another period of radical change,
comparable in significance to that of the late nineteenth
century with its exciting renaissance of ideas and
techniques. This time change is enabled less by new ideas
than by a change in the underlying technology. As
operations and services become more complex and more
capital-intensive, ad hoc, unsystematic decision-making
Automation of library Services 11
can lead library services down unproductive paths.

Correcting mistakes becomes expensive and disruptive.
Creative planning needs to be central, because of the
superiority of planning over merely reacting to events.
Funders, providers, and users of library services need to
reflect creatively on what they do and why. Planning offers
us a chance to create the future.
COLLECTIONS RECONSIDERED
Collection development not only a matter of the funds used

to pay for materials, but also the substantial proportion of
a library's labour that is devoted to selecting, purchasing,
cataloging, and processing the materials, plus associated
administrative overhead and space needs for these people
and for housing the collection. Libraries are chronically
short of operating budget and of space, so any activity that
accounts for two-thirds of both operating budget and space
usage ought to be of great interest. Perspective on library
service is based on experience with libraries in the form of
large collections of books. A good starting point is to ask
why libraries develop collections.
Collecting material does not create material. It only
affects where copies are located. Library collection
development is a matter of "file organisation," concerned
with where copies of documents are to be located and for
how long. Collecting cannot be justified for its own sake but
only as a means for the role and mission of the library. The
role of library service is to facilitate access to documents
and, by extension, to provide service based on the
availability of documents. The mission of a library service
is to support the mission of the institution or population
served.
These are, however, general statements that need more
detailed interpretation in each case. To justify the
investment we ought to have a coherent, explicit, and
convincing explanation of why we devote so much effort,
12 Advancement in Library and lnfonnation Science
money, and space to the assembling and refining of

collections of documents and of how the "value-added"
benefit of collecting compares with the contributions of
other claims on library resources, such as bibliographies
and assistance to readers. These basic questions have been
rather neglected and, perhaps, taken for granted. A start
can be made by isolating the specific purposes that
collections, hence collecting, support:
Preservation role: Any document that is not collected
and preserved is likely to be lost, unavailable both now
and in the future. It is difficult to predict what might
be of interest to someone in the future. When in doubt
it is prudent to preserve non-renewable resources.
~. Dispensing role: The principal reason for most
investment in collection development is not
preservation but the need to provide convenient access
to materials that people want to see where they want
to see them. If someone asks to see a book, it is not
entirely satisfactory to answer that a copy exists and is
being carefully preserved in some foreign national
library.
The need is for a copy here and now. The difference
between the dispensing role and the preserving role
can be imagined by considering the difference between
the size of library collections as they are now and the
size of libraries as they would be if, nation-wide, only
two or three designated copies of each edition were
retained for preservation purposes and all other copies
were vaporised. Imagine how dramatically libraries'
space problems could be solved - and how detrimental
it would be for library service - if only two or three
preservation copies were retained. The difference is an
indication of the importance and high cost of the
dispensing role in relation to library materials on
paper.
Bibliographic role: The use of materials depends on

identifying and locating what exists. Bibliographies
and the bibliographic superstructure that is built up in
catalogs are created for this bibliographic role. There is
no reason why the original documents should not also
represent themselves, sitting on the shelves for all to
see the choice that exists. Browsing shelves of
documents arranged by subject order may not be an
entirely satisfactory guide to what exists, but it is one
way and, unlike the examination of catalogue records,
has an enormous advantage: the document itself is at
hand.
Symbolic role: The three roles already mentioned -
preservation, dispensing, and bibliographic--do not
seem sufficient, even taken together, to explain
collection development behaviour adequately.
Collections also have a symbolic role. Large collections,
particularly of special materials, bring status and
prestige whether the materials are used or not. The
symbolic value of collections and of buildings to house
them are, perhaps, more marked in the case of
museums, but should not be ignored in the case of
libraries.
If these are the four purposes of collections of library
materials, how the change from paper to electronic media
may change how seek to effect these roles. To have reached
Earth and to be able to send messages back, the extraterr-
estrial must be familiar with sophisticated technology and
telecommunications, but might, however, have been
unfamiliar with, and very intrigued by paper as a form of
information technology. Paper is energy-efficient and is
fairly robust, but does have two attributes that limit and
dominate the way it is used .•
First, paper is generally a solo technology: Like a
telescope, it is usually best used by only one person at a
time. It is frustrating when two or more people try to use

the same reference book simultaneously.
Second, paper is a localised medium: Paper can be read
only if the reader and the paper are in the same place at the
same time. True, one can travel to a library at any distance
or interlibrary loan can bring a copy of a document. Either
action involves inconvenience and delay in order to achieve
a situation in which you and the paper are in the same
place.
Electronic documents, in sharp contrast, can be used by
many people at the same time. Further, unlike documents
on paper, users don't even have to be where the database
is. Users do have to have a telecommunications connection
to the database, but they don't even need to know the
physical location of the database. Other media such as
microforms and clay tablets can be seen as inconvenient
variations on paper.
Electronic and Paper Collections

What does the distinction between localised media (e.g.
paper) and non-localised media signify for the
development of library collections when each role is
considered?
Preservation role: For the preservation of nomenewable
resources, it remains prudent to retain two or more
copies designated as archival copies and carefully
stored at different locations under suitable conditions.
The specific techniques for preservation vary with the
differing physical media (paper, magnetic tape,
microfilm, etc.), but a broadly comparable pattern
emerges for the preservation of paper and of electronic
documents.
Dispensing role: The substantial difference in
transportability indicates a major change in the
dispensing role: a much red uced premium on local
Automation of Ubrazy Services 15
storage compared with storage at a distance. In contrast

to the localised media of paper and microform, local
storage is no longer a necessary condition for
convenient access with electronic collections.
The reader need not know or care where the disk-
drives are physically located so long as records appear
on the screen. There is, therefore, a fundamental
change. To the extent to which materials capable of
remote access are used, the historic necessity for local
collections ceases to apply. Material will not need to be
stored locally, will not be out on loan, and will be
available wherever the user's workstation is located.
Local storage becomes optional rather than necessary.
Decisions about what to store locally will depend on
several changing, technical and economic factors.
Since local collections can account for two-thirds of
operating and space costs, there would appear, in
theory, to be substantial potential scor: for investment
in remote access as an alternative.
Bibliographic role: It is not yet clear how access to
machine-readable text will be provided. However,
there is no obvious reason why bibliographic records
would not also be stored and linked with the text. If this
is the case, then the combination of on-line
bibliographic data and on-line electronic documents
would appear to have all of the advantages of on-line
bibliographies and catalogs combined the advantages
of having immediate access to all of the texts for
searching, browsing, serendipity, scanning, and
reading. This best of two worlds simply cannot be
achieved with localised media on paper and
microform.
Symbolic role: The symbolic and status-bringing role of
large and impressive local collections of paper
documents cannot be denied. Status and useful
function tended to coincide. The prestige of having
extensive access to electronic documents is less clear,

especially as access to electronic documents will
probably be a lot more easily and a lot more equitably
achievable than has been the case with paper
documents, which have always very unevenly
distributed among groups and geographically. Perhaps
the wisest course is to emphasise what is functional
and hope that prestige will be achieved as a by-product
or in other ways. The alternative approach--major
investment is what is prestigious but no longer the
most functional approach-becomes questionable.
Of these four roles, it is the dispensing role that is stands
out as being different when electronic documents are
compared with paper documents. The difference is two-
fold: It is the dispensing role that accounts for great
preponderance of libraries' operating costs and space
needs in the Paper Library and in the Automated Library;
and it is in the dispensing role that electronic documents
promise to be particularly advantageous because local
storage, which so dominates the operating and space
expenses with paper documents, ceases to be necessary.
So, in principle, it appears that it is the dispensing role
in which there is the most to be gained and it is also the
dispensing role in which change appears most feasible.
How and how far such a change would be acceptable for
the purposes of library users deserves very careful attention
because the stakes are so high. It becomes important to
remember that library users fall into two quite different
cases with respect to the dispensing role.
The obvious case is when the choice is between use of
.local collections on paper or access to electronic documents.
In practice, this is the limited case of the privileged few. For
most people, for most documents, for most of the world the
effective choice is likely to be between remote access to
collections of electronic documents or the costs and delays
of obtaining paper documents on interlibrary loan because
so much material is not in conveniently local collections.

This is the case even with those few countries that currently
have the best library collections. Taking a global view it
seems difficult to imagine that the vast preponderance of
cities that do not now have splendid public library
collections and the vast majority of universities that do not
now have excellent collections have much prospect of ever
being able to achieve excellence by assembling sufficient
local collections of paper documents.
For rural areas, smaller institutions, and poorer
countries the prospect is hopeless. It would seem very
foolish to expect any scenario involving an "either/or"
dichotomy - only paper collections or only a paperless
library. Some balance between selected materials on paper,
presumably the more heavily-used and less volatile
material, and selective recourse to electronic documents for
much of the rest as available. On such a view one might
expect core collections of materials in relatively high use to
be collected and held locally on paper, even if also available
as 'electronic documents. Core collections, however, are not
where the heavy costs arise.
The demand for library materials is very unevenly
distributed over collections and so most of the cost of
acquisition, processing, and especially storage is
attributable to the larger quantities of relatively little-used
material. Just as the change from the Paper Library to the
Automated Library, in conjunction with the rise of on-line
bibliographies, so also the rise of the Electronic Library
changes our perspective on collecting and local collections.
Dominated by local collections, as is unavoidable with the
Paper Library and the Automated Library, the effect of
having electronic documents is to make local storage
optional rather than necessary.
There is much that has to be studied in these issues:
Relatively little attention seems to have been given to
comparing the costs of maintaining paper collections
compared with the costs of maintaining electronic

collections. If hardware costs and computing costs continue
to come down, then presumably the costs of storing
electronic documents must also be trending downward
relative to the costs of storing paper documents.
Similarly studies will need to be made of the costs and
difficulties of accessing electronic documents remotely
compared with accessing both locally-held and remote
paper documents. Here again the same assumptions about
cost trends would suggest a long-term cost trend
favourable to electronic documents. For both types of
documents the underlying problem is an "owning versus
borrowing" trade-off, but the costs, costs trends, and
acceptability appear to be different and to be changing.
Here again the issues are broader than technology,
costs, and the preferences of library users. Publishers,
booksellers, authors, and, probably, intellectual property
rights are also moving into a changed situation as electronic
documents permit, in principle, a significant change in
library collecting practices and in the use of library
materials. A changed "industry" and new policies are to be
expected.
Local storage (and local ownership) is no longer a
necessary condition for convenient access. Local storage
becomes optional rather than necessary. Under wh;-".
circumstances, access to networked resources w:~: replace
local collections in paper, only to probE" ~(jlne consequences
of a such a replacement. The mere idea that access to
networked electronic documents could substantially
replace local collections raises intriguing questions about
the purposes of local collections and the work of collection
developers.
At the superficial level of process, what they do is
clear: They make a stream of decisions telling the library's
technical services staff which items should be acquired and
catalogued for the local collection and which should be
Automation of library Services 19
discarded or not acquired. But the purpose, as

distinguished from the process, is less clear. There are large
literatures on collection development, on how new library-
related technologies are evolving, and about a shift in
emphasis from ownership to access. Important though
these topics are, they are primarily concerned with process
and can distract attention from examination of purpose.
There has been a significant shift away from viewing
technical services as being a grand apparatus for
establishing usable local collections and towards a notion
of technical services being the user's gateway to the
bibliographic universe. A comparable change in the
perception of collection development seems likely.
Electronic Resources and Network

With materials on paper, the development of well-
selected local collections dominates the quality of service.
There are gradations of local availability, but there is a
basic, binary distinction: What is held locally is accessible
and what not held locally is inaccessible. Recourse to
interlibrary borrowing is an unsatisfying substitute for
local holdings. Put simply, by the process of selection the
collection developE::r is imposing a partitioning of the
universe of library materials into two broad ranks: those
that are to be made more accessible by being held locally;
and those that are to be kept less accessible because not
added to the local collection.
Structurally, the effect is the same as the compilation of
a selective bibliography and of an online search: Some items
are selectively brought forward to the reader's attention--
acquired, listed, retrieved respectively--the others, those
not acquired, listed, or retrieved, still exist but they are left
in less-accessible obscurity. With networked, electronic
resources, however, this binary distinction between local
and non-local becomes less clear. In principle, all resources
become equally accessible. The distinction between locally-
20 Advancement in Libnry and Information Science
held and not-locally-held loses significance and, in a sense,

all collections can become locally-accessible collections. A
consequence relevant to technical services is that the local
catalogue, essentially a guide to what is locally-held, loses
its past pre-eminence relative to union catalogs, remote
catalogs, and bibliographies of network-accessible
resources.
"In the library of the future there will be no library. This
declaration does not imply an end to library service or to
librarians. Some of us may prefer "bibliography" to
"navigation" as the term of choice and, obviously, some
repositories of electronic materials must be located
somewhere. What is placed in question is the future of the
local collections that have hitherto dominated library
service and have accounted for most of libraries'
expenditures once the full costs of selecting, acquiring,
processing, and housing local collections are properly
attributed to the development of local collections. Clearly
there will continue to be local collections of two kinds:
Materials on paper, microfilm, and other localised
media for which the reader and the document must be
in the same place; and
"Caches" of electronic documents that are used often
enough to justify keeping locally so long as demand
remains high.
The population of documents in the localised caches will
be transient and transparent. In general, users need neither
know nor care which documents are stored locally and
which are not at any given time. Automatic algorithms can
be designed, based on expected frequency of use, unit
storage costs, and the costs of obtaining a copy from remote
storage, to adjust the cache dynamically. It is a task for
industrial engineers rather than subject specialists. Site
licenses will need to be negotiated, but that seems likely to
become more like negotiating a blanket order than
traditional title-by-title, copy-by-copy book selection. There
are localised electronic media, notably CD-ROMs, that need

to be selected and acquired like books or microfilms. But
CD-ROMs can be put on networks and their contents can
be stored in repositories. CD-ROMs seem a transitional
technology or, at best, a temporary storage device.
As for the local collections in localised media (paper,
microform, etc.), indications are that they will be a
diminishing portion of what is used and certainly not what
defines a library's ability to serve as they were in the past.
So the question remains: What will collection developers do
as local collections diminish in significance relative to
networked electronic resources? Will their professional
lives be enriched by the assignment of other, different
duties? The answer lies in the purposes of what they do
now, in the distinction between demand and value. In
traditional collection development, in building local
collections of localised media, the single act of acquisition
is the one response to both demand and value.
One procedure addresses both concerns and one cannot
know, by examining a book on the shelves, whether its
acquisition resulted from an expectation of demand, from
belief in its value, or from some combination. In an
electronic environment, however, considerations of
demand and of value diverge because they require different
courses of action. The logistics of catering to high demand,
in detail a matter of a hierarchy of caches, can be delegated.
It is a mechanical task and one could dispense with
collection developers if that were all they did.
But what of the concern for value in the electronic
library environment, for the privileging of some books over
others? If there is more to collection development than
responding to demand, then. the value-laden tole of
privileging some resources over others needs to be
continued unless the purpose of library service is to change.
How is this other, remaining task of collection developers
to be done? If it can no longer be done obccurely, combined
with the logistical task of meeting demand by placing

copies on the shelves, it will need to be addressed directly
and separately. Collection developers and technical
services staff have been more closely co-conspirators in
achieving the library'S mission than libraries' organisational
charts have indicated. Acquisitions departments and
cataloguers process only those items that the collection
developers select.
Collection developers and technical services staff play
complementary and interdependent roles in establishing an
ordering of the universe of documents, in determining the
relative accessibility of different documents, for their local
users. There is no clear reason why that purpose and that
partnership should cease. As electronic resources multiply,
the need for a convenient ordering, of differentiated
accessibility, increases. The privileging of the better and,
by default, the non-privileging of the rest, remains a
significant needed service.
Technical services have been evolving away from being
the grand apparatus that constructs the local collection and
has evolved to becoming the provision of navigational tools
to the universe of documents, electronic and non-electronic.
But value judgements are still needed concerning which
resources are most suitable for any given user group. It is
contrary to common sense and to the central traditions of
library service to make all material equally accessible. There
is simply too much of it and, it would be unhelpful to make
it all equally accessible. Some items are demanded more
frequently than others, some may be regarded as more
valuable than others. It has not been the purview of
technical services staff to select which items should be
privileged over others, but rather to implement that
privileging for the documents designated by the collection
developers.
What collection developers have done in the past is to
select items for local acquisition. The purpose of that
process is to manipulate the universe of documents so that

some--on grounds of demand and/or value--are made
more accessible than others. Some documents are made
visually prominent by being placed on the local shelves and
others are deliberately not. As paper documents on shelves
cease to be the technological medium of choice, different
procedures are needed for a different technological
medium, but the objective remains.
The design of a gopher service provides a simple
example. Not all items can be equally accessible at the same,
highest hierarchical level. It makes for efficiency if items
that will be looked for often are given a privileged place
high in the hierarchy of gopher levels.
Other items remain accessible, but can be left to deeper,
less convenient levels of storage. If collection development
is seen as deciding which items to privilege, then the need
for those with that ability would appear to increase as a
local paper collections diminish relative to networked
electronic collections--and the traditional partnership
between collection developers and technical services staff
should be just as close.
What collection developers will do in the future with
the new technology can be expected to differ in various
procedural ways from what was done in the past with the
old technology:
Hitherto the privileging of documents has been
dominated by a binary division: Items acquired for the
local collection and those not acquired or not retained.
In the environment of networked resources any such
abrupt division seems improbable. A much finer
gradation of degrees of accessibility and privileging
seems likely.
Hitherto all users of a given library have been supplied
with one and the same collection. This "one-collection-
for-all" approach has been technologically inevitable,
but it is Procrustean rather than egalitarian. Different

users have different needs. Users are unlikely to be
equally well served by what the collection contains or
by the way it is arranged. The popularity of branch and
departmental libraries arises not only from geograp-
hical convenience but also from their being designed
for a smaller group of users. With the new technology,
different forms of access can be designed for different
interest groups within the local population served.
Because of the inherent localness of local collections,
collection development work has been specific to each
location and has resulted in massive geographical
inequalities in .library holdings. Library users with
similar interests but located at different sites have
received radically different service. With the new
technology it may well be that the task can and will
become more specific to topical areas than to locality.
This would open new opportunities for cooperative
efforts. Similar forms of access could be shared by those
who have similar interests but who are at different
locations.
Because the evaluative, privileging role will no longer
be combined with catering to demand, it will become
a separate task and, therefore, a performance with
greater visibility and accountability - as has already
happened for cataloguers.
The notion of "materials budget" will evolve.
Historically a component of the cost of making
privileged documents more accessible, a different
deployment is inevitable if the traditional purpose of
library service is to be sustained in a changed
environment.
What collection developers will do, depends on how one
regards what they do now. At the superficial, procedural
level, it seems that there will be a much reduced need for
them.
Automation of Library Services 2S
BIBLIOGRAPHIC ACCESS RECONSIDERED
Bibliography used in making of lists of books, articles, and

other documents - by subject, by author, and by other
attributes-and the making of indexes to those lists. The
term bibliography is used in several ways to denote the
study of books and the making of descriptions of books.
Bibliographic access is perhaps the best available term for
the whole apparatus of access to records of all kinds
(textual, numerical, visual, musical, etc.), in all kinds of
storage media. Bibliographic access includes three central
concerns:
Identifying documents: Which documents exist that
might be of interest? The essence of bibliography is the
identification and enumeration of documents that
would be of interest. Which writings by some specified
author? Which articles about some subject? Which
books published in some time, place, or language? It is
a matter, on the one hand, of creating useful
descriptions of documents, and, on the other, of
identifying documents that fit any given description.
Locating documents: Bibliographies describe documents,
but they do not usually tell you where a copy can be
found, least of all where the nearest copy can be found.
It is catalogs that indicate where copies may be found.
During the nineteenth century catalogs became more
elaborate in their descriptions and came to look like
and, indeed, to be bibliographies of local holdings. The
differentiating characteristic of a catalogue is that it
indicates a location.
Physical access to material: Identifying and establishing
the supposed location of a document is not the same
as having a copy of the document in ones hands, close
enough to read.
The components of this bibliographical universe are
numerous as well as varied. An important feature of
26 Advancement in Libruy and Information Science
bibliography in this sense is that it is primarily concerned

with works and editions of works rather than with
individual copies of documents. Bibliographies are not
ordinarily concerned with specific copies of an edition.
Information about individual copies is usually included
only in exceptional circumstances: one copy is somehow
different, or may be the only extant copy known. For
rare materials and early printing it is customary to note
where individual copies can be found or which
individual copy was inspected by the bibliographer.
Nevertheless, as a general rule, bibliography deals with
published editions rather than with individual copies of
an edition.
Because bibliographies describe works rather than
individual copies, they are of general interest to anyone
who might benefit from knowing of the works that are
listed. For this reason publication of, or at least widespread
public access to, bibliographies is highly desirable.
Bibliographies, especially continuing ones, lend themselves
well to computer-based production which reduces the
tedium of the mechanical tasks of sorting, cumulating,
updating, rearranging, and indexing a large number of
individually brief records. It has become difficult to
imagine creation of a bibliography without using a
computer and the logical next sfep to make the
bibliography available on-line.
It is reasonable to expect the number of bibliographies
that are available in machine-readable form to increase and
for them to account for a growing proportion of all use of
bibliographies. It is also reasonable to expect that these
bibliographies will become available in more different
ways: accessible through commercial database services;
available as tapes that can be mounted at computer centers;
or available on optical digital disks, such as CD-ROMs,
attachable to microcomputers.
The next logical development would be to provide links

from the references in the bibliographies to libraries'
holdings records. If one were to find an interesting
reference to an article while searching Chemical Abstracts
on-line, for example, it would be an obvious amenity if one
could move automatically from the bibliographic reference
to a statement of local libraries' holdings of the periodical
concerned. This kind of service is beginning to be provided.
Even better, one would like to know whether that particular
volume is currently available and to be able to send a
request for a copy of it. Bibliography deals with published
works in a general fashion and is not ordinarily concerned
with individual copies of works. In contrast, library records
are, of necessity, very much concerned with individual
libraries, individual copies, and, for that matter, with
individual library users.
Library catalogs, as currently kno\;y th~m, ~;:rre composed
of a q)mbina!ion of. bibliographic records '~nd of library
holdings records, co'rrta'iIii'rigJ)pth general statements about
editions of works anq 'a'Iso specific statements about
individual copies and their individual locations in
particular libraries. One might even argue that, given the
limitations of the technology of paper and of cardboard, the
only practical way of achieving this linking of
bibliographies and library records in the nineteenth century
was to create an additional third set of records containing
elements derived from each: the modern library catalogue.
Library catalogs vary considerably in format according
to the technology in use: in book form; on cards; in
microform; on-line. Further, if library catalogs are seen as
a bridge between bibliographies and library records, it has
to be recognised that this is a bridge between two moving
and changing objects as bibliographies and internal library
procedures both evolve, Early library catalogs were
inventories of what was on the shelves. The printed
catalogue of 1620 of the Bodleian Library of Oxford
University is regarded as significant because it listed books

in author order regardless of where they were shelved.
This, then, was the library catalogue as an author-ordered
finding list of books.
The transformation of library cataloging to its present
form came in the nineteenth century when it was argued
that simple author access was not enough and that a
different, more sophisticated, and more elaborate approach
was needed. In effect, the new library techniques of the mid
and late nineteenth century can be viewed as a building up
on top of simple finding lists a superstructure of
bibliographical access: complex subject headings, added
entries, cross references, systematic shelf-arrangements,
and so on. The form of display moved from catalogs in book
form to catalogs in card form, which are easier to update,
but the principal change was the local development of more
elaborate access to the contents of the collection. Modem
library catalogs are essentially as defined in the nineteenth
century.
A catalogue includes an essential element that is
normally absent from bibliographies, the call number,
although this is in practice an incomplete and imperfect
reflection of the precise status of the library's holdings. To
determine the actual status it may also be necessary to refer
to library holdings records: to the circulation file for the best
information on what is where, to serials records to know
which pieces have arrived, to the "in process" file to know
what has arrived but has not yet been catalogued; and to
acquisitions files to know what is believed to be on its way.
Although the catalogue may show that the library has the
book, the book may have been lost.
The present set of relationships can be expressed as
follows: Records found in bibliographies may help one find
corresponding records' in catalogs, if present, and vice-
versa. The catalogue usually indicates the official location
of a copy of a document. But one may choose to (or need

to) consult other library holdings records (acquisitions,
circulation, serials) for more precise information
concerning actual copies of documents and their location.
The present relationship between bibliography, library
records.
The purpose of a library catalogue, almost all of the data
needed are bibliographical and would be common to any
other library catalogue or bibliography that listed the same
edition. The exception is the locational information: the
particular call number and details of each copy, as needed.
The locational data would not be the same as those found
in other libraries' catalogs listing the same work.
INDEXING METHODS
Definition of "indexing" simply means pointing or

indicating. Two basic approaches are used for the analysis
of messages, texts, and documents: by human examination
and by machine algorithm. Humans examine documents
and texts in order to consider messages that texts represent,
plus features of texts and of documents in which texts are
recorded. Computers identify and compare components of
texts - the symbols that comprise texts - sometimes
consulting lexical, thesaural, discourse or other contextual
data to expand and characterise sets of textual components;
sometimes applying syntactic or pattern indexing
algorithms to identify larger units of text; and sometimes
calculating attributes for text components and documents
based on available data. These two approaches are often
called human indexing and automatic indexing.
Both human and machine approacpes are widely used.
Research comparing retrieval based on human versus
machine analysis and indexing tends to show that the two
approaches produce different results, but that users find
them, on balance, more or less equally effective. Similar
evidence comes from observing the behaviour of expert
30 Advancement in library and Information Science
searchers. When they have access to indexing based on both

approaches, they generally use both types of indexing,
preferring human analysis and indexing for some types of
searches and automatic machL'le analysis and indexing for
others.
Personal preferences also playa role. Some users prefer
one type of analysis and indexing or the other most or all
of the time. Increasingly, Indexing Rules (IR) databases are
designed to provide more than one indexing approach in
hopes of maximising the effective retrieval of useful
messages, texts, and documents. By offering multiple
approaches, it may be possible to take advantage of the
strengths and features of different approaches and also to
respond to the needs and preferences of users in a variety
of situations.
Research and experimentation on automatic analysis
and indexing of language text has been under way for
several decades, and there has been much progress and
growing use of automatic analysis and indexing of
language texts for retrieval. But the automatic analysis and
indexing of image and musical text has barely begun. This
is especially true for the topical messages of non-language
text, as opposed to non-topical features, such as colour,
texture, shape, and. for music, pitch, tempo, and pattern.
Whereas automatic analysis and indexing of language
and mathematical/ chemical text is now routine and
common, automatic analysis and indexing of image text is
usually experimental. IR databases that provide access to
most other types of text, especially images, rely for the most
part on human analysis and indexing of messages and their
texts and documents.
Automatic and Human Indexing

Research comparing the relative strengths and weaknesses
of these two basic approaches to analysis and indexing has
Automation of Libraty Services 31
failed. to convince die-hard opponents of the merits of either

approach, and in fact, the clarity of research results has been
disappointing. Most of the earlier experiment was limited
to relatively small collections, and the evaluation of search
results were usually based on judgments by persons other
than real users with real information needs or desires. Since
then, there have been increased efforts to put users in the
centre of information retrieval research, with the
recognition that user variables are as important, perhaps
more important, than any variation in analysis and
indexing methods for determining the effectiveness of
Indexing Rules (IR) systems and Indexing Rules
(IR)databases.
In all IR research, it has been very difficult to isolate
particular variables or differences in order to assess their
specific impact on overall performance. The two major
components of any information retrieval situation are the
user on the one hand and the IR system, including IR
databases, on the other hand. Here the focus is on the IR
system component. Quite apart from human versus
machine analysis and indexing of messages, texts, and
documents, other key variables, tend to co-occur in varying
degrees with human or automatic analysis and indexing.
Nevertheless, unless these other variables are
accounted for, in addition to the type of analysis and
indexing (machine versus human), it cannot be clear which
variables have the major impact on the results of
comparisons. Thus, differences in retrieval results may be
due to differences in documentary unit size or extent of
indexable matter or exhaustivity of indexing or specificity
of indexing terms or index browsability or searching syntax
or displayed heading syntax or vocabulary management or
surrogation and surrogate display or interface design, in
addition to differences in analysis and indexing methods.
Vocabulary Management
This variable is closely related to specificity. Although there
is no necessary connection between type of analysis and
indexing on the one hand and vocabulary control or
management on the other, nevertheless, the provision of
references linking synonymous or equivalent terms,
pointing to related terms, and distinguishing among
ambiguous homographs tends to accompany human
analysis and indexing more commonly than automatic
analysis and indexing.
However, this type of vocabulary management is
increasingly common in automatic experimental systems
and more advanced publicly available systems. Closely
related to several of these key variables is the amount,
nature, and style of information provided to the user about
documentary units. For browsable displayed indexes, this
will be connected to the amount and style of information
provided in index headings, but also to subsequent
documentary unit records that are linked to index headings.
For machine matching systems, this variable relates to
the size and style of the documentary unit records provided
to the user for evaluation, ranging from very brief to very
lengthy. Newer methods of using visual displays to
characterise retrieved or relevant sets of messages has been
more closely tied to automatic analysis and indexing
techniques, but there is no inherent reason why they could
not also be used with human indexing in the context of
electronic IR database displays.
Because variables such as these have typically not been
separately analysed, it has been difficult, if not impossible,
to determine whether the results of particular IR systems
are due to automatic versus human analysis and indexing,
or to different documentary units, different levels of
index able matter and exhaustivity, different types of
interface options provided, different levels of vocabulary
specificity, different types or levels of vocabulary

management, different types of surrogation, or to
combinations and interactions among these features.
Conflation of distinct variables continues to be a
problem in IR research. With the advent of full-text IR
databases, this comparison has progressed to "full-text
searching" versus "controlled vocabulary indexing". In
each of these examples, two different variables have been
conflated. It is possible to present controlled vocabulary
terms for searching based on either automatic or human
analysis, so the first of these comparisons should
appropriately· focus on the presence or absence of
vocabulary management, separating that attribute from
automatic versus human analysis and indexing.
Similarly full-text searching has to do with exhaustivity
and indexable matter, so in a genuine comparison between
human versus machine analysis and indexing, or between
free-text terms versus controlled vocabulary, these
attributes should be as similar as possible. All the papers
in this anthology are valuable and useful, but they also
illustrate the continuing difficulty of isolating the many
different aspects of IR database design for assessing the
impact of each variable.
When they have a choice between automatic indexing
and human indexing as the basis for a search, they often
opt for automatic indexing, depending on a whole array of
other considerations, which Fidel explores. Again,
however, choosing automatic indexing means also
choosing, in most cases, a greatly expanded level of
exhaustivity, much larger indexable matter, much smaller
documentary units, a higher level of, specificity, a much
larger indexing vocabulary, and little or no vocabulary
management. It also provides access to different types of
indexing syntax and searching options, which can be much
more flexible in certain situations. At the same time,
choosing automatic indexing usually limits a user to
electronic term-matching searches, as opposed to

browsable displays. Thus, when a searcher chooses
automatic indexing, it is not clear which features are the
most influential. These are not simple choices limited to
automatic versus human analysis and indexing.
Automatic analysis and indexing is also considerably
faster and cheaper than indexing based on human
intellectual analysis. Automatic methods can be applied to
enormous collections of messages where the volume of
texts and constant change, both within individual texts and
in the composition of collections, make human indexing
impractical, if not impossible.
The challenge for IR database designers is to determine,
for particular clientele, particular types of messages, texts
and documents, in particular subject areas and for
particular purposes, how expensive human analysis and
fast, cheap machine analysis can best be deployed to
maximise effective retrieval results at the lowest overall
cost.
REFERENCES
Beagrie, N. and Greenstein, D., A strategic policy framework for creating and
preserving digital collections. JISC/NPO Studies on the Preservation
of Electronic Materials. eLib Supporting Study P3. London: South
Bank University, Library Information Technology Centre, 1998.
Bearman, D., "Optical media: their implications for archives and
museums", Archives and Museum Informatics Technical Report, 1 (1).
Pittsburgh, Pa.: Archives and Museum Informatics, 1987.
Conway, P., Digitizing preservation. Library Journal, 1 February 1994,42-
45.
2
Digitisation of Library Services
Libraries and archives are society's primary information

providers and were early users of the new digital
technology with respect to cataloguing and processing
management, and later for providing information on their
collections to thewww-community. Besides preserving and
providing access to 'born digital material' a great number
of archives and libraries nowadays have also turned to
creating digital surrogates from their existing resources. It
is for those libraries and archives that these guidelines have
been compiled.
Many libraries and archives would like to plan
digitisation projects but lack experience There is a need for
a practical guide as a working tool for planning digitisation
projects. The reasons for implementing a digitisation
project, or more precisely for digital conversion of non-
digital source material, are varied and may well overlap.
The decision to digitise may be in order to:
To increase access: this is the most obvious and primary
reason, where there is thought to be a high demand
from users and the library or archive has the desire to
improve access to a specific collection
To improve services to an expanding user's group by
providing enhanced access to the institution's
resources with respect to education, long life learning
To reduce the handling and use of fragile or heavily

used original material and create a "back up" copy for
endangered material such as brittle books or
documents.
To give the institution opportunities for the
development of its technical infrastructure and staff
skill capacity. From a desire to develop collaborative
resources, sharing partnerships with other institutions
to create virtual collections and increase worldwide
access
To seek partnerships with other institutions to
capitalise on the economic advantages of a shared
approach
To take advantage of financial opportunities, for
example the likelihood of securing funding to
implement a programme, or of a particular project
being able to generate significant income.
Since digitisation is both labour intensive and expensive it
is important to capture an image in a way that makes it
possible to use it to serve several needs. The key
components of a digital imaging project are:
Selection policy
Conversion
Quality control programme
Collection management
Presentation
Maintaining long term access.
All these components are equally important-the chain is
not stronger then its weakest link. Digital technologies are
undergoing rapid and continuing development and many
issues are unresolved, giving rise to a delusive reliance on
the "wait-and-see" approach. The basis of a commitment to
going digital is an acknowledgement that the technology
will change and change often.
Digitisation of Library Services 37
The crucial management decision is therefore less about

the "when", or the "whether" to begin. It is rather a
question of whether the institution can afford to ignore the
opportunity to reach wider audiences in a global
community, in a manner afforded by the technology to
improve access to and the preservation of cultural and
scholarly resources. Digitisation will be a costly exercise,
requiring detailed planning and the establishment of an
infrastructure to ensure continued access to the digital file.
Institutions in countries of the developing world
especially should consider whether the costs and time
involved will be commensurate with the benefits. Such
institutions should for example be prepared to resist
encouragement in the implementation of a digitisation
project by outside donor agencies, when analysis shows
that for example the use of microfilm would be adequate,
even preferable. Obviously, the user plays an important
role in the decision to begin a project, but which role, is very
often hard to define. Indeed the specific demands of the
user may be difficult to know. In most cases there is a
supposed user's group, and it is the aim of the institution
to increase its services and expand its approach and
influence.
The user group may differ, depending on the type of
institution and the mission ot the organisation. Institutions
of higher education fulfil faculty staff and students needs.
Public and national institutions must satisfy a large and
more diverse population. This influences not only selection
but also the forms of presentation and accessibility.
Digitisation is not preserv~tion: digitisation is not cheaper,
.safer or more reliable than microfilming. Unlike a frame of
high quality microfilm, a digital image is not a preservation
master.
The only way that digital reformatting contributes
positively to preservation is when the digital surrogate
reduces physical wear and tear on the original, or when the
files are written to computer output microfilm that meets

preservation standards for quality and longevity. A
digitisation project is therefore no replacement for a
preservation programme based on reformatting on
microfilm. This is in general true. But there may be specific
circumstances, for example in developing countries, that
can turn this notion on its head. If an institution with no
experience nor facilities for preservation at all, wants to
preserve a specific collection, it may decide to invest in
digital instead of microfilming equipment, thus avoiding
the high expenditure on microfilming cameras and
processors and realising that this digital equipment and the
developed staff skills will serve other purposes as well.
This shifting from the generally recommended method
of preservation microfilming into digitisation with its risks
in the long term is perhaps not the ideal solution for the
problem of nineteenth and twentieth century paper decay
but can serve as a practical way of providing protection to
certain documents. Digital technologies. offer a new
preservation paradigm. They offer the opportunity of
preserving the original by providing access to the digital
surrogate; of separating the informational content from the
degradation of the physical medium.
In addition, digital technologies liberate preservation
management from the constraints of poor storage
environments typical of the tropical and sub-tropical
climates in which many developing countries are located.
Cost saving Digitisation does not result in cost savings for
collection management. A digital surrogate can never
replace the original item or artefact. If an institution wants
to save space by deaccessioning the brittle newspapers, it
would do better to create microfilm copies rather than
digital images.
The whole process, selection, scanning, creating
records etc. requires heavy expenditure and the long-term
maintenance of the digital assets has its own high costs. An
Digilisalion of Library Services 39
institution may wish to investigate the possibilities of cost

recovery by marketing digital copies. Preservation of digital
information is undoubtedly expensive and requires highly
skilled technical staff and equipment. Individual libraries
embarking on digital projects should seek co-operation
within regional, national and international agreements and
should look to conclude agreements with trusted
repositories. Whether to use a digital process which
reproduces the image, or to use OCR (Optical character
recognition) or actual keying in of the source text. It is likely
that users will want searchable texts, and that means OCR
or re-keying.
On the other hand, depending on the type of users and
the kind of text, many users will want to see the page
images as well, and experience a touch of the original. This
may lead to the conclusion to use both methods but in most
cases this would be cost prohibitive. Then the best way is
to choose page images. Whether to produce digital files
capable of handling every job traditionally carried out by
conventional photographic services. Selection is important
to see digitisation as a series of choices where competing
requirements and demands have to be balanced.
The selection therefore has to be conducted in such a
way that it will assure that not only issues like the value
of the selected material and interest in its content are
considered but also demands concerning technical
feasibility, legal matters and institutional conditions. Issues
involved in the selection of material for digitisation will be
examined from two perspectives: 1) Principal reasons for
digitisation (to enhance access and/or preservation) 2)
Criteria for selection. As noted in the Introduction there can
be several reasons for increasing accessibility:
Enhancement of access to a defined stock of research
material
Creation of a single point of access to documentation
from different institutions concerning a special subject
Implementation of the "virtual re-unification" of

collections and holdings from a single original location
or creator now widely scattered
Support for democratic considerations by making
public records more widely accessible
Extending the availability of material in support of
educational and outreach projects
The key point is to evaluate the contribution that increased
access could make to a defined user community. If the
institution planning a digitisation project is a private one,
it is normal for it to focus on specific needs and to target
a specific user group. If however a public institution is
involved, it will probably have to satisfy a larger population
and more diverse demands. The way that it is intended to
use a digital image is of vital importance in shaping the
te~hnical requirements.
When digital conversion deals with source materials

which are endangered or damaged, the purpose is, in the
first place, to create accurate reproductions of these
originals on a long-lasting medium and not to select
materials according to demand. These reproductions need
to satisfy both users of today and future potential users, and
must therefore both be of high quality and possess a
physical stability that can be maintained over time.
One method of selecting source materials for
preservation is to claSSify them into three categories:
Rare, unique or fragile documents, archives and other
objects of artifactual value that need to be retained in
their original form: Digital conversion can provide
high quality surrogates with quick and broad access
which in most cases will protect this kind of material
from handling. This can be difficult to achieve using
some kinds of microform.
Source material with an important intellectual but
relatively low artifactual value, highly used and
damaged or fragile: Digital images are normally good

replacements for serving immediate demands. If the
source materials are deteriorating and, therefore, need
to be replaced permanently, archives and libraries
sometimes prefer to produce microfilm for
preservation purposes and digital copies for access.
Mostly brittle source material of high intellectual but
low artifactual value and with a low level of use. This
is not material that will be of interest for digitisation
in the first place. If it is brittle material that needs to
be replaced by surrogate copies to allow use, then
microfilm is still the normal choice in many countries
being stable, cheap and easy to store.
In the future, when researchers discover this source
material and perhaps use it more frequently, there will
always be the possibility to digitise the microfilm Many
institutions have not yet accepted digital technology as
being stable enough for long-term preservation. The
reasons are often that they feel the threat of technical
obsolescence of the digital medium and an uncertainty both
about the legal status of electronic documents and about the
future costs of preservation of such documents. While
waiting for the problem of digital longevity to be s.plved,
most institutions are creating archival images of what can
be called "preservation quality". That means that they:
can be used for different purposes
- are created at a quality level that will minimise the
need for rescanning.
The fact that a surrogate has been created is certainly not
enough to jnstify disposal of the originals. Even to be
accepted as the text for consultation by the reader rather
then the orginal the digital images must:
have a guaranteed authenticity
- be a part of a preservation plan.
Disposal of original source documents after digital

conversion is sometimes used in records management
programmes but only for documents that have already been
appraised and scheduled for disposal, and which have been
digitised to facilitate heavy use during their intended life
time. It is useful when planning a digitisation project to
look at policies established by other institutions for their
own projects. Many of these are now available for
consultation on the Web.
Another example is the Library of Congress where the
selection for preservation digital reformatting is based on
value, use, characteristics of the original item, and
appropriateness of digital reproduction for use and access.
Regardless of the purpose for implementing a digitisation
project, the selection of source material will always be more
or less content driven. In fact, intellectual value is the basic
question in all kind of selection: does the content of this
material justify all the efforts, costs and other resources that
will be needed? Therefore, every digitisation project or
programme ought to have its own definitions of value
based on the goals it trying to achieve.
During the last ten years scholars have started to build
up virtual collections of scanned documents, books,
museum artifacts etc. The selection is normally based on the
intellectual content of the material, but it could as well be
built on the physical appearance or on other factors like age
etc. The purposes of building virtual collections may differ.
It could for example be to re-unify scattered collections and
holdings or to enhance research by . integrating different
source material that otherwise would have remained
; separate items located in different parts of the world. The
possibilities of providing widespread access over the
Internet plays an important role here.
To make a digitisation project worthwhile requires a
certain minimum volume 6f information. Otherwise the
•
research value will be too low to attract enough either

planned or potential users. An important question is,
therefore, if a selection is being made based on content,
should all of a collection be included or only parts of it?
Normally the value of archival material, photographic
collections etc. is higher as aggregates rather than as single
parts taken out of context, but if individual documents or
objects have significant research value, even a few of them
can form a critical mass of information.
The level of demand is of course of great interest when
selecting source material for digitisation. If the purpose is
mainly to enhance access, the likelihood of significant use
of a digitised material will probably govern the selection
process. Involving scholars and other researchers in the
original decision is therefore a traditional selection
methodology. Sometimes an active user group for a specific
source material may be spread all over the world and
because of that it can be difficult to define or even detect.
Materials in special collections often run the risk of being
looked upon as little-used, which is not necessarily true
since a small specialist group can generate a great deal of
important research. To balance the demands of different
user groups many institutions have boards of scholars and
other researcher to help them select material that is most
urgent to digitise.
When an institutions~ digitising activities are being
developed from general' proposals to specific projects
covering whole collections or types of documents or objects,
these advisory boards can be strategically important. For
cultural institutions starting their first digitising project, a
good rule of thumb is that selecting the most heavily used
parts of their collections will normally give the greatest
added value because it will satisfy the majority of the
people they try to serve. Selection of material for
digitisation will be affected both by its physical condition
44
•
Advanc:ement in Library and Information Scienc:e
and by the existing quality of the bibliographical

descriptions available for it. .
Material which is fragile, damaged and in poor
condition may present too many risks of further damage
being caused by handling to allow it to be scanned without
special care, or some basic conservation treatment. This will
involve additional costs, and the institution will need to
consider whether other collections in better condition
should have priority, or whether the costs of preparation
and conservation should be built in to the costs of the
overall digitisation project.
Similarly, if the material being considered as a
candidate for digitisation lacks detailed cataloguing or
descriptive data, it is essential for future access to such
material to create such data, and it will therefore need to
be considered whether the necessary costs of doing this can
be included in the overall budget of the digitisation project.
A digital image is an "electronic photograph" mapped as
a set of picture elements (pixels) and arranged according to
a predefined ratio of columns and rows. The number of
pixels in a given array defines the resolution of the image.
Each pixel has a given tonal vaiue depending on the level
of light reflecting from the source document to a charge-
coupled device (CCD) with light-sensitive diodes.
When exposed to light they create a proportional
electric charge, which through an analogue/digital
conversion generates a series of digital signals represented
in binary code. The smallest unit of data stored in a
computer is called a bit (binary digit). The number of bits
used to represent each pixel in an image determines the
number of colours or shades of grey that can be represented
in a digital image. This is called bit-depth. Digital images
are also known as bit-mapped images or raster images to
separate them from other types of electronic files such as
vector files in which graphic information is encoded as
..
Digitisation of Library Services 4S
mathematics formulas representing lines and curves.

Source documents are transformed to bit-mapped images
by a scanner or a digital camera.
During image capture these documents are "read" or
scanned at a predefined resolution and bit-depth. The
resulting digital files, containing the binary digits (bits) for
each pixel, are then formatted and tagged in a way that
makes it easy for a computer to store and retrieve them.
From these files the computer can produce analogue
representations for on-screen display or printing. Because
files with high-resolution images are very large it may be
necessary to reduce the file size (compression) to make
them more manageable both for the computer and the user.
When a source document has been scanned, all data is
converted to a particular file format for storage. There is a
number of widely used image formats on the market.
Some of them are meant both for storage and
compression. Image files also include technical information
stored in an area of the file called the image "header". The
goal of any digitisation programme should be to capture
and present in digital formats the significant informational
content contained in a single source document or in a
collection of such documents.
To capture the significant parts, the quality
assessments of the digital images have to be based on a
comparison between those digital images and the original
source documents that are to be converted, not on some
vaguely defined concept of what is good enough to serve
immediate needs. However, the solution is not to capture
an image at the highest quality pos~ible, but to match the
conversion process to the informational content of the
original-no more and no less. At capture, consideration
has to be taken both of the technical processes inv,olved in
digitisation and of the attributes of the source documents.
These attributes could be of varying dimensions and tonal
range.
Source documents can also be characterised by the

way in which they have been produced: by hand, by a
typewriter or printer, or by photographic or electronic
methods. The physical condition of the source documents
can affect the conversion in different ways. Fading text,
bleed-through of ink, burned pages and other kinds of
damage sometimes destroy the informational content but
more often set physical limitations on the possibilities of
catching information during a scan. Therefore, the need for
pre-scanning treatment of the source documents has to be
identified. Neglecting this can not only be a threat to the
documents themselves but can also limit the benefits and
results of digitisation and increase the cost.
Ordinary steps to prevent this are for example to carry
out preliminary elementary conservation treatment, and to
use book cradles for bound volumes, and routines to control
lighting and other environmental conditions during the
actual scanning. If the source documents have artifactual
value they will normally need to be examined by a
conservator before scanning. When the risks of damage to
the source documents are high and the documents are of
special value or in bad condition, it can sometimes be better
to scan from film intermediates instead of from the original
documents, if such film is available.
Image quality at capture can be defined as the
.cumulative result of the scanning resolution, the bit depth
of the scanned image, the enhancement processes and
compression applied, the scanning device or technique
used, and the skill of the scanning operator. Resolution is
determined by the number of pixels used to present the
image, expressed in dots per inch (dpi) or pixels per inch
(ppi). Increasing the number of pixels used to capture an
image will result in a higher resolution and a greater ability
to delineate fine details, but just continuing to increase
resolution will not result in better quality, only in a larger
file size.
The key issue is to determine the point at which

sufficient resolution has been used to capture all significant
details in the source document. The physical size of a source
document is of importance when determining the
resolution. When the dimensions of the document increase,
the number of pixels needed to capture required details in
it will increase too, as well as the file size. Large files can
cause problems for users when viewing the images on a
screen or in sending them over networks, because the file
size has an important impact on the time it takes to display
an image.
One way to decrease the file size is to decrease the
resolution. This is a critical decision, especic4ly if the source
document has both a large physical size and a high level
of detail, which can be the case with oversized maps and
drawings. Bit depth is a measurement of the number of bits
used to define each pixel. The greater the bit depth used,
the greater the number of grey and colour tones that can
be represented. There are three kinds of scanning (digital
sampling):
bitonal scanning using one bit per pixel to represent
black or white
sgreyscale scanning using multiple bits per pixel to
represent shades of grey; the preferred level of grey
scale is 8 bits per pixel, and at this level the image
displayed can select from 256 different levels of grey.
colour scanning using multiple bits per pixel to
represent colour; 24 bits per pixel is called true colour
level, and it makes possible a selection from 16.7
million colours.
The choice of bit depth affects the possibility of capturing
both the physical appearance and the informational content
of a source document. Decisions about bit depth therefore
have to take into account whether the physical appearance,
or parts of it, have an added informational value and need
to be captured. This can be the case when the purpose of

the digitisation project is to produce facsimiles of the source
documents. Image enhancement processes can be used to
modify or improve image capture by changing size, colour,
contrast, and brightness, or to compare and analyse images
for characteristics that the human eye cannot perceive. This
has opened up many new fields of applications for image
processing, but the use of such processes raises concerns
about fidelity and authenticity to the original.
Image processing features include for example the use
of filters, tonal reproduction curves and colour
management tools. Compression is normally used to
reduce file size for processing, storage and transmission of
digital images. Methods used are for example to abbreviate
repeated information or eliminate information that the
human eye has difficulty in seeing. The quality of an image
can therefore be affected by the compression techniques
that are used and the level of compression applied.
Compression techniques can be either "loss less", which
means that a decompressed image will be identical to its
earlier state because no information is thrown away when
the file size is reduced, or "lossy" when the least significant
information is averaged or discarded in this process.
In general "loss less" compression is used for master
files and "lossy" compression techniques for access files. It
is important to be aware that images can respond to
compression in diffe:r:ent ways. Particular kinds of visual
characteristics like subtle tonal variations may produce
unintended visual effects. Digital images reproduced from
photographic formats have a wide tonal range, commonly
resulting in large files.
Another technique besides compression that can be
used to reduce file size is to limit the spatial dimension of
the digital image. This can be done when the image is
intended to be an archival reproduction rather than a
facsimile replacement of the original. The equipment used
and its performance has an important impact on the quality

of the image. Equipment from different manufacturers can
perform differently, even if it offers the same technical
capability. Operator judgement and care always have a
considerable impact on image quality. In the end it is
decisions taken by humans wlVch decide what quality will
be achieved.
QUALITY CONTROL PROGRAMME
Quality control is an important component in every stage

of a digital imaging project. Without this activity it will not
be possible to guarantee the integrity and consistency of the
image files. Steps need to be taken to minimise variations
between different operators as well as between diff,erent the
scanning devices in use. Scanners most also be regularly
controlled to verify accuracy and quality.
A quality control programme is needed both for in-
house projects and for projects where all arrangements or
parts of them are outsourced. An important difference is
that in a partly or totally outsourced project the quality
requirements often have to be formulated before a contract
is signed, due to its legally binding nature. In-house
projects can built up their quality control programmes step
by step as a part of their project activities. Although quality
control is a crucial factor to ensure the bes~ results, there is
no standard way to ensure a certain image quality at
capture.
Different source documents require different scanning
processes, and this has to be considered when developing
a quality control programme. However, in most
programmes it is enough to set up a sampling plan covering
for example 10% of all images produced by each scanning
device during a certain time period (day, week, month). If
a specified percentage of the chosen images is found to be
incorrect then the whole batch will have to be subjected to
control.
so Advancement in Library and Information Science
A quality control programme always covers the master

files that are produced and in most cases will also cover
other out-puts such as access files, microforms and paper
copies. The automated image evaluation tools that are
available today are normally not sufficient for materials
that are required for cultural and scientific purposes.
Therefore, visual quality evaluation has to be done:
either from on-screen or print-outs
- based on a mix of on-screen evaluation and print-outs
(film or hard copies)
Technical limitations that can affect the evaluation must be
considered, beginning with the possibilities of getting good
quality printed hard copies of grey scale and colour images.
Before a scanner is bought, vendors should be required to
deliver measurable digital results from relevant digital
image quality evaluation tests. When a digital imaging
project is running, scanning quality control measures must
be set to enable operators to ensure that the scanning device
is operating within anticipated tolerances. Issues of main
concern in performance are: spatial resolution, tonal
reproduction, colour reproduction, noise, and artifacts
detection.
In projects which are scanning oversized material, such
as maps and plans, geometric accuracy is also an important
factor. A common definition of spatial resolution is the
ability to capture and reproduce spatial details. It covers
both input and output devices and that is probably one
reason why the concept of resolution is one of the most
misunderstood and misused technical specifications
applied to digitising equipment. Resolution is often
specified in terms of dpi (dots per inch). However, dpi
should normally be used only for printers, as lid" always
refers to printed dots (e.g. ink jet printers and laser
printers). For input resolution (i.e. scanners and digital
cameras) and on-screen resolution (i.e. monitors) pixels per
inch (ppi) normally should be used.
A pixel is in general a much smaller physical unit than

a dot. When it says that a scanner has a maximum
resolution of for example 600 dpi, it means in practice that
the scanner optically samples a maximum 600 pixels per
inch (ppi). But the optical sampling rate of a scanning
device only delineates the maximum possible (optical)
resolution in the direction of the extension of the CCD unit.
It will not guarantee that the scanner in reality can spatially
resolve details to the same degree that the optical sampling
rate would imply.
The reason is that the optical sampling rate of an input
device is only one component of the concept of resolution.
Other components of importance are for example the
quality, focal range and mechanical stability of the optical
system, the input/output bit-depth, the vibrations of the
source document and the CCD, and the level of image
processing applied to the image.
There are several methods for evaluating resolution.
Commonly used are the following:
Resolution targets, which were originally made for use
in micrographic and photographic industries. They are
normally used to measure the reproduction of details,
uniform capture of different parts of a source
document, image sharpness etc. The results can
sometimes be not fully trustworthy, but resolution
targets are still practical tools to use especially for
bitonal conversion.
The Modulation Transfer Function (MTF), in which the
spread of light in the imaging process (line spread
function) is measured. This is a more reliable and
objective way to evaluate how well details are
preserved and suits best greyscale and colour systems.
Spatial Frequency Response (SFR), which means
measuring the ability of the scanner to transmit high-
frequency information by means of a specified transfer

function (in practice equivalent to MTF)
OBJECTIVE OF DIGITAL CONVERSION
The cost of the technical infrastructure required for the

conversion is determined by the media selected. Bound
volumes may need to be scanned face-up on a planetary
scanner, loose documents on a flatbed scanner. Transparent
media, (slides, negatives) may be captured with a
transparency adaptor on a flatbed scanner, but optimal
image quality might be secured in the inclusion on the
budget of a film scanner. Included in the hardware cost
estimate is the maintenance contract to support maximum
production.
The objective of digital conversion for cultural heritage
institutions is authentic representation rather than image
enhancement for desktop publishing. Image capture
software is normally bundled with the capture device, and
subsequent image management can be effected with high
end products like Adobe PhotoShop, Corel PhotoPaint) or
demonstration packages on the Web (PaintShopPro). Once
a digitisation process has been chosen for the source
material selected, the cost per image can be analysed on the
following basis:
Source type: Turning pages and repositioning of bound
materials will take longer to scan than loose sheets; the
large pixel dimensions in scanning oversize maps or
newspapers slow production rates, and may need to be
outsourced where technical infrastructure is not
available.
Quantity: total volume of images to be scanned.
Process: direct scans or intermediate, OCR conversion
to ASCII text. 0 Standard: resolution, bit-depth, tonal
range will affect resultant file-size, and ultimately, the
cost of disk storage.
Digitisation of Ubrary Services 53
Cost per item: Where resolution is constant, cost per item

is affects by physical dimension of the source material,
resulting in variations in file-size, and cost of disk
storage.
The processes to make collections accessible, either in a
catalogue or on the Web are determined by the selection of
a metadata standard, based on the following factors:
Extent of existing collection-level description.
Need to modify metadata for vari<?us user audiences.
Compatibility to make collection visible through a
Gateway. .
The cost of metadata or indexing processes are
disproportionately high (60% of total cost), as they are
conducted by qualified information specialists, who often
need to be re-skilled in the use of new standards. Post
capture processing includes quality control against the
conversion standard selected and re-scanning where
necessary. This can either be conducted on each file, or at
specific capture intervals, to ensure consistency of image
quality.
The creation of smaller derivative files from master tiff
files can be automated, to provide low-resolution images
for Web presentation. Digital archiving comprises
electronic records management functions of providing
security, authenticity, and integrity for long-term
preservation and access. While many document
management systems will offers these features, it is
important that proprietary file formats are avoided at all
costs. Files should· be stored in a standard file format
(.TIFF,.JPEG, ASCII text) that can be migrated to a new
platform as required, without loss of data and resultant
costs incurred to the library or archive. The identification
of a local agent to provide ongoing software support in
developing countries is important for cultural heritage
institutions.
Digital asset management is becoming of increasing

importance to the commercial sector, and the strength of the
growing market will positively affect pricing, and enhance
the availability of local support. The application of digital
technologies in providing open access to information
demands high levels of capacity in information technology.
Where this capacity is lacking in developing countries,
reliance on consultants must be calculated into the budget
at market-related prices. The implementation of a storage
system to manage document images should enable the
management of file relationships, audit trails, version
control, and disposition scheduling.
The selection of a suitable system requires some
investigation of commercial software products for budget
purposes. Software evaluation may be effectively
conducted by a specialised consultant, working in
conjunction with staff to identify the needs of the
institution. Apart from the many useful features that many
software packages offer, an additional point of budgetary
consideration is the license fee, usually an annual
commitment to maintenance and updating of the software.
The design of a user interface and management of the
delivery system is integral to access. Budgeting for software
is open, with solutions ranging from highly sophisticated
HTML editors (Dreamweaver, Front Page), to shareware
products available on the Web (Arachnophobia, Front Page
Express).
The budget considerations in managing the storage and
delivery system will include the software requirements
outlined above, systems administration functions of server
acquisition and maintenance, network infrastructure and
access control (firewall), backup hardware and media
(tapes, CD's etc.). Storage of backup copies and microfilm
masters in off-site low temperature and low humidity
storage is recommended for disaster recovery purposes.
Modest solutions to managing the storage and delivery
Digitisation of Library Services ss
system can be applied in developing countries. One

solution can be found in hiring the services of a commercial
Internet Service Provider (ISP), rather than assuming the
technical challenge and ongoing costs of server
maintenance. Because of resource constraints, many
libraries and archives in developing countries tend to be
behind the digital technology curve.
Service providers in the education and qualification of
library and archive staff have been slow inform students of
the new skills they will need to respond to the digital
environment. These include not only technical skills, but
proposal writing and project management skills applied to
the development of technical services. The successful
application digital technology is not a matter of hardware
or software, but a problem of access to opportunity, which
goes far beyond technology.
Directors of libraries and archives may fear that because
they do not understand the technical details of digitisation,
they cannot effectively plan for the implementation of
. digitisation projects. It is more important for managers to
understand the impact of digitisation on the organisation
and its goals. Three main areas of consideration are change
management, capacity building, and in developing
countries, the social implications of digital technologies.
Opporturuties for staff development in the implementation
and use of digital technologies require managerial support,
often less than enthusiastic when faced with the reality of
trimming budgets to support new initiatives.
Change is basically about people. It may be necessary
to analyse the problems of interaction within the
organisational culture for obstacles related to territoriality,
a lack of informed managerial support and fear of change
within the line management, including technophobic
barriers to technological innovation. These issues are often
underestimated. The functional units of organisation
within the institution may need to be deconstructed to
enable change by focusing less on procedures and more on

common goals of providing an information service. It is
inevitable that existing lines of authority and responsibility
will be relaxed.
The level of seniority that is age-related in the
traditional societies of developing countries has no place in
the digital arena, where individuals must be fearless of risk
and change, and be self-motivated in learning the limits
and opportunities of information technology and
communication. In the absence of formal training in
developing countries, managers can nevertheless provide
leadership in seeking aptitude in these areas to empower
the right people in the organisation. For example, a simple
manifestation of managerial support for changing
institutional cultures might be in making time available to
staff, who show an aptitude, to familiarise themselves with
computers.
Financial assistance in the form of institutional loans for
personal computers, modems etc. will serve the institution
by extending the learning curve beyond office hours, while
taking the threat of change out of the workplace. Even
when opportunities abound, people and organisations have
a natural aversion to change, especially where it is
perceived as daunting, complicated or costly. At the same
time there is a natural human tendency to desire what
others ·have. Capacity building is therefore effectively
achieved by forming partnerships with early adapters,
either institutions or individuals with experience in the use
of the technology, and who in their commitment to making
it work, ensure the transfer of skills and increase the
chances for a successful outcome of the project. The
development of partnerships with similar cultural heritage
proposals to collaborate with experienced institutions or
individuals on joint initiatives can leverage human
development beyond the seniority and gender constraints
of the particular institutional culture.
Formal training opportunities that might be available

include commercial training for the basic office
environment, or short courses offered by universities and
colleges, some even on-line, aimed to deliver successful
technology. Most institutions began operating on the
information highway by sending highly motivated
delegates to intensive training courses. In the developing
world, the training should be appropriate to the particular
needs of operating independently with limited IT support.
It has become clear that in providing grant-funded specific
digitisation training courses that the acceptance of such
opportunities also bears with it a level of accountability.
Capacity building then becomes self-motivated, if the
individual is empowered to affect change in developing
digital technologies.
Intensive instruction in digitisation should assume a
basic level of IT competency in a Windows environment,
and aim instead to provide key skills for digitisation: image
capture: to capture a digital image from a physical object
OCR (Optical Character Recognition): to convert imaged
text into machine-readable format markup languages:
standard protocol for adding metadata, e.g. HTML, XML
metadata: standard schema of administrative, descriptive,
structural and preservation information, e.g. Dublin Core
indexing and database technologies to search and retrieve
digital resources intellectual property management: the
risks and responsibilities of disseminating electronic
information user interface design: the interpretation of user
interactions with the data web technology: encompasses
basic delivery mechanisms of digital data via HTML, XML
and use of search engines project management: to achieve
goals within set periods of time, and within a specific
timeframe.
The goals and objectives of digital projects have to be
clearly identified and the implementation carefully
planned in order to attract grant funding. Whether digital
projects are to be ultimately outsourced or production

conducted in-house, the need to develop both technical and
managerial skills is essential for effective quality control.
An important cosmponent of capacity building in
developing countries is the opportunity provided to create
new job opportunities for the people of the country.
Partnerships that support human development are
preferable to those that offer quicker, and often cheaper off-
site conversion, but which ignore the social upliftment of
job creation. Human resource development is essentially
aimed at breaking the digital divide. The Internet provides
global information sharing, changing the way in which
users interact with information resources.
CHALLENGES OF INFORMATION REVOLUTION
Networking Revolution is opening new "digital

opportunities" for developing countries. They can
significantly benefit from investments in modern
information infrastructure in a pro-competitive regulatory
environment, and leapfrog stages of development in terms
of networking roll-out. In one of the most significant
developments ever to emerge in the communications
industry, the Internet is redrawing the economy of
communication networks and creating incentives for the
expansion of connectivity at a dizzying pace.
The resulting "Networking Revolution" is seeing the
introduction of large-scale data communications networks
that promise broadband access and substantially lower
costs to all consumers. These developments are already
having a significant impact on the global economy.
Between 1995 and 1998, the global telecommunications
markets worldwide connected 200 million telephone lines,
263 million mobile subscribers, and 10 million leased lines.
1 And while only 15 million Internet connections were
made in 1991-1994, this number exploded to 88 million in
1995-1998, nearly a sixfold increase in network growth.
Digitisation of Library Services . S9
While it took the telephone close to 75 years to r~ach

50 million users, it took the World Wide Web (WWW) only
4 years to reach the same number. The forces of
technological advance and competition are driving this
infrastructure revolution and fostering the growth of a
global networked economy. When the first transatlantic
cables for te1.ephony were laid in the 1950s, a minute of
voice communication cost US$2.44; by 1996, operating costs
per minute were just over a cent. The costs of many other
key networking components are also falling sharply. Some
of the main technological innovations driving this procec;s
are: The move from copper to fibber optics in telecommu-
nications backbones. This has increased exponentially the
amount of information that can be transmitted over cables.
Transmission speeds that used to be clocked in kilobits
per second have now reached the terabits-per-second
range. More information is now carried at higher speed and
lower cost, making connections more affordable and
promoting content development. New mobile wireless
technologies have increased the ubiquity of commu-
nications services, allowing users to stay connected
irrespective of location, while new fixed wireless
technologies have increased the capacity of the airwaves to
transmit information at very high data rates.
Cables and wireless systems can now be connected by
fast electronic switching components that cost a fraction of
the price of old analog switches, making possible real-time
networking through Internet and wireless communi-
cations. And perhaps most central to the explosion of the
Internet has been the huge increase in affordable
computing power made possible through inventions
including integrated circuits, miniaturisation, and large
data storage devices. The widespread adoption of digital
packet-switched technology has spurred a convergence of
voice, data, and multimedia applications, enabling all
communications traffic to be managed and delivered over
multipurpose platforms. Together, these changes have

made possible the emergence of new technological and
communication paradigms.
The Internet, personal computers, and wireless
telephony foster an increasingly dynamic network of
individuals, firms, schools, and governments commun-
icating and interacting with each other. And since the value
of a network increases as its number of users grows, by
accessing the global information network each additional
user not only benefits from a new ability to communicate
and trade, but also adds value to the rest of the connected
world. Digital technology allows Significant savings to be
achieved in the pro~ion of communication services
networks. ",
In turn, this is driving a change in the economics of
information and communications, allowing competition
between cable TV providers, cellular companies, and fixed-
line telephone systems over Internet service provision.
Continued technological advance will allow for both
greater competition and improved functionality of the
Internet, through real-time interactive communications,
faster data exchange rates, and the ability to build entirely
new service concepts and applications. Technological
advance has facilitated the contestability of incumbent
communications providers. This has in turn been further
stimulated by the policy drive toward liberalisation
(elimination of regulatory barriers to entry), competition
(particularly, via efforts to constrain abuses of market
power by incumbent companies) and privatisation that
over the last two decades has characterised the sector all
over the world.
Rather than a move toward laissez-faire, regulatory
reform in most countries has been characterised by a shift
in the focus of regulation from one of control of "natural
monopolies" toward the elimination of barriers among the
sectors of the communications field (to facilitate

convergence), re-regulation and support for public-private
partnerships. In the United States, government support for
design and early operation of the Internet was critical.
Equally critical for the Internet's success, however, was the
government's. decision in the early 1990s to privatise
operation of the Internet backbone-while continuing reform
of the telecommunications marketplace to expand
competition.
At the same time, regulation of content and commerce
over the Internet has been kept to a minimum during this
period. Elsewhere, not only' have many national
governments acted decisively to liberalise the
telecommunication sector, but also negotiated under the
World Trade Organisation (WTO) a groundbreaking effort
to reach a general agreement on basic telecommunications,
to further extend this process. A number of positive
feedback mechanisms, driven by technological change and
regulatory reform, further promote the dynamism of the
networking revolution. As the unit costs of many key
networking technologies fall, this further stimulates
increased demand and higher production volumes-the
evolution of cellular handsets being a good illustration. In
the information-carrying business, the decreasing profit
margin per bit carried implies the necessity to operate on
larger and larger volumes.
The Internet is complying, due to both a rapidly
evolving technology and greater and greater volumes of
information exchanged within a denser and denser
network of users. The 1990s were also characterised by the
emergence of new approaches fo standardisation. .until
then network standardisation was the preserve of highly
specialised technicians, who cajoled networking, and IT
corporations towards laboriously negotiated standards.
The explosion of the Internet has entirely Jransformed the
dynamics of standardisation, making the quick
62 Advancement in Ubrary and Infonnation Science
establishment of standards an essential prerequisite for the

continued dynamism of the market. Incentives to move
faster in establishing such standards via public-private
partnerships and industry negotiations are increasing
significantly.
In the developing world, the combined impact of
liberalisation and ownership reform has been substantial.
Telecommunication and networking possil:>ilities are no
longer limited to narrowband wire line telephony and
analog wireless telephony. The range of choice is
expanding significantly, although still in a limited number
of -mostly urban- areas, by the successful deployment of
state-of-the-art mobile networks, broadband metropolitan
area networks and paging networks. And, of course, the
World Wide Web ,. allows each connected user to access a
dynamic glolial information space.
The Internet is providing a platform for a global
marketplac~, supporting-among other things-electronic
commerce, where as more suppliers and customers enter
the arena, the benefits of participation grow, and the
penalties for non-participation also increase. The global
economy has already been profoundly affected by these
developments. In 1999, global e-commerce transactions
amounted to US$150 billion. Continuing developments in,
the ICT sector are expected to further compound the impact
of the networking revolution on world economic activity.
The remarkable reduction in the cost-and
enhancement of functions-of communications networks,
made possible by the forces of change described above, are
fostering the emergence of new businesses with potential
to drive economic growth and to transform the social
environment in the developing, world. However, these
businesses are often quite different in profile from the .
traditional telecommunications service providers.
CHALLENGES OF INTERNET INFRASTRUcruRE
Infrastructure providers and users, the indicators play an

increasingly important role in underpinning Internet
self-governance and self-regulation. For each of the key
reasons expressing why self-regulation is preferred by the
Internet community, inf0!ffiation to inform that process is
essentiaL If one Internet Service Providers (ISP) does the
'wrong thing', in terms of its interaction with the Internet,
it can impair the network and service performance for all
ISPs. This can range from problems with day-to-day
network quality management right up to, in the worst case,
bringing traffic flows on the Internet to a standstill. Just as
policy makers have needed indicators for regulation in
traditional communication sectors, the Internet industry
needs infrastructure indicators for self-regulation.
For policy makers familiarity with Internet
infrastructure indicators is important in a number of areas.
One aspect is the increasing number of regulatory issues
being placed before governments, not only in relatively
unfamiliar issues at the core of the Internet but also in areas
where the Internet is converging with other communication
platforms. Moreover, a better understanding of Internet
infrastructures is an important element underpinning
wider issues bearing on electronic commerce. They can
provide a better understanding of the challenges for the
private sector in upgrading. infrastructure and of
comparative national performance. They also provide im
important input into a better understanding of how the
Internet is becoming more critical for overall economic and
social development in OEeD countries.
The main criteria for inclusion of indicators in this
document is that the data were generated by network
surveys or by entities that playa role in administering core
Internet infrastructure. In respect to the first criteria, online
or electronic network measurements would qualify but an
off-line surveyor e-mail based surveys are not included.

The exclusions are not because su,ch surveys may not
provide valuable information but rather because the off-
line survey methodologies used are generally well known.
In addition, their exclusion assists to narrow the scope of
this document to a manageable level and to focus on
indicators that often rely on the policy maker to bring
together data from different sources to generate the
indicator.
The second criteria includes data that are collected by
the managers of core infrastructure surrounding the
Domain Name System (DNS), such as second and third
level domain registration, Internet Protocol (IP) number
assignment, Autonomous System Number (ASN)
assignment. Some useful Internet Websites on this subject,
with the summaries of attempts to draw together the results
of official and commercial surveys, are those of HeadCount
and NUA. How reliable the underlying results are for each
country depends on the individual surveys and there is, of
course, no harmonisation of methodologies.
At the same time some of the Internet's core
administrative entities, in some countries, are collecting
useful data from among their members. In this context a
leading example is KRNIC, the organisation which
administers domain names and IP addresses in Korea,
which surveys Korean Internet Service providers (ISPs) to
determine their number of business connections and dial-
up subscribers. KRNIC is then able to publish national
Internet subscriber statistics for Korea. IP addresses and
autonomous system numbers. As appropriate, references
are made to governance and regulatory issues where these
indicators are being used to inform debate Qf form a tool
used by industry for self-regulation.
Whereas surveys of hosts, servers and so forth are
undertaken by the Internet's technical community, tools are
available for policy makers to generate infrastructure
Digitisation of Library Services 6S
indicators. These include the use of traceroutes to provide

an indicator of market position and a better understanding
of traffic exchange in backbone networks. In addition, the
use of search engines to provide information on the
implementation of webcasting technologies and the
topography of Internet hyper-text links to leading
,electronic commerce sites.
By way of example, a matrix of all the hyper-text links
between domains for the OECD area is made available.
Accordingly, it is possible to see the emerging pathways of
electronic commerce between OECD countries. Finally,
readers may find the references to electronic glossaries at
the end of this document a useful aid in respect to Internet
terminology. The most common indicators used to measure
Internet development are the surveys of Internet hosts
undertaken by Network Wizards and RIPE (Reseaux IP
Europeens).
The Network Wizards survey includes all Top Level
Domains (TLDs) and generic Top Level Domains (gTLDs)
and is undertaken every six months. The RIPE survey is
undertaken monthly but is limited to TLD registrations in
their service area. While both surveys are much appreciated
by the Internet community the results need to be qualified
and have several limitations. The first qualification that
needs to be made is that host data do not indicate the total
number of users who can access the Internet.
The second caveat is that these surveys do not reach
every host on the Internet, as access to some hosts is blocked
by company fire-walls. Recognising the limitation of this
second factor Network Wizards changed their
methodology for the survey undertaken in January 1998 to
enable access to a greater number of hosts. Not with
standing this change, surveys of Internet hosts may only be
interpreted as the minimum size of the 'public Internet', as
it is impossible to determine the number of users accessing
services via each ho'st. The Netcraft Web Server Survey is
a survey of web server software usage on computers

connected to the Internet. Netcraft collect and collate as
many hostnames providing an http-service as their survey
can find, and systematically poll e~ch one with an
HyperText Transfer Protocol (HTTP) request for the server
name. A host name is the first part of a hosts' domain name.
In the July 1998 survey Netcraft received responses from 2
594 622 web servers. The growth rate for the first half of
1998 was 41 per cent. Some 96 per cent of these servers are
in the DEeD area.
By far the largest number of web servers are
under. com which has 60 per cent of all web servers. As for
Internet hosts, it is possible to provide penetration by
domain and to weight this by the number of gTLD
registrations. The country providing the most responses, on
a per capita basis, is Denmark. This is because there are a
lot of small virtually hosted sites in Denmark. Netcraft
report, that while this is a characteristic of many countries,
it is particularly so in Denmark and the Netherlands.
TeleDanmark arid Cybercity operate two of the largest
virtual hosting sites in Denmark. Internet surveys of hosts
and servers provide one indicator of Internet development
and may be used as one potential indicator of comparative
Internet development between countries.
The main limitations are not reaching all hosts or
servers, and the structure of the domain name system being
such that there is no guarantee that all hosts under a
particular domain are located in a certain geographic
location. That being said the DECD's observations, from an
series of traceroutes to Websites under TLDs, are that by far
the majority of hosts using TLDs are located in the country
concerned. The availability of gTLD registrations by
country presented the first possibility of redistributing
Internet hosts under domain names such as.com to
individual countries. The most simple option, used to
prepare this report, was to weight the number of hosts
Digitisation of Ubrary Services 61
under gTLDs according to the number of gTLD

registrations from a particular country.
In other words if 5 per cent of the total gTLD
registrations are from a particular country then 5 per cent
of the total number of hosts surveyed under gTLDs are
reallocated to that country. This methodology could, no
doubt, be subject to a number of caveats. Nevertheless it
seems reasonable to assume that this approach gives a more
accurate distribution of Internet hosts, in OECD countries,
than allocating all hosts under gTLD registrations to the
United States. Other countries recording Significant
increases at that time, albeit from smaller base numbers of
hosts, were Turkey, Spain, Luxembourg and France. All
these countries recorded a relatively large increase in the
number of hosts relative to the average OECD increase of
21 per cent. The countries for which this made very little
difference are those where users mainly rely on national
TLD registrations, such as Iceland, the Czech Republic,
New Zealand, Poland and Finland.
The Netcraft Server surveys also provide one of the
best available indicators of the growth of electronic
commerce on the Internet. Whereas the best known search
engines only cover http sites, Netcraft also undertakes a
secure socket layer (SSL) survey. The SSL protocol was
developed by Netscape for encrypted transmission over
TCP lIP networks. It sets up a secure end-to-end link over
which http or any other application protocol can operate.
The most common application of S£L is https for ssl-
encrypted http which enables electronic commerce to take
place. In August 1998, Netcraft received responses from
more than 424 000 web sites using encryption. However
most of these responses are excluded, in terms of electronic
commerce web sites, because they do not have third party
certification. Sites without a third party certification are not
expected to be engaging in electronic commerce because of
the warning message that gets generated. The key element
for electronic commerce is third party certification with

matching certificate. Netcraft say plausible reasons for the
large number of responses, where the name in the
certificate did not match the site's domain name, might
include web sites run from virtual hosting configurations
where the provider sets up all customers with https
services, with customers buying certificates when they start
to make use of the facilities.
Netcraft adds that sites where the certificate issuer is
not a known certificate authority, typically indicate that site
has generated and signed its own certificate, which is
acceptable for prototyping, or where trust is not required
outside a limited group of people, such as a company, or
collaborative project. This is likely to be more commonplace
on internal networks than on externally visible Internet
sites. The major electronic commerce uses of secure server
software are for encrypted credit card transactions over the
Internet. The most common non-retail use of SSL is
subscription access to privileged information.
REFERENCES
preserving digitnl collections. JISC/NPO Studies on the Preservation
museums", Archives and Museum Informatics Technical Report, 1 (1).
Conway, P., Digitizing preservation. Library Tournai, 1 February 1994,42-
45.
Feeney, M. (ed.), Digitnl culture: maximising the nation's investment: a
synthesis of TISCINPO studies on the preservation of electronic
materials. London: National Preservation Office, 1999.
3
leT Applications in Libraries
The adoption and use of Information and communication

Technology (lCT) has resulted in the globalization of
information and knowledge resources. Bibliographic
databases, full-text documents, and digital library
collections are always available to users. Today, there is not
a single library or information center in India that is fully
automated. Some libraries are in the initial stages of the
automation and networking process. A few libraries have
CD-ROM access, but no initiative has been taken in action
to produce information products on. CD. Some libraries
have an online connection and are providing external
resource sharing on a limited scale. Only a few specialized
libraries and information centers have started networking
or resource sharing or have used the telecommunication
system for data transfer.
Modem libraries are increasingly being redefined as
places to get unrestricted access to information in many
formats and from many sources. In addition to providing
materials, they also provide the services of specialists,
librarians, who are experts at finding and organizing
information and at interpreting information needs.
More recently, libraries are understood as extending
beyond the physical walls of a building, by including
material accessible by electronic means, and by providing
the assistance of librarians in navigating and analyzing
tremendous amounts of knowledge with a variety of digital

tools.
Before the computer age, this was accomplished by the
card catalog - a cabinet containing many drawers filled
with index cards that identified books and other materials.
In a large library, the card catalog often filled a large room.
The emergence of the Internet, however, has led to the
adoption of electronic catalog databases (often referred to
as "webcats" or as OPACs, for "online public access
catalog"), which allow users to search the library's holdings
from any location with Internet access. This style of catalog
maintenance is compatible with new types of libraries, such
as digital libraries and distributed libraries, as well as older
libraries that have been retrofitted.
Electronic catalog databases are disfavored by some
who believe that the old card catalog system was both
easier to navigate and allowed retention of information, by
writing directly on the cards, that is lost in the electronic
systems. This argument is analogous to the debate over
paper books and e-books. While they have been accused of
precipitously throwing out valuable information in card
catalogs, most modem libraries have nonetheless made the
movement to electronic catalog databases.
Large libraries are scattered across multiple buildings
across a town, each having multiple floors, with multiple
rooms with many shelves. After the user found a book in
the catalog the user is forced to wade through another stack
of lists, manuals, and maps and then navigate through
erratic floor layouts to the real book. CPS coordinates might
help in this respect.
Basic tasks in library management include the planning
of acquisitions (which materials the library should acquire,
by purchase or otherwise), library classification of acquired
materials, preservation of materials (especially rare and
fragile archival materials such as manuscripts), the
deaccessioning of materials, patron borrowing of materials,
IeI' Applications in Libraries 71
and developing and administering library computer

systems. More long-term issues include the planning of the
construction of new libraries or extensions to existing ones,
and the development and implementation of outreach
services and reading-enhancement services.
IMPLEMENTATION OF ICfS IN LIBRARIES
The successful implementation of leTs in libraries requires

effective use of the involved political-administrative,
financial, human, technical and other resources, correct
selection of priorities and directions of activities, control
and coordination of all conducted work. This makes highly
necessary a national strategy and a plan of activities in the
sphere of application and development of leT for
implementation of the policy. The state plays the key role
in creating favorable conditions for building the
information society, and its main activities include:
forming a legislative base and its regular moder-
nisation;
analysing and regulating the activities relating to
information technologies in the country;
creating the national and state information systems,
forming the information resources, control activities of
the state institutions in this sphere;.· creating admiss-
ible environment for new situation, which will serve
for attracting foreign and local investments and for fair
competition;
creating equal conditions for all participants, using
political, legal, economic and administrative
mechanisms to attract and involve wide layers of the
society in implementation of the strategy, and
coordinating their activities;
ensuring protection of civil rights and freedoms and
security of personal and private information;
creating for citizens opportunities of access to state

information resources;
ensuring the national information security;
mobilising financial resources required for impleme-
ntation of the strategy, providing governmental
support to the social oriented and national important
projects and programs;
creating favorable conditions for production national
leT hardware and software products, and stimulating
their promotion at the world market;
creating favorable environment for private companies,
especially small and medium-sized companies acting
in the leT sector;
creating favorable environment for leT usage in all
fields of economy;
using modem leT in state administration and local
self-administration;
carrying out regular activities on forming electronic
government;.
enhancing international cooperation for ensuring
national interests in leT.
For instance, in Azerbaijan, the national strategy
determines the following key principles to ensure
effectiveness of leT application and development and
create favorable and equal conditions for all participants:
lCT awareness - to gain public support, ensure effective
activities and facilitate implementation of leT,
knowledge and information relating to ICT is made
broadly accessible to population.
Transparency - all the activities are conducted openly;
rules and arrangements concerning activities are
disclosed to community through all means; public
discussions are conducted, the ideas of all parties are
listened to and taken into consideration.
ler Applications in Libraries 73
Equality - irrespective of the position in society and type

of ownership, interests of all participants are
considered equally, social justice principle is carried
out. Innovation - innovations of scientific and technical
progress are taken into consideration, research
activities are supported.
Stepwise implementation - taking into account the ICT's
rapid growth and to ensure effective use of financial
resources, the implementation activities are conducted
in stages. Programs and projects are prepared taking
into account priorities and obtaining outputs in short
times.
International cooperation - the country takes an active
part in preparation and implementation of the
international ICT projects, the strategy implementation
activities are tightly coordinated with the development
of the global information society.
"First leader" principle - the leaders of state
administration and local self-administration,
organisations and enterprises are interested in the
activities to implement the strategy and control this
process directly.
Nation-orientated principle - development of national
information resources, creation of Azerbaijani-
supporting software are the considered priorities,
creation of information resources of the national
minorities is stimulated.
The key pri?rities of the National Strategy are to:
meet information requirements of citizens, ensure
comprehensive development of a person, raise
intellectual potential of the country;
create legal environment to ensure the transition to the
information society, conduct effective, transparent and
controllable state administration and self-
administra tion;
74 Advancement in Ubrary and Information Science
strengthen country's economic potential through ICT

application;
protect and popularise historical, literary and cultural
national heritage via ICT usage.
National Strategy determines the following key activity
directions:
leT and Modernisation of Education, training national
staff in leT and providing minimal leT literacy in the
country;
development of the social spheres using leT;
development of the telecommunication infrastructure;
formation and development of the electronic
government;
creating and developing the legislative base relating to
the informatisation;
formation and development of the electronic economy;
formation and development of national information
resources;.
strengthening scientific, technical and production
potential in the reT sector;
ensuring national information security and personal
and private data protection.
Development of Telecommunication Infrastructures:
creating and developing the modern telecom-
munication infrastructure, new telecommunication
networks;
development of the data networks in the country,
including Internet, eliminating differences in this
sphere between rural and urban areas;
conducting the liberalisation policy in telecomm-
unication and creating the environment for fair
competition, forPlation of the' national level data
ler Applications in Ubraries 7S
operators and creating favorable conditions for their

activities;
conducting flexible tariff policy in telecommunication
services.
Information Security:
ensuring the national security in information
exchange, struggle against electronic crimes;
Taking into account the national interests, creating
conditions to ensure rights of citizens and
organisations to safely obtain and use of electronic
information;
creating an environment providing citizens'
information security.
The complexity of the forthcoming work, large financial
resources required and the rapid growth of leT, stipulates
stage-wise implementation of the national leT strategy in
Azerbaijan, economy of which is in the transition period.
As the National Strategy determines the main activity
directions and their content conceptually, the action plans
in every direction - national leT programs are developed
and concrete actions are conducted through projects. The
successful implementation of the strategy depends directly
on its correct management. This process is coordinated,
controlled and governed by state institutions conducting
the Government policy in leT usage and development.
To ensure implementation of the unified policy in the
international relations of the Republic of Azerbaijan with
other countries and international organisations in
implementation of this strategy the activities of participants
are coordinated by the Ministry of Foreign Affairs. The
implementation of information security activities is
coordinated by the Ministry of National Security, because
of the information security being one of the main factors
influencing the national security and country's
development. The strategy is implemented on the scientific

basis and conducted under democratic principles.
The systematic and permanent activities, the support
of broad masses, the national dialogue achievements, the
open discussions and consultation, the impartial
consideration of all proposals, participation of all parties in
projects developments ensure the strategy's successful
implementation. In order to determine the level of progress
of the country towcu:ds the information society, regular
monitoring of the leT usage level is conducted.
With this purpose the national success indicators are
developed on the basis of world practice. Regular
monitoring of national leT usage level and its assessment
is one of the main objectives of the Government policy.
Successful implementation of the National Strategy, full
achievement of its goals and objectives require large
financial resources. This problem can be solved by
mobilisation of Azerbaijan's internal State finances and
attracting foreign investments. In key administration and
socially oriented spheres leT applications may be financed
by the Government.
To achieve long-term stable progress in this sphere
effective financial policy meeting modern requirements
should be developed. In general, implementation of the
strategy is financed from the following sources:
resources in state budget allocated for JeT;
various local government and non-government funds;
resources of central and local administrations, state
organisations allocated for leT;
foreign and local investments, purpose loans;
financial and· technical support of international and
foreign organisations, grants.
A favorable environment for the transition to the infor-
mation society which will meet interests of government,
Icr Applications in Libraries 77
social institutions, NGOs, private companies, overall

society and its members will be established as a result of
implementation of the strategy and the following strategic .
results will be achieved in 10 years period:
The Republic of Azerbaijan will be among the leading
countries of Europe in the sphere of ICT usage, will be
in the position of the leader in South Caucasus and
Transcaspian regions in the ICT sector.
Effective, transparent and controllable state
administration and local self-administration will be
conducted, participation of broad layers of the citizens
in administration process will be ensured.
Easy access to information resources and services will
be provided for citizens.
The country's economy will experience a considerable
growth as a result of ICT applications, considerable
progress will be achieved in eliminating poverty and
unemployment.
Favorable legal environment, intellectual potential,
telecommunication infrastructure and national
electronic information environment will be established
for the transition to the information society.
Internet segment of Azerbaijan will be developed,
national information security will be ensured, the
Republic of Azerbaijan will be successfully integrated
into the international information society.
National information resources will be developed, the
Azerbaijani language will be widely used in national
information exchange, ICT will be broadly used to
protect and popularise historical" and cultural heritage
of the Azerbaijani people and other peoples living in
the country.
Irrespective of the ownership all organisations' ~nd
enterprises will use ICT infrastructure, telecommun-
ication market with fair competition will be formed

and function effectively, country will directly
participate in world's electronic commerce and
business processes.
ICT part of the GDP (gross domestic product) will be
considerably increased.
The relations between the government and citizens will
permanently grow in the ICT epoch. The world experience
shows that there has never been the complete harmony in
relations between government and civil society. But the
successful movement towards the information society is
only possible on the base of mutual respect, mutual
workmanlike activity and collaboration between the
government and citizens.
INTERNET ACCESS STRATEGY FOR PUBLIC LIBRARY
The Internet is used to provide access to electronic

information and recreation resources from around the
world to enhance and supplement the wealth of
information available in the ACT Government's public
library hard-copy print, electronic and audio collections. As
part of its information role the Library develops guides to
the Internet which assist the community locate the
information they require and specifically to provide access
to :
ACT Government and Community Information which
is relevant to residents in our community;
Commonwealth Government information;
the catalogues of other libraries, such as The National
Library of Australia, other ACT Libraries, Universities
and public libraries throughout Australia; and
extend the range of information resources available to
the library user.
To facilitate public access to the Internet Public Access
terminals are freely available at all branches of the Library.
ICf Applications in Libraries 79
In providing free access to the information available on the

Internet it is recognised that the satisfaction of a person's
information needs must be independent of an ability to pay.
Each member of the Australian community has an equal
right of access to information services and that such
freedom of access is essential to the democratic process and
to the social well-being of the Australian community.
Costs associated with the printing of information
(within copyright) available over the 'net' or for access to
information or sites which require payment of specific fees
is the responsibility of the user. The ability to download
information to a removable disc is not provided. Access to
the removable disc drives is not allowed to ensure that
unauthorised access to the operating system and/or
network cannot occur.
Internet Mail services are not specifically supplied.
However users are made aware of the various free mail
services operating over the Internet. Users of the Public
Access Internet Service are requested to abide by the
follOWing 'Code of Use'. The ACT Government is not
responsible for the material on the Internet and cannot
guarantee the authority or accuracy of any of the
information found on it, nor can it accept responsibility for
any material it contains which may be considered offensive
by some users. Internet terminals cannot be used for any
activities of an illegal or fraudulent nature, including such
activities as defined under the Australian Commonwealth
Government Telecommunications Act 1989, or other
applicable Territory and Commonwealth laws.
Printing facilities are available at a cost of 10c per page.
Users must abide by copyright laws when using material
on the Internet. Specifically users should note section 40 of
the Copyright Act which refers to fair dealing. To provide
a comprehensive Public Access Internet Service to the
Canberra community, terminals are provided at all libraries
with additional terminals for Seniors' priority use provided
80 Advancement in Library and InfolDlation Science
at the Woden Library. This has enabled the number of

Public Access Internet Terminals to be increased to twenty-
five, one for every 12,000 residents. In 1998/99 the number
of public library access points is to be increased to thirty-
one, one for every 10,000 residents. Library staff are also
provided with access to the Internet to assist customers to
locate specific information. All terminals located on
Information Counters have access to the Internet. Specific
terminals for Seniors' priority use are available at the
Woden Library.
The conditions of use are the same as for the Public
Access terminals available to the general public, with the
added requirement that Seniors have priority use. As such
proof of the users age may be requested. This service is
provided to encourage use of the Internet by this section of
the community. Introductory training for Seniors is
provided at the Woden Library through an arrangement
with an outside agency. Terminals provided for Seniors'
use are made available to other members of the public if
they are not required for use by members of the Seniors'
community. Public library services at Tuggeranong and
Erindale are provided in conjunction with the Department
of Education & Training as joint-use facilities for both the
public and students. In addition to the terminals provided
by the Library additional terminals may also be provided
by the two Colleges. This arrangement is on the
understanding that the College meets the costs associated
with the extended service.
Provision has been made on the Library network for
up to 4 additional Internet terminals at each joint-use
Library. Public access to the terminals funded by the
Colleges will be negotiated on a case by case basis. Basic
introductory training in using the Internet is to be provided
to users of the service on request. Such training would
include: how to input an Internet address; how to identify
and use links to other sites and documents; and basic
ICI' Applications in Ubraries 81
enquiry searching. Such training would usually take less

than 10 minutes. Brochures providing basic information on
the use of the Internet are also to be provided. Specific free
training is currently provided to Seniors through an
arrangement with an outside agency. It is planned to utilise
the terminals to provide more in depth Internet training
using the services of outside training agencies. The
provision of public access to the Internet is an integral part
of the Library's role to provide free access to information.
Future decisions on the expansion of the network will
be made by evaluating the costs and benefits of providing
access to information resources through additional Internet
access points to those associated with the purchase of
resources for the Library's 'hard:'copy' print, audio and
electronic collections. In addition to the funding provided
by Government attempts are made on a regular basis to
obtain donations and sponsorship of the services provided
by the Library. This has been done and will continue to be
done for the existing Public Library Internet Access Service.
In addition to providing access from its physical service
points throughout Canberra, ACT Information & Libraries
plans to provide direct access to its resources via the
Internet. Access to the 650,000 items listed on the Library
catalogue is to be made available to the community from
their home or office computer. Currently access is available
through a separate dial-up service. This service provides
users with full access to the Library Catalogue.
Access to the catalogue by ACT Government staff as
part of the Government's Intranet services was
implemented in 1998. It is planned to provide public access
to the Library catalogue through the Internet during 1998/
99. A project is underway to provide access to a range of
documents from the ACT Heritage Library on the Internet.
As an extension to the project of making the Library
catalogue available over the Internet, it is planned to
provide access to a catalogue of ACT Government publicat-
82 Advancement in Library and Infonnation Science
ions. A service is being developed to enable requests for

information to be submitted via the Internet. The pilot
service is linked to the ACT Heritage Library and at the
conclusion of this trial the benefit of extending this service
to the public library service will be evaluated.. The
Australian Library and Information Association, believing
that freedom can be protected in a democratic society only
if its citizens have access to information and ideas through
books and other sources of informa tion, affirms the
following principles as basic and distinctive of the
obligations and responsibilities of the librarian:
A primary purpose of a library service is to provide
information through books and other media on all
matters which are appropriate to the library concerned.
A librarian must protect the essential confidential
relationship which exists between a library user and
the library.
The functions of the librarian include: to promote the
use of materials in the librarian's care; to ensure that
the resources of the library are adequate to its purpose;
to obtain additional information from outside sources
to meet the needs of readers; to cater for interest in all
relevant facets of knowledge, literature and
contemporary issues, including those of a controversial
nature; but neither to promote or suppress particular
ideas and beliefs.
A librarian, while recognising that powers of
censorship exist and are legally vested in state and
federal governments, should resist attempts by
individuals or organised groups within the community
to determine what library materials are to be, or are not
to be, available to the users of the library.
A librarian should not exercise censorship in the
selection of materials by rejecting on moral, political,
racial or religious grounds alone material which is
1C'f Applications in Libraries 83
otherwise relevant to the purpose of the library and

meets the standards, such as historical importance,
intellectual integrity, effectiveness of expressipn or
accuracy of information which are required by the
library concerned. Material should not be rejected on
the grounds that its content is controversial or likely to
offend some sections of the library's community.
A librarian should uphold the right of all Australians
to have access to library services and materials and
should not discriminate against users on the grounds
of age, sex, race, religion, national origin, disability,
economic condition, individual lifestyle or political or
social views.
A librarian must obey the laws relating to books and
libraries, but if the laws or their administration conflict
with the principles put forward in this statement, the
librarian should be free to move for the amendment of
these laws.
The Australian Library and Information Association asserts
that each member of the Australian community has an
equal right of access to public library and information
services regardless of age, race, gender, religion, natio-
nality, language, social or economic status. Such freedom
of access is essential to the democratic process and to the
social well-being of the Australian community. That
satisfaction of a person's information needs must be
independent of an ability to pay. Libraries and information
services established to serve the general public should,
therefore, provide core services to all members of the
library's clientele without direct charge to the individual.
Freedom, prosperity and the development of society and of
individuals are fundamental human values. They will only
be attained through the ab.ility of well-informed citizens to
exercise their democratic rights and to play an active role
in society. Constructive participation and the development
of democracy depend on satisfactory education as well as
on free and unlimited access to knowledge, thought, culture

and information.
The public library, the local gateway to knowledge,
provides a basic condition for lifelong learning,
independent decision-making and cultural development of
the individual and social groups. This Manifesto proclaims
UNESCO'S belief in the public library as a living force for
education, culture and information, and as an essential
agent for the fostering of peace and spiritual welfare
through the mines of men and women. UNESCO therefore
encourages national and local governments to support and
actively engage in the development of public libraries. The
public library is the local centre of information, making all
kinds of knowledge and information readily available to its
users. The services of the public library are provided on the
basis of equality of access for all, regardless of age, race, sex,
religion, nationality, language or social status.
Specific services and materials must be provided for
those users who cannot, for whatever reason, use the
regular services and materials, for example linguistic mino-
rities, people with disabilities or people in hospital or
prison. All age groups must find material relevant to their
needs. Collections and services have to include all types of
appropriate media and modem technologies as well as
traditional materials. High quality and relevance to local
needs and-"'conditions are fundamental. Material must
reflect current trends and the evolution of society, as well
as the memory of human endeavour and imagination.
EXPANDING ACCESS TO ICT
A main aim of the People's Network was to connect all

public libraries to the internet, as part of a government
commi tmen t to provide all UK citizens wi th the
opportuility to use and benefit from online services. The
basic building block for realising this goal is adequate
connectivity. Connectivity comprises at least three elem-
ICf Applications in Libraries 8S
ents: telecommunications infrastructure; the availability of

adequate compatible hardware and software; and access to
needed technical support. Levels of investment in leT
infrastructure and hardware prior to the launch of the
People's Network varied greatly from one library authority
to another, as well as from one home country to another,
resulting in wide disparities in public library connectivity
and services. A main goal in connecting all public libraries
to the internet was thus about 'levelling the playing field'.
In addition to rolling out leT infras-tructure and hardware
across the UK public library system, PN also sought to
ensure a common platform and package of leT services, no
matter how small or remote the library.
Two years after the launch of the People's Network,
both these goals have been very largely achieved. A citizen
anywhere in the UK can now go to any public library
service point and be confident of gaining access to the
internet, available free or at low cost. Even small rural
communities are being reached through mobile phones
using wireless networking to provide internet access. More
than this, the user can go from one library to another and .
expect to find the same platform and basic package of
services. I
This universality of physical access and consistency in

standards represents both a remarkable technical achiev-
ement as well as an organisational feat in implementing
such a large scale infrastructure initiative within the
scheduled timescale and budget. At a progra-mme level as
well as at the level of the individual library service, it has
involved entering into partnE;rships, and managing
relationships, with a number of different stakeholders,
including telecom providers, local authorities and
corporate IT.
The People's Network has been successfully rolled-out
across the UK, connecting all library authorities to the
internet. Over 4,000 public libraries provide broadband
access to the internet and other online services. This

amounts to a huge leap in connectivity from just 1% in 1995
to almost 100% of branch library coverage in just short of
a decade. At the start of the People's Network in 1999, only
a minority of library authorities could be described as 'early
adopters' of ICT. Three lacked even the most basic of
electronic management systems and were using old card
indexes.
The People's Network has added in excess of 30,000
computer terminals to the public library system. The central
libraries of some large metropolitan library authorities now
have as many as 100 terminals, whilst at the other extreme
some small branch libraries have just two terminals because
of limited space. This expansion in hardware capacity,
coupled with increased opening hours in many libraries to
make ICT more accessible, is now providing over 68.5
million hours of potential internet use a year, across the UK.
Hardware includes more than terminals.
The People's Network has provided a cluster of
networked technology and facilities that include
principally scanners and printers. In many library auth-
orities, videoconferencing facilities were also incorporated
into the technology package. Other hardware found in
around half the library authorities includes assistive and
adaptive technology appropriate to older users and those
with disabilities. These include keyboard and trackball
mouse alternatives, height adjustable workstations and lap
trays, sound cards and headphones and large 22" screens.
Monitoring returns indicate that adaptive technologies
are commonly restricted to one workstation/ terminal point
within a library and not all brpnch libraries are kitted out.
Specific hardware is also found in individual library
services, reflecting local population characteristics and
requirements. An example is multi-language keyboards
and software. The technical requirements for the People's
Network specified a minimum 2MB connection. Although
ICT Applications in Libraries 87
small libraries were not expected to need this capacity in

the short term, the infrastructure allowed for the likelihood
that web-based resources would become increasingly
broadband dependent. Bandwidth is now impressive:
while averages do not tell the whole story, libraries
reported a mean of 23MB connections.
Nearly all People's Network terminals are providing
access to a suite of applications, including Microsoft
Explorer, Word, Excel, Access and PowerPoint. There was
no requirement on libraries to extend this basic package,
although some library authorities have done so in response
to demand from their more sophisticated ICT users, or as
library staff have themselves increased their knowledge
and skills of what software is on offer.
Proprietary software available across the network of
some library services includes desk-top publishing
packages, desk-top digital imaging such as Photoshop,'web
page design programmes such as Dreamweaver or
FrontPage. For people who have particular disabilities
which inhibit their use of ICT, access to adaptive software
is especially important, and particularly so if in.come levels
are low.
The same is true of those with low levels of literacy. A
good proportion of libraries have catered for users whose
access to ICT is impaired by a physical or learning
disability, although, as with the 'hardware, software may
only be available in one library or on one terminal per
library. Different software packages have been installed to
support blind or partially sighted users, people with
learning difficulties such as dyslexia or low levels of
literacy.
Library services have traditionally used mobile libraries
as a way of reaching users remote from library service
points or physically unable to access the library. Many of
the case study libraries had mobile libraries, but not all
offered access to PN services. CALL PN funding was for
static library access points only and so any equipment

available in mobile libraries was funded from other
- sources. Just one or two had equipped the mobile library
van with full ICT access, including disability access to the
vehicle. Another, serving remote communities in Scotland
where internet access is not possible, was looking at
packages that would enable staff to download complete
web sites on a non-networked Pc.
Other libraries similarly provided access to a PC, but no
network access. Laptops were the other way of reaching
physically and socially isolated groups, although again
these did not always provide real time access to the internet
or networked resources. Library services with advanced
library management systems were better placed to provide
users in community settings with access to on-line.
catalogues and services, using high spec lap tops.
Physical access involves more than rolling out the
technology. Once in place, it is important that the system
works reliably, that the hardware and software are
relatively trouble free, and that back up technical support
is readily available when needed. Case studies provide a
window for looking at the realities on the ground of the
internet access afforded by PN.
In the first round of field visits, libraries had many tales
about hiccoughs in rolling out the infrastructure and
problems with bedding down the hardware and setting up
appropriate management systems for PN to ensure its
smooth running. By the time of the second round visit,
these had mainly dissipated. Not surprisingly though,
given the scale of PN, individual libraries have encountered
various obstacles to keeping systems running and
providing reliable open access to the terminals and the
internet for library users. Here, give a flavour of these:
A major source of technical difficulty was the
peripheral hardware (printers and scanners) rather
than the computers, which for the most part were

technically robust. One county library service
described its decision to go for low-end scanners as a
mistake - the PN coordinator had come round to the
view that photocopying was a more cost-effective
alternative. Several libraries reported systemic failures
with print management software. Because printing is
networked, a local problem on one printer often
impacted on other printers, clogging up the entire
system. Such problems could be exacerbated by lack of
inter-operability between hardware and software in
different libraries.
Viruses were a problem for several libraries,
occasionally requiring the whole system to be closed
down, in one case for several weeks during the busy
summer period. This caused the staff endless problems,
but made them realise how embedded PN had become
and what a useful asset it is.
Systems failure was a periodic occurrence, in one
library service taking all PN terminals out of action for
five weeks. This particular library service also had a
high failure rate on its computers, eventually dealt with
hy the supplier replacing the hardware.
Not all library services had fully installed PN hardware
and software. In one case, the terminals only provided
access to email and internet. The computers didn't
come with MS Office already installed and the library
service had waited more than a year for corporate IT
to load them and give them a common'cyber cafe' look.
Delays were also experienced in installing video-
conferencing hardware.
Providing free internet access for all in a public space
raised particular concerns about information abuse and
the censorship of illegal or inappropriate material
which could be displayed and downloaded by users.
Open information access also created potential issues
of liability, culpability and accountability for the local

authority and library service. Firewalling and filtering
were the common ways of dealing with these issues,
although a heavy hand could restrict users from
accessing legitimate sites and was a common cause of
user complaint. Several libraries had installed software
that allowed them to reboot the machines clean.
Securing adequate bandwidth was a problem for two
library services.
For many libraries, the main software deficiency was
the lack of an automated booking system. Instead, staff
were spending many hours manually logging users
into the system or using a paper based system.
The experience of technical support varied widely across
library services, reflecting such things as relations with
corporate IT, the appointment of staff to dedicated posts for
froptline leT support and the existence of service level
agreements with suppliers. Some libraries had no funding
for frontline support and were dependent on a central leT
sl1-pport desk; others could call on the services of an in-
library leT support team or officer. One county library
service had included a high level support agreement as part
of its PFI contract, with a rapid response time and penalties
for failure to deliver within targets.
An aspe'2t of physical access to PN computers and
online services is the way they are grouped and located
within the library setting, and the social and cultural
ambience of the library more generally. Some libraries had
located PN terminals in a separate learning centre or suite,
away from book stacks and other services; others
distributed small banks of comput~rs in different parts of
the library, often zoned areas for particular user groups
such as teenagers; still others integrated them with the
physical stock wherever space allowed.
Physical space could sometimes dictate the
arrangement; other times the library made a conscious
ICf Applications in Ubraries 91
choice about location. In one library for example, a con-

scious decision had been made about where to locate the
terminals as a result of earlier experience of working
alongside adult education. In this particular case, the PCs
were integrated with the physical stock, in groupings that
were sufficient to ensure viable numbers for adult educ-
ation courses. Another library which had chosen to locate
all its computers in a large learning centre alongside the
reference section of the library was reconsidering the
appropriateness of this choice.
Users of leT
The users of ICT in public libraries represent the ultimate

test-bed or litmus-test of the connectivity afforded by PN,
and their experiences of its accessibility, reliability and
comparative advantage, over other modes of ICT access.
Focus groups with users explored some of these aspects,
and users' experience and overall satisfaction with the
'hard' end of PN. Functioning effectively in the inform-
ation age of the 21st century is not just a matter of
computer or technological competence. It also calls for a
new kind of information literacy or intelligence attuned
to the digital world and the changing nature of what
counts for knowledge, and for what have termed civic
literacy.
The 'c' in ICT represents a new communitarian
capability of technology that can strengthen a host of civic
purposes and can be harnessed to other kinds of learning.
Today's citizens thus need to be competent and confident
in at least three different facets of Information and
Communication Technology:
The technical side of handling technology and
exploiting its multiple uses
The information side of dealing with the chaotic nature
of the internet, including discovering, evaluating and
making sense of information encountered in a web

environment
The civic side of familiarity with leT as a tool for
participative citizenship
The main thrust of the efforts of public libraries in
developing citizens' capability and confidence in using leT
has been on the first facet. Some library services have paid
attention to the second, but the third remains a largely
neglected area. A main purpose of the People's Network
was to reach out to citizens whose lack of awareness and
familiarity with leT placed them at risk of being disenfra-
nchised from the mainstream of society. Beyond helping
citizens to achieve a basic threshold in leT competence,
public libraries were also expected to play an important role
in helping learners along a 'learning pathway', developing
and building on their basic level leT skills to take
advantage of the full range of learning opportunities on
offer. Public libraries have taken on this new area of work
with enthusiasm, generally in partnership with other
agencies.
ICT Competence
Library staff have generally viewed technical competence

as cumulative hierarchy, with four main building blocks.
A novice user needs to master one before moving on to
the next. So awareness and familiarisation are followed
by simple use of online and computer applications, and
lead on to more structured basic, intermediate and
advanced training. Many libraries have provided
opportunities for leT skill development at these different
levels, in partnership with other agencies. These have
commonly included the adult education service of the
local authority, local colleges or other leT training
providers. Libraries funded as UK online centres were in
a position to fund one or more training positions which
ICf Applications in Libraries 93
were filled by nonlibrarians with leT training expertise.

Other libraries, especially some of the smaller branch
libraries, offered training and support at the lower levels
only.
The citizens or users who were attracted to the library
as an environment where they felt comfortable either to
explore leT for the first time or to develop their leT
competence and confidence, were not a homogenous
group: they varied widely in terms of awareness,
motivation, purpose and confidence. A substantial prop-
ortion of ICT learners had no specific purpose in mind and
were only vaguely aware of its potential uses. Thus, the
kinds of support, guidance and leT training provided by
the library or learning centre could be very important in
shaping users' experiences, expectations and ultimate use
of the technology.
In one community based library, for example, a survey
was undertaken to gather information on the interests of
local people in ICT. It was framed only in training terms,
asking respondents to indicate what level of training they
would like the library to provide. The majority of libraries
however, offered a more limited repertoire of leT training
and support focused on awareness and familiarisation,
with more tenuous links to advanced training through
signposting to other providers.
Complementary Strategies
There are two main strategies taken by libraries: one, a skill
based or vertical strategy, the .other a use based or
horizontal strategy. The skill based vertical strategy was
aimed at moving people up the technical competence
hierarchy through a deliberate programme of instruction
and development. Success was measured by progression
from novice to basic to intermediate to advanced levels, and
often reflected in numbers completing certificated courses.
Developing leT capability was very much its own end, or
a route to other organised learning, or into employment.

The emphasis here was on generic leT competences,
developed outside their immediate context of use.
The use based horizontal strategy was aimed more at
helping people put their newly acquired computer skills to
good use, and to extend the number of activities or
purposes involving the use of leT.
In the process of 'learning by doing' in themed sessions,
assisted by various kinds of tutor and peer support,
individuals were also able to develop their leT competence
and confidence, but in less formalised ways. It might
involve developing a higher level of skill on a particular
piece of software to accomplish some learning project or
task, or extending beyond a text activity such as email to
learning how to scan, download and send photos to family
and friends; or finding out more about online family history
resources and how to use search tools to locate specific
information.
In this strategy, JeT was primarily seen as a means or
a tool to add value to some other activity or to accomplish
a learner's purpose. Number of drivers or pressures on
libraries towards a skill based strategy.
A key one was the preoccupation with measurable
outputs and evidence of leT progression, reinforced by
funding agencies. The targets commonly included
completion of formal training, or numbers of formal
training sessions.
Another driver was a consequence of the close
partnership between the library service and the adult
education service, local college or Workers' Educational
Association (WEA) in the provision of leT training. These
agencies tended to have a strong preference for
instructional modes of teaching leT, leading to formal
qualifications and tied into progression routes. Other
pressures came from the external environment.
Icr Applications in Libraries 9S
One library service, with a strong commitment to an

informal learning and community learning approach,
found little support from its regional Learning and Skills
Council: "The Learning and Skills Council have not been
a good partner, in terms of starting up any new initiatives
or furthering existing projects. They were not interested in
informal learning or community based learning initiatives
for older members - basically not interested in working
with libraries period. They were only interested in accr-
edited or formal learning and not even interested in
developing links, which lead to pathway development in
libraries - from informal to formal learning. A community
based outcomes approach is anathema to the LSC."
Civic Literacy
One of the defining features of ICT is its potential for

communication between people, whether one to one as in
email, or one to many and many to many involving the
use of other tools (bulletin boards, list-serves, chat rooms,
real-time collaboration). Where the medium is supported
by videolinks, the range and depth of potential exchange
is even greater. The new technologies can be used in a
variety of ways to strengthen a civic agenda:
E-democracy or e-cultural forums on local and national
issues and debates.
Dialogue between local groups and communities and
local government planners and service providers.
Creation of information and knowledge sharing
groups.
Creation of 'communities of practice'.
Active engagement in civic activities using ICT requires not
just awareness of the possibilities, but also the skills of
using different networking and communication tools. It
also calls for specific kinds of competence and competence
96 Advancement in Library 'IlIld Information Science
in online communication practices. The use of comm-

unication tools in e-Ieaming has highlighted the need for
a set of ground rules and conventions about appropriate
discourse, issues of confidentiality, norms for involving
group members etc. The nature of online text-based
communications is very different from the oral and visual
communicative practices of 'hard to reach' and 'at risk'
learners.
Apart from leT awareness and familiarisation (tasters)
sessions introducing users to email, library services were
doing very little to support the development of civic literacy
as a distinct kind of leT competence. This finding reflects
the low exploitation by libraries of the communitarian,
networking and knowledge sharing dimensions of the new
technologies although there are some emergent
developments in this area around reading in particular.
In their own leT training, even where opportunities for
collaborative online learning were part of the course design,
library staff made very little use of virtual learning
environments and communication tools. One or two
libraries came at civic literacy from another angle, with
sessions for parents on 'safe surfing' and the use of chat
rooms.
Digital Citizenship
The different ways in which libraries are enabling citizens
to access networked information resources and to exploit
the communication potential of the new technologies.
These r~late to:
Helping users discover and retrieve electronic
information relevant to their needs and interests.
Organising, presenting and creating new online
content, especially local content.
Facilitating interactive learning environments and
creating virtualleaming communities.
leT Applications in Libraries 97
There are four examples of an ICT service, as illustrations

of how ICT has been used to enhance, improve or add
significant new value for the user. Next, look at ICT-
enabled services generally and look at the different ways
in which libraries have incorporated ICT information
resources and tools into services and service delivery, as
well as the different organisational and social support
mechanisms which have emerged to support the service
and the user.
In keeping with the democratising intent of the public
library service, sec the conditions for effective use as
involving users not only as consumers but also as
producers of information and knowledge, and as active
participants in dialogue about what applications or uses
would be most beneficial in particular local contexts.
Libraries were providing or developing different
discovery tools or navigational and finding aids intended
to help users more easily discover, locate and retrieve
networked information and other knowledge resources.
These were of two main kinds:
Those to do with the content, oJtganisation or
accessibility of websites.
Those relating to online catalogues and advanced
search and retrieval functions.
In many case study libraries, but by no means all, library
staff were placing finding aids which were to do with
content, organisation or accessibility on the library'S
website, signposting users to information and other
resources. Examples of these included:
Signposts or nyperlinks from the library home page to
useful, new Clnd interesting websites. These might be
sites of general or specific interest, sites with top-rating
hits or sites hosting databases of interest to particular
user groups.
Web pages with content or links relevant to a particular
group stored in a database. For example, legal rights

materials relevant to asylum seekers, or family and
local history sources and resources.
A 'walled garden' of sites and resources suitable for
children and linked to the curriculum, available via the
library'S portal. A homework club might sometimes
have it's own dedicated site or web pages.
Portals to authenticated or validated information
resources in specific topic areas such as health and
well-being, with browsing and retrieval functions.
Websites with a specialist interface, providing a range
of information content and interactive services aimed
at different client groups such as children, businesses,
teenagers, readers.
Advanced search and browsing facilities on the council
interface for finding information about council
services, including library services.
Libraries were at different stages in creating an electronic
portal or gateway to all library services and catalogues,
with links to other services of interest to library users.
A few had interactive online catalogues with advanced
service functions up and running; others were in the
process of implementing or changing over to an advanced
library management system; and a few were still in the
throes of mounting a static version of the library catalogue
onto the PN network, accessible from any Internet-Pc. The
more advanced systems allow users to:
Remotely access the full range of library services,
including the ability to renew or reserve books and
other stock, and to check the user's own borrower
record.
Hire a meeting room or book internet sessions and
classes.
Access an electronic database of community
information, tailored to the library.
Post an inquiry to the 'Ask a Librarian' service.

Use the electronic portal to access the digital resources
of the library and partner organisations.
Find information helpful to selecting reading material
e.g. 'top ten reads' of the month.
There were instances of library authorities providing
additional services on the back of catalogue-based resource
discovery. Kent Library Service, for example, was piloting
a revenue-based online document search and delivery
service based on its genealogy and local history resources,
using an online secure payment system. Its service attracted
worldwide usage.
In Northern Ireland, library services acting together are
exploring the feasibility of a genealogy related service
based on library resources, involving a partnership
between the Service Provider and other complementary
bodies on a revenue-sharing basis.
REFERENCES
Bearman, D., "Archival methods". Archives and Museum Informatics

Tec1mical Report, 3 (1). Pittsburgh, Pa.: Archives and Museum
Informatics, 1989.
Blake, R., Electronic Records from Office Systems (EROS). In: Electronic
access: archives in the new millennium: proceedi/lgs, 3-4 Tune 1999.
London: Public Record Office, 1999, pp. 52-58.
Day, M.W., Preservation problems of electronic text and data. East
Midlands Branch of the Library Association, Occasional papers,
no. 3. Loughborough: EMBLA Publications, 1990.
Hedstrom, M., Optical disks: are archivists repeating the mistakes of the
past? Archives & Museum Informatics Newsletter, 2, 1988, 52.
_ _ _ _ _ _ ., Understanding electronic incunabula: a framework
for research on electronic records. American Archivist, 54 (3), 1991,
334-354.
McKemmish, S. and Duff, W., Metadata and ISO 9000 compliance.
Information Management jOllrnal, 34 (1), January 2000 .
. and Parer, D., Towards frameworks for standardising
recordkeeping metadata. Archives and Manusclipts, 26 (1), 1998,
24-45.
100 Advan<.ement in Library and Information Science
Wallace, D., Metadata and the archival management of electronic

records. Archivaria, 36, 1993,87-110.
_______ ., Managing the present: meta data as archival
description. Archiv'lria, 39, 1995, 11-21.
Waters, D.J., Electronic technologies and preservation: [.. .] based on a
presentation to the Annual Meeting of the Research Libraries Group,
June 25, 1992.
4
Features of Digital Library
Digital libraries are very different from traditional libraries,

yet in others they are remarkably similar. People do not
change because new technology is invented. They still
create information that has to be organised, stored, and
distributed. They still need to find information that others
have created, and use it for study, reference, or
entertainment. However, the form in which the information
is expressed and the methods that are used to manage it are
greatly influenced by technology and this creates change.
Every year, the quantity and variety of collections available
in digital form grows, while the supporting technology
continues to improve steadily.
Cumulatively, these changes are stimulating
fundamental alterations in how people create information
and how they use it. To understand these forces requires
an understanding of the people who are developing the
libraries. Technology has dictated the pace at which digital
libraries have been able to develop, but the manner in
which the technology is used depends upon people. Two
important communities are the source of much of this
innovation. One group is the information professionals.
They include librarians, publishers, and a wide range of
information providers, such as indexing and abstracting
services.
The other community contains the computer science

researchers and their offspring, the Internet developers.
Until recently, these two communities had disappointingly
little interaction; even now it is commonplace to find a
computer scientist who knows nothing of the basic tools of
librarianship, or a librarian whose concepts of information
retrieval are years out of date. Over the past few years,
however, there has been much more collaboration and
understanding. Partly this is a consequence of digital
libraries becoming a recognised field for research, but an
even more important factor is greater involvement from the
users themselves. Low-cost equipment and simple software
have made electronic information directly available to
everybody. Authors no longer need the services of a
publisher to distribute their works. Readers can have direct
access to information without going through an
intermediary.
Many exciting developments come from academic or
professional groups who develop digital libraries for their
own needs. Medicine has a long trap.ition of creative
developments; the pioneering legal information systems
were developed by lawyers for lawyers; the Web was
initially developed by physicists, for their own use.
Technology influences the economic and social aspects of
information, and vice versa. The technology of digital
libraries is developing fast and so are the financial,
organisational, and social frameworks.
The various groups that are developing digital libraries
bring different social conventions and different attitudes to
money. Publishers and libraries have a long tradition of
managing physical objects, nota1;>ly books, but also maps,
photographs, sound recordings and other artifacts. They
evolved economic and legal frameworks that are based on
buying and selling these objects. Their natural instinct is to
transfer to digital libraries the concepts that have served
them well for phySical artifacts. Computer scientists and
Features of Digital Library 103
scientific users, such as physicists, have a. different

tradition. Their interest in digital information began in the
days when computers were very expensive. Only a few
well-funded researchers had computers on the first
networks. They exchanged information informally and
openly with colleagues, without payment. The networks
have grown, but the tradition of open information remains.
The economic framework that is developing for digital
libraries shows a mixture of these two approaches. Some
digital libraries mimic traditional publishing by requiring
a form of payment before users may access the collections
and use the services. Other digital libraries use a different
economic model. Their material is provided with open
access to everybody. The costs of creating and distributing
the information are borne by the producer, not the user of
the information. The fundamental reason for building
digital libraries is a belief that they will provide better
delivery of information than was possible in the past.
Traditional libraries are a fundamental part of society,
but they are not perfect. Enthusiasts for digital libraries
point out that computers and networks have already
changed the ways in which people communicate with each
other. In some disciplines, they argue, a professional or
scholar is better served by sitting at a personal computer
connected to a communications network than by making a
visit to a library. Information that was previously available
only to the professional is now directly available to all.
From a personal computer, the user is able to consult
materials that are stored on computers around the world.
Conversely, all but the most diehard enthusiasts recognise
that printed documents are so much part of civilisation that
their dominant role cannot change except gradually. While
some important uses of printing may be replaced by
electronic information, not everybody considers a large-
scale movement to electronic information desirable, even if
it is technically, economically, and legally feasible. Here are

some of the potential benefits of digital libraries.
Digital library brings the ibrary to the user: To use a library
requires access. Traditional methods require that the user
goes to the library. In a university, the walk to a library
takes a few minutes, but not many people ~e member of
universities or have a nearby library. Many engineers or
physicians carry out their work with depressingly poor
access to the latest information. A digital library brings the
information to the user's desk, either at work or at home,
making it easier to use and hence increasing its usage. With
a digital library on the desk top, a user need never visit a
library building. The library is wherever there is a personal
computer and a network connection.
Computer power is used for searching and browsing:
Computing power can be used to find information. Paper
documents are convenient to read, but finding information
that is stored on paper can be difficult. Despite the myriad
of secondary tools and the skill of reference librarians, using
a large library can be a tough challenge. A claim that used
to be made for traditional libraries is that they stimulate
serendipity, because readers stumble across unexpected
items of value. The truth is that libraries are full of useful
materials that readers discover only by accident. In most
aspects, computer systems are already better than manual
methods for finding information. They are not as good as
everybody would like, but they are good and improving
steadily. Computers are particularly useful for reference
work that involves repeated leaps from one source of
information to another.
Information can be shared: Libraries and archives contain
much information that is unique. Placing digital
information on a network makes it available to everybody.
Many digital libraries or electronic publications are
maintained at a single central site, perhaps with a few
duplicate copies strategically placed around the world. This
Features of Digital Library lOS
is a vast improvement over expensive physical duplication

of little used material, or the inconvenience of unique
material that is inaccessible without travelling to the
location where it is stored.
Information is easier to keep current: Much important
information needs to be brought up to date continually.
Printed materials are awkward to update, since the entire
document must be reprinted; all copies of the old version
must be tracked down and replaced. Keeping information
current is much less of a problem when the definitive
version is in digital format and stored on a central
computer. Many libraries provide online the text of
reference works, such as directories or encyclopedias.
Whenever revisions are received from the publisher, they
are installed on the library's computer. The new versions
are available immediately. The Library of Congress has an
online collection, called Thomas, that contains the latest
drafts of all legislation currently before the U.S. Congress;
it changes continually.
Information is always available: The doors of the digital
library never close; a recent study at a British university
found that about half the usage of a library's digital
collections was at hours when the library buildings were
closed. Materials are never checked out to other readers,
miss-shelved or stolen; they are never in an off-campus
warehouse. The scope of the collections expands beyond
the walls of the library. Private papers in an office or the
collections of a library on the other side of the world are
as easy to use as materials in the local library. Digital
libraries are not perfect. Computer systems can fail and
networks may be slow or unreliable, but, compared with a
traditional library, information is much more likely to be
available when and where the user wants it.
New forms of information become possible: Most of what is
stored in a conventional library is printed on paper, yet
print is not always the best way to record and disseminate
information. A database may be the best way to store

census data, so that it can be analysed by computer; satellite
data can be rendered in many different ways; a
mathematics library can store mathematical expressions,
not as ink marks on paper but as computer symbols to be
manipulated by programs such as Mathematica or Maple.
Even when the formats are similar, materials that are
created explicitly for the digital world are not the same as
materials originally designed for paper or other media.
Words that are spoken have a different impact from words
that are written, and online textual materials are subtly
different from either the spoken or printed word. Good
authors use words differently when they write for different
media and users find new ways to use the information.
Materials created for the digital world can have a vitality
that is lacking in material that has been mechanically
converted to digital formats, just as a feature film never
looks quite right when shown on television.
Each of the benefits described above can be seen in
existing digital libraries. There is another group of potential
benefits, which have not yet been demonstrated, but hold
tantalising prospects. The hope is that digital libraries will
develop from static repositories of immutable objects to
provide a wide range of services that allow collaboration
and exchange of ideas. The technology of digital libraries
is closely related to the technology used in fields such as
electronic mail and teleconferencing, which have
historically had little relationship to libraries. The potential
for convergence between these fields is exciting.
DIGITAL LIBRARIES COST
The final potential benefit of digital libraries is cost. This

is a topic about which there has been a notable lack of
hard data, but some of the underlying facts are clear.
Conventional libraries are expensive. They occupy
expensive buildings on prime sites. Big libraries employ
hundreds of people-well-educated, though poorly paid.

Libraries never have enough money to acquire and
process all the materials they desire. Publishing is also
expensive. Converting to electronic publishing adds new
expenses. In order to recover the costs of developing new
products, publishers sometimes even charge more for a
digital version than the printed equivalent.
Today's digital libraries are also expensive, initially
more expensive. However, digital libraries are made from
components that are declining rapidly in price. As the cost
of the underlying technology continues to fall, digital
libraries become steadily less expensive. In particular, the
costs of distribution and storage of digital information
declines. The reduction in cost will not be uniform. Some
things are already cheaper by computer than by traditional
methods. Other costs will not decline at the same rate or
may even increase. Overall, however, there is a great
opportunity to lower the costs of publishing and libraries.
Lower long-term costs are not necessarily good news
for existing libraries and publishers. In the short term, the
pressure to support traditional media alongside new digital
collections is a heavy burden on budgets. Because people
and organisations appreciate the benefits of online access
and online publishing, they are prepared to spend an
increasing amount of their money on computing, networks,
and digital information. Most of this money, however, is
going not to traditional libraries, but to new areas:
computers and networks, Web sites and Webmasters.
Publishers face difficulties because the normal pricing
model of selling individual items does not fit the cost
structure of electronic publishing. Much of the cost of
conventional publishing is in the production and
distribution of individual copies of books, photographs,
video tapes, or other artifacts. Digital information is
different. The fixed cost of creating the information and
mounting it on a computer may be substantial, but the cost
lOB Advancement in Library and Information Science
of using it is almost zero. Because the marginal cost is

negligible, much of the information on the networks has
been made openly available, with no access restrictions.
Not everything on the world's networks is freely available,
but a great deal is open to everybody, undermining revenue
for the publishers.
The first serious attempts to store library information on
computers date from the late 1960s. These early attempts
faced serious technical barriers, including the high cost of
computers, terse user interfaces, and the lack of networks.
Because storage was expensive, the first applications were
in areas where financial benefits could be gained from
storing comparatively small volumes of data online.
An early success was the work of the Library of
Congress in developing a format for Machine-Readable
Cataloguing (MARC) in the late 1960s. The MARC format
was used by the Online Computer Library Center (OCLC)
to share catalog records among many libraries. This
resulted in large savings in costs for libraries.
Early information services, such as shared cataloguing,
legal information systems, and the National Library of
Medicine's Medline service, used the technology that
existed when they were developed.
Small quantities of information were mounted on a
large central computer. Users sat at a dedicated terminal,
connected by a low-speed communications link, which was
either a telephone line or a special purpose network. These
systems required a trained user who would accept a cryptic
user interface in return for faster searching than could be
carried out manually and access to information that was not
available locally. Such systems were no threat to the printed
document.
All that could be displayed was unformatted text,
usually in a fixed spaced font, without diagrams,
mathematics, or the graphic quality that is essential for easy
reading. When these weaknesses were added to the

inherent defects of early computer screens-poor contrast
and low resolution-it is hardly surprising that most
people were convinced that users would never willingly
read from a screen.
The past thirty years have steadily eroded these
technical barriers. During the early 1990s, a series of
technical developments took place that removed the last
fundamental barriers to building digital libraries. Some of
this technology is still rough and ready, but low-cost
computing has stimulated an explosion of online
information services. Four technical areas stand out as
being particularly important to digital libraries.
Electronic Storage is Becoming Cheaper than Paper:
Large libraries are painfully expensive for even the richest
organisations. Buildings are about a quarter of the total cost
·of most libraries. Behind the collections of many great
libraries are huge, elderly buildings, with poor
environmental control. Even when money is available,
space for expansion is often hard to find in the center of a
busy city or on a university campus.
The costs of constructing new buildings and
maintaining old ones to store printed books and other
artifacts will only increase with time, but electronic storage
costs decrease by at least 30 percent per annum. In 198-7, we
began work on a digital library at Carnegie Mellon
University, known as the Mercury library. The collections
were stored on computers, each with ten gigabytes of disk
storage. In 1987, the list price of these computers was about
$120,000.
In 1997, a much more powerful computer with the same
storage cost about $4,000. In ten years, the price was
reduced by about 97 percent. Moreover, there is every
reason to believe that by 2007 the equipment will be·
reduced in price by another 97 percent. Ten years ago, the
cost of storing documents on CD-ROM was already less

than the cost of books in libraries. Today, storing most
forms of information on computers is much cheaper than
storing artifacts in a library. Ten years ago, equipment costs
were a major barrier to digital libraries.
Today, they are much lower, though still noticeable,
particularly for storing large objects such as digitised
videos, extensive collections of images, or high-fidelity
sound recordings. In ten years time, equipment that is too
expensive to buy today will be so cheap that the price will
rarely be a factor in decision making.
Personal Computer Displays are Becoming More
Pleasant to Use: Storage cost is not the only factor.
Otherwise libraries would have standardised on microfilm
years ago. Until recently, few people were happy to read
from a computer. The quality of the representation of
documents on the screen was too poor. The usual procedure
was to print a paper copy.
Recently, however, major advances have been made in
the quality of computer displays, in the fonts which are
displayed on them, and in the software that is used to
manipulate and render information.
People are beginning to read directly from computer
screens, particularly materials that were designed for
computer display, such as Web pages. The best
computers displays are still quite expensive, but every
year they get cheaper and better. It will be a long time
before computers match the convenience of books for
general reading, but the high..resolution displays to be
seen in research laboratories are very impressive indeed.
Most users of digital libraries have a mixed style of
working, with only part of the materials that they use in
digital form. Users still print materials from the digital
library and read the printed version, but every year more
people are reading more materials directly from the

screen.
The growth of the Internet over the past few years
has been phenomenal. Telecommunications companies
compete to provide local and long distance Internet
service across the United States; international links reach
almost every country in the world; every sizable company
has its internal network; universities have built campus
networks; individuals can purchase low-cost, dial-up
services for their homes. The coverage is not universal.
Even in the U.S. there are many' gaps and some countries
are not yet connected at all, but in many countries of the
world it is easier to receive information over the Internet
than to acquire printed books and journals by orthodox
methods.
COMPUTERS AND NETWORKS
The emergence of the Internet as a flexible, low-cost,

world-wide network has been one of the key factors that
has led to the growth of digital libraries. Figure 1 shows
some of the computers that are used in digital libraries.
The computers have three main function: to help users
interact with the library, to store collections of materials,
and to provide services.
In the terminology of computing, anybody who
interacts with a computer is called a user or computer
user. This is a broad term that covers creators, library
users, information professionals, and anybody else
who accesses the computer. To access a digital library,
users normally use personal computers. These
computers are given the general name clients.
Sometimes, clients may interact with a digital library
without no human user involved, such as the robots
that automatically index library collections, and
sensors that gather data, such as information about the

weather, and supply it to digital libraries.
1& g Repositories
F1l n
Users
Location Systems Search Systems
Figure 1. Computers in digital libraries
The next major group of computers in digital libraries

are repositories which store collections of information
and provide access to them. An archive is a repository
that is organised for long-term preservation of
materials.
The figure shows two tyPical services which are
provided by digital libraries: location sy~tems and
search systems. Search systems provide catalogs,
indexes, and other services to help users find
information. Location systems are used to identify and
locate information.
In some circumstances there may be other computers
that sit between the clients and computers that store
information. These are not shown in the figure. Mirrors
and caches store duplicate copies of information, for
faster performance and reliability. The distinction
between them is that mirrors replicate large sets of

information, while caches store recently used
information only. Proxies and gateways provide
bridges between different types of computer system.
They are particularly useful in reconciling systems that
have conflicting technical specifications.
The generic term server is used to describe any computer
other than the user's personal computer. A single server
may provide several of the functions, perhaps acting as a
repository, search system, and location system. Conversely,
individual functions can be distributed across many
servers. For example, the domain name system, which is a
locator system for computers on the Internet, is a single,
integrated service that runs on thousands of separate
servers. In computing terminology, a distributed system is
a group of computers that work as a team to provide
services to users.
Digital libraries are some of the most complex and
ambitious distributed systems ever built. The personal
computers that users have on their desks have to exchange
messages with the server computers; these computers are
of every known type, managed by thousands of different
organisations, running software that ranges from state-of-
the art to antiquated. The term interoperability refers to the
task of building coherent services for users, when the
individual components are technically different and
managed by different organisations. Some people argue
that all technical problems in digital libraries are aspects of
this one problem, interoperability. This is probably an
overstatement, but it is certainly true that interoperability
is a fundamental challenge in all aspects of digital libraries.
If digital technology is so splendid, what is stopping
every library immediately becoming entirely digital? Part
of the answer is that the technology of digital libraries is
still immature, but the challenge is much more than
technology. An equal challenge is the ability of individuals
and organisations to devise ways that use technology

effectively, to absorb the inevitable changes, and to create
the required social frameworks. The world of information
is like a huge machine with many participants each
contributing their experience, expertise, and resources. To
make fundamental changes in the system requires inter-
related shifts in the economic, social and legal relationships
amongst these parties.
Digital libraries depend on people and can not be
introduced faster than people and organisations can adapt.
This applies equally to the creators, users, and the
professionals who support them. The relationships
amongst these groups are changing. With digital libraries,
readers are more likely to go directly to information,
without visiting a library building or having any contact
with a professional intermediary. Authors carry out more
of the preparation of a manuscript. Professionals need new
skills and new training to support these new relationships.
Some of these skills are absorbed through experience,
while others can l;>e taught. Since librarians have a career
path based around schools of librarianship, these schools
are adapting their curriculum, but it will be many years
before the changes work through the system. The traditions
of hundreds of years go deep. The general wisdom is that,
except in a few specialised areas, digital libraries and
conventional collections are going to coexist for the
foreseeable future.
Institutional libraries will maintain large collections of
traditional materials in parallel with their digital services,
while publishers will continue to have large markets for
their existing products. This does not imply that the
organisations need not change, as new services extend the
old. The full deployment of digital libraries will require
extensive reallocation of money, with funds moving from
the areas where savings are made to the areas that incur
increased cost. Within an institution, such reallocations are
Features of Digital Ubrary 115
painful to achieve, though they will eventually take place,

but some of the changes are on a larger scale.
When a new and old technology compete, the new
technology is never an exact match. Typically, the new has
some features that are not in the old, but lacks some basic
characteristics of the old. Therefore the old and new usually
exist along side. However, the spectacular and continuing
decline in the cost of computing with the corresponding
increase in capabilities sometimes leads to complete
substitution. Word processors were such an improvement
that they supplanted typewriters in barely ten years. Card
catalogs in libraries are on the same track.
In 1980, only a handful of libraries could afford an
online catalog. Twenty years later, a card catalog is
becoming a historic curiosity in American libraries. In some
specialised areas, digital libraries may completely replace
conventional library materials. Since established
organisations have difficulties changing rapidly, many
exciting developments in digital libraries have been
introduced by new organisations. New organisations can
begin afresh, but older organisations are faced with the
problems of maintaining old services while introducing the
new. The likely effect of digital libraries will be a massive
transfer of money from traditional suppliers of information
to new information entrepreneurs and to the computing
industry.
Naturally, existing organisations will try hard to
discourage any change in which their importance
diminishes, but the economic relationships between the
various parties are already changing. Some important
organisations will undoubtedly shrink in size or even go
out of business. Predicting these changes is made
particularly difficult by uncertainties about the finances of
digital libraries and electronic publishing, and by the need
for the legal system to adapt. Eventually, the pressures of
the marketplace will establish a new order. At some stage,
the market will have settled down sufficiently for the legal
rules to be clarified. Until then, economic and legal
uncertainties are annoying, though they have not proved
to be serious barriers to progress. Overall, there appear to
be no barriers to digital libraries and electronic publishing.
Technical, economic, social, and legal challenges abound,
but they are being overcome steadily.
FUTURE OF DIGITAL LIBRARIES
Libraries see an expanding base of users and, in many

instances, changing service demands. There is ample
evidence that when libraries make quality content available
through the Web, its use increases and it reaches more
people within the institution. In addition, the user base for
any individual library has expanded to include users
beyond the institution. This has been achieved through
more efficient resource-sharing arrangements as well as by
targeting new user groups for digitised content and e-
services.
In many cases, openly accessible digitised collections on
the Web are also being used heavily by people from beyond
the educational world. Despite press accounts to the
contrary, research libraries report that use of the library's
physical facility remains heavy, especially for collaborative
learning and research activities and for access to computers
and information technologies. As a result, many libraries
have moved to provide 24/5 access to the library buildings.
Many libraries see declining use of some traditional
services, such as use of some print materials, circulation
transactions, and reference transactions. However, the data
confirm that the demands for interlibrary borrowing and
for user education are steadily increasing. These trends
signal a shift in the behaviour of information users that is
being closely monitored. Research on the information-
seeking behaviour of people is growing almost as fast as the
Web itself. An overwhelming number of college students
reported that the Internet, rather than the library, is the

primary site of their information searches.
"Many students are likely to use information found on
search engines and various Web sites as research
materials .... A great challenge for today's colleges is how to
teach students search techniques that will get them to the
information they want and how to evaluate it.... While few
universities require college students to take courses on
information seeking, many include a session on it during
freshman orientation meetings. To better understand how
usage patterns are changing, the Digital Library Federation
(DLF) and Council on Library and Information Resources
(CLIR) commissioI'l:ed Outsell, Inc., to conduct a large-scale
study of over 3,000 1,lndergraduates, graduate students, and
faculty members from a wide range of academic
institutions. The results revealed that the highest
percentage of resources used were print materials, which
students ranked higher than library and non-library
electronic sources in four out of five factors.
The Internet-based sources scored high only on the
factor of convenience, while print materials scored high on
the factors that make a difference in the quality of research
and learning: generating the information for which the
student is looking, the usefulness of the material, its
reliability, and the availability of assistance. There are
differences across disciplines in terms of how information
is used and to what extent new technologies are being
embraced; libraries are recognising these differences as they
offer new services.
There are also differences in the pace of change in
different institutions; institutional cultures that allow, for
example, some flexibility in personnel policies and
practices as well as in budgetary reallocations, are more
accommodating of experimentation and change. In this
rapidly changing environment, libraries must continue
monitoring trends, assessing their performance by seeking
feedback from current users, and anticipating the needs of

future users. To meet and anticipate these needs of the
academic and research communities, libraries are
collaborating as never before, forming partnerships both
within and outside their own institutions, often sharing
control in order to effect critical change. While
circumstances will vary from library to library, the results
of the task force survey demonstrate many ways that
libraries are responding to user expectations.
Libraries are expanding the amount and variety of high-
quality information resources that are directly available to
academic and research users via the Web. They are also
expanding the definition of collections to include ''bom-
digital" content that is neither owned nor licensed by the
library. The varied efforts toward this goal may be
characterised as follows: changing collection development
policies to emphasize the acquisition of electronic
resources, engaging in digitising and electronic publishing
projects, and assuming responsibility for managing and
servicing born-digital content that resides outside the
domain of the library. Even in the electronic and networked
environment, the economic model continues to feature the
library as the central agency on campus that buys and/ or
manages information resources on behalf of the institution.
Libraries have shifted the focus of their collection
development policies to the acquisition of more electronic
content, much of it via consortia.
While aggregate data documenting the quantity of e-
resources currently being made available by libraries has
proven elusive, there is data on the spending trends. Over
the last decade the average percentage of a research
library's materials budget that 18 spent on electronic
resources has grown from 4% to 16%. One hundred six ARL
university libraries report spending more than $132 million
on electronic resources in 2000-01. The vast majority of that
was spent on electronic serials and subscription services,
expenditures which have increased sharply, from just $11

million in 1994-95 to more than $117 million today. To
support this increased spending on electronic content,
libraries have reallocated resources from the purchase of
print.
The extent of such reallocations will vary depending on
institutional user expectations and financial circumstances.
Many libraries have adopted a policy of adding the e-
version of journals when available and are showcasing
titles of e-journals available to users via the Web by making
this a staff priority and by applying software management
tools for titles included in aggregated electronic databases.
The processes for selecting, budgeting, and acquiring
electronic materials are continually changing and greatly
differ from those for the selection and management of print.
The print process is orderly: discrete amounts of money are
allocated by discipline, the marketplace is fairly
predictable, and materials are selected and ordered using
established procedures. By contrast, the processes for
selection and management of electronic resources are
chaotic.
The migration from print to electronic varies in speed
and extent by discipline; electronic products are
interdisciplinary and expensive, giving rise to selection by
committee; projections for future funding are guesswork;
and archiving and content control are problematic. Legal
and negotiation skills are now mandatory. To complicate
matters, decisions are often made through a consortium.
The process for acquiring electronic resources turns the
traditional acquisitions and user service model topsy-turvy.
In building electronic collections, libraries must also
constantly respond to changing publisher behaviour. The
most profound influence on a library's collection
management and access strategies has been the extraord-
inary price increases for scholarly journals, combined with
publishers' use of licensing to define the terms under which
a library may make the content available and to whom. It

is reasonable to speculate that these phenomena, driven by
some of the larger publishers but now employed widely,
make up one of the forces that prompted libraries to blend
the previously distinct operations of collection
management and access.
Some libraries have fully embraced initiatives
stimulated or endorsed by SPARC (the Scholarly
Publishing and Academic Resources Coalition) and other
affordable publishing venues and are using acquisitions
funds as investments in the future of scholarly comm-
unication. These libraries focus acquisitions on publications
from scholarly societies and less expensive publishers,
nonprofit and for-profit, that provide high-quality titles at
affordable prices. These less expensive titles tend to be the
most highly ranked by faculty but labor intensive to obtain,
since each publisher has only a few titles and often lacks
sales and technical staff.
These libraries try to consistently view collection
expenditures as investments, and to seek out publishers
likely to contribute to a sustainable future for scholarly
information. Libraries are also supporting open access
projects that experiment with alternatives to the current
subscription-based funding model or the current joumals-
based publishing model for scholarly communication.
These approaches are seen as those of a good citizen,
especially in an institution whose needs for funding include
many urgent priorities in addition to library needs. These
libraries are investing in initiatives that may help solve the
long-term problem of high prices for journals.
Electronic Publishing
Most libraries are also pursuing a proactive strategy to
increase user access to quality information resources on the
Web by digitising materials and collaborating with others
to publish digital collections on the Web. For example,

libraries are:
Creating new electronic collections by digitally
reformatting existing collections in the library and
institution, including the collections of senior faculty
and retired faculty
Collaborating with other libraries, academic depar-
tments, scholarly and historical societies, museums,
and others to build virtual collections from originals
that are geographically dispersed
Giving greater emphasis to developing and showca-
sing special collections of manuscripts and rare books,
foreign-language resources, images, music, maps and
geospatial data, numeric data sets, and making them
accessible to discovery via the Web and in other
learning environments
Supporting faculty's e-publishing efforts by consulting
and by offering technological infrastructure to develop
the next generation "journal" and to create Web-based
research tools and resources for teaching
Partnering with university presses and other
publishers to develop new publishing models
Partnering with faculty departments and/or societies
to mount discipline-based and/or institutional e-print
servers
Libraries have expanded the traditional view and definition
of collections so that the concept no longer equates with
those materials that the library "owns". The boundaries
have expanded far beyond the print collections on site or
the electronic files mounted locally to include electronic
materials licensed or managed by the library and materials
available through consortia. Increasingly libraries are
taking responsibility for born-digital collections and
developing tools for their management and use.
In a growing number of cases, a library's collection

also includes resources that reside outside the domain of
the library but for which the library takes some respons-
ibility for managing and servicing. An example of the latter
is the movement by libraries to support the development
of institution-wide knowledge management systems,
where an institution's intellectual capital is centralised,
preserved, and made accessible.
In a university, this might include collections of digital
material created by faculty, research staff, and students,
e.g., research collaboratives, graduate student theses,
electronic records of the university's administrative offices,
courseware content, and streaming audio and video
resources generated from courses, conferences, and other
campus activities. The content of print collections continues
to playa critical role in libraries but that role is changing,
as is the way that libraries are managing print collections.
Less frequently used print resources arE' being relocated to
storage with paging/retrieval services; more convenient
resource sharing makes book collections accessible to wider
audiences.
Instead of describing collections as "those things
owned", a better definition may be "information resources
for which the library invests financial resources-directly
or indirectly-to manage, service, or preserve on behalf of
library users, regardless of the locati~m of content."
"Collections" now include resources owned by the library
and those accessed in remote locations; the norm is now an
interdependent mix of ownership and access, with the
location of the material increasingly irrelevant to users.
Libraries are increasingly providing users with services and
tools that enhance access to electronic resources and
support an integrated approach to discovering and
receiving this content-along with library services--in
classrooms, on the "desktop", and on PDAs. In many cases,
libraries deliver unrestricted content and services to users
far beyond those affiliated with the institution. libraries are

pursuing this in many ways, including offering new
services, adapting the role of the online catalog,
experimenting with the Open Archives Initiative (OAI)
metadata harvesting protocol, and writing or adapting
software tools.
Online CQtalogues
It is clear that the role of the online catalog has changed.
That catalog has become one of many databases available
to users; libraries are linking to and from the catalog to
integrate all of these resources. In recognition of this
broader role for the library catalog, some libraries are
considering modifying cataloguing to favour timely access
to a wider variety of formats. One proposal called for
reallocating funds that are devoted to describing books and
journals to materials that are proportionately underrep-
resented in today's catalogs, such as films, music,
photographs, and digital objects. Libraries are also
reconsidering their current efforts to collect and catalog free
Web resources, concluding that these labor-intensive
activities can be avoided by perfecting nascent machine
harvesting and cataloging techniques.
/
A small number of libraries and library organisations
are participating in experiments f{mded by The Mellon
Foundation to test the application of harvesting and search
engine technologies. Using the recently developed OAI
metadata harvesting protocol, these libraries are delivering
information from the "hidden Web" not normally found by
Internet search engines and from databases with retrieval
formats that present special processing or presentation
problems.
Several libraries are worl~jng with vendors to adapt
existing portal software into multifunctional products with
features and services desired by users in research
communities. Other examples of library involvement in
development of software applications are ILL management

systems, instructional tutorials, management of content
and services for digital libraries, and institutional
repositories. Although where information resides matters
less to the user, the library as place-the physical entity-
remains more important than ever and performs a host of
functions vital to learning and research. Much research
library space is busier than ever before.
Library facilities are being reconfigured to provide
space for collaborative learning and research. There are
classrooms and media labs where faculty and librarians
may interact in providing student learning experiences,
group study spaces, and community spaces where students
can meet to discuss ideas. The transformation of library
space has also become an opportunity to attract new
academic collaborations such as writing studios and
academic skills tutoring.
Library space is now seen as learning space on an equal
footing with classrooms and laboratories. Libraries are
being renovated to expand e-access and foster community
by providing electronic classrooms, wireless data networks,
offering laptops, and expanding library hours. Libraries are
establishing spaces called "information commons" where
library reference services are offered jointly with
information technology support. At the same time, libraries
are responding to decreasing use of in-person reference
service by combining service points and shifting resources
to online reference service and online tutorials. To make the
best use of prime real estate, libraries are adopting new
approaches to managing large print collections by using
storage centers with delivery services for less frequently
used materials and engaging in cooperative approaches to
long-term preservation copy retention.
The branch or departmental library remains valuable
but its role too is changing. While access to information and
library services is far less geographically based, branch and
departmental libraries still play a role in development of

community and serve as sites for collaboration. Yet, in the
Internet world there are opportunities to rethink the role of
multiple libraries and their configuration within an
institution. One library reported establishing a program to
replicate the opportunities for personalised services and
contacts that characterise branch library service without the
cost of creating additional branches.
Some of the same opportunities for reorganisation may
present themselves when libraries work in collaboration
with other libraries outside their institutional boundaries.
Libraries are active participants in building awareness
among researchers, faculty, and students of uses of high-
quality content and information technology in teaching and
research. Curriculum review and changing expectations for
teaching faculty present opportunities for libraries to
contribute expertise and resources.
Libraries provide classrooms; training and consulting in
finding and evaluating information; and assistance with
creating electronic theses and dissertations, displaying and
visualising data, publishing journals on the Web, and using
geographic information systems (GIS) and remote sensing.
Recognising that disciplines are adopting new technology
at different rates, librarians are:
Working with faculty to integrate information literacy
skills and digital content into the course curricula
Expanding the number of people reached by offering
basic information literacy and fluency training through
online tutorials
Assisting researchers in acquiring, accessing,
displaying, and visualising spatial and numeric data by
providing training in use of software and partnering
with faculty to provide classroom instruction in its use
Developing systems and instructional tools for
pedagogical use of digital resources, including images
Managing campus instructional technology initiatives

Making Organisational Changes within the Library to
. Innovate and Improve Services
Libraries are embracing change-being willing to change
what libraries do and how it is done-and, as a result, are
reorganising their operations and re-deploying staff to
respond to the new environment. Examples of recent
organisational change in ARL libraries include:
Many libraries report having instituted an integrated
approach to collection management, bringing together
the activities of building, maintaining, and providing
access to library resources in all formats, both in-house
and remote, both acquired and locally created.
Many libraries also report taking interdisciplinary
approaches, training library staff to perform across
traditional roles and to build bridges between the
culture of the library and that of the faculty and
information technology. An example is the creation of
information commons jointly staffed by the library and
information technology, often developing services in
collaboration with faculty.
Others report organisational redesigns to achieve a
system-wide view of services to coordinate communi-
cations with all library users and to deploy staff more
flexibly.
Libraries are experimenting rigorously with
organisational transformation reflecting strategies for
team-based decision making, learning organisations,
and quality management,
Scholarly communication issues

Libraries are assuming leadership within their institutions
to advocate for enduring institution-wide knowledge
management policies and programs that contribute to a
robust system of scholarly communication.
There are specific types of innovation, however, that

seem to be related to library or institution size. There is
anecdotal evidence that library size has been influential in
the development of digital library programs; libraries with
more mature digital library programs tend to be libraries
with larger collections and more staff members. On the
other hand, smaller institutional settings may encourage
innovative collaborative activities, either because of the
financial necessity to join forces to make something happen
or because of the potential in smaller communities for more
interaction across campus units. There is also anecdotal
evidence that the institutional climate for change and the
active support of groups and key individuals within the
institution playa central role in the library's orientation and
opportunities to innovate; to the extent that the size of the
institution impacts this climate, size may be a
contributing-if not driving-factor.
Research libraries' ability and desire to collaborate with
other research libraries, with non-university libraries, or
with other organisations and industries will determine the
success of many library programs. Motivations for
collaborations· vary greatly and include leadership, political
support, the lure of available funding, and the desire to
transform the library, to move it into the digital age. An
increasingly important collaboration is consortial buying of
print and electronic content and services.
Consortial buying and licensing for access to electronic
resources have become central in every research library's
arsenal of strategies for acquiring information more
economically than could be accomplished by acting alone .
. Many libraries find user-initiated consortia} borrowing
combined with rapid delivery a successful avenue for
making their users' access to books as easy as access to
electronic resources. There is evidence that fast delivery of
books within a consortium has increased book use,
especially by undergraduates.
128 Advancement in Ebrary and Information Science
Consortia also offer accompanying services that greatly

expedite the discovery and delivery of non-book content for
users affiliated with libraries participating in the
consortium; some consortia are centrally managing image
data banks, electronic theses and dissertations, and other
digital materials.
Consortial structures vary and reflect the political
framework of the state or region in which they operate and
the institutions they represent. In some cases, a state
legislature appropriates funds centrally to support
consortial activities on behalf of libraries; in others,
individual libraries pay to a common fund.
External funding facilitates the process of individual
institutions' making the adjustments and compromises that
consortia require and tends to support consortia]
relationships among libraries of varying sizes and
characteristics. Consortia funded solely by the participating
libraries tend to be formed of libraries of similar size and
characteristics. Consortia are on the front line of
controversies between libraries and publishers about
pricing, the terms of use defined by licenses, and publisher
practices that make it difficult for a library to cost-
effectively tailor its purchases from the publishers' lists or
to reduce the total amount of money spent with that
publisher.
The tendency of consortia to acquire or license
electronic resources from the larger, commercial publishers
raises concerns that content from scholarly societies or
other smaller publishers will be overlooked. Research
library staff are actively working with their partners in
consortia to assess the benefits and challenges of consortial
licensing and to influence publisher behaviour in favour of
scholarly communication needs.
A closely related factor to collaboration is the degree to
which the organisations involved in a joint effort are willing
to share control, at both the intra- and the inter-institutional
levels. This sharing of control has implications for many

activities such as archiving; consortial buying; expanding
online resources; collaborative efforts with faculty, inform-
ation technology groups, instructional technology centers;
and cooperative publishing projects.
Libraries are promoting team operations to advance
collaborative projects and to encourage a convergence of
different organisational cultures where control is truly
shared. Many research libraries are willing to establish new
partnerships and share control in order to innovate.
Research libraries' ability to recognise their strengths and
their willingness to leverage those strengths influences the
success of library integration into the research institution.
Libraries have a special kind of space on campus, space
that is seen as politically neutral and representative of the
intellectual commons--public-good space. This nature of
space gives libraries leverage to attract and nurture collab-
oration with other parts of the institution. Libraries also
have human talent they may deploy for the institution.
Research library staffs have a wide range of expertise that
can support new approaches to meeting user and
institutional needs.
Many libraries are leveraging their space and human
resources to the advantage of the instihltion, for example,
in creating an information commons, establishing an insti-
tutional repository, or providing campus management of
copyright clearances. A major factor in innoyation is the
degree to which research libraries and their institutions are
willing and able to embrace change. The willingness of the
library leadership (at all levels) to change its vision of what
a library is and does, and to transform library operations
to reflect that vision, obviously influences innovation.
However, the culture of the institution itself also influences
how the library engages and supports change.
Many research libraries find themselves in the
impossible position of being expected by some disciplines
to implement systemic change while continuing to be

responsive to the needs of other disciplines that have
slower rates of adoption of new technology. During this
period of transition, this is a major factor influencing how
libraries implement change and how that change is
received within the institution. The degree to which people
in the institution exhibit "interdisciplinarity" or demo-
nstrate effectiveness in working with the various cultures
of the institution-library, faculty, and information
technology cultures-is another important factor in
innovation. Success in collaboration depends on some
degree of convergence of these cultures.
Convergence or integration of cultures is being
approached in several different ways. Examples include
convergence of cultures in shared space; convergence of
technical and bibliographic expertise; and convergence of
human capital to consolidate efforts, encourage coope-
ration, and provide institution-wide access to the latest
equipment, software, and support as well as a wide and
deep collection of high-quality content. Experience
reported in the task force survey suggests there is a need
to support greater cultural integration. Some libraries
pursue this goal by establishing cross-cultural task forces,
developing cross-training and mentoring programs for
dfective performance'across and outside traditional roles,
adopting position classification schemes that reflect the
changes required of staff, providing opportunities for staff
to cross boundaries, and by giving people time to work
through the process of cultural convergence.
REFERENCES

museums", Archives alld Museum Informatics Technical Report, 1 (1).
Conway, P., Digitizingpreservation.l.ibrary /ollrnal, 1 February 1994,42-
45.
5
Digital Information Resources
For many years a range of software has been available

specifically aimed at information storage and retrieval of
textbased information. Examples of such software include
BRS/Search, CAIRS, CDS/ ISIS, Cardbox Plus, HeadFast,
IdeaList and InMagic. CDS/ISIS is very widely used all
over the world as it is available free of charge to nonprofit
organisations in UNESCO Member States and exists in a
number of language versions.
A regular feature on CDS/ISIS is included in
Information Development. One recent example given of the
use of CDS/ISIS is the DRAiN (Drainage Information
System) project which aims to coordinate information from
various relevant research organisations in Egypt, France,
India, Mexico, Pakistan and Uzbekistan involved in
irrigation and drainage research. For many years
textretrieval software was used to process bibliographic
data, but recent developments in storage technologies have
meant that this software can now be used also for fulltext
retrieval purposes.
The producers of this software have continued to
develop their products and many now run under Windows,
can deal with graphical data as well as text data and can
be used when creating local CDROMs. The directory by
Wood and Moore provides details of about 100 such
packages. A special category of textretrieval software is
personal bibliographic software which may be used by

academic researchers and which offers pre-defined data
structures and pre-defined output structures (to comply
with bibliographic styles adopted by organisations such as
the American National Standards Institute (ANSI» as well
as standard facilities such as Boolean searching and batch
importing of records.
An introductory essay by David Bearman in a
directory of about eighty software packages for use in
archives and museums notes that the 'problem with
archives software has historically been that the market is
too small and diffuse to support a range of products'.
However, he points out that the Internet provides an
exciting domain for archivists with the possibility of setting
up World Wide Web servers of archives holdings which
might include images and sound information as well as
document delivery services.
GEOGRAPHIC INFORMATION SYSTEMS (GIS)
GIS software comprises tools for the collection, analysis,

retrieval and display of spatial information. Technologies
that integrate the management and analysis of this type of
data are being used in a variety of ways for environmental
studies, global change research, transportation planning,
urban planning, marine studies, and so on. As with
bibliographic data, there is a need to share resources
worldwide and the MARC format is being investigated as
one possible solution. One of the projects-Project
Alexandria (not to be confused with the plan for the new
library of the same name at Alexandria in Egypt)-being
funded in the United States as part of the National Science
Foundation (NSF) digital libraries programme, is
developing a system to access spatial data in distributed
databases.
Traditionally the information that can be stored and
retrieved in computer systems used for library and
Digital Information Resources 133
information work has been structured into databases. The

growth of databases has been rapid. There is a wide range
of organisations involved in publishing databases; these
include traditional academic publishers (Oxford University
Press, Elsevier), learned societies (Institute of Physics,
Institute of Electrical Engineers), commercial companies
(Institute for Scientific Information, Derwent), the
computer industry '(Microsoft) and the entertainment
industry (Sega, Disner; Nintendo). Work on quality issues
of publicly available databases has been undertaken by
various online user groups, and Armstrong describes the
concept of 'database labelling', which would provide the
potential user with'/oasic information about the database
and the extent to which the information contained could be
trusted.
As well as the traditional online search services (such
as Dialog, DataStar, STN and ESA-IRS) there are now many
others ways in which libraries and information centres
provide access to publicly available databases for their
users. During the 1980s more spedalist services appeared
and the existing services broadened their scope. FT Profile,
for instance, specialises in the provision of fulltext online
information tailored for the business community. OCLC
entered the online search service field in late 1991 with
FirstSearch, a service designed particularly for end-users.
In the United Kingdom the Joint Information Systems
Committee (JISC} of the funding bodies for higher
education has organised the establishment of centralised
databases for the academic community. By charging a
university a fixed annual fee, searching becomes 'free at
point of use' for researchers, teaching staff and students.
The rise in the use of the Internet has been accompanied by
the development of commercial, consumeroriented online
services. Examples include America Online, Prodigy,
Compuserve, Genie and Delphi which are popular with
end-users as they cover a range of information such as
134 Advancement in LibriUJ" and Information Science
news, general health matters, encyclopedias, business

information and magazines.
GROWTH OF CURRENT ALERTING SERVICES INSTANT ARTICLE SUPPLY
(CAS-lAS)
CAS-lAS provide access to the table of contents of several

thousand current journals and also provide means of
transmitting requested articles. By making use of CAS-lAS
services some libraries are moving from a just-in-case mode
of operation with respect to serials holdings to a just-in-
time mode. Examples of CAS-lAS include OCLC
FirstSearch (with the Article 1st and Contents 1st
databases), UnCover (now owned by Knight-Ridder, which
also owns Dialog and DataStar) and Inside Information
(from the British Library).
The early online search systems such as Dialog,
DataStar and ESA-IRS relied heavily on command
languages for carrying out online searches, and this helps
to explain why these systems were mainly searched by
specially trained intermediaries. With the developments of
OPACs and CDROMs in the 1980s, an interface was needed
which could be fairly intuitive with searchers and needed
no special training. The technique adopted was to provide
the searcher with a menu of options on the screen so that
an appropriate option could be selected which would then
lead to another set of options or to the data. On the positive
side, menus can be self-explanatory, easy for the novice
searcher and give a structure to the search. However, they
may be slow to work through and irritating for frequent
searchers.
With the major move to a Windows environment,
many producers of search systems are -developing their
software to work in Graphical User Interface (GUI) mode.
When a screen of potentially clickable items is presented it
is not always obvious to the novice searcher where to go
next in this two-dimensional environment and so there are
Digital Information Resources 13S
many aspects of screen design that need to be borne in mind

by the interface developer. Schneiderman outlines eight
golden rules for any designer of a search interface.
Paraphrased these are: strive for consistency, enable
frequent users to take shortcuts, offer informative feedback,
give action sequences a logical structure, offer simple error
handling, permit ea~y reversal of actions taken, make the
user feel in control and reduce short-term memory load.
Most retrieval software is based on the user combining
chosen terms or phrases using the standard Boolean
operators AND, OR, NOT in the search statement. Over the
years various alternative techniques have been developed
by researchers; these are referred to in the literature as best
match, nearest neighbour, probabilistic retrieval, fuzzy sets,
relevance ranking or ranked output. Some of these ideas are
now appearing in commercial services such as Personal
Library Software's Personal Librarian, Dialog's TARGET,
and FREESTYLE developed for the LEXIS legal service and
the NEXIS news service. Evaluating the performance of
various information retrieval systems has been a topic of
interest for many researchers over the years. A major
initiative, known as the Text Retrieval Conference (TREC),
involves a number of research groups from around the
world testing their information retrieval techniques on the
same databases of fulltext items.
There are many potential problems related to the use
of information technology in archive, library and
information work, and these can result in useless,
expensive and inappropriate systems. There have been
examples of libraries where much time, effort and money
is spent on the latest software and hardware but-little time
has been spent on revising work practices or ensuring that
workstations are ergonomically designed for the people
who use them. Only about 10% of the literature on library
automation covers human aspects, whereas about 80% of
the problems that arise in automation projects are thought
to be caused by human or organisational matters. It is most

important to consider people at all levels when setting up
any form of computer system in a library.
The real needs of t1;te users must be taken into account
in the design; library staff and computer staff need to know
enough about each others' areas of expertise so that they
can communicate properly; systems librarians and network
managers need to have suitable job descriptions so that they
do not become too 'technostressed'; workstations need to
be suitably designed for their likely users; users need to
have realistic expectations of the new system; and library
staff need to be kept informed in an appropriate way.
Traditional organisation structures may need to be adapted
with the introduction of new systems.
In many countries where legislation is in place to cover
the health and safety aspects of, say, working with VDU
(Visual Display Unit) screens, managers are being forced to
think about the human aspects of automation. There is
much to be done in making sure that human factors are
considered at all stages in the design of a computer system
for use in libraries or information units. The impact on
people of electronic libraries (IMPEL) is one of the
supporting studies to the e-Lib project.
All librarians and information workers must be able to
cope with the vast quantity of electronic information now
available as well as to advise their users how to cope with
it. Del Castillo reports on the use of information technology
in libraries in the Philippine", and notes that problems are
due to lack of know-how, lack of direction, lack of funds
and a weak telecommunication infrastructure. Some
manufacturers make their products available on the world
market and have invested efforts in translating the interface
dialogue of their systems into various languages. TINLIB,
for instance, is available in about twenty-five different
languages and is used in libraries on five continents.
ALEPH, from Israel, can handle several different scripts
within a single record and has been used in many East

European countries, including the Czech Republic,
Hungary, Slovakia and Romania.
QUANTITIES OF INFORMATION
Library, information and archival work generally deals

with very large quantities of information. Regardless of
whether information sources are in printed or electronic
formats, space is always a key issue. Mass storage is
required to meet:
The need for a large-volume digital storage system for
archival management.
The need to provide users with immediate access to the
rapidly growing volume of data and information that
is stored in digital information systems and is likely to
be distributed on optical media in the future.
The need to provide users with access to multimedia
information quickly and interactively through the
integration of technologies.
The need to transfer a large volume of data and/ or files
from one system to another.
Traditionally libraries have used conventional media like
film, microfilm and microfiche to store iI\formation
materials, but they are bulky and rather expensive. With the
advent of computer and optical technologies, mass storage
has shifted mostly to electronic media. There are several
different technologies available for mass storage on
magnetic tapes, high-density floppy disks, portable hard
disks with a capacity of over 2 GB, and optical disks. Bu'
it is optical media that are the primary ones for mas.
storage. Because of this, the following section will explor.~
further the different types of optical media.
The various types of optical media offer differeJ l1
storage densities, media formats, transfer rates, capabiliti.~"
and compatibility among commercial vendors' products fn
the last decade alone, a flood of new media and

applications-CDROM, laser videodisks, write-once and
read-many devices, erasable disks, to name just a few-
have been introduced, promoted and utilised. There is a
wide range of optical alternatives available to provide the
high-est application flexibility to end-users.
Optical media can be grouped into three major
categories:
Read-only media.
- Write-once and read-many.
- Erasable.
Under each of these major categories, a multitude of optical
storage media can be found. For more detailed information
on each of these optical media. All of them are essential for
multi-media application developments.
To enter the interactive multimedia world, a minimum
equipment configuration should be more than the bare
minimum described earlier. It should consist of the
following components:
A computer system with a minimum of 4 MB of RAM.
- A 350 MB hard disk drive.
-- A 14.4 kbps modem (fax modem would be preferable).
-- - A double-speed CDROM drive.
A portable videotape recorder.
A fixed videotape recorder capable of being connected
to a computer output either directly or through an
appropriate AV card inserted into one of the bus slots
in the computer.
A television monitor for use during taping and
playback.
A scanner.
Additional hardware in the form of an LCD display
panel or LCD projection system is highly recom-

mended.
The minimum software configuration for using multimedia
products is rather low, since most products have plug-and-
play capabilities with very few requirements other than the
installation processes. However, the following are varying
levels of software requirements for producing simple
multimedia applications:
A basic editing software package, such as those
available from Adobe, Avid, Radius and others.
An intermediate-level software system that would
include all of the above plus a free-standing audio
editing software package, a two-dimensional
modelling or rendering software package and a
graphic/titling package such as Adobe Photoshop.
An advanced-level software system that would include
all the above plus an advanced-level three-dimensional
modelling or rendering software package, and an
authorware package for output to hard, floppy disk or
to read/write compact disc.
The equipment cost varies greatly from one model to
another, and from one configuration to another. Thus it is
best to check with the vendors for current price
information. However, it is safe to estimate that a PC
Pentium multimedia system can be acquired from US$1,500
to $4,000, and a Macintosh Power Mac from US$2,000 to
$5,500, depending on the system model, RAM size, hard
disk size, and connected peripherals. Whenever possible,
efforts should be made to acquire a system with as large a
RAM and hard disk storage capacity as pOSSible. The cost
of software also varies greatly, ranging from less than $100
to over $1,000. However, powerful software like Adobe
Photoshop costs about $600 and Macromedia's Director
about $900.
The abundant multimedia tools are to be used for

creating multimedia applications. Yet how one goes about
developing multimedia depends on the nature of the
application and how it will be viewed and used. Although
there is no multimedia development formula, the process
does follow a series of basic steps. These steps include:
Concept.
- Content and interface.
- Product.
Planning and design is always the most important
component of any development, regardless of whether it is
technology-related or not. Usually at least half to two-thirds
of project effort is devoted to this phase. In other words, the
better a project is planned and designed, the more likely it
is that it will be successful, effective, efficient and useful.
For a multimedia application, after the idea is conceived
and a conceptual framework developed, the planning
process will have to go into the more minute details of plans
and design, so that these will lead to the successful
implementation of the application development. It is
important to stress the importance of project design. This
includes both the application design and interface design.
Remember that an application can deal with 'goldmine'
source materials which are rich, relevant and essential, but
if the presentation is not well thought out and the
interactive feature of multimedia technology is not fully
utilised, then the richness of the available knowledge base
will not be fully exploited. On the other hand, even if the
presentation of the multimedia application is well designed
conceptually, it can still fail if the interface design is either
poor or uninviting/confusing to the user. Few would be
willing or able to visit the 'goldmine'.
In designing multimedia applications, it is essential to
realise that the process of linking multimedia information
in a hypelweb environment can be both confusing and
Digital Information R.esources 141
disorienting. Beyond the many demanding technical

elements that allow multimedia to come together, there is
a sense of transcendentalness that occurs during the
production process. While combining massive amounts of
information, one commonly observes coincidences and
encounters baffling developments. Great care must be
taken to separate the intelligent realities from the illusions.
Integrating the complex threads of interactive electronic
communication requires an emphasis on the relationship
between the designer / prod ucers and their reader /
listener/user/viewer. New access paths to source material
and new procedures for protection and licensing must be
developed. Industry standards must emerge to facilitate
diversity and universal connectivity.
After the planning, one has to determine what kind of
information is to be included and published, and then
prepare and process this information. Information sources
can include all formats-textual, still images and motion
videos, sounds and animation. This step involves
information-gathering and preparation, and electronic
management. The former determines the format to be
chosen for inclusion, and the latter considers how to turn
all the desired information to electronic forms and also how
to manage them.
The existing text information can be available in both
printed and electronic forms. For the printed information,
all three popular ways of conversion to an electronic
format-keyboarding, imaging (scanning), and optical
character recognition-will be used. For the electronic files,
once the delivery platform is decided, electronic text files
will have to be converted for the chosen platform.
Hardcopy images will have to be scanned and stored
in acceptable format for multimedia applications. The most
popular format is JPEG ijoint Photographic Experts Group),
but PICT (a Macintosh graphics file format, closely related
to peT and commonly supported in Macintosh format) and
TIFF (Tag Image File Formats) are also popular with

multimedia application software. When multimedia is
moving closely with Internet and World Wide Web applic-
ations, GIF and JPEG with a very high rate of compression
are preferred.
Scanned images often need to be processed and
enhanced by the use of software like Adobe Photoshop.
Video. Using the videocapturing software via a
videocapturing board, one can convert video sources from
television, video recorder and video camera to digital video
and save them in popular formats such as QuickTime
movies (in both platforms), or AVI (Audio-Video Inter-
leaved) for PC applications. The standard for digital video
is MPEG (Moving Pictures Experts Group). Again,liked the
scanned images, captured digital videos will have to be
edited by the use of software tools like Macromedia' s
Director, Adobe's Premier or Avid's VideoShop.
Through the use of sound recording software and a
sound recorder, sound sources from tapes, cassettes and
video can be converted to digital sound files, which can also
be manipulated and enhanced by means of existing tools.
Non-text files consume a large amount of storage space
(for example, one colour image at screen resolution can
easily take up about 1 MB of disk space or more); thus the
issue of size becomes very significant, and hence comp-
ression and decompression. Compression is a widely
employed technique to reduce the size of large files without
appreciably changing the way a viewer sees the images or
digital videos or hears the sounds. Once compressed, the
file must be decompressed before it can be used. Compr-
ession and decompression can be accomplished by software
alone or through the use of a combination of software and
hardware. Take image as an example: compression
software analyses an image and finds ways to store the
same amount of information using less storage space.
Compression hardware usually consists of a ROM chip
with built-in compression routines for faster operation, or

a coprocessor chip that shares the computing load with the
computer's main processor.
There are different levels of software compression:
Lossless compression: no information lost through the
compression process. In this way, the file size is
generally not reduced much.
Lossy compression: through the compression, some
information is lost. This will reduce the file size more
dramatically than the lossless one.
The most common method for compressing image is called
JPEG, which is a standard way of reducing image file size
that discards information which could not be detected
-easily by the human eye. In compressing the digital video,
the standard is MPEG. MPEG is an industry standard for
moving images that uses interframe compression (or frame
differencing) as well as compression within frames. There
are different MPEG standards, such as MPEG I, which
optimises for data rates in the 1 to 1.5 MB/sec range (the
common transfer rate of CDROM drives and T-1
communications links), and MPEG II, which optimises for
data rates above the 5 MB/sec rate.
There are many compelling reasons for using
multimedia for education, training, information delivery,
business, entertainment, etc. First of all, the power of
pictures is enormous. Only recently, with the advent of
multimedia technologies, have we been able to tap the
undeniable power of visual images and other non-textual
information sources. But equally appealing for multimedia
technology is the power of interactivity-a concept
extended from hypertext as discussed in the introductory
section. Through the ages, information has been presented
and absorbed in a linear fashion. Interactive multimedia
brings the incredible freedom to explore a subject area with
fast links to related topics.
REUSABLE MODEL COMPONENTS OF LIBRARY
Large-scale digital library (DL) designers and engineers

face significant technical challenges in providing adequate
response time to users. Substantial advances in hardware
technology make the CPU faster and faster every year.
However, I/O hardware technology has not kept up. I/O
speed continues to be the main hardware performance
bottleneck. Better designs and better tuned configurations
may help meet this performance challenge. Building digital
libraries with excellent performance requires design
evaluation and what-if analysis for which simulation is a
very powerful tool.
Although developing a simulation model of a complex
DL is an onerous task, recent advances in object-oriented
visual simulation enable component-based model
development with almost no programming. Component-
based development is becoming increasingly important.
The NIST advanced technology program on component-
based software lists numerous benefits of component-based
development. Such benefits can be realised in visual
simulation of digital libraries by way of using the Visual
Simulation Environment (VSE). The VSE technology has
been developed over the last decade at Virginia Tech and
Orca Computer, Inc. under $1.2 million research funding
from the U.S. Navy. VSE enables discrete-event, domain-
independent, object-oriented, picture-based, component-
based, visual simulation model development and execution
for solving complex problems.
NCSTRL (pronounced ancestral") is a distributed
/I
digital library of computer science technical reports from

universities and institutions throughout the world. Log
data collected at three NCSTRL servers, namely, Cornell,
Virginia Tech and UCLA, were used to probabilistically
characterise the query inter-generation times for a user
population. Data on query inter-generation times was
collected over a period of three days. We used the Expert
Fit software product to fit a probability distribution to the

collected data. The log data collected at the three servers
also was studied in an effort to probabilistically characterise
the server response times to queries. However, log entries
did not contain sufficiently precise data that could be used
for this characterisation.
Therefore, the triangular probability distribution was
used to characterise this stochastic phenomena in the
absence of sufficient data. The transmission time of requests
from one server to another was determined by examining
log entries. By collecting performance data at different
times of the day, we characterised the network delay at
different servers. Broadly, these delays were classified into
two categories: normal and heavy. The library contains nine
reusable model components. Each component is described
below. Each instantiated component or an object in VSE is
given a unique identification called object reference. This
reference is used for message passing.
Dienst Server
This component models the behavior of a Dienst server that
receives a user query, searches its own repository of
technical reports, and simultaneously sends the query to all
other Dienst and MIS servers in its region. If some server
does not respond within the timeout limit, the Dienst server
then sends the query to its region's backup server, if any.
When all contacted servers respond with query
results, the Dienst server returns the merged results to the
user. This component belongs to the VSE class DienstServer
and exhibits the following behavior (methods).
replicationStarted: sends the reference (identity) of the
Dienst server to the region it is contained in at
simulation start-up. The region collects all the Dienst
server references within it and sends the list to each
server.
receiveServerList: returns a list of references of all Dienst

servers within a particular region.
dynObjArrived: indicates the arrival of a query
submitted by the user. If the server status in the model
is found to be operational, a IIsearchFor" message is
sent to all other Dienst servers, the MIS server within
the region and the system CIS server (if any). The
server also sets the number of replies expected to the
query. It schedules a IltimeOut" event for that query,
whereupon it checks if all expected responses for that
query are received.
timeOut: checks if all the expected responses to a
particular query are received. If not, it accesses the
Backup server of the region (if any). It schedules
another IltimeOut", whereupon a check is made to see
if the Backup server has responded within the time
limit.
backupServerTimeOut: checks if the Backup server of the
region responded to the query. If not, the Dienst server
sends all the responses it has been able to collect so fal
to the user and informs the user of the servers that
failed to respond within the time limit.
searchFor: is sent to a Dienst server by another Dienst
or Lite server within the region as part of a distributed
search for the query received by the other server. If the
server receiving the message is up, it schedules the
arrival of a response to the query at the sending server
after a time which is the sum of the search time and the
network delay between the two servers.
replyToQueryArrived: is sent to a Dienst server by all
other Dienst servers, the MIS server within the region
and the system CIS server (if any) in response to a
search request made by the server. It indicates the
arrival of a reply to the search from the remote server~
Lite Server
This component models all of the Dienst server's behavior
except that it does not perform a search on its own since
it does not possess a repository of technical reports. It
belongs to the VSE class LiteSite and exhibits the following
behavior (methods).
Lite server to the region it is contained in. The region
collects all the Lite server references within itself and
sends them to each server.
serversWithinRegionAre: returns a list of references of all
Dienst, MIS, Backup (if any), and CIS (if any) servers
within a particular region.
dynObjArrived: indicates the arrival of a query
submitted by the user. The server checks if it is
operational by generating a uniform random number
between 0 and 1 and comparing it with the probability
of its going down, and if so, it then sends a "searchFor"
message to all Dienst servers, the MIS server within the
region and the system CIS server. It also sets the
number of replies expected to the query. It schedules
a "timeOut" event for that query, whereupon it checks
if all expected responses for that query have been
received.
timeOut: checks if all the expected responses to a
particular query have been received. If not, it accesses
the backup server of the region (if any). It schedules
another "timeOut", whereupon a check is made to see
if the backup server has responded within the time
limit.
backupServerTimeOut: checks if the backup server of the
region has responded to the query. If not, the Lite
server sends all the ret'ponses it has been able to collect •
so far to the user and informs the user of the servers
that failed to respond within the time limit.
replyToQueryArrived: This message is sent to ~ Lite

server by all the Dienst servers, the MIS server within
the region and the system CIS server (if any) in
response to a search request made by the Lite server.
It indicates the arrival of a reply to the search from the
remote server.
Merged Index Server (MIS)

This component models a Merged Index server which
possesses the indices of the Dienst servers of the NCSTRL
regions other than its own. It is accessed by the Dienst and
Lite servers in its region. It performs a search on its indices
and returns the result to the requesting server. It belongs
to the VSE class MISServer and exhibits the following
behavior.
replication Started: sends the reference (identity) of the
MIS server to the region it is contained in. The region
then sends the MIS server reference to all other servers
within the region.
servers WithinRegionAre: provides a list of references of
all Dienst servers within a particular region.
allMISServersAre: provides a list of references of all MIS
servers in the system.
searchFor: is sent to the MIS server by a Dienst or Lite
server within the region as part of a distributed search
to a query received by the Dienst or Lite server. If the
MIS server is up, it schedules the arrival of a response
to the query at the sending server after a time, which
is the sum of the search time and the network delay
between the two servers.
Central Index Server (CIS)
This component models a Central Index server which
possesses the indices of all the Lite servers. It is accessed
by the Dienst and Lite servers. It belongs to the VSE class

BackUp and exhibits the following behavior (methods).
CIS to the region it is contained in at simulation start-
up.
searchFor: is sent to the CIS server by a Dienst or Lite
server within the region as part of a distributed search
to a query received by the Dienst or Lite server. If the
CIS server is up, it schedules the arrival of a response
to the query at the sending server after a time, which
is the sum of the search time and the network delay
between the two servers.
Backup Server
This component models the behavior of a Backup server
which possesses the indices of all the Dienst and MIS
servers in its region. It is accessed by a Dienst or Lite server
when the server times out on a query. The Backup server
conducts the search over its indices and returns the results
to the requesting server. It belongs to the VSE class BackUp
and exhibits the following behavior (methods).
Backup server to the region it is contained in. The
region then sends the Backup server reference to all
other servers within the region.
servers WithinRegionAre: provides a list of references of
all Dienst servers within a particular region.
searchFor: is sent to the backup server by a Dienst or Lite
server within the region if it times out on a response
to a search from one or more Dienst or MIS or CIS
server in the system.' If the backup server is up, it
schedules the arrival of a response to the query at the
sending server after a time, which is the sum of the
search time and the network delay between the tWo
servers.
ELECTRONIC INFORMAnON RESOURCES
An "electronic resource" is defined as any work encoded

and made available for access through the use of a
computer. It includes both online data and electronic data
in physical formats (e.g., CD-ROM). To avoid confusion
with these terms as used in the copyright process, online
will refer to intangible works; physical to a tangible work.
The term "acquire" refers to any electronic resource, online
or physical, which the Library receives though its various,
typical acquisitions processes, or which the Library
provides access to through official contractuaC licensed, or
other agreements; any of these electronic resources mayor
may not be owned by or housed at the Library. "Collect"
refers to electronic resources owned by the Library and
selected for the permanent collection, including works
created by the Library.
It may also include works stored elsewhere for which
the Library has permanent ownership rights. "Link" refers
to pointers from the Library's internet resources or
bibliographic records to the Library and non-Library
electronic resources, created and maintained by Library
staff for a variety of purposes; "link" is not an act of
acquiring, and electronic resources linked do not
necessarily constitute an acquisition by the Library. The
Library collects materials in many formats to support its
universal collections. This policy is intentionally general in
order not to restrict the collecting of needed materials and
to allow the Library to make these resources available as
technology changes. It is the Library's policy with electronic
works, as with all others, to obtain them through copyright
unless they are not subject to deposit under sections 407 or
408 of the copyright law.
Selection of works for the collection depends on the
subject of the item as defined by the collections policy
statement for the subject of the work, regardless of its
format. Formats include home pages, Web sites, or internet

sites required to support research in the subject covered.
The Recommending Officer responsible for the subject,
language, or geographic area of the electronic resources is
responsible for recommending these materials. Electronic
editions of audio-visual materials, prints, photographs,
maps, or related items are also covered by the Collections
Policy Statements for their appropriate formats. The criteria
used to evaluate electronic resources do not greatly differ
from those used for books or materials in other formats. As
with more traditional formats, the cost of the work and the
requirements of serving, cataloging, storing, and
preserving must be considered in the decision.
The Library selects electronic works for its permanent
collections which rank high on the following list of criteria:
usefulness in serving the current or future informational
needs of Congress and researchers, reputation of the
information provider, amount of unique information
provided, scholarly content, currency of the information,
frequency of updating, and ease of access. Priority is given
to resources containing information not otherwise
available. The Library will facilitate access to various kinds
of electronic resources which it does not maintain in its own
collections by means of bookmarked links, special catalog
records, or index systems. The Library will not consider
these resources part of its permanent collections and will
not archive them.
AUTOMATED INFORMATION SERVICES
The term automated digital library" can be used to

1/
describe a digital library where all tasks are carried out

automatically. Computer programs substitute for the
intellectually demanding tasks that are traditionally carried
out by skilled professionals. These tasks include selection,
cataloguing and indexing, seeking for information,
reference services, and so on. The common theme is that
these activities require considerable mental activity, the

type of activity that people are skilled at and computers
find difficult. Automated digital libraries should not be
confused with library automation, which uses computing
to reduce routine tasks in conventional libraries.
In some ways, digital libraries are very different from
traditional libraries, yet in others they are remarkably
similar. People do not change because new technology is
invented. They still create information that has to be
organised, stored, and distributed. They still need to find
information that others have created, and use it for study,
reference, or entertainment. However, the form in which
the information is expressed and the methods that are used
to manage it are greatly influenced by technology and this
creates change. Every year, the quantity and variety of
collections available in digital form grows, while the
supporting technology continues to improve steadily.
Uumulatively, these changes are stimulating
fundamental alterations in how people create information
and how they use it. To understand these forces requires
an understanding of the people who are developing the
libraries. Technology has dictated the pace at which digital
libraries have been able to develop, but the manner in
which the technology is used depends upon people. Two
important communities are the source of much of this
innovation. One group is the information professionals.
They include librarians, publishers, and a wide range of
information providers, such as indexing and abstracting
services.
The other community contains the computer science
researchers and their offspring, the Internet developers.
Until recently, these two communities had disappointingly
little interaction; even now it is commonplace to find a
computer scientist who knows nothing of the basic tools of
librarianship, or a librarian whose concepts of information
retrieval are years out of date. Over the past few years,
however, there has been much more collaboration and

understanding. Partly this is a consequence of digital
libraries becoming a recognised field for research, but an
even more important factor is greater involvement from the
users themselves.
Low-cost equipment and simple software have made
electrorric information directly available to everybody.
Authors no longer need the services of a publisher to
distribute their works. Readers can have direct access to
information without going through an intermediary. Many
exciting developments come from academic or professional
groups who develop digital libraries for their own needs.
Medicine has a long tradition of creative developments; the
pioneering legal information systems were developed by
lawyers for lawyers; the web was initially developed by
physicists, for their own use.
Technology influences the economic and social aspects
of information, and vice versa. The technology of digital
libraries is developing fast and so are the financial,
organisational, and social frameworks. The various groups
that are developing digital libraries bring different social
conventions and different attitudes to money. Publishers
and libraries have a long tradition of managing physical
objects, notably books, but also maps, photographs, sound
recordings and other artifacts. They evolved economic and
legal frameworks that are based on buying and selling these
objects.
Their natural instinct is to transfer to digital libraries
the concepts that have served them well for physical
artifacts. Compu ter scientists and scientific users, such as
physicists, have a different tradition. Their interest in
digital information began in the days when computers were
very expensive. Only a few well-funded researchers had
computers on the first networks. They exchanged
information informally and openly with colleagues,
without payment. The networks have grown, but the
tradition of open information remains. The economic

framework that is developing for digital libraries shows a
mixture of these two approaches.
Some digital libraries mimic traditional publishing by
requiring a form of payment before users may access the
collections and use the services. Other digital libraries use
a different economic model. Their material is provided with
open access to everybody. The costs of creating and
distributing the information are borne by the producer, not
the user of the information. Traditional libraries are a
fundamental part of society, but they are not perfect. Can
we do better? Enthusiasts for digital libraries point out that
computers and networks have already changed the ways in
which people communicate with each other.
In some disciplines, they argue, a professional or
scholar is better served by sitting at a personal computer
connected to a communications network than by making a
visit to a library. Information that was previously available
only to the professional is now directly available to all.
From a personal computer, the user is able to consult
materials that are stored on computers around the world.
Conversely, all but the most diehard enthusiasts recognise
that printed documents are so much part of civilisation that
their dominant role cannot change except gradually.
Digital Libraries Benefits

The final potential benefit of digital libraries is cost. This
is a topic about which there has been a notable lack of
hard data, but some of the underlying facts are dear.
Conventional libraries are expensive. They occupy
expensive buildings on prime sites. Big libraries employ
hundreds of people-well-educated, though poorly paid.
Libraries never have enough money to acquire and
process all the materials they desire. Publishing is also
expensive. Converting to electronic publishing adds new
expenses. In order to recover the costs of developing new

products, publishers sometimes even charge more for a
digital version than the printed equivalent.
Today's digital libraries are also expensive, initially
more expensive. However, digital libraries are made from
components that are declining rapidly in price. As the cost
of the underlying technology continues to fall, digital
libraries become steadily less expensive. In particular, the
costs of distribution and storage of digital information
declines. The reduction in cost will not be uniform. Some
things are already cheaper by computer than by traditional
methods. Other costs will not decline at the same rate or
may even increase. Overall, however, there is a great
opportunity to lower the costs of publishing and libraries.
Lower long-term costs are not necessarily good news
for existing libraries and publishers. In the short term, the
pressure to support traditional media alongside new digital
collections is a heavy burden on budgets. Because people
and organisations appreciate the benefits of online access
and online publishing, they are prepared to spend an
increasing amount of their money on computing, networks,
and digital information. Most of this money, however, is
going not to traditional libraries, but to new areas:
computers and networks, web sites and webmasters.
Publishers face difficulties because the normal pricing
model of selling individual items does not fit the cost
structure of electronic publishing. Much of the cost of
conventional publishing is in the production and
distribution of individual copies of books, photographs,
video tapes, or other artifacts.
Digital information is different. The fixed cost of
creating the information and mounting it on a computer
may be substantial, but the cost of using it is almost zero.
Because the marginal cost is negligible, much of the
information on the networks has been made openly
available, with no access restrictions. Not everything on the
world's networks is freely available, but a great deal is open

to everybody, undermining revenue for the publishers.
These pressures are inevitably changing the economic
decisions that are made by authors, users, publishers, and
libraries. Early information services, such as shared
cataloguing, legal information systems, and the National
Library of Medicine's Medline service, used the technology
that existed when they were developed. Small quantities of
information were mounted on a large central computer.
Users sat at a dedicated terminal, connected by a low-speed
communications link, which was either a telephone line or
a special purpose network. These systems required a
trained user who would accept a cryptic user interface in
return for faster searching than could be carried out
manually and access to information that was not available
locally.
Such systems were no threat to the printed document.
All that could be displayed was unformatted text,~usually
in a fixed spaced font, without diagrams, mathematics, or
the graphic quality that is essential for easy reading. When
these weaknesses were added to the inherent defects of
early computer screens-poor contrast and low
resolution-it is hardly surprising that most people were
convinced that users would never willingly read from a
screen. The past thirty years have steadily eroded these
technical barriers. During the early 1990s, a series of
technical developments took place that removed the last
fundamental barriers to building digital libraries.
REFERENCES
Bearman, D., "Optical media: their implications for archive~ and

museums", Archives and Museum 11lforlllatics Technical Report, 1 (1).
Conway, P., Digitizing preservation. Libra'"!1 Journal, 1 February 1994, 42-
45.
6
Multimedia Systems in Libraries
The proper concept of a digital library seems hard to

completely understand and evades definitional consensus.
Different views and perspectives have led to a myriad of
differing definitions. Digital libraries integrate findings
from disciplines such as hypertext, information retrieval,
multimedia services, database management, and human-
computer interaction. The need to accommodate all these
characteristics complicates the understanding of the
underlying concepts and functionalities of digital libraries,
thus making it difficult and expensive to construct new
digital library systems. Designers of digital libraries are
most often library technical with little to no formal training
in software engineering, or computer scientists with little
background in the research findings about information
retrieval or hypertext.
Thus, digital library systems are usually built from
scratch using home-grown architectures that do not benefit
from digital library and software design experience.
Wasted effort and poor interoperability can therefore
ensue, raising the costs of digital libraries and risking the
fluidity of information assets in the future. The broad and
deep requirements of digital libraries demand new models
and theories in 'order to understand better the complex
interactions among its several components. However,
though the necessity for such an underlying theory has long
been perceived and advocated, little if any progress has

been made towards a formal model or theory for digital
libraries. Formal mathematical models strengthen common
practice. Their lack leads to diverging efforts and has made
interoperability one of the most important problems faced
by the field.
As a matter of fact, it is not surprising that most of the
disciplines related to digital libraries have formal models
that have steered them well: programming languages,
relational databases, hypertext, multimedia, and
information retrieval. There are five formalisms-streams,
structures, spaces, scenarios, and societies (SS)-as a
framework for providing theoretical and practical
unification of digital libraries. These formalisms are
important for making sense of complexity and can
ultimately serve as an aid for designers, implementers, and
evaluators of digital libraries. These abstractions work with
other known and derived definitions to yield a formal,
rigorous model of digital libraries. This matter is organised
as follows. Formal models and theories are crucial to
specify and understand clearly and unambiguously the
characteristics, structure, and behaviour of complex
information systems.
A formal model abstracts the general characteristics
and common features of a set of systems developed for
similar problems, and explains their structures and
processes. Furthermore, formal models for information
systems can be used as a tool for the design of a real system
while providing a precise specification of requirements
against which the implementation can be compared for
correctness. Thus, most of the current classes of information
systems have some underlying formal model.
Databases: The relational model is the most established
formal model for databases. In this model, entities of
the real world are modelled as fixed sequences of
attribute values, called relations, which are actually
Multimedia Systems in Libraries 159
subsets of the Cartesian product of the domain set of

each attribute. Normalisation encompasses a set of
operations designed to divide complex relations into
ones with simple domains. A specific algebra describes
operations in this model; the relational calculus
provides an alternative formulation. Object-oriented
databases have become the focus of much current
database research and development efforts due to the
limitations of the relational model regarding more
complex applications, like multimedia and geographic
information systems. One of the first developments of
a formal framework for object databases explicitly
divided them into a structural object model and a
behavioural layer. The structural model is described as
a directed graph whose nodes can be simple values or
other abstract objects. Higher-order logic constructs
describe the behaviours of the classes, methods, and
inheritance.
Information retrieval: Formal models for information
retrieval (IR) have arisen since the mid-sixties to
undergird a primarily empirical approach. The three
classic models in information retrieval are called
boolean, vector, and probabilistic. The boolean model
is set theoretic and is the basis for many IR systems.
Documents are seen as sets of terms. A query is
represented as a logical expression built on terms and
some combination of the logical operators and, or, and
NOT. Searching is carried out by returning documents
that have combinations of terms satisfying the logical
constraints of the query, with supplementary facilities
that allow proximity and truncation searching.
The boolean model, in its pure form, has inherent
limitations for searching textual documents: boolean
queries are notoriollsly difficult to write and modify;
allow little control over the size of the result set;and
more importantly disallow the production of ranked
output. In the vector space model, terms used to index

documents are considered as coordinates of a
multidimensional space. Retrieval involves ranking the
document vector space with respect to the query based
on some similarity function such as the cosine measure
between vectors.
Hypertext and Multimedia: Hypertext systems provide
the foundations for a complementary activity of
retrieval called browsing, characterised by the lack of
clearly defined objectives and whose purpose can
change during the interaction with the system.
Hypertext systems have increasingly incorporated
multimedia content, leading thus to the concept of
"hypermedia systems". Multimedia and hypertext/
hypermedia information systems also have received
formal treatments. Lucarella and Zanzi present a
formal graph-based object model of a multimedia
database and describe both the underlying theory and
the design and implementation of the Multimedia
Object Retrieval Environment (MORE) system. The
Dexter model for hypertext abstracts the dynamic and
run-time aspects of hypermedia systems as well as the
data storage layer. Nevertheless the Dexter model has
been shown not to be sufficient to handle the so-called
"hypermedia-in-the-Iarge," required when systems are
integrated and expanded through time. The Dexter
model also is not sufficient for multimectia, and was
extended and implemented as the Amsterdam model
by adding temporal logics, events, and activations.
These models are important to digital libraries because
the DL itself, especially in "b.::owse" mode, can be
presented as a large hypertext of nodes that are the
documents contained by the library.
Digital Libraries: Surprisingly, formal models for digital
libraries are missed in the literature and one could
conjecture that is due to the previously argued

complexity of the field.
Streams are sequences of elements of an arbitrary type. In
this sense, they can model both static content, as textual
material, and dynamic content, as in a temporal
presentation of a digital video or time and positional data
for a moving object. A dynamic stream represents an
information flow -a sequence of messages encoded by the
sender and communicated using a transmission channel
possibly distorted with noise, to a receiver whose goal is to
reconstruct the sender's messages and interpret message
semantics. Dynamic streams facilitate communication in
digital libraries, and are thus important for representing
whatever communications take place in the digital library.
Examples of dynamic streams and their applications
include video-on-demand, filtering and routing of streams
of news, and transmission of messages.
Typically, a dynamic stream is understood through its
temporal nature. A dynamic stream can then be interpreted
as an defnite sequence of clock times and values that can
be used to define a stream algebra, allowing operations on
diverse kinds of multimedia streams. The synchronisation
of streams can be specified with Petri Nets or other
approaches. In the static case, a stream corresponds to the
information content of an entity and is interpreted as a
sequence of basic elements, probably of a same type. Types
of stream include text, video, and audio. The type of the
stream defines its semantics and area of application.
Statically thinking, any text representation can be seen as
a stream of characters, so that text documents, like scientific
articles and books can be considered as structured streams.
A structure specifies the way in which parts of a whole
are arranged or organised. In digital libraries, structures can
represent hypertexts, taxonomies, system connections, user
relationships, containment, data flows, and work flows, to
cite a few. Books, for example can be structured logically
into chapters, sections, subsections, and paragraphs; or

physically into cover, pages, line groups, and lines.
Structuring orients readers within a document's
information space. Indeed structured documents often rely
on markup languages. Relational and object-oriented
databases impose strict structures on data, typically using
tables or graphs as units of structuring.
Indexing in information retrieval systems by a manual
or automated process serves not only to improve
performance but also to cluster and/or classify documents
to support future requests, generating an organisational
structure for the document space. With the increase of
heterogeneity of material continually being added to digital
libraries, find that much of this material is "semistructured"
or "unstructured". Such "semistructured data" refers to
data that may have some structure, where the structure is
not as rigid, regular, or complete as the structure used by
structured documents or traditional database management
systems.
Query languages and algorithms can extract structure
from these data. Although most of those efforts have a
"data-centric" view of semi-structured data, recent work
with a more "document-centric view" have emerged. In
general, human and natural language processing routines
can expend considerable effort to unlock the interwoven
structures found in texts at syntactic, semantic, pragmatic,
and discourse levels. A space is any set of objects together
with operations on those objects that obey certain rules.
Despite the generality of this definition, spaces are
extremely important mathematical constructs.
The operations and rules associated with a space define
its properties. For example, in mathematics, a fine, linear,
metric, and topological spaces define the basis for algebra
and analysis. Spaces can be generalised into "feature
spaces," sometimes used with image as well as document
collections and suitable for clustering or probabilistic
retrieval. Document spaces are a key concept in those

theories. Human understanding is captured in conceptual
spaces. Various spaces or subspaces can handle metadata
like author and date, or relationships like citation-based
links. Multimedia systems must represent real as well as
synthetic spaces in one or several dimensions, limited by
some presentational space and transformed to other spaces
to facilitate processing such as compression.
Many of the synthetic spaces represented in virtual
reality systems are analogs to real spaces, or to information
spaces of various types. Digital libraries may model
traditional libraries by using virtual reality spaces or
environments. Also spaces for computer-supported
cooperative work provide a context for virtual meetings
and collaborations. Again, spaces are distinguished by the
operations on their elements. Digital libraries can use many
types of spaces for indexing, visualising, and other serviCes
that they perform. The most prominent of these for digital
libraries are measurable spaces, measure spaces,
probability spaces, vector spaces, and topological spaces. A
scenario is a story that describes possible ways to use a
system to accomplish some function that the user desires.
Scenarios are useful as part of the process of designing
information systems.
Scenarios can be used to describe external system
behaviour from the user's point of view; provide guidelines
to build a cost-effective prototype; or help to validate, infer
and support requirements specifications and provide
acceptance criteria for testing. Developers can quickly grasp
the potentials and complexities of digital libraries through
scenarios. Scenarios tell what happens to the streams, in the
spaces, and through the structures. Scenarios help us
visualise the spaces, by setting up streams from views of
structures. Thus, taken together the scenarios describe
services, activities, tasks and operations and those
ultimately specify the functionalities of a digital library.
User scenarios describe one or more users engaged in some

meaningful activity with an existing or envisioned system.
This approach has been used as a design model for
hypermedia applications. Human information needs, and
the processes of satisfying them in the context of digital
libraries, are well suited to description with scenarios,
including these key types: fact-finding, learning, gathering,
and exploring.
An event denotes a transition or change between
states, executing a command in a program. Scenarios
specify sequences of events, which involve actions that
modify states of a computation and influence the
occurrence and outcome of future events. From this it is
easy to see how data flow and work flow in digital libraries
and elsewhere can be modelled using scenarios. A society
is a set of entities and activities and the relationships
between them. The entities are hardware, software, and
wetware that are somehow related to the digital library. The
activities are what the entities have done, do, and expect
to do with each other. The relationships make connections
between and among the entities and activities of the society.
Societies are necessary to describe the context of use
of a digital library, since societies are the reason why
libraries are built and maintained. In this sense, digital
libraries are used for collecting, preserving and sharing
information artifacts between society members. For
example, digital libraries help to grow the relationship
between library patrons (society members) and the
information they seek. A society is the highest-level
component of the library, as a digital1ibrary exists to serve
the information needs of its societies. Cognitive Models for
Information Retrieval, for example, focus on user's
information-seeking behaviour and on the ways in which
IR systems are used in operational environments. In digital
libraries, specific human societies include patrons, authors,
publishers, editors, maintainers, developers, and the

library. There are also societies of learners and teachers.
In a human society, people have roles, purposes, and
relationships. Societies follow certain rules and their
members play different roles-participants, managers,
leaders, contributors, or users. Members have activities and
relationships. During their activities, society members have
created information artifacts-art, history, images, data-that
can be managed by the library. Societies are holistic-
substantially more than the sums of their constituents and
the relationships between them. Several societal issues arise·
when consider them in the digital library context. These
include policies for information use, reuse, privacy,
ownership, intellectl.!-aL property rights, access
management, security"etc. Therefore, societal governance
is a fundamental c~rnin digital libraries.
J
Language barriers are also an essential concern in

information systems and internationalisation of online
materials is a big issue in digital libraries, given their
globally distributed nature. Economics, a critical societal
concern,. is also key for digital libraries. Collections that
were "born electronic" are cheaper to house and maintain,
while scanning paper documents to be used online can be
prohibitively expensive. Internet access is widely available
and is inexpensive. Online materials are seeing more use,
including from exceedingly remote locations. With
circulation costs of electronic materials very low, digital
delivery makes sense. However, it brings the problem of
long-term storage and preservation such that the myriad of
information now being produced can be accessible to future
generations. Modelling a society to the extent that it can be
predicted reliably is not possible, due to its sensitivity to its
inputs that gives it its chaotic nature.
However, understanding certain aspects of societies as
sets of entities, activities, and relationships can be
beneficial. In sum, a society is composed of individuals.
166 Advancement in Libruy and Information Science
Individuals have an intrinsic identity and are grouped into

communities indirectly by way of descriptions that apply
to all members of a community. Individuals are related to
each other through relationships. Relationships specify
interconnections and communications among individuals.
We approach the problem by constructively defining a
"core" or a "minimal" digital library, i. e., the minimal set
of components that make a digital library, without which,
a system/ application cannot be considered a digital library.
Each component is formally defined in terms of a S-
based construct or as combinations or compositions of two
or more of them. The set-oriented and functional
mathematical forma] basis of 55 allows us to precisely de
fine those components as functional compositions or set-
based combinations of the formal 5s.· Informally, a digital
library is a managed collection of information with
associated services involving communities where
information is stored in digital formats and accessible over
a network. Information in digital libraries is manifest in
terms of digital objects, which can contain textual or
multimedia content (e. g., images, audio, video), and
meta data. Metadata have been informally defined as data
about other data.
Although the distinction between data and meta-data
often depends on the context, metadata commonly appears
in a structured way and covering different categories of
information about a digital object. The most common kind
of metadata is descriptive metadata, which include
catalogues, indexes and other summary information used
to describe objects in a digital library.
Another. common characteristic of digital objects and
meta data is the presence of some internal structure, which
can be explicitly represented and explored to provide better
DL services. Basic services provided by digi tal libraries are
indexing, searching, and browsing. Those services can be
tailored to the different communities depending on their
roles, for example, creators of material, librarians, patrons,

etc. In the following formally define those concepts of
metadata (structural and descriptive), digital objects,
collection, catalogue, repository, indexing, searching, and
browsing services, and finally digital library.
DIGITAL METADATA
Many new kinds of meta data are possible in a digital

library. Three examples are dynamically generated indexes,
personalised structures over library elements, and
annotations. Dynamically generated indexes may have
relatively short life-spans compared to the long-lived
indexes of the physical library. One example of
personalised structures are user- or group-specific sets of
hypertext links over some set of library elements.
Annotations are virtual modifications of data objects by
patrons-these modifications exist separately from the data
but may be always displayed with the data for a particular
user or group, thereby effecting a "virtual" modification.
A problem with new digital library metadata is that
much of it is personal, and thus may be stored separately
from the data over which it applies, leading to possible
consistency errors. If many users build structure over
certain data in a library, and that data changes, what should
be done with all of the metadata that is in some way
invalidated by this change? This is certainly a problem in
the physical library. Because most physical library
metadata resides in the library itself, however, it may be
easier to modify the metadata to reflect any changes in data.
With personal digital library metadata, all such copies of
metadata may not be known. To what degree is the digital
library system responsible for propagating changes to
patrons with metadata that relies on the changed material?
How can this propagation be effected?
The digital library allows new processes not found in
the physical library. Specifically, processes such as full-text
searching, personalising presentations, and retrieving by

agents are new digital library processes. Full-text searching
refers to querying a full-text index. Personalising presen-
tations involves access control issues as well as tailored
screen layouts. Retrieving by agents involves programs that
search data autonomously and report findings to users. One
problematic aspect of these new processes is that they
involve computation that may access large amounts of
library data or metadata.
A central problem is how to distribute the
computation needed to maintain these processes. For
example, how much of the computation involved in person-
alising presentation of information should be done by the
server and how much should be done by the client? If such
processes are computationally expensive, how can this load
be fairly distributed? What is the optimal mix of client /
server communication, server-side computation, and client-
side computation for effecting these processes?
DIGITAL LIBRARY SYSTEM ARCHITECTURE
Conceptually, a digital library system may be thought of as

mediating certain kinds of interactions among people and
computing systems. Publishing in the digital library is not
strictly a relationship between publisher, librarian, and the
digital library server. Patron needs, budgetary constraints,
limitations of library computing resources, and a number
of other factors may be involved.
Any robust digital library system should provide
support for these complex relationships. The client and
server computing systems may each be further subdivided.
Each may be thought of as consisting of three parts: the
back-end, the "middle-end", and the front-end. Both the
back-end and the front-end of a system define interfaces
between the system itself and some external entity. A
system front-end normally provides services to external
clients, while the back-end is provided with services from
external servers. The middle-end provides some interme-

diate mapping between the front- and back-ends.
CLASSICAL DIGITAL LIBRARY MODEL (CDLM)
The growing number of networked information resources

and services offers unprecedented opportunities for
delivering high quality information to the computer
desktop of a wide range of individuals. However, currently
there is a reliance on a database retrieval model, in which
end users use keywords to search large collections of
automatically indexed resources in order to find needed
information. The classical digital library model (CDLM),
which is derived from traditional practices of library and
information science (LIS) professionals.
These practices include the expert critical selection and
organisation of the networked information resources for
local population of users and the integration of advanced
information retrieval tools, such as databases and the
Internet into these networked collections. To evaluate this
model, LIS professionals and users involved with primary
care medicine were asked to respond to a series of questions
comparing their experiences with a digital library
developed for the primary care population to their
experiences with use of the Internet in general. There is
much current research on digital libraries. However, there
is little current work that explicitly investigates how to
maintain the various roles of LIS professionals so that they
can leverage network innovations to effectively manage
information collections for their constituencies.
The CDLM examines what professional services can be
delivered to the desktop computers of users. The strength
of the CDLM is that it enables LIS professionals to extend
their practices to the digital library eflvironment where
networked delivery of digital content does not require
library users to be physical present in a library. Th~ effect
is to leverage the capabilities of a profe~sion that has been
concerned with the challenges of information management

for over a century. The CDLM is a service-oriented model
developed so that the best digital resources appropriate for
users, and the needs arising from their usual processes, can
be integrated into digital libraries that are managed by LIS
professionals who have appropriate local intelligence.
The scope of the CDLM encompasses the digital
libraries' development and management from needs
assessment, information selection and organisation to
access and use. Under the CDLM, users do not have to
browse, navigate or search large and diverse information
spaces without the assistance of LIS professionals who are
able to preposition selected quality resoUrces and organise
them with their needs in mind. The goal of the CDLM is
to provide a digital library environment within which a
typical user can autonomously browse for documents
relevant to specific information needs.
There are three principal aspects of this goal:
wayfinding, collection development, and organisation of
resource collections. These aims must be achievable within
an implementation framework that is manageable on a
large scale, and capable of performance monitoring and
enhancement. Wayfinding is a concern from the physical
library model. The premise of library wayfinding is that
users who are able to help themselves are much more likely
to be return visitors. This is the central concern of the
classical digital library model: How should a digital library
be designed so that users are able to help themselves with
minimal guidance and thus revisit on a regular basis?
WEB-BASED RESOURCES
Many websites now do something-they enable visitor

details to be captured, online orders to be taken and
personalised information to be displayed based on user
profiles. Thus, some websites can be considered to be more
like a software applications than a set of publications. Each
Multimedia Systems in Ubraries 171
form of web-based activity poses a different set of

challenges for those who seek to keep and preserve records
of them over time. In its most basic form, a website may be
nothing more than a collection of static documents sitting
in folders on a server and tied together with hyperlinks.
These documents share a common address-the domain
name in the universal resource locator (URL), such as
'naa.gov.au'. A static website maps URLs directly to file
system locations. The only interactivity provided by static
sites is in the links which enable movement from one
document to another or from one part of the site to another.
Many websites utilise forms to collect information
such as comments and requests, from visitors. While these
sites are still largely static publication mechanisms,
agencies keeping records of such sites must also pay
attention to:
the information provided when the visitor fills in the
form (usually stored in a 'back end' information
system);
the form itself; and
the human readable source code of the script or
program which enables the form's functionality.
Websites are sometimes used as front ends, or user
interfaces, for accessing an organisation's database(s). Site
users search prepared lists or put together their own
searches which, in turn, query the content of a database.
The information returned from these queries is displayed
as an HTML (hypertext mark-up language) document to
the user. In many cases, documents exist as objects in a
database. Each document will have its own unique
identifier, usually reflected in the URL. This means that a
user can bookmark the particular document and return to
it later without reconstructing the original search query
(provided the document has not been deleted from the
database). Even if the site's main or top-level pages are
static, dynamic data access websites raise some additional

issues for agency recordkeepers.
Not all users 'see' the same website. At designated
levels, the pages displayed on users' browsers are based on
what they ask for, therefore user queries are an integral part
of generating the website and may need to be captured.
Information contained in databases behind the site may be
continually changing. An increasing number of websites
are being built which generate all of the pages 'on the fly'.
This means that the component parts of each individual
page-its content, structure and presentation-are
generated dynamically using a combination of databases
and style sheets based on:
a stored set of user preferences;
a stored set of access profiles;
a user query; and/or
the capabilities of the user's browser.
In these situations, the website does not exist in any single
or easily capturable form. Each user sees a different 'site'
based on their stored preferences and access rights, current
needs, and the capabilities or limitations of the technology
they are using. Although the end result for the user might
be a set of static pages, the processes which build the pages
involve the use of a number of software tools. This is the
point at which websites become more like software
applications than electronic publications.
Agencies need to consider how to archive dynamically
generated web resources in a fully functional state. The
major issues these sites raise for agency recordkeepers is the
need to choose whether to use an ohject-based or an event-
based approach to keeping records of web resources and
activities. That is, an agency needs to determine whether it
wishes to focus on keeping records of:
the individual transactions betWt'l'n clients (users) and
servers (agencies); or
Multimedia Systems in Ubraries 173
- the objects that comprise the content of the site at any

given time.
Following are the fundamentals of good web-based
recordkeeping:
Take a systematic approach
Assign and document responsibilities
Determine requirements for records
Apply metadata
Capture records into a recordkeeping system
In keeping records of web-based activity, there are certain
fundamental procedures that all agencies should observe.
These rules are not unique to web-based recordkeeping.
They are commonsense approaches which organisations, as
a matter of course, should implement as part of their
regimes for managing information resources-whether
these are web resources, electronic or paper-based records,
or data in legacy systems.
Metadata Creation and Cataloguing

The promise of digital libraries implies the possibility of
disseminating materials and information far beyond what
has ever been imagined. Early digital library efforts, such
as the Library of Congress' National Digital Library
Program and the projects sponsored by the digital library
I and II initiatives in the United States, showcase digital
facsimiles of unique doc;uments and artifacts previously
available only to curators and scholars. In Japan, the
National Diet Library, Kyoto University Library, the
University of Tsukuba, and the University of Library and
Information Science are actively planning to publish digital
content on the World Wide Web. One could view "digital
information organisation" as having two facets:
- The creation of cataloguing information to enable
searching, discovery, and retrieval of information in

digital format.
Accomplishing this task with methods that scale to
effectively handle quantities of data exponentially
larger than libraries have ever done. A key issue impa-
cting the wide dissemination of digital inform-ation is
the scalability of providing information (metadata) to
structure and enable searching, navigation, and
presentation of online documents.
On the surface, provision of metadata to accompany digital
objects does not seem difficult. Roughly speaking, many
people think that all that must be done is to take existing
cataloguing information, convert it to the appropriate
format, and link it to the digital images. The process is not
that simple due to several factors. First of all, the conversion
of a physical artifact implies not just putting information
into a new format but the concomitant goal to display the
information in a logical way. To do that, information in
addition to the content must be produced or extracted to
enable the structure and display of the data. If existing
schemes for classification and indexing are used, human
intellectual capital is necessary at some point in the process
to apply thesaurus terms and enable other access points.
REFERENCES

Technical Report, 3 (1). Pittsburgh, Pa.: Archives and Museum
Informatics, 1989.
access: archives in the new millennium: proceedings,. 3-4 June 1999.
London: Public Record Office, 1.999, pp. 52-58.
no. 3. Loughborough: EMBLA Publications, 1990.
7
Digital Information Preservation
In general civilised nations want to preserve the written

evidence of their history, literature and cultural
development in the broadest sense because it promotes in
their population a sense of integration and self-
identification. In a few cases, however, the cultural heritage
stored in archives or libraries is deliberately jeopardized for
political reasons, for example, to wipe out the traditions of
ethnic or other social groups. The main source of risk is that
conservation, as part of a political duty to posterity, is
neglected and the necessary resources in the way of staff,
buildings and technical equipment are not provided.
Though part of the cultural heritage, archives,
manuscripts and print do not survive on their own; it takes
political will to safeguard and protect this cultural
inheritance and to ensure that it is constantly supplemented
by contemporary documents of lasting value. Like an
empty house, property that is not managed and cared for
by trained archivists and librarians, and not made
accessible to researchers and the public at large, will
deteriorate and disappear. For this reason many countries
have legislation and regulations governing the protection,
conservation and use of archival property. In many cases,
it is also a legal requirement that at least one copy of every
book or printed work be kept in an 'archive library'.
Both endogenous and exogenous factors threaten the

continued existence of archives, manuscripts and print.
Endogenous (mostly chemical) sources of damage are
intrinsic to the information medium itself, the way it is
produced or even the material used to record the text.
Exogenous sources are physical phenomena acting on the
media or text from outside. 1;:>amage causl:!d by the
combined action of endogenous and exogenous factors is
by no means rare. Globally, though, there is no doubt that
endogenous deterioration is the worst source of damage to
paper.
The steep increase in paper demand since the mid-
nineteenth century and the related growth in industrial
paper production called for new technologies and caused
a revolution in paper manufacture. Two primary factors
. affected the quality of paper: acid sizing with a mixture of
alum and rosin, and the addition of cheap mechanical wood
pulp to rag-based or cellulose fibre pulp. Acid paper or
paper containing mechanical wood pulp ages visibly and
quickly, goes yellow or brown and becomes fragile and
brittle. Papers containing both acid and mechanical pulp
lose their colour within a few decades, become brittle and
crumble under the slightest mechanical load.
In books and documents affected by endogenous
deterioration the decay is gradual but unstoppable. The
process can be likened to a fire, smouldering slowly and
unnoticed in the storerooms, secretly destroying cultural
property. Acid or mechanical pulp-based papers thus carry
within them the seed of their own deterioration, as indeed
do many modern information media such as nitrocellulose
film. Another endogenous source of damage stems from
certain inks, including the deterioration caused by ink
erosion. For example, the iron gall ink still in general use
in law offices in the nineteenth century, which is wash-
proof and, in particular, bleach-proof, causes corrosion
aggravated by damp, even eating through strong rag paper
Digital Information Preservation 177
and leaving sharply etched holes where previously there

were letters or characters. The degree and rate of
endogenous deterioration are greatly affected by
exogenous factors such as temperature, humidity and
oxidizing or acid-forming gases.
If high relative humidity and heat are both present, and
the air quality is bad with a high sulphur dioxide content,
destructive reactions such as the weakening of the paper
are greatly accelerated. Unfortunately, paper deterioration
and its consequences are not the only threat to property
held in archives and libraries. Brittle and limp papers,
fragile leather and parchment, splits, holes, breaks and
deformations in binding are today part of the almost
normal picture in the case of many old books and archives.
Dirt and discoloration, marks on texts, paper turning
yellow or brown and text bleaching, if they go beyond the
natural effects of ageing, are alarming warning signs of the
creeping decay of cultural property. A large amount of
serious damage, however, is of human origin. Obviously,
when cultural property is picked up and handled, human
incompetence and thoughtlessness are major causes of
damage.
The users of archives and the organisers of exhibitions,
not to mention many negligent archival and library staff,
need to be constantly reminded that the books and archives
they deal with are not consumer goods intended to last one
or two generations. In our efforts to use books, manuscripts
and archives in the most comfortable and rational manner,
we often fail to take the necessary care. Splits, folds, spots
of fat, ballpoint pen marks, bleached ink and damaged or
deformed book spines are to be deplored, and there is no
reason consciously to cause such damage.
The same applies to well-meant but unprofessional
repairs with adhesive tape which generally do more harm
than good. Lastly the display of archives, manuscripts and
printed works in exhibitions also has its dark side. Long
periods under the strong lighting and severe mechanical

loading required for their attractive presentation, as well as
the damage often caused during transportation, leave traces
on the exhibits that cannot be removed. The damage
already caused-or yet to be caused -by endogenous paper
deterioration is on a global scale but, quantitatively, it can
be neither calculated nor even estimated. All properties in
archives and libraries since the mid-nineteenth century are
susceptible to endogenous paper deterioration.
Permanent paper is found to an increasing extent since
the 1980s, particularly in North America, Australia and
central and northern Europe, but no substantial use is yet
being made of it for paperwork, books and other documents
of lasting value. Surveys of European and North American
archives and libraries suggest that at least 60% of the items
stored in public arc~ives are potentially subject to
endogenous damage. In the case of 20%, the damage is
already evident or so imminent that the items can no longer
be used and conservation measures will be needed to save
them from final destruction. These surveys enable certain
inferences to be drawn: in countries with hot, moist
climates the endogenous deterioration will advance more
quickly if the archive and library storerooms are not air-
conditioned; and environments with a high toxic gas
content also accelerate deterioration.
These inferences are supported by reports from libraries
and archives all over the world. The poor state of materials
is often thought to result from bad storage conditions rather
than the inevitable deterioration of acid paper, and
_endogenous factors due to acid and mechanical pulp are
not fully appreciated as the underlying causes of damage.
It is even more difficult to reach any general conclusions
regarding the amount of damage caused to archives,
manuscripts or printed materials by exogenous factors. This
very much depends on the way the materials have been
stored over the centuries, armed hostilities and natural
disasters, frequency of use and, last but not least, the

resources that have been or are to be used for their
protection and conservation, or even simply their cleaning.
As an illustration, however, some orders of magnitude for
the amount of dam~ge can be obtained from one survey
carried out in central Europe and including eleven scientific
libraries and six state archives.
All over the world efforts are being made to halt the
creeping deterioration threatening to destroy a substantial
part of our cultural heritage, the written works handed
down over the last century and a half. Many national and
international organisations are actively working on the
conservation of the threatened property. Today there is no
difficulty in producing permanent paper that is free from
the factors that have caused paper deterioration in the past,
that is, free of acids or acid-generating substances and of
mechanical pulp (lignin). Such papers (neutral or slightly
alkaline) are made from cellulose fibre; it is also possible to
obtain the latter from a non-chlorine bleach process. This
kind of cellulose can be derived from lean wood, thinning
or chipwood.
As a protection against acid in the environment
permanent papers are given a coating of at least 2%-talcium
carbonate. The international standards for permanent
paper are laid down in ISO 9706. Incidentally, the
production of such paper is advantageous both
economically and environmentally, and is in tune with
c~rrent paper production trends. Permanent papers should
be no more costly than less durable papers of comparable
quality. While the use of permanent information media and
writing materials is designed to prevent future endogenous
deterioration, other preventive measures serve to avoid or
limit damage to objects already stored in archives and
libraries and even to extend the life of documents affected
by endogenous deterioration.
The objective of these preventive measures is to create

an environment for the objects under threat in which ageing
processes are checked and exogenous sources of damage
kept, as far as possible, at a distance. This is achieved by
appropriate air-conditioning for the archive and library
holdings in secure and properly equipped storerooms, the
first rule of thumb being to have everything as cool and dry
as possible, and the second to avoid sharp changes in
temperature and humidity. Provision should also be made
for adequate frequency of air change. Lengthy exposure to
light, and in particular daylight or artificial light with a high
ultraviolet content, should be avoided.
Keeping both the storeroom and the articles
themselves clean is another effective measure. All property,
excepting books with their binding intact, needs non-acid
protective packing using materials meeting the criteria of
ISO 9706. An international standard on the above-
mentioned preventive measures is under preparation.
Under the heading of preventive measures, of course, must
also come effective conservation management which
ensures that proper care is taken in the removal and return
of books and archives, their transport and in particular their
use in reading rooms. Care is also required for items on
display: technical and organisational measures must be
taken to protect them against damage and wear.
Timely protective filming of endangered archives or
manuscripts is one of the most effective and yet most
economic measures of all. Restoration of materials in
archives and libraries is by no means the preferred
objective. The primary aim must be to prevent damage by
employing the above preventive measures. But if damaged
cultural property has to be repaired it needs to be done
professionally. For the restoration of archive and library
contents, specialists have, with the help of scientists, drawn
up a number of principles designed to retain as far as
possible the original substance and appearance, and to

avoid clumsy renovation or reconstruction.
For historians and literary specialists, and also for
researchers in the fields of binding and codices,
inconspicuous external and formal marks are significant.
They are often the only pointers to an object's origin, its
legal validity or specific history, and should not be
destroyed or disturbed by the craftsman. Only identical or
similar materials to the originals should be used. The work
must be reversible so that the object's state Rrior to the work
can be restored at any time. The object's appearance must
be retained. What is done must be recognisable and the
restorer must explain fully what has been done and how
it has been done.
Lastly, the work must be described in writing and if
necessary documented by photographic means. With the
help of this documentation, future generations of restorers
and scientists will be able to reconstruct what has been
done with total clarity. To deal with dirt or spots tha t cannot
be removed in this way, paper should be carefully
moistened to dissolve the dirt or else washed in the normal
way. Restorers do not like to use bleach or other chemicals
as solvents. Water is little short of miraculous for the
restoration and conservation of paper. Water cleans,
washes out harmful residues and can be used to introduce
buffer substances to protect paper from endogenous and
exogenous acid attack. Common media like black printing
ink and iron gall ink survive wet cleaning without any
problem. .
Other inks, such as Indian ink and stamping inks, have
to be fixed before washing. After washing, papers are given
fresh strength by sizing and then dried and pressed. Water
treatment can be made particularly efficient using a
programmable transport system to time the immersion in
a treatment bath of large quantities of paper suffering from
the same· kind of damage. The moistening of water-
sensitive papers such as glassine or papers printed with

sensitive inks has to be done under careful control. Only
moisture in the form of vapour, not liquid, can be allowed
to act on the sensitive object.
The wonder-material that produces this effect is the
micro-fibre Gore-Tex, well-known in the weatherproof
clothing industry. For books and binding, where costly
breakdown into individual leaves before treatment is to be
avoided, de-acidification under spray is used. In this
technique the buffer substance is applied to the paper in a
very fine spray. Splits in paper are made good with wheat
paste, where necessary using Japanese paper, a very thin
and transparent but strong handmade paper. Holes and
damage to edges and elsewhere can also be patched up by
hand with wheat paste and torn Japanese or stronger
handmade paper, but the more elegant and at the same time
more rational technique is leaf casting, in which holes and
other defects are made good with fibre deposited from a
suspension of pulp in water. Sophisticated equipment is
available to the restorer for this technique and gives
excellent results.
Another efficient method for repairing holes in paper
or strengthening brittle or fragile papers is paper-splitting.
With this technique, already in use irdhe mid-nineteenth
century, damaged paper only a fraction of a millimetre
thick is split in two so that back and front become two
separate leaves. Next a very thin but strong paper is glued
in between to form a support. Additives can be included
in the adhesive in the form of alkali buffer solutions. In this
way the strength of the paper is restored or even increased
with no change to the original surface, back or front. The
strudure of the paper is etlso unchanged and even
watermarks remain visible. Paper-splitting, therefore, has
clear advantages over use of lamination or Japanese paper,
which restorers, for good reason, prefer not to employ if at
all possible. The treatment of parchment manuscripts,
records or binding is difficult because the reaction of this

intrinsically permanent animal skin to moisture and heat is
more acute and unpredictable than that of paper which is
more homogeneous.
Smoothing and stretching distorted parchment, here
too using controlled wetting, calls for much experience and
patience. When repairing splits or holes in parchment,
restorers prefer flexible bonding to gluing over the whole
area because old and new parchments behave differently
and unpredictably. In such cases sewing techniques are
used in which special types of stitching ensure that the
repaired parchment can stretch unequally without
warping. Small defects or holes in parchment can be
corrected by a method similar to leaf casting, with a
parchment fibre suspension.
Seals made from beeswax and additives, used to
certify the validity of official records and contracts, are
frequently dirty, but in many cases small or large fragments
also are broken off, possibly as a result of mishandling
when in use. The main treatments in seal restoration
include the manual replacement of damaged parts with
pure beeswax, the repair of damaged edges and the making
good of breaks in the impression. Fragile seals are dealt
with in a conservation unit where tiny cracks and channels
are closed under a vacuum after careful heating.
Finally, the restoration of bindings calls for
demanding manual work specific to each repair. A
thorough knowledge of the history of binding materials
and techniques in use since the Middle Ages is necessary.
If the repair is more than a matter of gluing down a loose
spine or reconstructing a book clasp that has gone astray,
the volume has to be patiently stripped down to its different
constituent parts. The cover has to be removed from the
book block and the block treated on its own. The book's
condition is carefully noted so that the assembling and
binding techniques used centuries ago can be reconstituted
with perfect accuracy. The covers are repaired separately.

This often involves dealing with valuable old wood boards.
Given the immeasurable quantity of damaged
archives, manuscripts and books, and the high cost of
repair, restoration and conservation, methods have to be
efficient and, wherever possible, machinery should be
used. After more than twenty years of research and
development, industrial methods can now be employed to
control endogenous paper deterioration. 'Mass de-
acidification' allows books and bound volumes of official
papers to be treated in large batches. The alkali-buffer
process extends the remaining life of acid paper by a factor
of three. However, since mass de-acidification is not so far
associated with any increase in strength worthy of the
name, it is only suitable for relatively new papers whose
strength has not yet suffered any serious diminution. Up
to now only one mass conservation method is available for
improving paper strength: a wet process that fixes inks,
washes out the products of decomposition, provides an
alkali buffer and adds a coating of size. Unfortunately this
machine method is only suitable for treating individual
pages one by one.
In the restoration field, incredible though this may
sound, paper splitting is the only mechanised and
automated method. The German Library in Leipzig has a
machine that copes with the difficult task of paper-splitting,
glues in the strengthening paper, presses and dries the page
and will soon, it is hoped, also automate sheet separation
and trimming. Along with the preservation of originals, the
conversion of damaged or endangered documents is
another conservation process. The specific nature of
archives, manuscripts and unique printed works calls for
graphic conversion and not simply a coded transfer of the
text.
The graphic conversion of endangered archive or
library objects to substitute media for preservation
purposes and/or as a permanent replacement for decaying

documents requires systems that can ensure an optimum
'quality of reproduction, long-term durability of the
conversion medium and a high level of cost-efficiency.
Microfilming has become the most usual method to meet
all these criteria. It is a highly economic and at the same
time efficient conservation method.
Used at first for preservation purposes, it is a life-
extending measure saving endangered books or documents
from wear. The originals stay in the optimum climatic
conditions of a safe storeroom while the user is provided
with photographic reproductions. If this does no more than
avoid damage due to use on one single occasion, the cost
of microfilming, totalling considerably less than that of
restoration and conservation, is already recovered. But in
many cases microfilm can also fully take the place of
endangered originals.
Microfilm is an economic storage medium that meets
not only the high demands of scientific research regarding
quality of reproduction but also the strict durability
requirements of a conservation strategy. With modern
equipment, most of it semiautomatic, large quantities of
endangered books or archives can be recorded in black"and
white or colour in a relatively short time.
Copies of these films can be made for users in
practically any number. In this way the microfilming of
unique historic documents helps not only to protect the
originals but also to improve accessibility for researchers or
interested members of the public. Microfilming still retains
its place despite the appearance of new digital media.
Obviously the object of conversion must be to replace
problematical information carriers such as brittle paper by
more reliable media, not ones that cause yet more problems
as time goes by. Compared with electronic image storage,
microfilm, using a technique invented over 150 years ago,
offers the advantage that the information in analogue form
is continuously accessible to the human eye. In principle

microfilm systems are not likely to undergo any basic
technical change and are compatible with the new digital
systems. Film can be scanned more efficiently than the
photographed originals.
So in an information world in which nothing ages as
rapidly as hightech systems, microfilm is the ideal
upwardly-compatible storage medium for the long term. To
that extent microfilming endangered documents is still the
right answer. To interrupt microfilming projects and switch
to long-term digital storage for conservation purposes
would be short-sighted. Image digiti sing of endangered
archive material giving maximum quality of reproduction,
as required in particular for scientific research, is not yet
possible at acceptable cost: the necessarily high storage
capacity requirements of such image systems will continue
to incur relatively high costs for processing, storage and
distribution.
The permanent accessibility required cannot be
ensured either by digital storage media with their limited
life or by the long-term availability of compatible systems
on which to view them. The hardware and software
components of electronic image storage systems are hardly
standardised and could be affected by rapid technological
change. The innovation cycles of the hardware will have a
shorter life than those of the optical-electronic storage
media, and little heed will be taken of the archival need for
long-term technological compatibility between emerging
generations of systems. Responsible use of digitised storage
must therefore make provision for continuous conversion
of machine-readable data to keep abreast of hardware/
software innovation cycles, and for the cost involved in so
doing.
These are incomparably higher than the costs of
microfilm ~ystems, which require only relatively simple
long-term storage facilities and equipment. The high-
quality digitisation of the huge quantities of data necessary

if all materials under threat of deterioration are to be
included is, for cost reasons alone, inconceivable. It would
also be uneconomic given documents' relatively low
frequency of consultation, because they would probably
have tq be converted several times before there was any
demand for them. Users' acceptance of microfilm, or
archive and library materials reformatted in other ways,
depends mainly on quality of reproduction, readability and
representation of halftones and colours, layout and the
quality of the accompanying user documentation.
Microfilming is highly developed. High-quality,
semiautomatics microfilm cameras and automated
developing equipment are available that guarantee results
in conformity with the international standards (ISO 6199)
which are very comprehensive in this field. Readers and
reader-printers are available for rollfilm and microfiches
that are very luminous and of high optical quality. The
conversion of damaged or endangered archive and library
materials can only be an effective conservation measure if
the use of microfiltn is promoted by the provision of the
necessary facilities in libraries and archives. With the
economic viability and guaranteed future of microfilm as
a storage technique safely secured, however, archive offices
and libraries should not reject the digital world.
Effective damage prevention measures need to be
taken as early as possible, and the same applies to
preservation microfilming, which should be done while the
graphic information is still complete. Preventive measures,
however, are also an essential preliminary before taking
steps to prolong the life of endangered objects OI. to repair
damage that has actually occurred, whatever their'.cost. The
major investment involved is only justifiable if the objects
are afterwards kept in an environment contributing to their
permanent conservation.
The first decision, whether to keep originals or to

convert, is technical. The central consideration is the
intrinsic value of the objects as determined by their formal
external characteristics, which cannot be retained in image
form. Ar.:hives, manuscripts and printed works that are
intrinsically valuable must in every case by kept it). their
original form. In other cases graphic conversion is generally
a much less costly alternative. -The low cost of conversion,
particularly in view of the normal shortage of resources and
the problem of numbers, argues for filming the maximum
quantity possible so that as large as possible a share of
resources then can be devoted to the far more costly
preservation of intrinsically valuable cultural objects.
INTERNET IN DIGITAL PRESERVATION
Tod!ly, we need again to take a fresh and holistic look at

what we are trying to do, to examine the nature of all types
of resources that we acquire, describe, organise, arrange,
preserve, access, select, and use. We need to understand
chatacteristics of users, their abilities, inquiries, and
information tasks, retrieval models, and operational
standards. Finally, we need to educate current and future
professionals about the evolving nature of virtual public
service components. We all rea d papers on important issues
of digital libraries, on declining uses of cumbersome
traditional reference services, on sharply increased uses of
convenient search engines, on early experiences of
collaborative digital reference services, and on uses of
various chat technologies. Consumers will be increasingly
remote rather than in-house, diverse in their capabilities
and needs, and will have high expectations from 24/7 real-
time virtual reference (similar to the ones in banking,
automated gas stations, food services, and other self-help
industries).
This area has been mainly studied by the cataloguing
community and at a generic level only. In addition, we need
Digital Infonnation Preservation 189
better linkages between different types of resources. At one

level, there is the 856 field, electronic location and access,
between a MARC catalogue record and networked
documents. It links a bibliographic record that is displayed
on WebOpacs with full text documents.
Other issues will be related to attaching tags for quality
assurance, especially to fluid digital-born documents.
Examples include authenticity, provenance, permanency,
methodological integrity-reliability and validity. Another
area that will be equally important to consider is the
capability to use seamless languages both by the reference
provider and the consumer in various question
negotiations and answering settings. The word seamless
here means that clients need a series of gentle
transformation of search languages on the Web, ranging
from a pictorial representation of a "thing" for the
elementary school child, to conceptual maps and DDC for
the high school student. More than ever before, we need a
search language that would ensure consistency, accuracy,
precision, and negotiation power between the remote
parties.
Controlled languages typically include subject
headings, classifications, names, and unified titles. This
area has been traditionally studied by cataloguers and
classification researchers; it has recently been studied by
others and renamed into knowledge organisation tools,
ontologies, and meta languages. Research is needed to
investigate level of compatibility between and among
languages for different user groups. Here, we have not
touched upon communication languages that are needed
for disadvantaged users if we are to provide equitable
CDRS for all potential clients.
Search engines are likely to index more popular and
US sites. With billions of digital documents out there, even
with embedded tiny networked sensors and actuators, it
will be mandatory to delineate reference tasks to be
190 Advancement in Ubrary and Infonnation Sdence
performed by trained and experienced people and others

to be computer mediated at different levels of intervention.
According to the new library act the virtual library is THE
new item especially for the public libraries, where the use
of information technology until recently was not so well
established. So the government has initiated several
projects in order to promote the public libraries as local
information centres, and to experience the library staffs in
using the internet to communicate with the end users.
In October 2000 a new national service to the public
was opened: bibliotek.dk-financed by the government. It
is an online catalogue where you can search amongst all the
Danish libraries' -public libraries as well as research
libraries-books, CDs, articles, CD-ROMs, videos, reviews,
talking books and research reports. You can order what you
want and collect it at the library of your preference. All you
need is an internet access and a borrower's number to the
library where you want to collect. You can order what you
wish from your home PC or from any internet access. If
your own library has not got what you need, your library
can get it from another library in the interlending network.
This database is based on the national professional online
catalogue DanBib, which still serves as the national
bibliographic interlending tool.
The principal goal is to simplify the complexity of a
professional database for the benefit of the non-
professional user. For example a title should only be shown
once though the libraries may have different principles of
registration. The public libraries use the national
classifications system where the research libraries use
Dewy. Search facilities will be extended, and the interface
to the end user will go ~m being developed. A future
intention is that it will be possible to be told immediately
if there is any waiting time. Via e-mail you can get
information about the latest links. Also the children's
library is a net guide for children and young adults.
Digital Information Preservation lQl
The guide gives access to links on the internet which

have been chosen according to the same guidelines as those
the librarians normally apply when selecting material for
children and young adults. Global information for ethnic
minorities is offered at: finfo.dk It is a global guide with
links to the countries particularly relevant to ethnic
minorities in Denmark. You can also get general
information about the rights, obligations and possibilities
of ethnic minorities in the Danish society. In addition most
public Danish information according to the national
information policy is to be found on the official web sites
and there is a tendency to publish online versions in stead
of printed.
Regarding reference material the DEF-project has
initiated projects for research libraries and for public
libraries to get licenses to databases and reference material
with relevance for the local users. This of course is a
challenge to the reference librarians and time has come to
consider whether we can quit the past and the good old
reference sources. However there is no doubt that time has
come to improve the librarians' competencies in being
"virtual librarians". Skills in handling desk top facilities,
downloading and file handling is still not "common
knowledge" among the middle aged librarians.
Also it is obvious, that the public libraries have
competitors in delivering information via the internet, and
that people often help themselves searching and finding via
the most popular search engines. For the users it must make
a difference to ask a librarian: you must get correct and
sufficient information on time. The Danish libraries have
skilled workers and competence to deliver information
even to specific groups of users-but this is often a secret
to the public. An explanation in many libraries will be that
we have not got the time, the experience and other
resources for marketing our services.
The public library is well established in peoples'

minds and our statistics show that we are busy and that the
number of users is increasing---especially when it comes to
the education area. So it is easy to forget, that there actually
is a much bigger market for information in the local area
among professionals as well as amateurs: business people,
lawyers, doctors, people looking for jobs, genealogists and
people dealing with local history. Through the years the
libraries have built up competencies and resources to serve
the local areas with proper information; in a near future
these investments will be changed into virtual services
delivered from as well small as big public libraries-the
only tool you need is access to the internet.
In many aspects the reference work in the Danish
libraries must go back to basics if the library service shall
make the difference for the user. And marketing must be
priority. It is necessary that the service offered is accessible:
there must be someone there to answer the phone and chat.
For many years we have kept on taking up new tasks-now
is the time to look back and find out if services are no longer
needed. It is obvious that we cannot go on compiling new
activities and services-we simply have not got the
resources for that. We have noticed, that the users' needs
are changing. Many users still come to the library for
leisure, but the number of students and people searching
for lifelong learning is growing and their demands are
becoming more sophisticated.
The quick reference service will change into
information management and document delivering and
that the librarian will have to deal with more complicated
subjects on an expert basis. Users want access to the library
virtually as well as physically 24 hours round the clock.
Having the reference sources online makes them
invisible---especially to the users frequenting the physical
library. Treaties, artifacts of social institutions, are
negotiated in order to respond collectively to issues that
states have been unable to resolve without institutional

support. However, the multilateral treaty-making process
is not always productive.
Take the Moon Treaty for example, the fifth in a series
of V.N. outer space treaties which after nine years of
negotiations finally culminated in consensus at the U.N.
Committee on the Peaceful Uses of Outer Space (COPVOS)
before being sent to the General Assembly. It entered into
force in 1984 but has since only attained five signatures and
nine ratifications. There are many treaties similar to the
Moon Treaty that are currently gathering dust on
bureaucratic bookshelves. Simply put, the present system
has not always been effective.
Despite such problems as fraud, terrorism, and
security breaches, the World Wide Web is not powerless in
such a setting as the U.N. Office of Outer Space Affairs
(OOSA) web site demonstrates. Inaugurated in March 1994
as the first official V.N. presence on the WWW, the OOSA
web site disseminates on-line documents and maintains
archives to help promote space technology and space law
as an integral part of sustainable development efforts in
developing countries where the Internet is becoming more
readily available. One of the most compelling roles of the
Internet in multilateral treaty-making is its capacity-
building function which links diplomats, institutions, and
practices to enable states to achieve their objectives. Priority
areas for capacity-building on the Internet are:
Information dissemination and access: The Internet fosters
creation of political and administrative forums which
allow negotiating parties to understand, formulate, and
share their own interests and concerns. It also
stimulates a multiplier effect by harnessing domestic
opinion and facilitating the linkage of policies with
issues. Moreover, uncertainty regarding other parties'
positions on issues will no longer restrain otherwise
willing countries from helping to bring multilateral

treaties to fruition.
Alliances, domestic and international networks, and bilateral
and multilateral partnerships through negotiation: Even
micro-states or coalitions of weaker states can share a
flexible forum to more vigorously participate in
agenda-setting and bargaining rather than be
overwhelmed by current consensus "bullying."
Dynamics: Once treaties are in force, the role of the
Internet is dynamic- allowing multilateral treaties to be
monitored as well as sustaining and documenting the
levels of international cooperation on a continuing
basis. The U.N. Office of Outer Space Affairs for
creating a web site which enhances multilateral treaty-
making.
Although the Moon Treaty was negotiated before the
advent of the Internet, and will examine some of the
reasons why the youngest of the outer space treaties has
suffered such low levels of state ratification and how the
Internet could have been instrumental in helping the treaty
acquire universal acceptance. The following reasons for the
failure of many developing countries to ratify the Moon
Treaty:
lack of knowledge of the benefits of space technology
lack of sharing and transfer of space technology with
space faring nations
lack of general involvement in space activities, and
lack of domestic regulatory rtlt.'chanisms to ensure
compliance with treaties.
In addition, current research indicates that the ambiguity
of terminology concerning property rights, the common
heritage of mankind, and a future legal regime to govern
exploitation of the moon "as such exploitation is about to
become feasible" have hindered stc\ tes' ratification process.
One of the key requirements in thb regard is capacity-
building and the recognition that OOSA's web site is more

than the sum of its individual components. Its horizontal
networks across agencies and its vertical networks from
individuals to communities disseminate valuable
information which facilitates cooperation for the mutual
benefit of all countries. The aaSA web site builds capacity
by striving to improve the use of space science and
technology for the economic and social development of all
nations.
The Opening Page also lists agendas of committee
meetings, current regional space workshops, indexes of on-
line reports of the Legal Subcommittee, the Scientific and
Technical Subcommittee, and international space
symposiums. The web site is also user-friendly. It should
be noted that the site's PDF and HTML files are available
in all the official languages of the United Nations: Arabic,
Chinese, English, French, Russian, and Spanish. However,
it is not uncommon to find documents in other languages
such as German, Norwegian, or Portuguese. The
availability of important documents in different languages
is applaudable considering the astronomical costs of
translating the colossal amount of written text. In addition
to these documents, direct links are also available for
accessing the original outer space treaties on the United
Nations web site.
The What's New page announces new upcoming
events including committee meetings, summits, and
workshops. Because the web site has a global focus and is
built on the premise of improving communications and
information access, it requires a broad and holistic view of
capacity development ranging from space applications for
disaster management to youth programs and space
education. The Programme on Space Applications (PSA)
page helps address two of the reasons why the Moon Treaty
has failed to achieve universal acceptance:
lack of knowledge of the benefits of space technology,

and
lack of sharing and transfer of space technology with
spacefaring nations.
Since its creation the PSA has made substantial progress in
furthering the knowledge and experience of space
applications around the world. The OOSA web site helps
to promote education, research and development support,
and technical advisory service. In sum, the mission of the
PSA web pages is to enhance the understanding and
subsequent use of space technology for peaceful purposes
in general, and for national development, in particular, in
response to expressed needs of different geographic regions
of the world. The site also builds capacity by promoting
greater cooperation and exchange of actual experiences in
space science and technology; for example:
Development of a fellowship program for in-depth
training of scientists, technologists and applications
specialists
Organisation of seminars on advanced space
applications and new system
Stimulation of the growth of indigenous nuclei and an
autonomous technologicaJ base in space technology in
developing countries
Dissemination of information and advanced
technologies and applications.
One of the reasons offered for the low level of signature and
ratification of the Moon Treaty was the lack of states'
general involvement in space activities. The OOSA web site
helps attract states' attention by linking space technology
with domestic issues in a number of ways. For example, in
1999 the Third Conference on the Exploration and Peaceful
Uses of Outer Space (UNISPACE III) was organised with
two main goals:
To promote the use of space technology in solving

problems of a regional and global nature; and
To further strengthen the capability of Member States,
particularly developing countries, in the use of space-
related technologies for economic, social and cultural
development.
To meet these goals, the OOSA web site announces
symposiums and workshops to attract domestic
interest.
Moreover, the UNISPACE III pages allow access to valuable
background papers, General Assembly resolutions,
international treaties, and other documents; for example:
Disaster Prediction, Warming and Mitigation
- Management of Earth Resources
- Space Communications and Applications
During World Space Week in 2001 Secretary-General Kofi
Annan addressed the United Nations: Space technology
has produced tools that are transforming weather
forecasting, environmental protection, humanitarian
assistance, education, medicine, agriculture and a wide
range of other activities.
And of course, a fascination with space leads many
young people to pursue careers in science and technology,
helping developing countries in particular to build up their
human resources, improve their technological base and
enhance their prospects for development. By promoting
World Space Week, the OOSA web site helps the
organisa tion to:
educate people around the world about the benefits
they receive from space,
encourage greater use of space for sustainable
economic development,
demonstrate public support for space programs,
excite children about leaming and their future,
promote institutions around the world that are

involved in space, and
foster international cooperation in space outreach and
education.
Other events promoted by the OOSA web site are essay
contests, summer camps, and student internships. For
example, the OOSA web site provides links to UNESCO
pages which sponsor an annual student essay contest.
The contest is open to all high school students between
15 and 18 years old and girls are encouraged to participate.
The OOSA site also provides links to the International
Institute of Spatial Information Technology (IISIT) of the
People's Republic of China, the Institute of Remote Sensing
and GIS of Peking University and UNESCO (Beijing Office)
which hold summer camps in Peking. The objective is to
enhance the knowledge, understanding and interest of high
school students in aviation and aerospace.
NETWORKED INFORMATION
Many universities are now impfementing institutional

authentication and access management systems to support
specific application and security requirements; typically
this work is being overseen by the institution's information
technologists and it is often motivated by policy
imperatives and priorities that are very different from those
of the library. Because authentication and access
management represents a common infrastructure that will
ultimately need to serve many purposes, it is essential that
libraries and information technology groups establish a
dialogue now to ensure that the systems which are
ultimately deployed honour the complete set of campus
community requirements. Organisational readiness within
the content provider community is al~o a barrier to the use
of advanced technology-based approaches; content site
operators are no more able than libraries to obtain
commercial turnkey solutions.
But as libraries offer a growing portfolio of network-

based information resources, they face immediate and
pragmatic needs for access management. In developing
strategies, it is important to recognise that technical choices
about access management have policy implications. A
library negotiates a license agreement on behalf of its
patron community-for example, the faculty, staff, and
students of a university. This license agreement gives
members of the community the right to use some network-
based resource-perhaps a commercially offered electronic
encyclopedia or scholarly journal at a publisher website, or
a specialised research database at another university that is
part of a resource-sharing agreement. Users connect to the
site using web browsers running on their personal
workstations or on public workstations.
As the site hosting the licensed resource processes
these requests for web pages, it must determine whether
the requesting user is in fact a member of the appropriate
user community. If so, he or she is given access; if not,
access is refused. Obviously, this basic scenario can be
elaborated upon-for example, there is a need for finer-
grained systems that can be used to allow only registrants
in a specific multi-institutional course access to electronic
reserve materials provided by the library at one of the
institutions.
These more elaborate situations raise additional
problems both in terms of scale and system design and are
useful to have in mind to uI1derstand the extent of the
requirements; however, we will concentrate on the basic
scenario here. Access management needs to be routine and
easy to implement; once a contract is signed, lengthy
technical negotiations between institution and content
supplier should not be necessary before users can have
access.
In a world of networked information resources, access
management needs to be a basic part of the infrastructure,
and must not become a barrier to institutional decisions to

change or add resource providers. While it may be a
complex administrative question to determine the
membership of the patron community in a world of
extension students, affiliated medical professionals,
visiting scholars, and others who may blur the boundaries,
libraries have a great deal of experience in devising these
policies.
In the physical world of circulating materials, the
"technical" problem is dealt with through the presentation
of a library card indicating that the holder is indeed an
authorised borrower; this library card is issued upon the
presentation of policy-specified documents that identify the
user and prove that he or she meets the policy criteria for
authorised community membership. Historically, many
computer systems employed user IDs and passwords.
When a user ID was first defined, it would have certain
rights and privileges associated with it. The user would
then supply a password upon demand to demonstrate his
or her rights to use a given ID.
Leaving aside technical problems associated with the
use of passwords on an insecure network and the possible
technical remedies, there is a more basic architectural issue.
We are in a world where users may routinely want access
to many different network resources controlled by many
different organisations, such as publishers; but with the
ability to follow links on the Web it may not even be clear
to the user which publisher owns what resource.
Users will not be able to remember and manage a large
number of different user IDs and passwords issued by
different publishers; further, each publisher will need some
cumbersome procedure to validate the user as a community
member prior to issuing a user ID and initial password.
This does not scale up in a practical way to a world filled
with electronic information resources. The library might
issue the user a single TO and password for all licensed
external network resources, and then transmit lists of these

IDs and passwords to each content supplier so that the
supplier can use the list to validate users.
However, there are architectural problems here, as
well: a very large number of publishers would need to be
notified every time the list of user IDs and passwords
changes, with inevitable timeliness and synchronisation
problems. Also, each publisher takes on an enormous
responsibility for protecting the security of the 10/
password list; a security breach at any publisher will mean
a security breach at every publisher doing business with the
library for networked resource access, an unacceptable
liability. Here the networked information scale-up means
that too many independent parties must rely on each other
to maintain security, and that the cost of accurately
maintaining a synchronised common-access management
database will be very high. As a stopgap measure, many
institutions are currently using electronic "place"-the user's
source IP network address-as a substitute for other methods
of demonstrating proof of community membership. If the
user's connection request came from a network that
belonged to the university, it is assumed to be from a
community member.
Network ownership does not change too frequently,
so maintaining a list of valid network numbers and making
this available to publishers is a tractable administrative
burden. Further, since the list of network numbers, unlike
IDs and passwords, doesn't need to be kept secret, there is
little interdependence. This approach works well and,
indeed, has the virtues of simplicity and transparency to the
user, as long as all users come to the resource providers
through the campus or library network.
However, in an era when many institutions are
discontinuing dial-up modem pools in favour of
commercial Internet service providers, and when new
access technologies to the home, such as cable television
modems for Internet access, are beginning to deploy, a great

deal of legitimate user access now takes place from sources
other than the campus or library network. With growing
needs to support distance learners, part-time students who
may do academic work from their place of business, and
people who want to exploit the "anytime, anywhere"
promise of networked information resources from their
homes, limiting access by source IP network address
disenfranchises more users every day.
For most universities, for example, it is clearly no
longer acceptable to tell community members they cal! only
access networked information resources from on-campus
workstations.
Two general approaches are emerging to address the
access management problem. The first is the use of proxies.
Here, an institution develops an internal authentication
system and uses it to validate user access to a special
machine on the institutional network, called a proxy. Once
a user is validated, he or she can ask the proxy to contact
an external resource; in some variations, the proxy mediates
the user's entire interaction with the external resource,
while in other variations the proxy drops out of the
interaction after making the introduction.
From the resource operator's point of view, it is only
necessary to know that the institution's proxy server can be
trusted to pre-validate all users before contacting the
external resource host.
Proxies have a number of advantages: authentication
is an internal matter to the institution and external service
providers need not be concerned with the details of how
this is accomplished; at least in theory, the institution has
complete flexibility in deciding what resources a user can
have access to through the proxy and when; and the proxy
can act as a central control point for the institution in aCcess
management.
On the negative side, the proxy is a high-impact

central point of vulnerability for outages or capacity
problems (though, of course, it's possible to have multiple
proxy machines), and configuration management in the
proxy can become extremely complex and labour-intensive,
particularly if not every valid proxy user has access to all
resources. Further, proxies do not eliminate the need for an
authentication system; they only isolate its scope to the
proxy and the members of the institutional community.
The other approach is based on credentials. The basic
idea here is that the institution issues each user the
electronic analog of a community ID card. The user, or the
user's browser, presents these credentials upon demand to
any resource provider that requests them; the resource
provider can then, through a fast electronic transaction,
validate the credentials with the issuing institution.
The validation process shares with proxies the
vulnerability to outages or capacity problems on the part
of institutional systems that verify credentials, though these
vulnerabilities are more circumscribed. A compromise of
the credential-verification system may be more serious than
compromise of a proxy: proxy compromise usually means
that unauthorised users get access to resources for the
period during which the proxy is compromised, while a
compromise in the credentialing system may well mean
that new credentials need to be issued to the entire
authorised user community.
The major practical difficulties with the credentials-
based approaches, however, involve technical problems,
standards, cost, and software integration. The simplest
credentials-based approach would be to have the
institution just issue the user a user ID and password for
external resources, and to have external resource providers
validate the ID/password pair with the institution.
This reduces, but doesn't eliminate, the interdepe-

ndence among external resource providers in the
maintenance of security; further, standards don't exist for
such a validation process and no off-the-shelf software
supports it. The industry is moving towards a technology
based on public key cryptographic certificates (X.509) in off-
the-shelf software, and, at least in theory, this should work
well for personal machines.
X.S09 removes the interdependent security issue
because credentials are computed for each use from
information that is held only by the user and never directly
transferred to the content provider. The problems with this
approach include integration with browsers, the mechanics
of issuing and distributing these electronic cryptographic
certificates to users, the cost and complexity of acquiring
and operating the infrastructure for managing and
validating public key certificates, and government
regulation of the export and use of cryptography in various
countries, which causes problems in an increasingly
international world of scholarly resources.
All of these problems with cryptographic certificates
are slowly but steadily getting better (except, perhaps, for
the government regulation issues), but, at least today,
implementation of such an approach is an enormous
challenge. Finally, it is important to note that X.509 was
really designed to support applications such as electronic
commerce.
The development of any access management strategy
raises policy issues in areas such as privacy, accountability,
and the collection of management data. It is important to
recognise that libraries must decide whether to address
these issues through legal means, through technical means,
or by a combination of the two. Library experience in other
contexts offers some insights. Libraries have aggressively
championed and defended the privacy of their patrons.
They have done this through both legal means-by

developing privacy policies and by requiring a legal
subpoena as a condition for divulging records-and also by
technical means by not keeping historical circulation data
on an individual basis, which then limits the amount of
information that they can be compelled to disclose under
any circumstances. Patron privacy has been such an
important value to libraries that they have, by and large,
used a dual technical/legal strategy to provide their
patrons with the strongest possible protection.
In the electronic environment, it is easy for a publisher
to track the use of content in great detail-what material is
being viewed, for how long, and how often. Depending on
how the access management system is structured,
particularly when credential-based approaches are
employed, the publisher may be able to correlate this usage
information to a specific individual by name, to a long-term
"pseudonymous" identity that the publisher can link to an
institution but not to an actual individual within that
institution's community, or simply to a transient and
anonymous member of the institution's user community.
, Clearly, the license contract between institution and
publisher can speak to the collection, retention, use, and
disclosure of such usage data on a policy basis, but libraries
and patrons may find it desirable to limit the ability of the
publisher to collect information by the design of the access
management system on a technical basis, as well.
One important point needs to be made about user
privacy that underscores the need for contractual
constraints even in conjunction with an access management
system that provides some level of anonymity: often users
will make their actual identities known to content providers
for other reasons, independent of the access management
system, such as to take advantage of email-based current
awareness services or personalisation options in a user
interface. .
The access management system is not the only way in

which privacy can be compromised, or bargained away for
increased function or conv~nience. A license agreement
represents a commitment on the part of the licensee
institution to honour the terms of the license, and to educate
members of its community about their obligations under
the license. The publisher and library share a need for some
level of accountability by community members: if a single
user accesses publisher content hundreds of times a day
from three continents, it's likely that something is wrong;
perhaps that user doesn't understand his or her obligations,
or perhaps credentials have been compromised.
. There is a need for the publisher and the library, acting
together, to be able to investigate such situations
effectively, and, if need be, to block access by specific
individuals who seem to be violating the terms of a license
agreement.
But, in order to do this effectively, a publisher needs
to be able to at least provide an institution with enough
information to permit the individual in question to be
identified; note that this does not necessarily mean that the
publisher can directly identify the individual in question,
but only that the publisher can provide the institution with
enough information to identify the user.
The need for accountability contradicts, to some
extent, the mandate to design anonymity into an access
management system, and argues for a pseudonymous.
approach. This can be achieved with credential-based
approaches, but is more difficult with proxy-based
architectures. Finally, there is the issue of management
data.
Electronic information resources promise libraries
much more accurate and detailed data about what content
is actually being used and how often-though even at this
level libraries may want to make contractual stipulations to
protect patron privacy; for instance, it is not clear how

many universities would be comfortable having a list of all
the articles read by members of their community posted on
the Web every week. But greater problems arise when
libraries want to have these usage statistics, at whatever
level of aggregation, demographically faceted-for example,
to drive internal cost allocation processes within the
library's institution.
The simplest solutions are often to pass demographic
attributes to the publisher along with identities or
pseudonyms and to get the publisher to do the work of
generating management data for the library-but this path
can rapidly compromise the privacy of pseudonymous
users by making them more identifiable and, if actual
identities are used, it makes the privacy problem even more
acute by raising the stakes on the amount of information
disclosed.
Demographically faceted usage data offers a good
illustration of the scaling issues that we face in the move
to network-based electronic information. If a library were
dealing with just a single publisher, it would be reasonable
to have the publisher return detailed (transactional) usage
data with pseudonyms to the library.
The library could then look up the pseudonyms to
obtain demographic data and summarise the transaction
data into management information. Here privacy is
protected by library policies and does not depend on the
publisher, who knows only pseudonyms. But this is
intractable with hundreds of publishers, each supplying
transaction data at different time intervals and in different
formats.
REFERENCES
Abid, A., Memory of the World: pres('wing our documentary heritage. Paris:
UNESCO, Information and Informatics Division, July 1997.
Bearman, D., Archival data management to achieve organizational

accountability for electronic records. Archives and Manuscripts, 21,
1993, 14-28. Reprinted in: Bearman, D., Electronic evidence.
Pittsburgh, Pa.: Archives and Museum Informatics, 1994, 12-33.
Cook, T., The impact of David Bearman on modern archival thinking:
an essay of personal reflection and critique. Archives and
Museum Informatics, 11, 1997.
Cox, R.J., Electronic information technology and the archivist: bright
lights, lingering concerns. American Archivist, 55, 1992.
8
Classification in Digital Libraries
In most databases, including catalogues on the Web, the

searcher may find it difficult to comprehend the
organizational s~cture that has been imposed upon the
materials. This is not due simply to the often exotic
notations of a scheme or to the surface characteristics of the
classificatory data. Rather, the problem is often a product
of a iack of match between the structure imposed upon the
retrieval system by the classification scheme and the user's
individual knowledge structures and search strategies.
Classification research has responded to this problem
by collecting the terminology of individual users and
compiling the results to generate larger, broader, and, it is
hoped, more successful sets of access points for users-i.e.,
If we design an end-user thesaurus, that should do the trick.
Because failure of the classificatory structure to support
user access is generally interpreted as a mechanical
question of matching between different individual
knowledge structures-Le., among those of the searcher,
the author, and the librarian as mediator-the underlying
relationship between user access and the collective
knowledge structures that are the basis for knowledge
production has not been widely recognized. From the
perspective of the sociology of science, Star has argued that
the Turing test, which is intended to measure the degree to
which an expert system is able to perform as a human
2ID' Advancement in Library and Information Sdence
expert in its interaction with individual users, should be

replaced by a "Durkheim test," where the system is
evaluated on its ability to support the goals of a specific
community of users.
Star points out that scientific work is not all one piece
but is distributed and heterogeneous, with differing
viewpoints emerging only to be reconciled within the
existing knowledge base. In her view, information systems
should not be designed simply to represent consensus but
to accommodate the dissent that can be expected to appear
among the various communities participating in their use.
To this end, she brings forward the concept of boundary
objects as a method for resolving problems of heterogeneity
in knowledge production and use or, in terms of library and
information science (LIS), problems of variation or
inconsistency in the representations by information
producers, information mediators, and information users.
Hjorland argues for a philosophical and sociological
orientation for cl~ssification research. In his view, the
problem of the searcher's uncertainty is a function of
relative task uncertainty in the user's problem domain.
Because information searching takes place within a
particular social framework-e.g., an academic discipline-
task uncertainty in searching is often the result of the
relative task uncertainty within the discipline itself.
Albrechtsen and Hjorland have earlier shown how such
task uncertainty within knowledge domains may be a
function of various social factors involved in the production
of knowledge, such as the degree of interdisciplinarity or
maturity within a domain. Such uncertainties will not only
be manifest in the searchers' difficulty in formulating
queries for IR-systems but will also be inscribed in the
relative plasticity and variety of the concepts and
terminology applied within the domains.
Classification research has too often neglected such
broader social backgrounds that inform information
Oassification in Digital Libraries 211
searching and knowledge organization and has relied,

more or less implicitly, on either a one-size-fits-all
paradigm (rationalism) or on the acclflIlulation of data
about user behaviour (empiricism). While the rationalist
approach argues that we just need to get everyone to
understand this, the empiricist counters that we just need
to get more data about users and proceeds to collect more
or less meaningful sets of "facts" on the individual user's
relative success measured as the number of "hits" resulting
from a series of search queries.
The different approaches to classification research and
practice into two broad epistemological categories:
Rationalism/ Empiricism on the one side and Historicism/
Social Constructivism on the other. Both rationalism and
empiricism are based on assumptions regarding the nature
of truth and the objectivity of knowledge. From the
empiricist approach, knowledge is reduced to sensory
observations or facts. In classification research, empiricism
is the prevalent epistemology in bottom-up thesauruS..-
construction based either on user warrant or on
terminology warrant, particularly when the process lacks
grounding in a theory of knowledge.
In contrast, rationalism strives to reduce knowledge to
an all-embracing structure of concepts that is intended to
be universally comprehenSive. It is, for example, the
epistemological foundation for Ranganathan's notion, of
universal facets. Rationalism is also closely related to more
sociopolitical actions undertaken by a particular agency or
from a specific disciplinary viewpoin~-i.e., actions which
are intended to impose one view of knowledge on all
research and practice within that domain.
In a paper discussing the role of dialogue in the
development of classificatory structures, Jacob and
Albrechtsen have shown how the American Psychiatric
Association's construction of DSM-IV, the international
classification for menta] disorders, used dialogue to create
a device for marginalizing and eliminating the viewpoints

of competing professions such as psychology. In short, both
empiricist and rationalist approaches to classification are
primarily looking for invariant structures that can be
imposed on encyclopedic knowledge or data compiled
from local observations.
In contrast to these more formalized structure-seeking
approaches to classification, social constructivism, or
historicism, offers a view of knowledge as a product of
historical, cultural, and social factors, where the
fundamental divisions and the fundamental concepts are
products of the divisions of scientific/cultural! social
labour in knowledge domains. According to a social
constructivist epistemology, the concepts and the structures
are inseparable in a classification system, and hence the
schemes must reflect the development, variety, plasticity,
and use of both within a particular knowledge domain. This
implies that scheme designers are not primarily looking for
ways to impose one single structure on knowledge,
including one set of all-embracing facets.
Rather, the designers should operate as "epistemic
engineers," attempting to articulate and represent the
dynamics of knowledge in such a way that the searcher can
proceed from the topic of his initial query to other related
perspectives on the same topic or to related materials
within the same knowledge domain. In this manner,
epistemic engineering of classificatory schemes can provide
for multidimensional classification schemes where the
concepts are represented in a variety of different conceptual
structures, functioning to articulate the multiple discourses
performed in different domains. In the role of epistemic
engineer, then, the scheme designer operates as an active
participant in the process of knowledge production and
mediation.
Such involvement on the part of the classificationist is
particularly evident in areas of interdisciplinary research
Classification in Digital Ubraries 213
that engage participation from many different professions.

The HIV / AIDS vocabulary, developed by Huber and
Gillaspy, provides an illustrative example of such
involvement on the part of the scheme designers. This
system, which was not intended as a classification per se
but as a mediating vocabulary, was developed to support
dialogue between the different communities involved with
the HIV / AIDS epidemic, including clinical and medical'
researchers, practitioners of alternative medicine,
nutritionists, psychotherapists and other professionals, as
well as those individuals who are either living with the
disorder themselves or are caring for someone who has
contracted the disease.
The HIV / AIDS vocabulary is built on a theory of
knowledge generation that explicitly eschews the standard
life cycle for knowledge production in medicine-a
knowledge cycle that proceeds in a top-down fashion from
theory developed at universities and other research
institutions, to applied clinical research, to daily clinical
application. Rather, according to the epistemological view
driving the HIV / AIDS vocabulary, research in lived
experience must necessarily feed into basic clinical
research.
Accordingly, this scheme was not developed solely as
a tool for retrieval of information in the database of the local
community, but as a tool for facilitating communication
both within and across diverse interest groups, from the so-
called layman to the cloistered scientist. In its role as
communicative facilitator, the scheme is also hospitable to
adaptations and extensions as an indexing language in local
contexts. For instance, specific drug names are not
articulated in the scheme but are left to local instantiations
of the indexing language. In Star's terms, the HIV / AIDS
scheme serves as a boundary object precisely because it
supports cooperation and common understandings among
the various interest groups touched by this epidemic.
214 Advancement in Library and Information Sdence
DIVERSE INFORMATION ECOLOGIES CLASSIFICATION
American anthropologists Nardi and Q'Day have

introduced the concept of diverse information .ecology" to
II
describe the sociotechnical network of heterogeneous

materials, people, and practices that constitutes a modem
library:
What we learned in the library suggests the possibility
of a socio-technical synthesis, an opportunity to design
an information ecology that integrates and
interconnects clients, human agents and software
agents in intelligent ways congenial to extending
information access to, potentially, all of humanity. As
we design the global information infrastructure, the
ultimate goal should be to design an ecology, not to
design technology.
Because information ecologies are situated within human
practice, they are dynamic and constantly changing. An
information ecology cannot be controlled by anyone single
agency but evolves through the collaboration of
heterogeneous socio-technical networks, whose elements
strive constantly to achieve coherence and wholeness. The
notion of an information ecology also implies a collective
view of information systems as striving to meet
heterogeneous community goals rather than. the goals of a
single agency or individual.
In their study of two research libraries in software
companies in the United States, Nardi and Q'Day explored
how the work practices and expertise of librarians can serve
as a model for the design of computerized information
services. They found that librarians are exemplary agents
who evince particular expertise not only in communicating
with users but also in searching for information. These two
skills are closely interrelated in that the librarian's search
strategy tends to evolve in collaboration with the user's
project.
Classification in Digital Libraries 215
Nardi and Q'Day propose to extend this working

relationship between the librarian and the user to the
collaborative design of information ecologies. In an
information ecology, a classification system should
function as a boundary object, supporting coherence and a
common identity across the different actors involved. In its
role as boundary object, a classification would be weakly
structured in common use, while remaining open to
adaptation in individual communities.
Across diverse information ecologies, classification
schemes would function as discursive arenas or public
domains for communication and production of knowledge
by all communities involved. This approach to the
development of classification schemes also implies that the
task of constructing such a scheme would no longer be
invisible work. This view of classification systems is in line
with the concept of "coordination mechanisms" in
distributed collaborative work, as put forward by Schmidt
and Simone. More importantly, the understanding and
appreciation of classification schemes as boundary objects
and discursive arenas, in cooperation with heterogeneous
user groups ~nd technology, engages the library as a
facilitator of connections and ensures its continuing
participation as an active contributor in the general process
of knowledge production.
In a recent European study of public libraries in the
information society, it was demonstrated that public
libraries have progressed through three distinct stages,
evolving from manual paper-based services, via the
automated library, to the current phenomenon of the
electronic multimedia library. This progression should not
be understood to imply that the current status of libraries
has been driven entirely by technology. Rather, the
electronic multimedia library must be understood from a
more integrated socio-technical point of view, where the
various actors, including librarians, computer suppliers,
and researchers in computing and information science,

constitute a heterogeneous network of agencies that bring
certain technologies to the foreground while marginalizing
others. In the recent development and use of
communication technology, for example, there is a
convergence of hitherto separate, even disparate, media
and activities.
This is apparent in the development .and application
of Web technology, which integrates text-based materials,
graphic illustrations, and audio materials with interactive
features such as online conferences and e-mail. It is
characteristic of this development that the technology is not
only plastic and customizable to almost any context of use,
rather like a boundary object, but is constantly renegotiated
and redeveloped through such use.
In the recent past, manual paper-based libraries
focused on collection building. Intermediaries, or
librarians, served both as collection builders and as agents
controlling and interpreting the order of the libraries.
Classification systems were frequently standardized in
order to support interlibrary cooperation with the result
that classification research was itself dominated by the
development of universal schemes which could be adopted
by central agencies to control the organization of
knowledge across libraries.
As a result of such standardization, classification
became invisible work performed without regard to the
needs of the local community of users. Because
maintenance and development of these classification
schemes was often based on literary warrant, reflecting
only those subjects represented in large national collections,
they can be interpreted as imposing an implicitly empiricist
view of knowledge. There was, then, at this stage in the
library evolution, a mix of rationalist and empiricist
epistemologies underlying classification, research and
development. The role of the librarian as intermediary was
challenged during the 1980s by the development of online

retrieval systems and, in particular, by the introduction of
online public access catalogues (OPACs) for end-user
searching.
During this decade, classification research was
dominated by work on thesauri and indexing systems.
There were numerous experiments with automated
indexing, including the application of text analysis
techniques developed in computational linguistics. OPAC
development was often based on studying users,
sometimes in naturalistic settings, but generally without
prior analysis of their different social worlds -or the
functional role of libraries in knowledge production and
mediation. Research in information retrieval systems was
very much oriented by a mechanistic conception of human
competence in information searching, indexing, and
classification, thereby neglecting the variety and
heterogeneity with which human agents (both librarians
and users), information sources, and technology interact in
different settings.
During the 1990s, the library has increasingly switched
its service emphasis from building and guarding the
collection or offering users access to the collection through
the local OP AC to providing local access to global
information resources available on the World Wide Web.
This represents a shift from a closed to an open system. In
some European public libraries, for example, traditionally
introverted and bureaucratic organizations have migrated
toward a project-oriented culture, where librarians and
users cooperate on the development of new services, using
the interactive affordances of Web technology and the
Internet. In general, such projects have not involved the
library schools in Europe, the traditional research
communities in the library and information sciences.
Close cooperation between libraries and the
community of LIS researchers in Europe has yet to be
manifested. In the United States, communities of LIS

researchers have come together in workshops and research
projects related to the social informatics of what are called
IIdigital libraries" but could equally well be termed
"electronic libraries". In this research area, major topics
include how knowledge IS structured in digital libraries,
including cataloguing and classification, and how digital
libraries are used-Le., how knowledge is produced,
communicated, applied, and recycled in distributed social
worlds. Research methods comprise ethnographic studies
of communication and knowledge production in (digital)
libraries as well as comprehensive sociological studies of
professional classification schemes in medicine and
nursing. Thus it seems apparent that classification research
is gradually evincing a more sociological and historical
orientation.
Ballerup Public Library is a medium-sized Danish
library on the outskirts of Copenhagen. There is, in this
library, a tradition of direct collaboration between the
librarians and their users. Until recently, a majority of the
librarians regarded themselves as cultural workers-as
intermediaries between collection and user, very much in
line with the traditional perspective described above for
libraries in the manual stage. In 1995, the library started a
new project called Database 2001.
This project, which was evaluated by Albrechtsen,
involved the development of an enriched multi media
catalog on the Web. In addition to the evaluation
researcher, the project group for Database 2001 included six
librarians with different areas of expertise: several in the
group were experienced intermediaries and online
searchers, while others were specialists in catalog design
and in the management of the library's technological
resources. However, none of the librarians had experience
with Web design or Internet browsing.
During the development of Database 2001, the project

group collaborated with user groups and colleagues in the
library to identify different kinds of materials, including
books, musical recordings on CD, CD-ROMs, and
audiotapes of books. Text, pictures, and sound were
selected as enrichment for the database, the idea being to
emulate a kind of virtual library on the Web. The menus
were designed as graphical layers of icons representing
both user groups and the kinds of materials available.
The subject icons in Database 2001, which represent
the subject content of materials in the database, went
through several iterations. In addition, the interface
designed for browsing the menus was customized for both
children and adults. The librarians arranged evaluation
sessions with users who represented different user
communities and their evaluations were very positive;
users with different interests were able to use the icon-
based interface for browsing in the database even though
they had very different interests and different goals for
searching.
In the database, documents were indexed using
standard call numbers from the Danish variant of the
Dewey Decimal Classification (DOC). Even though
indexing by class number would take advantage of the
hierarchical structure of DOC and thus would be
potentially useful for browsing by users, the librarians
knew from their practice as intermediaries that users found
it very difficult to understand the standard classification.
They experimented with a more pragmatic and much more
weakly structured classification which could reflect the
kinds of questions actually posed to library staff by the
different user groups. For example, for subject browsing by
children, they worked with the seven categories listed
below and designed a unique icon to be used on the Web
site:
computers;
astronomy, nature, animals, environment;
first love, star signs, being young today;
horses;
excitement, humor;
fantasy, science fiction; and
books that are easy to read.
From a semantic or disciplinary point of view, the
separation of subjects like animals and horses would
appear to be "incorrect" or "illogical." For the children,
however, this classification worked very well. Category 2
(astronomy, nature, animals, environment) was intended
for a broad group of interests, ,including fact literature,
whereas category 4 (horses) was intended, in particular, for
girls interested in novels about horses. There is, in
Denmark, a special research tradition within children's
librarianship, based on Wanting's research on how children
ask questions in libTaries, that advocates mediating
literature according to the different user interests of
children. Pejtersen has also studied children's use of
libraries in Denmark and their communication with
librarians.
In her development of the Book House system in the
1980s, Pejtersen used a collaborative prototyping approach,
engaging librarians, information scientists, and users in
Danish public and school libraries, and subsequently
designed a special interface of subject icons for browsing
of the Book House system by children. Database 2001 took
advantage of both of these research approaches to
children's information searching. The Book House is a
retrieval system for fiction and is based on a general
conceptual model that seeks to surround users with an
adequate resource space within which to situate their own
search spaces.
The design involves multidimensional representations

of different kinds of user needs, search strategies, and
literary paradigms as well as authorial intentions. This
multidimensional structure for subject access is intended to
match the different levels of user interest. The system
interface is constructed around the metaphor of a "house
of books," guiding the users through the rooms of a library
where they can browse the collection. Users can also switch
between different search strategies, including analytical
search in the multidimensional database structure,
visualized as icons for each dimension, and browsing of
subjects, visualized as icons in a picture gallery. The design
of these icons involved classification experiments using
both word association experiments and evaluations of
suggested icons in Danish public libraries.
The icons for brOWSing subjects in the Book House and
in Database 2001 serve similar functions-to provide the
users with an overview of the subjects included in the
databases. Because the Book House system builds on the
central design metaphor of rooms in a library, it provides
a single uniform interface. Database 2001, in contrast, is
realized as a mixture of interfaces that include the Web
layer of icons, designed by the librarians; a more or less
standard search client offering conventional text-based
searching; and a database structured according to a
standard cataloguing format that uses traditional call
numbers to represent the subject content of the documents.
While the Book House is a general system for fiction
retrieval, which in its present form cannot be customized
by individual libraries to support the idiosyncratic needs of
specific user communities, Database 2001 is a localized
experiment with system design and classification drawing
upon a range of technologies that reflect the heterogeneity
of tools used in today's libraries, from conventional
customizable applications such as the closed systems of the
database and the search client to the open systems

supported by interactive Web technologies.
AUTOMATIC CLASSIFICATION
Classification is the process by which a classificatory

system is constructed. All classifications, even the most
general are carried out for some more or less explicit
"special purpose" or set of purposes which should
influence the choice of classification method and the results
obtained. The purpose may be to group the documents in
such a way that retrieval will be faster or alternatively it
may be to construct a thesaurus automatically. Whatever
the purpose the 'goodness' of the classification can finally
only be measured by its performance during retrieval.
There are two main areas of application of
classification methods in information retrieval:
- keyword clustering;
- document clustering.
The first area is very well dealt with in a recent book by
Sparck Jones. Document clustering, although
recommended forcibly by Salton and his co-workers, has
had very little impact. One possible reason is that the details
of Salton's work on document clustering became
submerged under the welter of experiments performed on
the SMART system. Another is possibly that as the early
enthusiasm for clustering wanted, the realisation dawned
that significant experiments in this area required quantities
of expensive data and large amounts of computer time.
Good and Fairthorne were amongst the first to recommend
that automatic classification might prove useful in
document retrieval.
A clear statement of what is implied by document
clustering was made early on by R; M. Hayes: 'We define
i
! the organisation as the grouping together of items which
!
are then handled as a unit and lose, to that extent, their
individual identities. In other words, classification of a

document into a classification slot, to all intents and
purposes identifies the document with that slot. Thereafter,
it and other documents in the slot are treated as identical
until they are examined individually. It would appear,
therefore, that documents are grouped because they are in
some sense related to each other; but more basically, they
are grouped because they are likely to be wanted together,
and logical relationship is the means of measuring this
likelihood.' In the main, people have achieved the 'logical
organisation' in two different ways.
Firstly, through direct classification of the documents,
and secondly via the intermediate calculation of a measure
of closeness between documents. The first approach has
proved theoretically to be intractable so that any
experimental test results cannot be considered to be
reliable. The second approach to classification is fairly well
documented now, and above all, there are some forceful
arguments recommending it in a particular form. It is this
approach which is to be emphasised here.
The efficiency of document clustering has been
emphasised by Salton, he says: 'Clearly in practice it is not
possible to match each analysed document with each
analysed search request because the time consumed by
such operation would be excessive. Various solutions have
been proposed to reduce the number of needed
comparisons between infor~tion items and requests. A
particular promising one generates groups of related
documents, using an automatic document matching
procedure. A representative document group vector is then
chosen for each document group, and a search request is
initially checked against all the group vectors only.
Thereafter, the request is checked against only those
individual documents where group vectors show a high
score with the request.'
Methods of Automatic Classification

Let us start with a description of the kind of data for which
classification methods are appropriate. The data consists of
objects and their corresponding descriptions. The objects
may be documents, keywords, hand written characters, or
species (in the last case the objects themselves are classes
as opposed to individuals). The descriptors come under
various names depending on their structure:
multi-state attributes (e.g. colour)
binary-state (e.g. keywords)
numerical (e.g. hardness scale, or weighted keywords)
probability distributions.
The fourth category of descriptors is applicable when the
objects are classes. For example, the leaf width of a species
of plants may be described by a normal distribution of a
certain mean and variance. It is in an attempt to summarise
and simplify this kind of data that classification methods
are used. Some excellent surveys of classification methods
now exist, to name but a few, Ball, Cormack and Dorofeyuk.
In fact, methods of classification are now so numerous, that
Good has found it necessary to give a classification of
classification. Sparck Jones has provided a very clear
intuitive break down of classification methods in terms of
some general characteristics of the resulting classificatory
system. In what follows the primitive notion of 'property'
will mean feature of an object. We quote:
Relation between properties and classes
- monothetic
- polythetic
Relation between objects and classes
- exclusive
- overlapping
Relation between classes and classes
Oassification in Digital Ubraries 22S
- ordered
- unordered
The first category has been explored thoroughly by
numerical taxonomists. An early statement of the
distinction between monothetic and polythetic is given by
Beckner: 'A class is ordinarily defined by reference to a set
of properties which are both necessary and sufficient (by
stipulation) for membership in the class. It is possible,
however, to define a group K in terms of a set G of
properties f1' f2/" •• , fn in a different manner. Suppose we
have an aggregate of individuals such that
each one possesses a large (but unspecified) number of
the properties in G;
each f in G is possessed by large number of these
individuals; and
no f in G is possessed by every individual in the
aggregate.
The first sentence of Beckner's statement refers to the
classical Aristotelian definition of a class, which is now
termed monothetic. The second part defines polythetic. To
illustrate the basic distinction consider the following
example of 8 individuals (1-8) and 8 properties (A-H). The
possession of a property is indicated by a plus sign. The
individuals 1-4 constitute a polythetic group each
individual possessing three out of four of the properties
A,B,C,D. The other 4 individuals can be split into two
monothetic classes {5,6} and {7,81. The distinction between
monothetic and polythetic is a particularly easy one to
make providing the properties are of a simple kind, e.g.
binary-state attributes. When the properties are more
complex the definitions are rather more difficult to apply,
and in any case are rather arbitrary.
The distinction between overlapping and exclusive is
important both from a theoretical and practical point of
view. Many classification methods can be viewed as
datasimplification methods. In the process of classification

information is discarded so that the members of one class
are indistinguishable. It is in an attempt to minimise the
amount of information thrown away, or to put it
differently, to have a classification which is in some sense
'closest' to the original data, that overlapping classes are
allowed.
Unfortunately this plays havoc with the efficiency of
implementation for a particular application. A compromise
can be adopted in which the classification methods
generates overlapping classes in the first instance and is
finally 'tidied up' to give exclusive classes. An example of
an ordered classification is a hierarchy. The classes are
ordered by inclusion, e.g. the classes at one level are nested
in the classes at the next level. To give a simple example
of unordered classification is more difficult.
Unordered classes generally crop up in automatic
thesaurus construction. The classes sought for a thesaurus
are those which satisfy certain homogeneity and isolation
conditions but in general cannot be simply related to each
other. For certain applications ordering is irrelevant,
whereas for others such as document clustering it is of vital
importance. The ordering enables efficient search strategies
to be devised. The discussion about classification has been
purposely vague up to this point. Although the break down
scheme discussed gives some insight into classification
method. Like all categorisations it isolates some ideal types;
but any particular instance will often fall between
categories or be a member of a large proportion of
categories.
AUTOMATIC CLASSIFICATION OF WEB RESOURCES
The advantages of document clustering and classification

over keyword based indexes have been deba ted in
Information Retrieval (IR) research for quite some time.
Good, Fairthome and Salton were discussing the merits of
automatically and logically orgamsmg electronic

documents into groups in the early 1960's. Documents that
share the same frequently occurring keywords and
concepts are usually relevant to the same queries.
Clustering such documents together enables them to be
retrieved together more easily and helps to avoid the
retrieval of irrelevant unrelated information. Another
advantage is that classification usually enables the ability
to browse through a hierarchy of logically organised
information which is often considered a more intuitive
process than constructing a query string.
Keyword based indexes usually manage to find
documents that contain specified keywords but find it
difficult to simultaneously identify documents that share
the same concepts. Indexes are however comparatively
simple to construct. Automatically analysing documents
for index terms is far easier than assigning a document to
an appropriate classification group automatically.
Consequently, classification is usually associated with
human defined metadata or catalogue entries. The
evolution of automated World Wide Web search engines
from manually maintained classified lists and directories
has further demonstrated the strengths and weaknesses of
these two approaches. Alta Vista is fully automated, fully
indexed and offers wide Web coverage but is it as accurate
and intuitive as Yahoo? The tendency of automated search
engines to inundate users with irrelevant results has
prompted reconsideration of the merits of classification.
Alta Vista's large corpus is the product of fully
automated components: a robot that finds and retrieves
new resources, an indexer that automatically indexes the
full text of every document and a retrieval mechanism that
handles user queries. Yahoo's accuracy is the product of
manual classification and manually defined metadata. Its
intuitiveness comes from the ability to browse a well
structured classification hierarchy. The combination of
automation and classification has the potential to provide

an accurate, intuitive, comprehensive classified search
engine. This is the aim of WWUb.
Automated tools for locating information on the Web
are almost universally keyword indexes. Classified tools
like Yahoo and Galaxy require some degree of manual
input - usually in specifying the appropriate category and
other metadata. Many automated search engines have
deployed traditional IR indexing strategies and retrieval
mechanisms but very few have experimented with
automatic classification.
IR approaches to automatic classification involve
teaching systems to recognise documents belonging to
particular classification groups. This can be done by
manually classifying a set of documents and then
presenting them to the system as examples of documents
that belong to each classification. The system then builds
class representatives each of which consists of common
terms occurring in the documents known to belong to a
particular classification group. When the system
subsequently encounters new documents it measures the
similarity between the document and the class
representatives. Each time a new document is classified it
is used to modify the cl~ss represent. tive to include its most
commonly occurring keywords.
The Taxonomy And Path Enhanced Retrieval (TAPER)
system uses similar methods to classify documents
according to a hierarchical classification scheme. IR
techniques are used to extract signatures from documents
based on significant terms and these are then compared
with signatures representing each node of the classification
hierarchy. Each node has a different context specific stop
word list that is applied to the document signature as it is
filtered down through the hierarchy. When a user queries
the TAPER system, they are initially presented with a list
of topic paths, rather than documents, this helps to focus
the query to the most relevant areas of the classification

hierarchy where subsequent relevant documents will be
clustered.
This approach to automatic classification works in
situations where just the outline of a classification scheme
is defined and the rest is generated by the system.
Classifying documents according to a standard
classification scheme such as DOC requires more manual
input when defining the classification hierarchy. The
advantages of using a well known standard classification
scheme are considerable:
DOC is a universal classification scheme covering all
subject areas and geographically global information
Users who are accustomed to using a library will find
the classification system familiar
DDC has multilingual scope which will become
increasingly important as the volume of information in
other languages grows on the Web.
Another project using DOC for automatic classification is
the Scorpion project of the Online Computer Library Centre
(OCLC). Their system combines library science with IR
techniques to enable automatic subject assignment using
DOC as a knowledge base. Documents are used as queries
to a database of manually defined DOC information. The
result from such a query identifies the subject matter of the
document.
The manually defined DOC information is maintained
by OCLC using an electronic Editorial Support System
(ESS). Scorpion uses ESS records to build its knowledge
base. These are the same records that are used to produce
the printed version of DOC and the DOC 21 CD ROM
Dewey for Windows.
WWLib
The original version of WWLib relied to a large degree on

manual maintenance and as such can best be described as

a classified directory that was organised according to DOC.
Like most classified directories it offered the option to
browse the classification hierarchy or to enter a query
string. The use of DOC to organise WWLib evolved from
the notion that Library Science has a lot to offer the chaotic
task of information resource discovery on the Web.
Classification schemes like DOC have been responsible for
the organisation of vast amounts of information for
decades. There are two main concepts that search engines
have inherited from Library Science:
Metadata - information describing each resource
including a description and keyword index terms
comparable to a traditional library catalogue entry;
Classification - a structured framework for clustering
documents that share the same subject.
The experimental WWLib emphasised, from a very early
stage, the need for automation. It has become necessary for
any system calling itself a search engine to incorporate a
robot or spider which automatically collects information by
retrieving documents and analysing them for embedded
URLs. This and other automated components such as an
automatic indexer and an automatic classifier were
required. The decision to maintain the classified nature of
WWLib as it evolves into an automated search engine is
quite unusual. HistOrically manually maintained
directories are classified and automated search engines are
just huge indexes.
An outline design of the new automated WWLib,
identifies the automated components and their
responsibilities:
There are six al!tomated components:
A Spider that automatically retrieves documents from
the Web;
An Indexer that reteives Web pages from the spider,

stores a local copy, assigns to it a unique accession
number and generates a new metadata template. It also
distributes local copies to the Analyser, Classifier and
Builder and adds subsequent metadata generated by
the Classifier and the Builder to the assigned metadata
template;
An Analyser that analyses pages, provided by the
indexer for embedded hyperlinks to other documents.
If found, URLs are passed to the indexer where they
are evaluated to check that they are pointing to
locations in the UK (this process will be documented
elsewhere), before being passed to the Spider;
A Classifier that analyses pages provided by the indexer
and generates DOC dassmarks;
A Builder that analyses pages provided by the indexer
and outputs metadata which is stored by the indexer
in the document's metadata template and is also used
to build the index database that will be used to quickly
associate keywords with document accession numbers;
A Searcher that accepts query strings from the user, uses
them to interrogate the index database built by the
builder, uses the resulting accession numbers to
retrieve the appropriate metadata templates and local
document copies and then uses all this information to
generate detailed results, ranked according to
relevance to the original query.
One of the reasons for deciding on such a componentised
architecture was to allow for components to be distributed
over a network if necessary. It is intended that most of the
components will run as d~emons and respond to each
other's requests via a defined protocol.
Oassifier
Previous experimentation with automatic classification was
carried out during the development of the original WWlib.

This original classifier compared text in each document
with entries in a DOC thesaurus file. The thesaurus entries
consisted of the DOC classmark and accompanying header
text eg: 641.568 Cooking for special occasions Including
Christmas. Terl!lS within each document were assigned
weights according to word frequency and according to
where they occurred in the document.
Terms occUrring in <Hl> level headings, for example,
were found to '"be particularly useful in determining the
subject of a document and therefore terms found within this
element were given greater weight. This classifier achieved
some degree of succe~s. As a simple test of its efficiency 100
documents were manually classified and these were then
compared with the results of the automatic classifier over
the same 100 documents. 40% accuracy was achieved.
For the new version of the classifier, it was decided
that a much more detailed thesaurus with a long list of
keywords and synonyms for each c1assmark was required.
These lists are referred to as class representatives. It was
also decided that, more use would be made of the
hierarchical nature of DOC. Rather than ploughing through
a DOC thesaurus from start to finish considering every
possible classmark, it was decided that the classifier would
begin by matching documents against very broad class
representatives (containing keywords and synonyms)
representing each of the ten DOC classes at the top of the
hierarchy.
The matching process would then proceed recursively
down through the subclasses of those DOC classes that
were found to have a significant match score with the
document, or a significant measure of similarity.
The TAPER system used customised stop lists to filter
documents at each node of the classification hierarchy. The
new WWLib classifier achieves a similar filtering effect
using customised class representatives at each node. The

higher level shows the animal class and the lower level
shows three of its subclasses. Within each of these classes
some of the many keywords present within the class
representatives are shown.
Table 1. The ten DDC classes

000 Generalities
100 Philosophy, paranormal phenomena, psychology
200 Religion
300 Social sciences
400 Language
500 Natural sciences and mathematics
600 Technology (Applied sciences)
700 The arts, Fine and decorative arts
800 Literature (Belles-lettres) and rhetoric
900 Geography, history, and auxiliary disciplines
If a document matched against'the animal class is found to

have a significant measure of similarity it would
subsequently be matched against the three subclasses.
Although very simplified this example shows how the
classification hierarchy can aid the classification process.
Documents matched against broad lists of keywords are
then filtered through sub-classes with more detailed
focused terms.
Notice the presence of the word litter in the cat class
representative which is obviously ambiguous. The presence
of this word in a higher node of the hierarchy could caus~
a document about a litter of kittens to be wrongly classified.
However, it is unlikely that a page about rubbish/garbage
would obtain a significant score when matched against the
animal class and so would proceed no further down this
branch of the hierarchy.
The classifier is based on an object oriented design.

DOC objects each inherit the same basic structure from a
generic dewey class and build their own list of weighted
keyword objects forming the class representative, their own
list of subclasses which are DOC objects in themselves and
their own classmark object. Documents to be classified are
represented as document objects which comprise a list of
weighted keywords, identical in structure to the DDC class
representative. The result of the classification process is that
the document obtains a list of appropriate classmark objects
each comprising a DDC classmark and accompanying
header text.
Each classmark object stores and maintains a
numerical code and a textual label representing a particular
instance of a DDC classmark. It also stores an integer value,
score, that represents the measure of similarity between the
associated DDC object and a given document.
The constructor method takes as its parameters two
strings, the first representing the numerical code which is
subsequently stored in the cmark instance variable and the
second the label which is then stored in the classmarkLabel
instance variable. The method is equal returns true if the
classmark object passed as a parameter is equal to the
current one and the getLabel and getClassmark methods
are there for retrieving information from the object. The
methods setScore, getScore, and isGreater are used to store
and compare measures of similarity between the associated
DDe object and a given document, as a result of the
classification process.
Each keyword object stores and maintains a keyword
string, a score (or weight) associated with it and an integer
representing the keyword's position (within a document).
The constructor method takes as its parameters the word,
score and position. Like the classmark object, the keyword
object has an is equal method for comparing two keywords
and a getKeyword, getScore and getPosition method for

retrieving information.
Each DOC object inherits the abstract class Dewey
which defines an object that stores and maintains a list of
keywords, a list of subclasses and a classmark. The methods
enable each DOC object that in,herits them to:
specify its own dassmark using setClassmark
retrieve that c1assmark when required using
getClassmark
collate its own list of keywords (class representative)
using addK.eywords and trim Keywords
collate its own list of subclasses using addSubclass and
trimSubclasses
retrieve the total number of keywords using getTotal
retrieve keywords consecutively using getNextK.eyword
and hasMoreK.eywords
retrieve subclasses consecutively using getNextSubclass
and hasMoreSubclasses
Specify if there are no subclasses using noSubclasses
Each document object stores and maintains a list of
keywords representing the document and a list of
classmarks, assigned to the document as a result of the
classification process.
The document object has two constructor methods;
one takes in three parameters-a dummy, a URL and a
document accession number-and opens a DataInput
Stream to the remote URL; the other takes in two
parameters-local filename and accession number-and
opens a DataInputStream to a local file. In either case the
methods open the document, extract all the words from it
and build the keywords vector. This is done using the two
private methods noHTML, which strips the HTML tags out
of the document, and dolndexing which identifies all the
remaining words and stores them as keyword objects with
a weight, score and position in the keywords vector. Words

occurring in titles and headings are given greater weight.
All words are stored and those occurring frequently are
automatically given more weight by appearing more often
in the vector.
The methods getNextKeyword, hasMoreKeywords
and resetKeywords allow the contents of the keywords list
to be retrieved and compared with DOC class
representatives. The addClassmarks method is used by the
classification process to assign classmarks that are found to
be relevant to the document. The getClassmarks method
retrieves the highest scoring classmark objects and outputs
their numerical codes and labels as a string. getTotal
retrieves the total number of keywords and getAccession
retrieves the document accession number.
The classification process is carried out by a classify
object that takes as its parameter a document object and
compares its keywords with, initially, the class
representatives of the ten top DOC classes. For each DOC
class, the matched keyword scores are added to the
matched keyword scores from the document and if the
resulting value is significant the classify object continues to
recursively compare the document with that DOC object's
subclasses.
The measure of significance was an important and yet
difficult issue. Initially, very simple measures were
introduced where scores over a certain set value were
considered significant. The classifier worked for short
documents but long documents with a wider range of
vocabulary acquired significant matches with some quite
bizarre ODC classes resulting in a wide range of classmarks,
some accurate and some... not! The measure of significance
must take into account the length of the document and the
length of the DOC class representative.
The constructor method takes a document object as its
parameter which is assigned to the doc instance variable.
Classification in Digital Ubrarles 237
It then calls the private method proceed ten times passing

a different DOC classification object as its parameter each
time.
The proceed method takes a DDC object as its
parameter. It calls the private method score which
compares the class representative of the object with the
document and calculates a total score based on matched
keywords and associated weights. The score, together with
the total length of the class representative and the total
length of the document are then passed as parameters to
the private method significant. The significant method
calculates Dice's coefficient and returns true if the value is
significant. If Significant returns true and the DOC object
has (more) subclasses, the proceed method calls itself
recursively on each of them. Otherwise the DDC object's
classmark is added to the document object. If significant
returns false no further action is taken.
The ace object ties the whole system together. It
accepts arguments from the command line and then creates
a new document object using the given URL or filename,
creates a new classify object using the new document object
as its parameter and then outputs the resulting document
classmarks. The decision to use Java for the implementation
was based on a number of issues:
An object oriented system is clearly appropriate
because of the hierarchical nature of the system and the
similarities between data objects in documents
(weighted keywords) and data objects in the class
representatives;
The distributed nature of the new WWLib architecture
means that the new classifier needs to be able to run
as a daemon and communicate over a network which
is simple to achieve in Java;
The program needs to have the ability to multithread
- process a number of documents at once - which again
is very simple in Java;
238 Advancement in Ubrary and Infopnation Science
Although a more efficierW object oriented system, in

terms of speed, could be achieved in C++, speed is not
vital for the classifier;
The lack of memory management problems when
writing Java programs means that more time can be
spent on the problem of automatic classification;
streaming documents directly from a URL, for testing
purposes, is useful and simple to achieve in Java;
Should the classifier need to be moved to a different
platform ever, this should not cause a problem.
Initially, the classifier has been implemented for
experimental purposes using just the top ten DDC classes
and their ten sub-divisions - the top 100 classes. Coding
each dewey object, although structural information is
inherited from the abstract Dewey class, is a laborious
process because as many keywords and synonyms need to
be added to form the class representatives as possible. It is
believed that the initial burden of laborious manual input
will payoff if accurate automatic classification can be
achieved. The DDC objects will only require further
adjustments when new versions of DDC are released (about
once every 20 years). Rather than relying on manual input
as each new document is encountered, as the original
WWLib did and Yahoo does, the new WWlib will have
stocked up enough manually defined class representatives
to enable documents to be handled automatically.
Automated Classification Tools

Automated classification has been described in a multitude
of ways, depending on the point of view of the author (e.g.,
academician, vendor). A recent excellent discussion is
provided in Katherine C. Adams' article IIWord
Wranglers", which includes a description of the tools
available, their features and the techniques used for
clustering information. Classification is the process by
which information, whether in document or data forrtrM, is

clustered together to make it easier for the user to find it.
For instance, while classifying a collection of documents
about food, you might want to cluster documents by
subject-e.g., those that discuss "methods of cooking," those
that discuss "kitchen tools"-depending on how useful your
audience would find that type of classification.
In ,a nutshell, classification assists people who are:
Browsing: For example, a user may browse through a
classified structure (e.g., Men's Clothing: Shirts: T-
Shirts) to find relevant information-this structure
reflects clusters created by classification.
Searching: For example, a user searching for "cars"
retrieves all information that is clustered around this
topic. The search engine retrieves these clusters and
returns these results to the user.
Managing: For example, an editor looking for the
appropriate place to put a piece of information can use
a classified structure to help suggest places. The
resulting placement of information forms clusters.
The end .result of classification is structures that are
commonly known as ontologies, taxonomies, hierarchies,
controlled vocabularies or thesauri-depending on the
discipline involved. A superb article by Dagobert Soergel
reflects on the use of these names and notes that one of the
main features of any classification structure is to support
information retrieval. support more efficient and relevant
information retrieval. Automated classification is the
process by which technology is used to create clusters.
Some of the most popular vendors that have marketed
stand-alone automated classification tools or add-ons to
their search or content management tools are Autonomy,
Inktomi, Interwoven, Mohomine, Semio and Verity. These
tools cluster information by using one of the following
techniques:
Statistical clustering: Employs algorithms to cluster

information. Popular methods include term co-
occurrence analys~s and neural networks. This
technique is dependent on the information in the
collection-it classifies using the terms in the collection
exclusively. Vendors using this technique are
Autonomy, Interwoven, Mohomine and Semio.
Rules-based clustering: Necessitates the creation of IF-
THEN statements that define the clusters of
information. This technique builds a classification
structure that can be reviewed and edited manually,
while populated automatically. It isn't dependent on
the information in the collection-a new collection could
use the same statements. Vendors using this technique
are Inktomi and Verity.
An additional technique, designed for improving the above
techniques, is:
Training: Compares to-be-classified information with
previ.ously wellclassified information, and collects
these together in clusters. Training can happen at the
beginning of the classification process or iteratively
throughout. Vendors using this technique are
Autonomy, Inktomi and Mohomine.
A number of automated classification tool vendors, pundits
and those working in this field have noted that fully
automated classification of information is not the complete
answer. They state that a semi-automated solution is more
appropriate. Peter Morville puts it most succinctly in his
ACIA "Little Blue Folders" article: "The key to success in
designing information architecture solutions for really
large web sites and intranets is to intelligently combine
manual AND automated approaches."
Good indexers and cataloguers know the difficulties of
developing appropriate clusters of information. Often it
isn't enough to just pullout terms that reside in documents;
concepts that aren't stated verbatim within a document also

need to be reflected in the classification. Computers are not
able to understand the meaning in documents or the
context surrounding them. As stated by Nancy Mulvany,
"computers are capable of automatically manipulating the
text of a book in a variety of ways. Yet the computer is
incapable of exercising the type of judgment and
interpretation applied by experienced indexers./I
The use of humans in the classification process runs
the gamut from minimal involvement to full-fledged
participation. Humans can be involved with automated
classification tools by:
Creating the classification: Developing all or part of the
structure to be used during the automated
classification process. For instance, manually creating
the top-level clusters of the classification structure and
having the tool automatically create the bottom-level
clusters. This structure can be newly created or
tweaked from out-of-the-box versions, and should be
predicated on a controlled vocabulary or thesaurus.
Developing the classification rules: Manually creating and
editing the rules that govern the clustering of
information. These rules should also be predicated on
a controlled vocabulary or thesaurus.
Trail1i11g tlte collection process: Moving information from
one cluster to another, when necessary, so that the tool
learns what information should exist in specific
clusters.
ImplemeJlting suggestions: Reviewing options provided
by the tool for classifying information, analyzing these
and using them or choosing other options to best
cluster the information.
Throughout the automated classification process, humans
should be:
Performing quality control: Reviewing clusters and the

information in the clusters after the tool has done its
work, and making any changes. Hopefully, the tool
also allows humans to perform some of the above listed
functions (e.g., creating the classification, training the
tool), in order to reduce the n~ed for extensive manual
quality control.
Testing the classification: Designing methods for testing
the usability (e.g., appropriateness, ease of use) of the.
resulting classification structure, running tests and
analyzing the results. Analysis should provide you
with recommendations for improvement to the
classification structure and possibly improvements to
the classification process.
Here's where the users matter the most. You've developed
a system that adequately classifies the information you
have in your information space, based on the resources and
capabilities of your company. You've determined the mix
of automated and manual classification and have chosen an
appropriate tool.
The next step is to identify what the users want. Most
users are interested in directly relevant information. Each
of the automated classification techniques noted above
claims to give users the most relevant information. But if
you really want to ensure that users are retrieving what
they need, you should be using a controlled vocabulary or
thesaurus.
A controlled vocabulary (CV) is a list of descriptive
terms that indicates which terms are preferred and which
are variants of the preferred terms. A thesaurus is a special
kind of controlled vocabulary that also indicates which
terms are broader, narrower and related to the preferred
terms. For instance, a controlled vocabulary on food might
indicate that '(crackers" is the preferred term and "crisps"
is a variant term.
A thesaurus might also indicate that "cheese" is a

related term and that "melba toast" is a narrower term. The
following diagram illustrates these relationships.
Developing CVs involves issues that may not be
obvious initially. Some of these are:
Creating cross-references or double-posting: It's often
useful to provide "see" references and "see also"
references among clusters, or to selectively place
information in more than one cluster. This makes the
relationships among terms obvious and makes the
resulting classification more usable.
Building more than one CV: To reach your primary users
and to reflect all your content, you may need to develop
multiple types of CVs. Examples include those
or~~nized by user role, topic, action or task and
geography.
Building CVs in more than one language: If your collection
contains considerable amounts of information in
multiple languages, you should have a CV for each
language. This often involves more than simple
translation of terms, since concepts, jargon and
relevancy of terms differ among languages.
Building a thesaurus: If you have the resources, a
thesaurus can greatly help your users and your
business. You will be offering users navigational
assistance, which can help them generate more
relevant results. Managing a classification structure
that includes more relationships among terms can be
a very powerful avenue for translating your business
strategy-e.g., promoting products using broader
clusters.
We have discussed the need for manual involvement in
your classification process and the need for controlled
vocabularies in that process. What we missed in these
discussions is testing of these variable, and sharing of the
results with the wider community. Situations exist in which

testing can occur, but inevitably there are limitations to
doing this kind of testing. These situations include:
Partnering with an automated classification tool
vendor so you can run pilot tests with actual data using
their tool. The limitation is the partnership agreement
and how that affects your company's strategy.
Obtaining a trial version of a tool to run against actual
or contrived data. The limitation is that the trial version
may limit you to a certain number of documents, not
allow the integration of controlled vocabularies or not
allow you to customize the classification process.
Contracting with a vendor to do a full test of their tool,
which usually involves the assistance of consulting
engineers from the vendor. The limitation is the added
expense of paying for the consultants just to see
whether you want to use the tool.
There is a greater opportunity out there-an opportunity to
provide the community with real data that has been tested
in a variety of ways using a variety of automated
classification tools, with results that can be shared and
compared without sharing proprietary vendor information
with the community.
The Text Retrieval Conferences (TREC) project provides
an excellent example of a collaboratory testbed. Each year,
the project provides data sets that interested parties can use
to run against new technology to see where that tethnology
succeeds and fails. It's not a giant leap to assume that
something like this could be done to test automated
classification tools. Since TREe's environment is set up for
new technology and not established technology, it may not
be the best venue for the proposed testing.
In order for use of the testbed to be most effective, the
community needs to enforce:
Description of the information in the testbed. What

type of information are you testing (e.g., engineering,
religious studies)? Is it mostly text or records-based? Is
it highly conceptual content or fact-based data?
Information in multiple languages. This will assist in
reaching the largest community of testers.
Selection of controlled vocabularies to run against
information. Multiple types of CVs are a necessity.
Multiple language CVs will be necessary for
information in multiple languages.
Ability to perform the test in a multitude of ways. For
instance, running text-based English language content
with multiple topic domain CVs using three selected
tools, or running records-based French and English
language data without a CV through five selected tools.
The testbed would benefit both the automated classification
vendors and those interested in using the tools. It can assist
in:
Selling the need for the tool to the leadership.
Quantifiable effects are easier for upper management
to buy into.
Providing a way for content managers to compare tools
and see what types of tools work best for their types
of data and resources.
Comparing manual and automated methods of
classification. The level of human involvement could
be tested under a variety of conditions.
Getting feedback from users. Testing with users can
provide subjective and objective data on which tools
work in which environments.
Selling the tool to you. Vendors with quantifiable data
on their tool's performance can use this to market the
tool.
COLON CLASSIFICATION IN WORLD WIDE WEB
The World Wide Web is an Internet system that distributes

graphical, hyperlinked information, based on the hypertext
transfer protocol (HTTP). The Web is the global hypertext
system providing access to documents written in a script
called Hypertext Markup Language (HTML) that allows its
contents to be interlinked, locally and remotely. The Web
was designed in 1989 by Tim Berners-Lee at the European
Organization for Nuclear Research (CERN) in Geneva.
The information revolution not only supplies the
technological horsepower that drives the Web, but fuels an
unprecedented demand for storing, organizing,
disseminating, and accessing information. If information is
the currency of the knowledge-based economy, the Web
will be the bank where it is invested. It is a very powerful
added value of the Web that users can access resources
online electronically, that for whatever reason are not in the
traditional paper-based collections.
The Web provides materials and makes them online
accessible, so they can be used. This is the real difference
between the Web and libraries. Therefore, webmasters
build web collections not for vanity but for use.
The Web is interested in its cybercitizens (users) using
its resources for all sorts of reasons: education, creative
recreation, social justice, democratic freedoms,
improvement of the economy and business, support for
literacy, life long learning cultural enrichment, etc. The
outcome of this use is the betterment of the individual and
the community in which we live -the social, cultural,
economic and environmental well being of our world. So
the Web must recognize and meet the information needs of
the users, and provide broad-based services.
There are numerous attempts to try to establish order
and apply organization to the chao's of the Internet and the
World Wide Web. Anyone who has conducted a search on
the Web will immediately realize the difficulty in retrieving

relevant results. In evaluating all the possible methods to
achieve the task of organizing the Web, perhaps one should
consider the Colon Classification as a candidate.
One of the major criticisms of the Colon Classification
is its lengthy notation, which is of low practicality in terms
of putting the call number on the spine label of a book, or
requiring the user to write down or memorize the whole
notation to locate the particular item in the library.
However, this problem disappears when we consider the
environment of the web, since we do not have to worry
about the "physical" location of a document.
Perhaps, the concept of analytico-synthetic
classification has already been applied to the Web. An
interesting article, Glassel, has compared the nature of
Ranganthan's Colon Classification to Yahoo!, a popular
search directory on the web. Both systems are based on
combining facets to facilitate searching and maximize the
number of relevant results.
"One important advantage that virtual collections
such as Yahoo! have over the print environment, in terms
of notation schemes and their citation order (the order in
which the facets are put together), is that the order of the
facets in a string doesn't have to be set in stone.
An electronic resource isn't limited to a single physical
location. In a library, a book is only supposed to live in one
place on a she1f. In the digital world, what is to stop us from
classifying a resource in multiple places within a hierarchy!
Nothing!"
Web directories like Yahoo! are an example of a
concept-based system. An indexer reviews the document
and assigns appropriate subject terms to describe it. The '
implementation of the Colon Classification scheme on the
web may be useful in that it provides the website creator,
the indexer, and the user a common language to describe

and identify the content of the page.
Since the Colon Classification is very accomodating to
new concepts and new composite subjects, it is quite
appropriate for the fast-growing web environment. As with
online retrieval systems, the advantage of a faceted scheme
is that it is more attentive to the user's need. Every query
is unique and comes from a specific perspective. What is
relevant to one user is different from another.
Although Ranganathan was not the inventor of facet
analysis, he is credited as the first to "systematize and
formalize the theory". It is said that Ranganthan's idea of
a faceted classification scheme is inspired by a Lego-type
toy set. Seeing that the salesperson can build different toys
just by combining the same pieces in a different way, he
builds his classification scheme by this analogy.
The Colon Classification, just as other classification
schemes, starts with a number of main classes, which
represent the fields of knowledge.Each class is then
anaylzed and broken down into its basic elements, grouped
together by common attributes, called facets.
Upon examining all the facets, Ranganthan notices
that there are five main groups into which the facets fall,
and he calls these the fundamental categories, represented
by the mnemonic PMEST in an order of decreasing
concreteness.
Personality
- can be understood as the primary facet.
- the most prominent attribute
Matter
- physical material
Energy
- action
Space
- location
Time
- time period
There are also facets that are common to all the classes.
These are called common isolates. Examples include form
and language. The same facet can be used more than once.
Notations, such as numbers and letters, are used to
represent the facets, while punctuation marks are used to
indicate the nature and type of the following facets. The
classifier's job, therefore, is to combine the available terms
that are appropriate in describing the information package
in hand.
FIVE LAWS OF LIBRARY SCIENCE
Shiyali Ramamrita Ranganathan was considered the father

of Library Science in India. He developed what has been
widely accepted as the definitive statement of ideal library
service. His Five Laws of Library Science is a classic of library
science literature, as fresh today as it was in 1931.
These brief statements remain as valid -in substance if
not in expression- today as when they were promulgated,
concisely representing the ideal service and organizational
philosophy of most libraries today:
Books are for use.
Every reader his or her book.
Every book its reader.
Save the time of the reader.
The Library is a growing organism.
Although these statements might seem self-evident today,
they certainly were not to librarians in the early part of the
20th century. The democratic library tradition we currently
enjoy had arisen in America and England only in the latter
part of the nineteenth century. For Ranganathan and his
followers, the five laws were a first step toward putting

library work on a scientific basis, providing general
principles from which all library practices could be
deduced. In 1992, James R. Rettig posited a Sixth Law, an
extension of Ranganathan's laws. He conceived that Sixth
Law "Every reader his freedom" as applicable only to the type
of service.
New information and communication technologies
suggest that the scope of Ranganathan's laws may
appropriately be extended to the Web. Nowadays the same
five laws are discussed and reused in many different
contexts. Since 1992, the 100th anniversary of
Ranganathan's birth, several modern scholars of library
science have attempted to update his five laws, or they
reworded them for other purposes. .
'Book, reader, and library' are the basic elements of
Ranganathan's laws. Even if we replace these keywords
with other elements, Ranganathan's laws still work very
well. Based on Ranganathan's laws, several researchers
have presented different principles and laws.
For instance, "Five new laws of librarianship" by Michael
Gorman; "Principles of distance education" by Sanjaya
Mishra; "Five laws of the software library" by Mentor Cana;
"Five laws of children's librarianship" by Virginia A. Walter;
"Five laws of web connectivity" by Lennart Bjornebom; and
"Five laws of diversity/affirmative action" by Tracie D. Hall.
Gorman's laws are the most famous. He has
reinterpreted Ranganathan's laws in the context of today's
library and its likely future. Michael Gorman has given us
his five new laws of librarianship:
Libraries serve humanity.
Respect all forms by which knowledge is
communicated.
Use technology intelligently to enhance service.
- Protect free access to knowledge; and

- Honor the past and create the future.
Gorman believes that S.R. Ranganathan invented the term
'library Science' and beautifully demonstrates how his
laws are applicable to the future issues and challenges that
librarians will face. Gorman's laws are not a revision of
Ranganathan's laws, but another completely separate set,
from the point of view of a librarian practicing in a
technological society.
Furthermore, based on Ranganathan's laws, Jim
Thompson in protesting against a library services, revised
Ranganathan's laws to the following statements:
Books are for profit.
Every reader his bill.
Every copy its bill.
Take the cash of the reader.
The library is a groaning organism.
Whether one looks to Ranganathan's original Five Laws of
Library Science or to anyone of the many new
interpretations of them, one central idea is immediately
.dear: Libraries and the Web exist to serve people's
information needs.
REFERENCES

Informatics, 1989.
access: arc/zives in tile new millennium: proceedings, 3-4 /ulle 1999.
Day, M.W., Preservation problems of electronic text aud data. East
no. 3. LoughblStough: EMBLA Publications, 1990.
Hedstrom, M., Optical disks: are archivists repeating the mistakes of the
past? Archives &: Museum Informatics Newsletter, 2, 1988, 52.
_ _ _ _ _ _ ., Understanding electronic incunabula: a framework

for research on electronic records. American Archivist, 54 (3), 1991,
334-354.
Information Management.Journal, 34 (1), January 2000.
_ _ _ _ _ _ . and Parer, D., Towards frameworks for standardising
recordkeeping metadata. Archives and Manuscripts, 26 (1), 1998,
24-45.
Wallace, D., Metadata and the archival management of electronic
records. Archivaria, 36, 1993, 87-110.
_ _ _ _ _ _ ., Managing the present: metadata as archival
description. Archivaria, 39, 1995, 11-21.
Waters, D.J., Electronic technologies and preservation: (...J based on a
presentation to the Anllual Meeting of tire Research Libraries Group,
June 25, 1992.
I
9
Trends in Information Archiving
The pace of technology evolution is causing some hardware

and software systems to become obsolete in a matter of a
few years, and these changes can put severe pressure on the
ability of the related data structures or formats to continue
effective representation of the full information desired.
Because much of the supporting information necessary to
preserve this information is more easily available or only
available at the time when the original information is
produced, these organisations need to be active participants
in the long-term preservation effort, and they need to
follow the principles espoused in this Open Archival
Information System (OAIS) reference model to ensure that
the information can be preserved for the Long Term.
Participation in these efforts will minimise the lifecycle
costs and enable effective long-term preservation of the
information. The explosion of computer processing power
and digital media has resulted in many systems where the
Producer role and the archive role are the responsibility of
the same entity. These systems, which are sometimes
known as Active Archives, should subscribe to the goals of
Long Term Preservation discussed in this document. The
design process must realise that some of the Long Term
Preservation activities may conflict with the goals of rapid
production and dissemination of products to Consumers.
The designers and architects of such systems should

document the solutions that have been reached. A major
purpose of this reference model is to facilitate a much wider
understanding of what is required to preserve and access
information for the Long Term. To avoid confusion with
simple , bit storage functions, the reference model defines
an OAIS which performs a long-term information
preservation and access function. An OAIS archive is one
that intends to preserve information for access and use by
a Designated Community. It includes archives that have to
keep up with steady input streams of information as well
as those that experience primarily aperiodic inputs. It
includes archives that provide a wide variety of
sophisticated access services as well as those that support
only the simplest types of requests. The OAIS model
recognises the already highly distributed nature of digital
information holdings and the need for local
implementations of effective policies and procedures
supporting information preservation.
This allows, in principle, a wide variety of
organisational arrangements, including various roles for
traditional archives, in achieving this preservation. It is
expected that organisations attempting to preserve
information wilt' find that using OAIS terms and concepts
will assist them in achieving their information preservation
goals. Management is the role played by those who set
overall OAIS policy as one component in a broader policy
domain. In other words, Management control of the OAIS
is only one of Management's responsibilities.
Management is not involved in day-to-day archive
operations. OAIS archives are not shown explicitly. Such
archives may establish particular agreements among
themselves consistent with Management and OAIS needs.
Some archives may interact with a particular archive for a
variety of reasons and with varying degrees of formalism
for any pre-arranged agreements. One OAIS may take the
Trends in Information Archiving 255
role of Producer to another OAIS; an example is when the

responsibility for preserving a type of information is to be
moved to this other archive.
One OAIS may take the role of Consumer to another
OAlS; an example is when ,the first OAIS decides to rely on
the other OAIS for a type of information it seldom needs
and chooses not to preserve locally. Such reliance should
have some formal basis that includes the requirement for
communication between the archives of any policy changes
that might affect this reliance. A person, or system, can be
said to have a Knowledge Base, which allows them to
understand received information. For example, a person
who has a Knowledge Base that includes an understanding
of English will be able to read, and understand, an English
text. Information is defined as any type of knowledge that
can be exchanged, and this information is always expressed
by some type of data. The information in a hardcopy book
is typically expressed by the observable characters which,
when they are combined with a knowledge of the language
used, are converted to more meaningful information.
If the recipient does not already include English in its
Knowledge Base, then the English text (the data) needs to
be accompanied by English dictionary and grammar
information in a form that is understandable using the
recipient's Knowledge Base. Similarly, the information
stored within a CD-ROM file is expressed by the bits (the
data) it contains which, when they are combined with the
Representation Information for those bits, are converted to
more meaningful information as long as the Representation
Information is understandable using the recipient's
Knowledge Base.
Assume the bits represent an ASCII table of numbers
giving the coordinates of a location on the Earth measured
in degrees latitude and East longitude. The· Representation
Information will typically include the definition of ASCII
together with deSCriptions of the format of the numbers and
their locations in the file,. their definitions as latitude and

longitude, and the definition of their units as degrees. It
may also include additional meaning that is assigned to the
table. For digital information, this means the OAIS must
clearly identify the bits and the Representation Information
that.applies to those bits.
This required transparency to the bit level is a
distinguishing feature of digita1 information preservation,
and it runs counter to object-oriented concepts which try to
hide these implementation issues. This presents a
Significant challenge to the preservation of digital
information. As a further complication, the recursive nature
of Representation Information, which typically is
composed of its own data and other Representation
Information, typically leads to a network of Representation
Information objects. Since a key purpose of an OAIS is to
preserve information for a Designated Community, the
OAIS must understand the Knowledge Base of its
Designated Community to understand the minimum
Representation Information that must be maintained.
The OAIS should then make a decision between
maintaining the minimum Representation Information
needed for its Designated Community, or maintaining a
larger amount of Representation Information that may
allow understanding by a larger Consumer community
with a less specialised Knowledge Base. Over time,
evolution of the Designated Community's Knowledge Base
may require updates to the Representation Information to
ensure continued understanding. As a ptactical matter,
software is used to access the Inf9rmation Object, and it will
incorporate some understanding of the network of
Representation Information objects involved. However,
this software should not be used as rationale for aVOiding
identifying and gathering readily understandable
Representation Information that defines the Information
Object, because it is harder to preserve working softWare

than to preserve information in digital or hardcopy forms.
The OAIS reference model emphasizes the
preservation of information content. As digital technology
evolves, multimedia technology and the dependency on
complex interplay between the data and presentation
technologies will lead some organisations to require that
the look and feel of the original presentation of the
information be preserved. This type of preservation
requirement may necessitate that the software programs
and interfaces used to access the data be preserved. This
problem may be further complicated by the proprietary
nature of some of the software. Various techniques for
preserving the look and feel of information access are
currently the subject of research and prototyping. These
techniques, which include hardware level emulation,
emulation of various common service APls, and the
development of a virtual machine, investigate the
preservation of the original bit steam and software across
technology.
Though the OAIS reference model does not focus on
these emerging techniques, it should provide architectural
basis for the prototyping and comparison of these
techniques. An Information Package is a conceptual
container of two types of information called Content
Information and Preservation Description Information
(PDI). The Content Information and PDI are viewed as
being encapsulated and identifiable by the Packaging
Information. The resulting package is viewed as being
discoverable by virtue of the Descriptive Information.
The Content Information is that information which is
the original target of preservation. It consists of the Content
Data Object (Physical Object or Digital Object, Le., bits) and
its associated Representation Information needed to make
the Content Data Object understandable to the Designated
Community. The COO may be an image that is provided
as the bit content of one CD-ROM file together with other

files, on the same CD-ROM, that contain Representation
Information. Only after the Content Information has been
clearly defined can an assessment of the Preservation
Description Information be made.
The Preservation Description Information applies to
the Content Information and is needed to preserve the
Content Information, to ensure it is clearly identified, and
to understand the environment in which the Content
Information was created. The Preservation Description
Information is divided into four types of preserving
information called Provenance, Context, Reference, and
Fixity. Briefly, they are the following:
Provenance describes the source of the Content
Information, who has had custody of it since its
origination, and its history.
Context describes how the Content Information relates
to other information outside the Information Package.
For example, it would describe why the Content
Information was produced, and it may include a
description of how it relates to another Content
Information object that is available.
Reference provides one or more identifiers, or systems
of identifiers, by which the Content Information may
be uniquely identified. Examples include an ISBN
number for a book, or a set of attributes that distinguish
one instance of Content Information from another.
Fixity provides a wrapper, or protective shield, that
protects the Content Information from undocumented
alteration. For example, it may involve a check sum
over the Content Information of a digital Information
Package.
The Packaging Information is that information which,
either actually or logically, binds, identifies and relates the
Content Information and POI. For example, if the Content
Information and POI are identified as being the content of

specific files on a CD-ROM, then the Packaging Information
would include the ISO 9660 volume/file structure on the
CD-ROM, as well as the names and directory information
of the files on CD-ROM disk. The Descriptive Information
is that information which is used to discover which package
has the Content Information of interest. Depending on the
setting, this may be no more than a descriptive title of the
Information Package that appears in some message, or it
may be a full set of attributes that are searchable in a catalog
service.
INFORMATION PACKAGE AND OAIS
It is necessary to distinguish between an Information

Package that is preserved by an OAIS and the Information
Packages that are submitted to, and disseminated from, an
OAIS. These variant packages are needed to reflect the
reality that some submissions to an OAIS will have
insufficient Representation Information or POI to meet final
OAIS preservation requirements. In addition, these may be
organised very differently from the way the OAIS organises
the information it is preserving. Finally, the OAIS may
provide information to Consumers that does not include all
the Representation Information or all the PDI with the
associated Content Information being disseminated. These
variants are referred to as the Submission Information
Package (SIP), the Archival Information Package (AlP), and
the Dissemination Information Package (DIP).
The Submission Information Package (SIP) is that
package that is sent to an OAIS by a Producer. Its form and
detailed content are typically negotiated between the
Producer and the OAIS. Most SIPs will have some Content
Information and some POI, but it may require several SIPs
to provide a complete set of Content Information and
associated POI to form an AlP. A single SIP may contain
information that is to be included in several AlPs. The
Packaging Information will always be present in some

form.
Within the OAIS one or more SIPs are transformed
into one or more Archival Information Packages (AlP) for
preservation. The AlP has a complete set of PDI for the
associated Content Information. The Packaging
Information of the AlP will conform to OAIS internal
standards, and it may vary as it is managed by the OAIS.
In response to a request, the OAIS provides all or a part of
an AlP to a Consumer in the form of a Dissemination
Information Package (DIP). The DIP may also include
collections of AlPs, and it mayor may not have complete
PDI.
The Packaging Information will necessarily be present
in some form so that the Consumer can clearly distinguish
the information that was requested. Depending on the
dissemination media and Consumer requirements, the
Packaging Information may take various forms.
Management provides the OAIS with its charter and
scope. The charter may be developed by the archive, but it
is important that Management formally endorse archive
activities. The scope determines the breadth of both the
Producer and Consumer groups served by the archive.
The first contact between the OAIS and the Producer
is a request that the OAIS preserve the data products
created by the Producer. This contact may be initiated by
the OAIS, the Producer or Management. The Producer
establishes a Submission Agreement with the OAIS, which
identifies the SIPs to be submitted and may span any length
of time for this submission. Some Submission Agreements
will reflect a mandatory requirement to provide
information to the OAIS, while others will reflect a
voluntary offering of information. Even in the case where
no formal Submission Agreement exists, such as a World
Wide Web (WWW) site, a virtual Submission Agreement
•
may exist specifying the file formats and the general subject
matter the site will accept.
Within the Submission Agreement, one or more Data
Submission Sessions are specified. There may be significant
time gaps between the Data Submission Sessions. A Data
Submission Session will contain one or more SIPs ~nd may
be a delivered set of media or a single telecommunications
session. The Data Submission Session content is based on
a data model negotiated between the OAIS and the
Producer in the Submission Agreement. This data model
identifies the logical components of the SIP (e.g., the
Content Information, PDI, Packaging Information, and
Descriptive Information) that are to be provided and how
(and whether) they are represented in each Data
Submission Session.
All data deliveries within a Submission Agreement are
recognised as belonging to that Submission Agreement and
will generally have a consistent data model, which is
specified in the Submission Agreement. A Data Submission
Session may consist of a set of Content Information
corresponding to a set of observations, which are carried by
a set of files on a CD-ROM. The Preservation Description
Information is split between two other files. '
All of these files need Representation Information
which must be provided in some way. The CD-ROM and
its directory/file structure are the Packaging Information,
which provides encapsulation and identification of the
Content Information and PDI in the Data Submission
Session. The Submission Agreement indicates how the
Representation Information for each file is to be provided,
how the CD-ROM is to be recognised, how the Packaging
Information will be used to identify and encapsulate the SIP
Content Information and PDI, and how frequently Data
Submission Sessions will occur. It also gives other needed
information such as access restrictions to the data.
Each SIP in a Data Submission Session is expected to

meet minimum OAIS requirements for completeness.
However, in some cases multiple SIPs may need to be
received before an acceptable AlP can be formed and fully
ingested within the OAIS. In other cases, a single SIP may
contain data to be included in many AlPs. A Submission
Agreement also includes, or references, the procedures and
protocol by which an OAIS will either verify the arrival and
completeness of a Data Submission Session with the
Producer or question the Producer on the contents of the
Data Submission Session.
ARCHIVAL INFORMATIONS LOGICAL MODEL
A basic concept of the OAIS Reference Model is the concept

of information being a combination of data and
Representation Information. The Information Object is
composed of a Data Object that is either physical or digital,
and the Representation Information that allows for the full
interpretation of the data into meaningful information. The
Data Object may be expressed as either a physical object
together with some Representation Information, or it may
be expressed as a digital object (Le., a sequence of bits)
together with the Representation Information giving
meaning to those bits.
The Representation Information accompanying a
physical object like a moon rock may give additional
meaning, as a result of some analysis, to the physically
observable attributes of the rock. This information may
have been developed over time and the results, if provided,
would be part of the Information Object. The
Representation Information accompanying a digital object,
or sequence of bits, is used to provide additional meaning.
It typically maps the bits into commonly recognised data
types such as character, integer, and real and into groups
of these data types. It associates these with higher-level
meanings that can have complex interrelationships that are

also described.
The purpose of the Representation Information object
is to convert the bit sequences into more meaningful
information. It does this by describing the format, or data
structure concepts, which are to be applied to the bit
sequences and that in tum result in more meaningful values
such as characters, numbers, pixels, arrays, tables, etc.
These common computer data types, aggregation of these
data types, and mapping rules which map from the
underlying data types to the higher level concepts needed
to understand the Digital Object are referred to as the
Structure Information of the Representation Information
object. These structures are commonly identified by name
or by relative position within the associated bit sequences.
The Representation Information provided by the
Structure Information is seldom sufficient. Even in the case
where the Digital Object is interpreted as a sequence of text
characters, and described as such in the Structure
Information, the additional information as to which
language was being expressed should be provided. This
type of additional required information is referred to as the
Semantic Information. When dealing with scientific data,
for example, the information in the Semantic Information
can be quite varied and complex. It will include special
meanings associated with all the elements of the Structural
Information, operations that may be performed on each
data type, and their interrelationships.
International Standard Organisation (ISO) 9660
describes text as conforming to the ASCII standard, but it
does not actually describe how ASCII is to be implemented.
It simply references the ASCII standard which is additional
Representation Information that is need.ed for a full
understanding. Therefore the ASCII standard is a part of
the Representation Net associated with ISO 9660 and needs
to be obtained by the OAIS in some form, or the OAIS needs
264 Advancement in Library and InfolD\ation Science
to track the availability of this standard so that it may take

appropriate steps in the future to ensure its ISO 9660
Representation Information is fully understandable:
KINDS OF INFORMATION
There are many types of information involved in the long-

term preservation of information in an DAIS. Each of these
types can be viewed as a complete Information Object in
that it contains a data object and adequate Representation
Information to understand the data.
The Content Information is the set of information that
is the original target of preservation by the DAIS. Deciding
what is the Content Information may not be obvious and
may need to be negotiated with the Producer. The
Representation Information for a digital Content Data
Object (both semantic and syntactic) is needed to fully
transform the bits into the Content Information. In
principal, this even extends to the inclusion of definitions
(e.g., dictionary and grammar) of any natural language
(e.g., English) used in expressing the Content Information.
Over long time periods the meaning of natural
language expressions can evolve significantly in both
general and in specific discipline usage. As a practical
matter, the DAIS needs to have enough Representation
Information associated with the bits of the Content Data
Object in the Content Information that it feels confident that
the members of the Designated Community can enter the
Representation Network with enough knowledge to begin
accurately interpreting the Representation Information.
This is a significant risk area for an DAIS, particularly for
those with an expert Designated Community, because
jargon and apparently widelY" understood terms may be
short-lived. In such cases extra care needs to be exercised
to ensure that the natural evolution of the Designated
Community Knowledge Base does not effectively cause
information loss from the Content Information.
As described above for an Information Object in

general, the Representation Information can also be viewed
as being augmented by Access Software that supports the
presentation of the Content Information to the Consumer.
Examples of this type of software include word processors
supporting complex document format representations of
Content Information and scientific visualisation systems
supporting representations of Content Information as a
time series or a multidimensional array. The software uses
its knowledge of the underlying Representation
Information to provide these services. Often required
information will be embedded in the software packages
used by the Designated Community to present and analyse
the Content Information. A reason for preserving working
Access Software arises from a convenience factor. Even
with a complete set of Representation Information, practical
access to all or part of a digital Content Data Object requires
the use of Access Software. Thus a software module that
provides useful access to a digital Content Data Object may
be preserved in a working state as a matter of convenience.
This is not difficult to do as long as the environment, which
supports the software module, is readily available.
This environment consists of some underlying
hardware and an operating system, various utilities that
effectively augment the operating system and storage and
display devices and their drivers. A change to any of these
may cause the software module to no longer function, to
function incorrectly, or to be unable to present results to the
application or human user. The complexity of these
interactions is what traditionally makes the preservation of
working software such an arduous task. In short, the use
of Access Software to replace Representation Networks is
attractive from the pOint of view of minimising the
resources needed to ingest data and provide current users
with access to data.
However the reliance on working software can

provide major problems for Long-Term Preservation when
that software ceases to function. Indefinite long-term
information preservation requires a full and
understandable description of the Representation
Information. An important function of the OAIS is deciding
what parts of the Content Information are the Content Data
Object and what parts are the Representation Information.
This aspect is critical to a clear understanding of what is
being preserved.
DIGITAL ARCHIVING
Libraries and universities have a stake in helping electronic

publishing to succeed, and therefore have an interest in
establishing secure, persistent and authoritative digital
research libraries. Users' needs will continue to be what
they long have been. Users will want information reliably
locatable, so that when they go there they can expect to find
what they're looking for.
Users will expect information to be available that was
placed in the library's care a long time ago; and they will
expect that the integrity of the information they get from
the library will be assured. The requirements of digital
technologies will change the way most librarians work
throughout research libraries. As it happens, professional
librarians are uniquely qualified to take up the
technological challenge. But if they do not, they will
contribute to the stagnation of their own profession as well
as fail in their professional responsibility to civilisation.
"Preservation" in libraries has until now been a matter of
preserving the artifact which provides the work inherent in
it, thereby preserving the work itself.
Electronic documents, by contrast, force the
preservation considerations to divide into two: the
preservation of the objects, as before, but also the
preservation of the information contained in those objects
and which is now so easily separable from them. The

primary requirement for a digital research library is that
from the start it be committed to organising, storing and
providing electronic information for perio~of time longer
than human lives. Implementation of a digital research
library will require that specific tasks be accomplished and
that several commitments be undertaken.
Electronic Information Preservation

Preservation of electronic information needs to be looked
at as comprising three distinct tasks:
medium preservation,
- technology preservation and
- intellectual preservation.
What is new about preservation in the digital environment
is that digital information must now be dealt with
separately from its medium. A crude analogy might be to
place a book on a closet shelf and close the door for 500
years. At the end of that time, broadly speaking, one can
open the door and read that book. With an electronic
resource we don't have that confidence even after ten years:
the, device on which it is recorded may deteriorate, the
technology for its use is liable to obsolescence, and the
contents may easily and invisibly have been changed.
Over the years the recording medium of tapes flakes
off fts support, or the support itself gets brittle. CD-ROMs
are still nQt considered to have a dependable life of over 15
years, for if air enters through the plastic cover the metal
substrate quickly corrodes. Aside from proper
environmental and handling controls, the solution has been
to "refresh" the information, that is, to copy it from the
potentially deteriorating medium to another, fresher
medium of the same or a similar kind. Except for device-
dependent formatting, the information itself is not changed
in any way detectable by the user or its application

program.
The preservation of the medium on which the bits and
bytes of electronic information are recorded is an important
concern. But such solutions will inevitably be short-term,
and will not in themselves be the means of preserving
information over long periods of time. Information
migration, or technological preservation, is the most
problematic of the digital archiving challenges.
Some of the evident technical problems are how to
assure forward compatibility of information files within
subsequently-developed application programs, given the
short life-span of program versions and of their supporting
corporate creators. The logistical question is posed: should
. information be migrated forward in time as new programs
supersede old, or should information only be migrated
forward to a new program when it is specifically needed?
Intellectual preservation addresses the integrity and
authenticity of the information as originally recorded.
Preservation of the media and of the software technologies
will serve only part of the need if the information content
has been corrupted from its original form, whether by
accident or design. The need for intellectual preservation
arises because the great asset of digital information is also
its great liability: the ease with which an identical copy can
be quickly and flawlessly made is paralleled by the ease
with which a change may undetectably be made. It is very
easy to replace an electronic dataset with an updated copy,
and the replacement can have wide-reaching effects.
The processes of authorship produce different
versions which in an electronic environment can easily go
into broad circulation; if each draft is not carefully labeled
and dated it is difficult to tell which draft one is looking
at, or whether one has the "final" version of a work. It is
the durability of those textual forms (books)that ultimately
secures the continuing future of our past; it is the
evanescence of the new ones that poses the most critical

problem for bibliography and any further history
dependent upon its scholarship.
Society, like the individual, becomes senile in
proportion as it loses its continuous memory, and electronic
texts are now part of that memory, significant products of
our civilisation. There is a new urgency with the arrival of
computer-generated texts. The demands made by the
evolution of texts in such forms, the speed with which
versions are displaced one by another, and the question of
their authority, are no less compelling than those we accept
for printed books.
There are three possibilities for change in electronic
texts that confront us with the need for intellectual
preservation techniques:
accidental change;
- intended change that is well-meant;
- intended change that is not well-meant; that is, fraud.
A document can sometimes be damaged accidentally,
perhaps by data loss during transfer or through inadvertent
mistakes in manipulation; for example, data may be
corrupted in being sent over a network or between disks
and memory on a computer. This no longer happens often,
but it is possible. More frequent is the loss of sections of a
document, or a whole version of a document, due to
accidents in updating.
There are at least two possibilities for intended change
that is well-meant. New versions and drafts are familiar to
us from dealing with authorial texts, for example, or from
working with successive book editions, legislative bills, or
revisions of working papers. It is desirable to keep track
bibliographically of the distinction between one version
and another. Readers are accustomed to visual cues to
indicate when a version is different.
In addition to explicit numbering one may observe the

page format, the typography, the producer's name, the
binding, the paper itself. These cues are not available or
dependable for distinguishing electronic versions.
Structural updates, changes that are inherent in the
document, also cause changes in information content. A
dynamic data base by its nature is frequently updated:
Books in Print, for example, or architectural drawings, or
elements of the human genome project. How may one
identify a given snapshot and authenticate it as
representing a certain time? The third kind of change that
can occur is intentional change for fraudulent reasons. The
change might be of one's own work, to cover one's tracks
or change evidence for a variety of reasons, or it might be
damage to the work of another.
In an electronic future the opportunities for revision of
history will be multiplied. An unscrupulous researcher
could change experimental data without a trace. A financial
dealer might wish to cover tracks to hide improper
business, or a political figure might wish to hide or modify
inconvenient earlier views.
Consider the consequences if political opponents
could modify their own past correspondence without
detection. Then consider the case if each of them could
modify the other's correspondence without detection.
Society, as well as each opponent, needs a defense against
such cases.
The need is to fix, or authenticate, a document so that
a, user can be sure of the unaltered text when it is needed.
Such a technique must be easy to use so that it does not
impede creation or access. It must also provide generality,
flexibility, openness where possible but document security
where desired, low cost, and - most of all- functionality
over long periods of time on the human scale. Digital time-
stamping and various forms of digital signatures are among
solutions available for the electronically novel problem of
intellectual preservation. There are likely to be others, each

with their own assets and liabilities. The preservation
community must keep ·aware of potential solutions and
urge implementation of ones most broadly suitable; most
of all, preservationists must be aware that the problem
exists and requires a solution beyond the important
preservation of the media and the technologies.
A library is an organisation, rather than a building or
a collection, than the requirement for institutional
commitment for electronic information to have more than
a fleeting existence.
Organisational Commitments: The organisation of
libraries is already changing as electronic information
increasingly becomes part of their charge. Most
research libraries have had substantial systems
departments which maintain infrastructures while the
librarians take on more and more digital information
responsibilities. Some libraries locate the responsibility
for electronic information distinctly from that for print.
Most libraries are coming to see the forms as
inseparable and include digital responsibilities along
with artifactual responsibilities in assignments for
collection development, cataloguing and public
service.
K
As libraries move more into the electronic environment the
historic tripartite division of libraries into public services,
technical services and collection development continues
functionally but in more fluid arrangements. In addition,
the need for consortial activity has become evident both for
provision and preservation of digital information. People
who combine bibliographic understanding, problem-
solving abilities, negotiating skills and process orientation
will be needed throughout libraries; such staff will take on
the demanding new technical, collection and service
responsibilities for long-term support of digital collections.
Fiscal Commitment: The permanent existence of a digital

research library will require assured continuity in
operational funding. Almost any other library activity
can survive a funding hiatus of a year or more.
Acquisitions, building maintenance, and preservation
can be suspended, or an entire staff can be dispersed
and a library shut down for several years, and the
artifactual collections will more or less survive. But
digital collections, like the online catalog, require
continual maintenance if they are to survive more than
a very brief interruption of power, environmental
control, backup, migration and related technical care.
Online catalog maintenance costs have reached a rough
steady state, and the capital costs for new OPACs are
decreasing relative to the capabilities provided. The
catalog size will continue to increase, but catalog
records are small relative to the information to which
they refer. Digital collections, however, as a proportion
of the library's supply of information, will grow for the
foreseeable future, and the quantity of information
requiring care will become considerable. Unit costs of
storage are likely to continue falling for some time,
which may make the financial burden manageable.
Long term funding will be required to assure long \erm
care. Libraries and their parent institutions will need to
develop new fiscal tools and use familiar fiscal tools for
new purposes. Public institutions, usually constrained
to annual funding, will have particular difficulties, but
existing procedures for capital or plant funding may
provide precedents. One familiar technique is the
endowment. It has been difficult to obtain private
funding for endowments of concepts and services
rather than books and mortar, but it is possible.
Institutions might also build endowments out of
operating funds over periods of time
Trends in Infonnation Archiving 273
Some revenue streams associated with digital research

libraries may be practical. Consortial arrangements
may allow for lease or purchase of shares in a digital
collection. Shorter-term access might be provided to
other institutions on a usage basis. Access could be sold
to certain classes of users, e.g. businesses, non-local
clienteles, or specific information projects. New
relations with publishers, presently difficult to
perceive through the miasmic fog rising from
intellectual property, might provide income for storage
of electronically published materials during the
copyright lifetime in which publishers collect usage
fees. With commitment and imagination long term
fiscal tools will be found.
Institutional Commitment: All these are instrumental
means of accomplishing the greatest requirement, that
of conscious, planned institutional commitment to
preserve that part of human culture which will flower
in electronic form. While museums preserve artifacts,
often beautiful, that embody information, libraries
preserve information that - until now - has been
embedded in artifacts. The advent of electronic
information will accentuate the difference between
these roles as libraries take the responsibility for the
preservation of information in non-artifactual forms.
For the past century most research libraries have been
associated with universities, and this connection seems
likely to continue in the immediate future. Whatever the
governance structure, an institution wishing to benefit
from electronic information will have to make a conscious
commitment to providing resources. Michael Buckland, of
the University of California at Berkeley, has distinguished
between a library's role and its mission. Where the role of
a library is to facilitate access to information, its mission
is to support the mission of its parent institution.
One can extend this to understand that if a university

wishes to continue gaining support for its mission from its
library, it will have to make commitments to the library's
role. In the electronic environment, this means new
longstanding financial commitments which the library and
university together must identify and accomplish. The
commitment will have to be clearly an~ publicly made if
scholars and other libraries are to have confidence that a
given digital collection is indeed likely to exist for the long
term. It is essential that guidelines or standards be
established defining what is meant by a long term
commitment, and defining what electronic repositories of
data can qualify to be called a digital research library. Just
as donors of books, manuscripts and archives look for
demonstration of long term care and conunitment, so too
will scholars and publishers as they create digital
information which requires a home.
REFERENCES
museums", Archives and Museum Informatics Techllical Report, 1 (1).
Conway, P., Digitizing preservation. Library Journal, 1 February 1994,42-
45.
Information Management Journal, 34 (1), January 2000.
Roberts, D., The disposal and appraisal of machine-readable records
from the literature. Archives and Manuscripts, 13, 1985, 30-38.
Waters, D.J., Electronic technologies and preservation: [... J based on a
presentation to the Annual Meeting of the Research Libraries Group,
lune 25, 1992.
10
Information Retrieval in Modern
Libraries
A core service provided by digital libraries is to helping

user find information. It begins with a discussion of
catalogues, indexes, and other summary information used
to describe objects in a digital library; the general name
for this topic is descriptive metadata. This is followed by
a section on the methods used to search bodies of text for
specific information, the subject known as information
retrieval. Many methods of information discovery do not
search the actual objects in the collections, but work from
descriptive metadata about the objects. The metadata
typically consists of a catalog or indexing record, or an
abstract, one record for each object. Usually it is stored
separately from the objects that it describes, but
sometimes it is embedded in the objects.
Descriptive metadata is usually expressed as text, but
can be used to describe information that is in formats
other than text, such as images, sound recording, maps,
computer programs, and other non-text materials, as well
as for textual documents. A single catalog can combine
records for every variety of genre, media, and format.
This enables users of digital libraries to discover materials
in all media by searching textual records about the
materials. Descriptive metadata is usually created by
professionals. Library catalogues and scientific indexes

represent huge investments by skilled people, sustained
over decades or even centuries. This economic fact is
crucial to understanding current trends. On one hand, it
is vital to build on the investments and the expertise
behind them. On the other, there is great incentive to find
cheaper and faster ways to create metadata, either by
automatic indexing or with computer tools that enhance
human expertise.
Catalog records are short records that provide
summary information about a library object. The word
catalog is applied to records that have a consistent
structure, organised according to systematic rules. An
abstract is a free text record that summarises a longer
document. Other types of indexing records are less formal
than a catalog record, but have more structure than a
simple abstract. Library catalogues serve many functions,
not only information retrieval. Some catalogues provide
comprehensive bibliographic information that can not be
derived directly from the objects. This includes
information about authors or the provenance of museum
artifacts. For managing collections, catalogues contain
administrative information, such as where items are
stored, either online or on library shelves.
Catalogues are usually much smaller than the
collections that they represent; in conventional libraries,
materials that are stored on miles of shelving are described
by records that can be contained in a group of card drawers
at one location or an online database. Indexes to digital
libraries can be mirrored for performance and reliability.
Information in catalog records is divided into fields and
sub-fields with tags that identify them.
In digital libraries the role of MARC and the related
cataloguing rules is a source of debate. How far can
traditional methods of cataloguing migrate to support new
Information Retrieval in Modem Libraries
formats, media types, and methods of publishing?

Currently, Machine-Readable Cataloguing (MARC)
cataloguing retains its importance for conventional
materials; librarians have extended it to some of the newer
types of object found in digital libraries, but MARC has not
been adopted by organisations other than traditional
libraries.
The sciences and other technical fields rely on
abstracting and indexing services more than catalogues.
Each scientific discipline has a service to help users find
information in journal articles. The. services include
Medline for medicine and biology, Chemical Abstracts for
chemistry, and Inspec for physics, computing, and related
fields. Each service indexes the articles from a large set of
journals. The record for an article includes basic
bibliographic information, supplemented by subject
information, organised for information retrieval. The
details differ, but the services have many similarities. Since
abstracting and indexing services emerged at a time when
computers were slower and more expensive than today, the
information is structured to support simple textual
searches, but the records have proved to be useful in more
flexible systems.
Scientific users frequently want information on a
specific subject. Because of the subtleties of language,
subject searching is unreliable unless there is indexing
information that describes the subject of each object. The
subject information can be an abstract, keywords, subject
terms, or other information. Some services ask authors to
provide keywords or an abstract, but this leads to gross
inconsistencies. More effective methods have a professional
indexer assign subject information to each item. An
effective but expensive approach is to use a controlled
vocabulary. Where several terms could be used to describe
a concept, one is used exclusively. Thus the indexer has a
list of approved subject terms and rules for applying them.
Cataloguing and indexing are expensive when carried

out by skilled professionals. A rule of thumb is that each
record costs about fifty dollars to create and distribute. In
certain fields, such as medicine and chemistry, the demand
for information is great enough to justify the expense of
comprehensive indexing, but these disciplines are the
exceptions. Even monograph cataloguing is usually
restricted to an overall record of the monograph rather than
detailed cataloguing of individual topics within a book.
Most items in museums, archives, and library special
collections are not catalogued or indexed individually.
In digital libraries, many items are worth collecting but
the costs of cataloguing them individually can not be
justified. The numbers of items in the collections can be
very large, and the manner in which digital library objects
change continually inhibits long-term investments in
catalogues. Each item may go through several versions in
quick succession. A single object may be composed of many
other objects, each changing independently. New
categories of object are being continually devised, while
others are discarded.
Frequently, the user's perception of an object is the
result of executing a computer program and is different
with each interaction. These factors increase the complexity
and cost of cataloguing digital library materials. For all
these reasons, professional cataloguing and indexing is
likely to be less central to digital libraries than it is in
traditional libraries. The alternative is to use computer
programs to create index records automatically. Records
created by automatic indexing are normally of poor quality,
but they are inexpensive. A powerful search system will go
a long way towards compensating for the low quality of
individual records.
The Web search programs prove this point. They build
their indexes automatically. The records are not very good,
but the success of the search services shows that the indexes
Information Retrieval in Modem Libraries 279
are useful. At least, they are better than the alternative,

which is to have nothing.
Much of the development that led to automatic
indexing came out of research in text skimming. A typical
problem in this field is how to organise electronic mail. A
user has a large volume of electronic mail messages and
wants to file them by subject. A computer program is
expected to read through them and assign them to subject
areas. This is a difficult problem for people to carry out
consistently and is a very difficult problem for a computer
program, but steady progress has been made.
The programs look for clues within the document.
These clues may be structural elements, such as the subject
field of an electronic mail message, they may be linguistic
clues, or the program may simply recognise key words. The
Altavista indexing program was able to identify the title
and author. For example, the page includes the tagged
_ element:
<title> Digital library concepts< / title>
These tags to guide Web browsers in displaying the article.
They are equally useful in providing guidance to automatic
indexing programs. One of the potential uses of mark-up
languages, such as SGML or XML, is that the structural tags
can be used by automatic indexing programs to build
records for information retrieval. Within the text of a
document, the string, "Marie Celeste" might be the name
of a person, a book, a song, a ship, a publisher, a play, or
might not even be a name. With structural mark-up, the
string can be identified and labeled for what it is. Thus,
information provided by the mark-up can be used to
distinguish specific categories of information, such as
author, title, or date.
The exact costs- are commercial secrets, but they are a
tiny fraction of one cent per record. For the cost of a single
record created by a professional cataloguer or indexer,
computer programs can generate a hundred thousand or

more records. It is economically feasible to index huge
numbers of items on the Internet and even to index them
again at frequent intervals. Creators of catalogues and
indexes can balance costs against perceived benefits.
The most expensive forms of descriptive metadata are
the traditional methods used for library catalogues, and by
indexing and abstracting services; structuralist Dublin Core
will be moderately expensive, keeping most of the benefits
while saving some costs; minimalist Dublin Core will be
cheaper, but not free; automatic indexing has the poorest
quality at a tiny cost.
Descriptive metadata needs to be associated with the
material that it describes. In the past, descriptive metadata
has usually been stored separately, as an external catalog
or index. This has many advantages, but requires links
between the metadata and the object it references. Some
digital libraries are moving in the other direction, storing
the metadata and the data together, either by embedding
the metadata in the object itself or by having two tightly
linked objects. This approach is convenient in distributed
systems and for long-term archiving, since it guarantees
that computer programs have access to both the data and
the metadata at the same time.
Mechanisms for associating metadata with Web pages
have been a subject of considerable debate. For an HTML
page, a simple approach is to embed the metadata in the
page, using the special HTML tag. These are the meta tags
from an HTML description of the Dublin Core Element Set.
Note that the choice of tags is a system design decision. The
Dublin Core itself does not specify how the metadata is
associated with the material.
Since meta tags can not be used with file types other
than HTML and rapidly become cumbersome, a number of
organisations working through the World Wide Web
Consortium have developed a more general structure
known as the Res<?urce Description Framework (RDF). The

Resource Description Framework (RDF) is a method that
has been developed for the exchange of metadata. It has
been developed by the World Wide Web Consortium,
drawing concepts together from several other efforts,
including the PICS format, which was developed to provide
rating labels, to identify violence, pornography, and similar
characteristics of Web pages.
The Dublin Core team is working closely with the RDF
designers. A metadata scheme, such as Dublin Core, can be
considered as having three aspects: semantics, syntax, and
structure. The semantics describes how to interpret
concepts such as date or creator. The syntax specifies how
the metadata is expressed. The structure defines the
relationships between the metadata elements, such as the
concepts of day, month and year as components of a date.
RDF provides a simple but general structural model to
express the syntax. It does not stipulate the semantics used
by a metadata scheme. XML is used to describe a metadata
scheme and for exchange of information between computer·
systems and among schemes. The structural model consists
of resources, property-types, and values. In the Dublin Core
metadata scheme, this can be represented as:
Resource Property-type Value

Hamlet -----I.. creator -----I••Shakespeare
--.-1.. type -----I.. play

A different metadata scheme, might use the term author in
place of creator, and might use the term type with a
completely different meaning. Therefore, the'RDF mark-up
would make explicit that this metadata is expressed in the
Dublin core scheme:
<DC:creator> Shakespeare</DC:creator>
<DC:type> play</DC:type>
•
To complete this example, Hamlet needs to be identified

more precisely. Then the full RDF record, with XML mark-
up, is:
<RDF:RDF>
<RDF:description RDF:about = "http:/ /
hamlet.org/ ">
<OC:creator> Shakespeare</DC:creator>
<OC:type> play</OC:type>
</RDF:descri ption>
</RDF:RDF>
The mark-up in this record makes explicit that the terms

description and about are defined in the RDF scheme, while
creator and type are terms defined in the Dublin Core (DC).
One more step is needed to complete this record: the
schemes RDF and DC must be defined as XML
namespaces.The RDF structural mode] permits resources to
have property-types that refer to other resources. For
example, a database might include a record about
Shakespeare with metadata about him, such as when and
where he lived, and the various ways that he spelled his·
name. The DC:Creator property-type could reference this
record as follows:
<DC:creator RDF:about = ''http://people.net/WS/''>
In this manner, arbitrarily complex metadata descriptions
can be built up from simple components. By using the RDF
framework for the syntax and structure, combined with
XML representation, computer systems can associate
metadata with digital objects and exchange metadata from
different schemes.
INFORMA TlON RETRIEVAL TECHNIQUES
Information retrieval is a field in which computer

scientists and information professionals have worked
together for many years. It remains an active area of

research and is one of the few areas of digital libraries to
have a systematic methodology for measuring the
performance of various methods. The various methods of
information retrieval build on some simple concepts to
search large bodies of information. A query is a string of
text, describing the information that the user is seeking.
Each word of the query is called a search term. A query
can be a single search term, a string of terms, a phrase in
natural language, or a stylised expression using special
symbols. Some methods of information retrieval compare
the query with every word in the entire text, without
distinguishing the function of the various words. This is
called full text searching.
Other methods identify bibliographic or structural
fields, such as author or heading, and allow searching on
specified field, such as "author = Gibbon". This is called
fielded searching. Full text and fielded searching are both
powerful tools, and modern methods of information
retrieval often use the techniques in combination. Fielded
searching requires some method of identifying the fields.
Full text searching does not require such support. By taking
advantage of the power of modern computers, full text
searching can be effective even on unprocessed text, but
heterogeneous texts of varying length, style, and content
are difficult to search effectively and the results can be
inconsistent.
The legal information systems are based on full text
searching; they are the exceptions. When descriptive
meta data is available, most services prefer either fielded
searching or free text searching of abstracts or other
meta data. Some. words occur so frequently that they are of
little value for retrieval. Examples include common
pronouns, conjunctions, and auxiliary verbs, such as "he",
"and", "be", and so on.
Most systems have a list of common words which are

ignored both in building inverted files and in queries. This
is called a stop list. The selection of stop words is difficult.
The choice clearly depends upon the language of the text
and may also be related to the subject matter. For this
reason, instead of have a predetermined stop list, some
systems use statistical methods to identify the most
commonly used words and reject them. Even then, no
system is perfect. There is always the danger that some
perfectly sensible queries might be rejected because every
word is in the stop list, as with the quotation, "To be or not
to be?"
An inverted file is a list of the words in a set of
documents and their locations within those documents.
Here is a small part of an inverted file.
Word Document Location

abacus 3 94
19 7
19 212
actor 2 66
19 200
29 45
aspen 5 43
atoll 11 3
34 40
This inverted file shows that the word "abacus" is word

94 in document 3, and words 7 and 212 in document 19;
the word "actor" is word 66 in document 2, word 200 in
document 19, and word 45 in document 29; and so on.
The list of locations for a given word is called an inverted
list. An inverted file can be used to search a set of
documents to find every occurrence of a single search
term. In the example above, a search for the word "actor"
would look in the inverted file and find that the word.
appears in documents 2, 19, and 29. A simple reference to
an inverted file is typically a fast operation for a
computer.
Most inverted lists contain the location of the word
within the document. This is important for displaying the
result -of searches, particularly with long documents. The
section of the document can be displayed prominently
with the search terms highlighted. Since inverted files
contain every word in a set of documents, except stop
words, they are large.
For typical digital library materials, the inverted file
may approach half the total size of all the documents,
even after compression. Thus, at the cost of storage space,
an inverted file provides a fast way to find every
occurrence of a single word in a collection of documents.
Most methods of information retrieval use inverted files.
VIRTUAL LIBRARY AND INFORMATION RETRIEVAL
The document collection of a virtual library consists of

digital documents and Internet resources. A Internet
resource is a link to other digital documents which are
stored elsewhere in the Internet. Thus, only the link is
under control of the virtual library and not the document
to which the link points. In addition, a virtual library
provides digital catalogues, containing metadata about the
document collection. A virtual library must accomplish as
far as possible all necessary services of conventional
libraries and must also exploit the advantages of the
technology used.
There are always three components involved in
communication: a source, one or several destinations, and
a medium. The source forwards the information which is
transmitted by a medium (e.g. text or sound) and finally
received by the destinations.
Depending on the conditions and situations,

communication can take various forms, and several
different proposals exist to capture them.
Communication is synchronous and non-distributed,
source and destination are communicating with each
other at the same time in the same place, e.g. talk.
Communication is asynchronous and distributed, if
other at the same time in different places, e.g. a phone
call.
Communication is asynchronous and non-distributed,
if source and destination are communicating with each
other at different times in the same place, e.g. a notice
board.
Communication is asynchronous and distributed, if
other at different times in different places, e.g. e-mail.
In, a formal communication is always based on an arranged
appointment while an informal communication is
spontaneous and takes place accidentally. For instance, a
real time video conference is mostly a formal
communication which is synchronous and distributed. If a
communication is asynchronous it cannot be classified as
formal and informal with respect to initiation.
Nevertheless, in some publications, e.g., the terms
IIasynchronous, formal" and asynchronous, informal"
II
communication are used to characterise the content of the

information communicated. For instance, refers to users
sharing their home page by allowing other users access to
it as informal communication.
Conventional libraries are meeting places where
students and researchers meet each other either
accidentally or by arr:angement. Often conventional
libraries have areas for collaboration where people can
work jointly. Further, most of the services a library provides
are highly interactive and are based on communication

between the people involved.
There is formal, synchronous and non-distributed
communication in conventional libraries when
librarians introduce patrons to the use of the library.
Formal, synchronous and distributed communication
is required if the document collection of a conventional
library is distributed to several buildings. For instance,
a phone conference system can be used for this kind of
communication.
Informal, synchronous and non-distributed
communication is found in conventional libraries
because students and researchers often meet each other
accidentally in the library. This is probably the most
common form of communication in conventional
libraries.
Informal, synchronous and distributed communication
is common in the conventional library as well. For
instance, patrons call librarians and ask for documents
they have reserved.
An example for a asynchronous and non-distributed
communication is a notebook in which patrons can add
suggestions to or complaints about the library. Once a
suggestion or complaint is made, the library can
comment on that in the same book.
Finally, asynchronous and distributed communication
is found in conventional libraries since researchers can
communicate their research results to a broad
community through publishing in journals or in books.
A conventional library which is defined as the building
where its document collection is stored, there is no longer
a library building for a virtual library. Instead, ~he
document collection of a virtual library is distributed
among several servers in different places. Thus, people
cannot meet each other accidentally or by arrangement in
the virtual library. Due to this, synchronous commun-

ication in virtual libraries is only possible if it is
distributed. One might argue that people can work jointly
at a computer at the client side and communicate with
each other in a non-distributed way. Even though this is
a non-distributed and synchronous communication it is
outside the scope of the virtual library and cannot be
under its control. Normally, the document collection of a
virtual library is distributed. This fact allows librarians to
work in different places.
Nevertheless, to achieve a common aim, commun-
ication is still necessary for collaboration. Therefore, a
virtual library should provide corresponding services.
Shared workspaces similar to virtual office systems, or
electronic meeting rooms should exist in virtual libraries.
Apart from research prototypes, there are also
communication tools that use the capabilities of the Internet
and are normally available on the Internet. DigiPhone or
VocalTec are prominent tools for phone conferencing on the
Internet. Further, MBone and Cu-SeeMe are often used for
video conferences. For instance, the use of MBone in a
virtual library setting is known from the RPID project. Also, '
a phone conferencing tool called CoolTalk can be connected
to the most recent release of the Netscape Browser called
Atlas. In addition, CoolTalk offers a white board, allowing
users to -cOIiaborate on the same data (text and
graphic).Finally, Marc Andreessen has recently announced
that Nestcape will be developing a tool called InSoft. With
InSoft phone calls will directly be possible from the
Netscape navigator.
Spontaneity is a typical cha-racteristic for informal
communication. Hence, a virtual library should provide
functionalities allowing users to get an overview of who
else is on-line in the library. Users should then be able to
choose one or several users with whom to communicate.
For this, the tools introduced above can be employed.
Infonnation Retrieval in Modem Libraries 289
In conventional libraries, a notice board represents this

form of communication. Similar to this, a predefined space
in a virtual library can assume the function of a notice
board. This space might be an HTML page to which users
can add links to the information to be communicated.
Another example is the use of annotations to documents.
For instance, Hyper-G and its commercial version
HyperWave afford users the ability to annotate documents
or other annotations. Finally, virtual libraries can use text-
based discussion systems, like HyperNews, which are
designed for asynchronous and non-distributed
communication.
Basically, asynchronous and distributed commun-
ication is found in every virtual library because researchers
publish their research results for example in electronic
journals which are available in each virtual library that
subscribes to them. In addition, there are other possibilities
for this form of communication. For instance, Hyper-G
provides a service allowing users to see who else is on-line.
One can select a single user or all on-line users with whom
to start communicating using an integrated message
system. Unlike e-mail, the message system displays the
message on the screen of the receiver immediately after the
sender has sent it. A drawback of this message system is
that only one user or all users (but not an arbitrary number
of users) can be chosen for communication. Thus,
communication within a group is not possible.
Communication and collaboration will be one of the
main research fields in the context of virtual libraries. A big
challenge for us is to identify and to implement new
communication services for virtual libraries. First results
show that communication can be based on digital agents.
Basically, a digital agent is a process which carries out a
well-defined task. Two new scenarios for communication
in virtual libraries then arise. First, digital agents can
support the communication betweei1. users. Second, users
can not only communicate with other users but also with
digital agents.
REFERENCES
Feeney, M. (ed.), Digital culture: maximising the nation's investment: a

synthesis of /ISC/NPO studies on the preservation of electronic
Fishbein, M.H., Appraising information in machine language form.
American Archivist, 35, 1972,35-43.
Hills, S., Electronically published material and the archival library.
Electronic Publishing Review, 5 (1), 1985, 63-72
11
Information Access in Digital Libraries
In the fifty years that preservation has been emerging as a

professional speciality in libraries and archives, the
preservation and access responsibilities of an archive or
library have often been in tension. "While preservation is
a primary goal or responsibility, an equa~y compelling
mandate-access and use-sets up a classic conflict that must
be arbitrated by the custodians and caretakers of archival
records," states a fundamental textbook in the field. The
intimate relationship between preservation and access has
changed in ways that mirror the technological environment
of cultural institutions.
Preservation OR Access: In the early years of modem
archival agencies-prior to World War II-preservation
simply meant collecting. The sheer act of pulling a
collection of manuscripts from a bam, a basement, or a
parking garage and placing it intact in a dry building with
locks on the door fulfilled the fundamental preservation
mandate of the institution. In this regard preservation and
access are mutually exclusive activities. Use exposes a
collection to risk of theft, damage, or misuse of either
content or object. The safest way to ensure that a book lasts
for a long time is to lock it up or make a copy for use.
Preservatioll AND Access: Modern preservation
management strategies posit that preservation and access
are mutually reinforcing .ideas. Preservation action is taken
on an item so that it may be used. In this view, creating a

preservation copy on microfilm of a deteriorated book
without making it possible to find the film is a waste of
money. In the world of preservation AND access, however,
it is theoretically possible to fulfil a preservation need
without solving access problems. Conversely, access to
scholarly materials can be guaranteed for a very long
period, indeed, without taking any concrete preservation
action on them.
Preservation IS Access: Librarians and archivists
concerned about the preservation of electronic records
sometimes view the two concepts as cause and effect. The
act of preserving makes access possible. Equating
preservation with access, however, implies that
preservation is defined by availability, when indeed this
construct may be getting it backwards. Preservation is no
more access than access is preservation. Simply refocusing
the preservation issue on access over~implifies the
preservation issues by suggesting that access is the engine
of preservation without addressing the nature of the thing
being preserved.
Preservation ofAccess: In the digital world, preservation
is the action and access is the thing-the act of preserving
access. A more accurate construct simply states "preserve
accessibility." When transformed in this way, a whole new
series of complexities arises.· The content, structure, and
integrity of the digital product assume center stage-and the
ability of a machine to transport and display this product
becomes an assumed end result of the preservation action
rather than its primary goal. Control over accessibility,
especially the capacity of the system to export digital image
files (and associated indexes) to future generations of the
technology, can be exercised in part through prudent
purchases of only nonproprietary hardware and software
components. In the present environment, true plug-and-
play components are more widely available.
Information Access in Digital Libraries 293
The financial commitment by librarians and archivists

is one of the only incentives that vendors have to adopt
open system architectures or at least provide better
documentation on the inner workings of their systems.
Additionally, librarians and archivists can influence
vendors and manufacturers to provide new equipment that
is backward compatible with existing systems. This
capability assists image file system migration in the same
way that today's word processing software allows access to
documents created with earlier versions. Much as they
might wish otherwise, digital product developers have
little or no control over the life expectancy of a given digital
image system and the decision to abandon that system.
ROLE OF ELECTRONIC RECORDS
During the last four decades, archiving-the permanent

preservation of information of enduring value for access by
future generations-has undergone a major change. Before
the advent of large bureaucracies supported by the now
ubiquitous computer, archivists dealt with a scarcity of
sources, with much of their efforts focused on tracking
down unique manuscripts or recovering incomplete files.
The archived records were relatively durable-clay tablets,
stone, parchment, vellum, or rag paper. Albeit scarce and
often incomplete, these records come down through the
centuries relatively intact and could be preserved with little
or no difficulty.
The growth of the government, complex
organisations, and advent of the electronic age have
reversed the conditions facing today's archives: rather than
dealing with scarce sources, the archives are facing a flood
of potentially valuable information stored on fragile
materials, including pulp paper and computer tapes and
disks. While the preservation of information recorded on
traditional materials such as paper or film requires
significant resources, the current major archival challenge
is the preservation of electronic records. Like traditional

archival materials-books, papers, or film-electronic
information is recorded on media that deteriorate with age.
However, unlike the traditional archival materials,
electronic records are stored in specific formats and cannot
be read without software and hardware-sometimes the
specific types of hardware and software on which they
were created. The rapid evolution of information
technology makes the task of managing and preserving
electronic records complex and costly.
Agencies are increasingly moving to an operational
environment in which electronic-rather than paper-records
provide comprehensive documentation of their activities
and business processes. Part of the challenge of managing
electronic records is that they are produced by a mix of
information systems, which vary not only by type but by
generation of technology: the mainframe, the persona]
computer, and the Internet. Each generation of technology
brought in new systems and capabilities without displacing
the older systems. Thus, organisations have to manage and
preserve electronic records associated with a wide range of
systems, technologies, and formats.
The challenge of managing and preserving vast and
rapidly growing volumes of electronic records produced by
modern organisations is placing pressure on the archival
community and on the information industry to develop a
cost-effective long-term preservation strategy that would
free electronic records of the straitjacket of proprietary file
formats and software and hardware depehdencies. This
challenge is affected by severa] factors: decentralisation of
the computing environment, the complexity of electronic
records, obsolescence and aging of storage media, massive
volumes of electronic records, and software and hardware
dependencies.
Decentralisation of computing environment: The
challenge of managing electronic records Significantly
increases with the decentralisation of the computing

environment. In response to the difficulty of manually
managing electronic records, agencies are slowly turning to
automated records management applications to help
automate electronic records management life-cycle
processes. The primary functions of these applications
include categorising and locating records and identifying
records that are due for disposition, as well as storing,
retrieving, and disposing of electronic records that are
maintained in repositories. Also, some applications are
beginning to be designed to automatically classify
electronic records and assign them to an appropriate
records retention and disposition category.
The Department of Defense (DOD), which is
pioneering the assessment and use of records management
applications, has published application standards and
established a certification program. There is a general
consensus in the archival community that a viable strategy
for the long-term preservation and archiving of electronic
records has yet to be developed.
In US, the Library of Congress' National Digital
Informatlon Infrastructure and Preservation Program is a
national tooperative effort led by the Library to develop the
strategy and technical approaches nee-ied to archive and
preserve digital information; NARA is also participating in
this effort. The program is in an early stage; completion is
not expected until 2004 or 2005, when the Library will
provide recommendations to the Congress. NARA is
collaborating in a joint effort on electronic record archiving
with the Defense Advanced Research Projects Agency
(DARPA), the U.S. Patent and Trademark Office, the
National Partnership for Advanced Computational
Infrastructure, and the San Diego Supercomputer \:enter.
Led by DARPA, the collaboration aims to develop and
demonstrate architectures and technologies for electronic
archiving and the development of persistent object
preservation, a proposed technique for electronic archiving.

These initiatives are all in their early stages; none of them
has yet yielded proof-of-concept prototypes demonstrating
the viability of a long-term solution to preserving and
accessing electronic records. Progress has been made,
however, in the development of a standard model for
electronic archiving systems.
The Open Archival Information System (OAIS) model,
which is currently emerging as a standard in the archival
community, was initially developed by the National
Aeronautics and Space Administration (NASA) for
archiving the large volumes of data produced by space
missions. However, the model is applicable to any archive,
digital library, or repository. As a standard framework for
long-term preservation archives, the model defines the
environment necessary to support a digital repository and
the interactions within that environment.
In the archival storage area, systems pass the
information, now called archival information packages,
into a storage repository, where it is maintained until the
contents are requested and retrieved. The data
management area encompasses the services and functions
for populating, maintaining, and accessing both descriptive
information that identifies and documents archive holdings
and administrative data used to manage the archive. The
administration area provides the services and functions for
the overall operation of the archive system. In the
preservation planning area, systems monitor the
environment of the OAIS and provide recommendations to
ensure that the information stored in the OAIS remains
accessible, even if the original computing environment
becomes obsolete.
The access area includes systems that allow a user to
determine the existence, description, location, and
availability of information stored in the OAIS, allowing
information products to be requested and received. The
Infonnation Access in Digital Libraries 297
OAIS framework does not presume or apply any particular

preservation strategy. This approach allows organisations
that adopt the framework to apply their own strategies or
combinations of strategies. The framework does assume
that the information managed is produced outside the
OAIS, and that the information will be disseminated to
users who are also outside the system. NARA is taking
action to respond to long-standing problems associated
wi th managing and preserving electronic records in
archives. In 2001, NARA completed an assessment of
governmentwide records management practices.
This assessment concluded that although agencies are
creating sufficient records and maintaining them
appropriately, most electronic records remain unscheduled,
and permanent records of historical value are not being
identified and provided to NARA for preservation and
archiving. The problems in electronic records management
appear to stem from (1) inadequate governmentwide
records management guidance and (2) the low priority
traditionally given to federal records management
functions and a lack of technology tools to manage
electronic records. To address these problems, NARA now
plans to (1) analyse key policy issues related to the
disposition of records and improve its guidance and (2)
examine and redesign, if necessary, the scheduling and
appraisal process and make this process more effective
through the use of technology.
NARA's plans, however, do not address the low
priority given to records functions. Further, these plans do
not address the need to monitor performance of records .
management programs and practices on an ongoing basis.
Records must be effectively managed throughout their life
cycle, which includes records creation, maintenance and
use, and scheduling and disposition. Agencies must create
reliable records that meet the business needs and legal
responsibilities of federal programs and (to the extent
known) the needs of internal and external stakeholders

who may make secondary use of the records.
To maintain and use the records created, agencies are
to create internal recordkeeping requirements for
maintaining records, consistently apply these
requirements, and establish systems that allow them to find
records that they need. Scheduling is the means by which
NARA and agencies identify federal records, determine
time frames for disposition, and identify permanent records
of historical value that are to be transferred to NARA for
preservation and archiving.
With regard particularly to electronic records, agencies
are also to compile inventories of their information systems,
after which the agency is required to develop a schedule for
the electronic records maintained in those systems. SRA
International-and a series of records system analyses
performed by NARA staff. The SRA study was based on a
survey of federal employees representing over 150 federal
government organisations and on 54 focus groups and
interviews involving individuals from 18 agencies; the
NARA staff's records system analyses focused on records
management practices for key business processes in 11
federal agencies. The resulting NARA/SRA study
identified problems in agency records management.
Specifically, NARA's assessment of records management
for key processes in 11 agencies concluded the following.
Records creation: In general, the NARA study showed
that the processes that were studied appeared to
generate adequate records documentation.
Records maintenance and use: For the most part,
recordkeeping requirements were adequate,
documented, and consistently applied. In addition,
employees were generally able to find the records that
they needed.
Records scheduling and disposition: The study identified
significant problems in both records scheduling and

disposition. According to the study, many significant
records-as well as most federal electronic records-are
unscheduled. In addition to the unscheduled records,
NARA identified several significant records that had
been improperly scheduled.
Scheduling the electronic records in a large number of
major information systems presents an enormous
challenge, particularly since it generally takes NARA, in
conjunction with agencies, well over 6 months to approve
a new schedule. Failure to inventory systems and schedule
records places these records at risk.
The absence of inventories and schedules means that
NARA and agencies have not examined the contents of
these information systems to identify official government
records, appraised the value of these records, determined
appropriate disposition, and directed and trained
employees in how to maintain and when and how to
dispose of these records. As a result, temporary records
may remain on hard drives and other media long after they
are needed or could be moved to less costly forms of
storage.
The NARA/SRA study identified the lack of sufficient
governmentwide guidance as one cause of records
management problems. As NARA has acknowledged, its
policies and processes on electronic records have not yet
evolved to reflect the modern recordkeeping environment:
records created electronically in decentralised processes.
Despite repeated attempts to clarify its electronic
records guidance through a succession of NARA bulletins,
the current guidance remains incomplete and confusing.
According to the study, for example, employees lack
knowledge concerning how to identify electronic records
and what to do with them once identified. The guidance
does not provide disposition instructions for electronic
records maintained in many of the common types of
formats produced by federal agencies, including PDF files,

Web pages, and spreadsheets. The NARA/SRA study
concluded that while agencies appreciate the specific
assistance from NARA personnel, they are frustrated
because they perceive that NARA is not meeting agencies'
broader needs for guidance and records management
leadership.
This study reported that agencies believe that NARA
has a responsibility to lead the way in transitioning to an
electronic records environment and to provide guidance
and standards, as well as tools to enable agencies to follow
the guidance. According to the study, some viewed NARA
as leaving agencies to fend for themselves, sometimes
levying impossible requirements that pressure agencies to
come up with their own individual solutions. The NARA/
SRA study identified another cause of records management
difficulties: the low priority generally afforded to records
management programs.
The study states that records management is not even
"on the radar scope" of agency leaders. Further, records
officers have little clout and do not appear to have much
involvement in or influence on programmatic business
processes or the development of information systems
designed to support them. New government employees
seldom receive any formal, initial records management
training. Further, records ma~gement is generally
considered a "support" activity. Since support functions are
typically the most dispensable in agencies, resources for
and focus on these functions are often limited. This finding
was echoed by a recent review of archival practices of
research universities, corporate research and development
programs, and federal science agencies, which noted that
"agency records management programs lack the resources
to meet even the legally required standards of securing
adequate documentation of their programs and activities."
As indicated by the NARA/SRA study, a related issue

is the technical challenge of electronic records management:
effective electronic records management may require more
sophisticated and expensive information technology than
was previously necessary for paper-based records
management programs.
Because management tends not to focus on records
management, priority has not been given to acquiring or
upgrading the technology required to manage records in an
electronic environment. The study noted that technology
tools for managing electronic records do not exist in most
agencies, and further, that agency information technology·
environments have not been designed to facilitate the
retention and retrieval of electronic records. As a result,
despite the growth of electronic media, agency records
systems are predominantly in paper format rather than
electronic. The study further noted that agencies planning
or piloting automated electronic records management
systems perform better recordkeeping than those without
such tools.
Typically, such agencies are already performing better
recordkeeping, and they tend to invest in electronic records
management systems because of the value they place on
good records management. According to the study, many
agencies are either planning or piloting information
technology initiatives to support electronic records
management, but their movement to electronic systems is
constrained by the level of financial support provided for
records management. NARA is responsible, under the
Federal Records Act, for conducting inspections or surveys
of agency records and records management programs and
practices. Its implementing regulations require NARA to
select agencies to be inspected (1) on the basis of perceived
need by NARA, (2) by specific request by the agency, or (3)
on the basis of a compliance monitoring cycle developed by
NARA. In all instances, NARA is to determine the scope of.
the inspection. Such inspections provide not only the

means to assess and improve individual agency records
management programs but also the opportunity for NARA
to determine overall progress in improving agency records
management and identify problem areas that need to be
addressed in its guidance.
Between 1996 and 2000, NARA performed 16
inspections of agency records management programs, or
about 3 per year. These reviews were systematic and
comprehensive, covering all aspects of an agency's records
program. However, only 2 of the 24 major executive
departments or agencies were evaluated, with most of
NARA's evaluations focused on component organisations
or independent agencies. Moreover, these evaluations
frequently bypassed the issue of electronic records. In 2000,
NARA replaced agency evaluations with a new inspection
approach-targeted assistance.
NARA decided that its previous approach to
inspections was basically flawed: besides reac~ung only a
few agencies, it was often perceived negatively by agencies
and resulted in a list of records management problems that
agencies then had to resolve on their own. Under the
targeted assistance approach, NARA enters into
partnerships with federal agencies to provide them with
guidance, assistance, or training in any area of records
management. Services offered include expedited review of
critical schedules, tailored training, and help in records
disposition and transfer. However, although this approach
may improve records management in the targeted agencies,
it is not a substitute for systematic inspections and
evaluations of federal records programs.
Because the targeted assistance program is voluntary
and, according to NARA, initiated by a written request
from the agency, relying on it exclusively could
significantly limit NARA's evaluations of federal
recordkeeping. First, only agencies requesting targeted
assistance-presumably those already having greater

appreciation of the importance of records management-are
evaluated. Second, the scope and the focus of the targeted
assistance are not determined by NARA but by the
requesting agency. NARA has recognised that its policy
and regulations for the management and disposition of
electronic records must be revised to provide agencies with
clear and comprehensive guidance encompassing all types
and formats of electronic records.
Having completed its assessment of federal records
management practices, NARA now plan a two-phase
project to (1) analyse key policy issues related to the
disposition of records and improve governmentwide
guidance, and (2) examine and redesign, if necessary, the
scheduling and appraisal process and make this proce~s
more effective through the use of technology. According to
NARA, the purpose of the first phase of the project is to
analyse and make decisions, as necessary, on key policy
issues related to determining the disposition of records.
NARA plans to evaluate current legislation, regulations,
and guidance to determine if these are adequate in the
current recordkeeping environment. NARA expects the
outcome of the first phase, scheduled for completion by the
end of fiscal year 2002, to be policy decisions that support
the appropriate disposition of all government
documentation in today's multimedia environment.
In the second phase, NARA plans to examine and
redesign, if necessary, the process used by the federal
government to determine the disposition of records. This is
planned as a multiyear process during which NARA
intends to address the scheduling and appraisal of federal
records in all formats. Currently, it takes NARA well over
6 months to approve a new schedule. According to NARA,
the extensive appraisal time delays action on the
disposition of records and discourages agencies from
submitting schedules, potentially putting essential

evidence at risk. NARA has two goals for this project:
making the process for determining the disposition of
records, regardless of medium, more effective and
efficient and dramatically decreasing the amount of
time it takes to get approval for the disposition of
records from the Archivist of the United States, and
deciding how to appropriately apply technology to
support the revised process for determining the
disposition of records as part of managing records
throughout their life cycle.
Although NARA's plans address the need to improve
guidance and determine how to use technology to support
records management, these plans do not address another
issue raised in its study: the low priority generally given to
records management and the related lack of management
commitment and attention to these functions. Without a
strategy to establish senior-level agency commitment to
records management and raise awareness of its importance
to the federal government, these programs are likely to
continue to be regarded by agency management and
employees as low-priority "support" functions.
In addition, NARA's plans do not address the issue of
systematic inspections. While the results of its recent study
provide a baseline of governmentwide records mana-
gement practices, NARA's targeted assistance approach
does not provide systematic and comprehensive
information to assess progress over time. Without this type
of data, NARA will be impaired in its ability to determine
if it is achieving results in improving agency records
management.
Further, NARA may not have the means to identify
agency implementation issues and areas where its guidance
needs to be clarified, augmented, and strengthened. The
feedback provided by inspection is especially critical now
as NARA plans to redesign the scheduling and appraisal

process, and improve its guidance. Archiving-the final
phase of records management for permanent records-
presents a significant challenge when records are electronic.
In light of the growth in the volume, complexity, and
diversity of electronic records, NARA has recognised that
its technical strategies to support preservation,
management, and sustained access to electronic records are
inadequate and inefficient.
To address this challenge, the agency is pursuing two
strategies. Its short-term strategy is to extend the useful life
of its current systems and to create some new systems for
archiving electronic records and for cataloguing and
displaying electronic records on-line. NARA's long-term
strategy, on which it is placing its primary focus, is to
contract with a private sector firm to acquire (that is, obtain)
an advanced electronic records archive (ERA). However,
NARA faces substantial risks in implementing its long-term
strategy. NARA is not meeting its schedule for the ERA
system, largely because of flaws in how the sch~dule was
developed. As a result, the schedule will be compressed,
increasing risks.
NARA's long-term strategic initiative is to develop an
advanced electronic records archive. The agency's goals for
this system are to preserve and provide access to any kind
of electronic record, free from dependency on any specific
hardware or software, so that the agency can carry out its
mission into the future. Although the new archival system
is not yet formally defined, agency documents, public
presentations, and interviews with agency officials and
staff indicate, in broad outline, how they envision this
system. It will probably be a distributed system, allowing
the storage and management massive record collections at ,
a variety of installations, with accessibility provided via the
Internet. It may be based on persistent object preservation,
an advanced form of file format conversion and
encapsulation that is the subject of research sponsored by

NARA and other organisations. The Extensible Markup
Language (XML), which provides a means for "tagging"
(annotating) information in a meaningful fashion that can
be readily interpreted by disparate computer systems.
NARA has indicated that ERA wiJI be a major system,
and that it is likely that it will be developed and
implemented in several phases, with each phase adding
more functions to the system. According to NARA, its
development will take several years, and it will involve a
significant expenditure of resources on program
management, research, and systems development
activities. Information security is an important
consideration for any organisation that depends on
information systems to carry out its mission. That leading
organisations manage their information security risks
through an ongoing cycle of risk management. This
management process involves
establishing a centralised management function to
coordinate the continuous cycle of activities while
providing guidance and oversight for the security of
the organisation as a whole,
identifying and assessing risks to determine what
security measures are needed,
establishing and implementing policies and
procedures that meet those needs,
promoting security awareness so that users understand
the risks and the related policies and procedures in
place to mitigate those risks, and
instituting an ongoing monitoring program of tests and
evaluations to ensure that policies and procedures are
appropriate and effecti ve.
The Clinger-Cohen Act of 1996 requires agencies to
establish an IT investment process that provides the means
for senior management to obtain timely information

regarding the progress of investments in an information
system, including a system of milestones for measuring
progress in terms of cost, timeliness, quality, and the
capability of the system to meet specified requirements.
Weak IT investment management processes significantly
increase the risk that agency funds and resources will not
be efficiently expended.
The first step toward establishing effective investment
management is putting in place foundational, pl,"oject-Ievel
control and selection processes. These foundational
processes allow the agency to identify variances in project
cost, schedule, and performance expectations; to take
corrective action, if appropriate; and to make informed,
project-specific selection decisions. The second major step
toward effective investment management is to continually
assess proposed and ongoing projects as an integrated and
competing set of investment options. This portfolio
management approach enables the organisation to consider
the relative costs, benefits, and risks of new and previously
funded investments and thereby identify the mix that best
meets its mission, strategies, and goals.
Over the past several years, NARA has taken action to
develop an enterprise architecture. NARA has drafted a
current architecture and is working on a target architecture,
but this work is incomplete. However, the process to
develop the electronic archival system is well under way.
Without an enterprise architecture to guide its
development, NARA increases the risk that the planned
electronic archival system will be incompatible with
existing and future operations and systems, thus wasting
resources and requiring that unnecessary interfaces be built
to achieve integration. NARA is currently strengthening its
information security, having recognised that it has
numerous weaknesses. Significant security weaknesses
were identified by two IG assessments and a NARA-
initiated vulnerability assessment of its network. As a result

of these assessments, the Archivist of the United States
declared information security a material weakness in fiscaJ
year 2000.
Actions taken by the Archivist to addresses these
shortcomings and respond to recommendations identified
in the reports include establishing an information security
program, updating and developing new security policy
documents, developing contingency plans and business
recovery plans, and strengthening firewalls across the
network to control inbound and outbound traffic. Risk
assessments provide a basis for establishing appropriate
policies and selecting cost-effective techniques to
implement these policies.
NARA intends to develop an agencywide risk
assessment capability in fiscal year 2003, but it is not clear
that this will allow vulnerability assessments to be
completed before ERA is developed. Without a method to
identify and evaluate risks, NARA cannot be assured that
it has effective mechanisms for protecting its information
assets: networks, systems, and information associated with
ERA. Because a compromise of security in a single poorly
secured system can undermine the security of multiple
systems, NARA needs to complete vulnerability
assessments of all systems that will interface with ERA.
Second, because NARA lacks an enterprise architecture, it
may have difficulty addressing agencywide security.
Without an enterprise architecture that addresses security
issues agencywide, NARA cannot be sure that its current
or future archiving systems are adequately protected.
These weaknesses may be particularly significant for
ERA, because this system presents security issues that
NARA has never before addressed, according to an initial
assessment report on ERA prepared by NARA's systems
development and acquisition contractor.
The proposed distributed structure of ERA introduces

the security risks associated with the Internet-threats to the
integrity of data and to data accessibility. According to the
Federal Bureau of Investigation, Internet systems are
threatened by hackers (who may be terrorists, transnational
criminals, and intelligence services) using information
exploitation tools such as computer viruses, worms, Trojan
horses, logic bombs, and eavesdropping sniffers. As
Internet usage increases, the Internet has become an
increasingly tempting target, and the number of reported
Internet-related security incidents is growing. The effect on
ERA of the vulnerabilities of the Internet would have to be
assessed and addressed.
In response to the challenges associated with
managing and preserving electronic records, NARA has
performed an assessment of governmentwide records
management-an important first step that identified
several problems, including the inadequacy of guidance on
electronic records, the low priority generally given to
records management, and the lack of technology tools to
manage electronic records. While NARA has plans to
improve its guidance and address the need for technology,
it has not yet formulated a strategy to deal with the stature
of records management programs across government.
Further, it has no strategy for acquiring the kind of
comprehensive information on records management that
would be provided by systematic inspections and
evaluations of federal records programs. Without such a
strategy, records management will likely continue to be
considered a low-priority "support" activity lacking
appropriate management attention, and NARA will not
acquire information needed to address problems in agency
records management and guidance. Inadequacies in
records management put at risk records that may be
valuable: records providing information on essential
government functions, information that is necessary to
protect government and citizen interests, and information

that is significant for the historical record. NARA's effort
to acquire an advanced electronic records archive is at risk.
NARA is not meeting its schedule for the ERA system,
largely because flaws in how the schedule was developed.
As a result, the schedule will compressed, leaving less
time for completing essential planning tasks. addition,
NARA has not yet improved IT management capabilities
that would reduce the risks inherent in its effort to acquire
ERA. Without these capabilities, NARA risks spending
funds to acquire a system that does not meet mission needs
and requirements, effectively work with existing systems,
or provide adequate security over the information it
contains. The Archivist develop a documented strategy for
conducting systematic inspections of agency records
management programs to (1) periodically assess agency
progress in improving records management programs and
(2) evaluate the efficacy of NARA's governmentwide
guidance.
To mitigate the risks associated with the acquisition of
an advanced electronic archival system, the Archivist
reassess the ERA project schedule. A revised schedule
should be developed, based estimates of the amount of
work and resources required to complete each task, that
allows sufficient time for NARA to complete essential
planning tasks and strengthen its IT management
capabilities by (1) implementing an IT investment
management process, (2) developing an enterprise
architecture, and (3) improving information security.
The Archivist also agreed the recommendation that
NARA develop a strategy for conducting systematic
inspections of agency records management program, but
noted that continuing its past inspection program, as cited
in the report, would not succeed. The Archivist said that
this approach would inc1ude an assessment of broad
categories of important records across agencies, agency-
specific interventions, and the use of NARA's authority to

report the results of evaluations of at-risk records to OMB
and the Congress. Regarding the schedule for the ERA
system, the Archivist noted that while some program
documentation was not completed on schedule, all items on
the ERA project's "critical path" have been completed on
time, and NARA expects to meet all milestones on the
critical path this year.
The Archivist noted that NARA should receive the
first National Academy of Sciences report at a time when
it expects to receive the industry's response to NARA's
request for information, and that the report will provide an
unbiased, expert view of the feasibility of building a system
that is inherently evolutionary, addressing the core
problem of digital preservation. According to the Archivist,
NARA will factor both the scientific and the industry views
into its articulation of a draft request for proposals.
In regard to the second National Academy of Sciences
report, the Archivist noted that its primary purpose is to
provide input to NARA's long-range plans for addressing
the continuing evolution of information technology and
electronic records, and that the report will be useful in
revising the ERA research plan to address new problems
and opportunities identified by the experts, and in plans for
successive builds of the ERA system. The challenge of
managing and preserving the vast and rapidly growing
volumes of electronic records produced by modern
organisations is placing pressure on archives and on the
information industry to develop a cost-effective long-term
preservation strategy that will free electronic records from
the constraints of proprietary file formats and software and
hardware dependencies. Part of this strategy will involve
ways to capture and use information about the records to
make them accessible, as information in card catalogues
does in traditional libraries. However, there is no current
solution to the electronic records archiving challenge, and
312 Advancement in Library and InfoIDlation Science
SO archival organisations now rely on a mixture of evolving

approaches that generally fall short of solving the long-term
preservation problem.
The four most common approaches-migration,
emulation, encapsulation, and conversion-are in use or
under consideration by the major archives. Recognising
that archival solutions may be some time off, companies in
the information industry are relying on off-the-shelf
technology for providing access to billions of electronic
records. These commercial archives, however, concentrate
on electronic records of types that are relatively uniform in
comparison to those that a government archive must
address. Archives use catalogues of various types to
capture information abour records, information that is
critical for sharing, storing, managing, and accessing
records effectively-particularly in the context of millions of
records.
Because such information is data containing
descriptive information about other data, it is referred to as
metadata. Metadata are a central element of any approach
to ensure that preserved records are functional. For
electronic records, the metadata needed are often more
extensive than information in traditional catalogues,
including information that is important for preservation.
The creation of accessible software- and hardware-
independent electronic records requires that all materials
that are placed in archives be linked to information about
their structure, context, and use hjstory. Metadata to be
associated with electronic records may include:
information about the source of the record;
how, why, and when it WRS created, updated, or
changed;
its intended function or purpose;
how to open and read it;
Infomaation Access in Digital Libraries 313
terms of access, and

how it is related to other software and records used by
the originating organisation.
These metadata must be sufficient to support any changes
made to records through various generations of hardware
and software, to support the reconstruction of the
decisionmaking process, to provide audit trails throughout
a record's life cycle, and to capture internal documentation.
Without an adequately defined metadata structure, an
effective electronic archive cannot be constructed.
Numerous research projects have examined the question of
defining metadata that would be sufficient to ensure digital
preservation. Although archives experts note that
unresolved issues remain, the work on preservation
metadata is beginning to move from the research area to
practice.
The Public Record Office Victoria (Australia), a state
archive, has published standards for the management of
electronic records that includes a metadatamodel originally
developed by the National Archives of Australia. For
incorporating metadata, the Victoria archive mandates the
use of XML. XML is being actively considered by archives
and researchers as a promising approach to generating
metadata./XML is a flexible, nonproprietary set of standards
for a~tating ("tagging") data with semantically rich
labels that permit computers to process files on the basis of
their meaning. Like the more familiar HTML (Hypertext
Markup Language) files used on the World Wide Web,
XML files can be easily transmitted via the Internet, a{l.d
with appropriate software, they can be displayed by Web
browsers. The difference is that HTML is. used only for
telling computers how to display information for a human
being to view, whereas the semantically based XML tags
allow computers to automatically interpret and process
XML files.
XML is called extensible because it is not a fixed

format. Instead, XML is actually a "metalanguage"-a
language for describing other languages-which allows the
design of customiseq. markup languages for limitless
different types of dckuments. Thus, although in the
beginning stages of adoption, XML is viewed as a prom-
ising format for a wide range of applications. Several XML
attributes make it attractive for archive applications. The
semantic nature of XML tags makes XML suitable for
recording metadata. Its extensibility would allow archives
to expand their systems to accommodate evolVing needs.
As an open standCl~d, it reduces the problems of
proprietary software. Further, because they are basically
text files, XML files can be readily interpreted by disparate
computer systems. Even without the mediation of software,
human beings can interpret an XML-tagged file, because
XML tags are human readable. This quality allows them to
be preserved both on computer media and on paper so that
they would be readable. XML is also used by the National
Archives of Australia, which converts files from their native
formats to XML versions, while retaining a copy of the
origin'al source file.
REFERENCES

Informatics, 1989.
Blake, R, Electronic Records from Office Systems (EROS). In: Electronic
access: archives in tile new millennium: proceedings, 3-4 June 1999.
Day, M.W., Preservation problems of electronk text and data. East
no. 3. Loughborough: EMBLA Publication1>, ] 990.
Bibliography
Abid, A., Memory of the W9rld: preserving our documentary heritage. Paris:
UNESCO, Information and Informatics Division, July 1997.
museums", Archives,and Museum Informatics Technical Report, 1 (1).
Blake, R, Electronic Records from Office Systems (EROS). In: Electronic
access: archives in the new millennium: proceedings, 3-4 June 1999.
Conway, P., Digitizing preservation. Library Journal, 1 February 1994, 42-
45.
Cook, T., The impact of David Bearman on modem archival thinking:
an essay of personal reflection and critique. Archives and
Museum Informatics, 11, 1997.
Cox, RJ., Electronic information technology and the archivist: bright.
lights, lingering concerns. A~can Archivist, 55, 1992.
~. 3. Loughborough: EMBLA Publications, 1990.
Feeney, M. (ed.), Digital culture: maximising the nation's investment: a
synthesis of JISC/NPO studies on the preservation of electronic
Fishbein, M.H., Appraising information in machine language form.
American Archivist, 35, 1972, 35-43.
Hills,S., Electronically published material and the archival library.
Electronic Publishing Review, 5 (1), 1985, 63-72
316 Advancement in Library and Information Sdence
Neavill, G.B., Electronic publishing. libraries and the survival'of

information. Library ResourCl!S & Technical ServiCl!S, 28, 1984, 76-89.
Waters, D.l. and Kenney, A., The Digital Preservation Consortium: mission
. and goals. Washington, D.C: Commission on Preservation and
Access, 1994.
Weber, H. and Dorr, M., Digitisation as a method of preservation?
Amsterdam: European Commission on Preservation and Access;
Washington, D.C: Commission on Preservation and Access, 1997.
ynch, CA., Rethinking the integrity of the scholarly record in the
networ~ed information age. Educom Review, 29 (2), Marchi April
1994.
Index
Administrative mechanisms Electronic Records Archive

71 (ERA) 305
American National Stand.ards
Institute (ANSI) 132 Geographic Information
Archival Information Package Systems (GIS) 132
(AlP) 259 Graphical User Interface (GUI)
Automated digital library 151 134
Automatic algorithms 20
Automatic experimental Impact on People of Electronic
systems 32 Libraries (IMPEL) 136
Audio-Video Inter-leaved
(AVI) 142 Indexing Rules (IR) 30
International Standard Organi-
Central Index Server (CIS) 148 sation (ISO) 263
Clinger-cohen act 306
Commonwealth l\lwS 79 Joint Information Systems
Copyright act 79 Committee (JISC) 133
Joint Photographic Experts
Defence Advanced Research Group GPEG) 141
Projects Ag~ncy (DARPA)
295 Legal information systems 153
Department of Defence (000) Library management systems
295 88
Dissemination Information Library management 5
Pa~kage (DIP) 259
Drainage Information System Machine-readable data 186
(DRAiN) 131 Merged Index Server (MIS)
148
Moving Pictures ExpertS Programme on Space Applica-

Group (MPEG) 142 tions (PSA) 195
Public access internet service
National Aeronautics and 79
Space Administration Public library system 85
(NASA) 296
National classifications system Submission Information
190 Package (SIP) 259
National Science Foundation
(NSF) 132 Telecommunication networks
74
Office of Outer Space Affairs Text Retrieval Conference
(OOSA) 193 (TREC) 135
Online secure payment system Tag Image File Formats (TIFF)
99 142
Open Archival Information
System (OAIS) 253 Visual Display Unit (VDU)
Optical-electronic storage 136
media 186 Visual Simulation Environ-
ment (VSE) 144
Preservation Description Vocabulary management 32
Information (POI) 257
Programmable transport Workers' Educational Associa-
system 181 tion (WEA) 94

Advancement in Library and Information Science - Nodrm

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Advancement in Library and Information Science - Nodrm

Uploaded by

Copyright:

Available Formats

Advancement in Library

Oxford Book Company

First Published 2009

Oxford Book Company

Printed at Mehra Offset Press, Delhi.

With rapid advances in information and communication

Library automation which started in late 70s in few special

transmission of information. The capabilities of computer

Because the collections of documents are still on paper, a

The elements of the probable form of the electronic library

of the twenty-first century were being glimpsed, albeit

to understand and evaluate what is in it. If what is found

A disproportionate amount of attention has been paid

concerned as much with the management of service as with

Library services have two bases:

achieved by more of the means and the distinction

The notion of systematic, purposeful book selection;

One can speculate about the eventual balance between

Library services have to do with support for learning,

can lead library services down unproductive paths.

Collection development not only a matter of the funds used

money, and space to the assembling and refining of

Bibliographic role: The use of materials depends on

time. It is frustrating when two or more people try to use

Electronic and Paper Collections

storage compared with storage at a distance. In contrast

extensive access to electronic documents is less clear,

so much material is not in conveniently local collections.

compared with the costs of maintaining electronic

discarded or not acquired. But the purpose, as

Electronic Resources and Network

held and not-locally-held loses significance and, in a sense,

are localised electronic media, notably CD-ROMs, that need

with the logistical task of meeting demand by placing

process is to manipulate the universe of documents so that

but it is Procrustean rather than egalitarian. Different

BIBLIOGRAPHIC ACCESS RECONSIDERED

Bibliography used in making of lists of books, articles, and

bibliography in this sense is that it is primarily concerned

The next logical development would be to provide links

University is regarded as significant because it listed books

of a copy of a document. But one may choose to (or need

Definition of "indexing" simply means pointing or

searchers. When they have access to indexing based on both

Automatic and Human Indexing

failed. to convince die-hard opponents of the merits of either

specificity, different types or levels of vocabulary

electronic term-matching searches, as opposed to

Libraries and archives are society's primary information

To reduce the handling and use of fragile or heavily

The crucial management decision is therefore less about

files are written to computer output microfilm that meets

institution may wish to investigate the possibilities of cost

Implementation of the "virtual re-unification" of

When digital conversion deals with source materials

damaged or fragile: Digital images are normally good

Disposal of original source documents after digital

research value will be too low to attract enough either

and by the existing quality of the bibliographical

Digitisation of Library Services 4S

mathematics formulas representing lines and curves.

Source documents can also be characterised by the

The key issue is to determine the point at which