You are on page 1of 42

Toward Universal Serial Item Names

Robert D. Cameron
School of Computing Science
Simon Fraser University

Accepted for publication in JoDI - Revisions Completed - June 25, 1998

Copyright 1997, 1998, Robert D. Cameron.

Contents
Abstract
I. Introduction
II. Requirements
Requirement #1: Unambiguous Article Identification
Requirement #2: Canonical USINs
Requirement #3: Identification of Secondary Serial Components
Requirement #4: Scholar-Friendliness
Requirement #4.1: No Required Redundancy
Requirement #4.2: Standard Mnemonics
Requirement #4.3: Publication Numbering
Requirement #4.4: Standard Numbering Syntax
Requirement #4.5: Brevity of Article Identification
Requirement #4.6: Ease of Construction and Analysis
Requirement #4.7: Media Independent Specification
Requirement #4.8: Embedding USINs in Context
Requirement #5: Permanence of USIN Designation
Requirement #6: Accomodating Serial Evolution
III. Global Naming of Serial Publications
Hierarchical Naming Using the DNS Model
Three Initial Domains
Evolution of the USIN System: Towards Scholar-Friendly Names
IV. Hierarchical Identification of Serial Items
Example: Journal Article Citation
Multiple Articles Per Page.
Unpaginated E-Journals
A General Model for Identification by Hierarchical Numbering
Scope
Scope-Dependent Numbering
Syntactic Representation
Parallel Numbering Hierarchies
Chronology
Further Work: Hierarchical Numbering Theory
Additional Design Ideas for Hierarchical Numbering
Syntax for Holdings Description
Secondary Component Notation
The Reference Notation
Hyphenation Notation
V. USIN Support Technology
USIN Global Registry
SDL - Serials Definition Language
UPP: USIN Publication Protocol
SRP: Serial Registration Protocol
PDP: Publication Domain Protocol
USIN Global Database System
UIP - USIN Inquiry Protocol
Bibliographic Retrieval and Formatting
USINs, the USIN Global Database and Literature Research
VI. Conclusion
References

Abstract
The Universal Serial Item Name (USIN) scheme is proposed as a framework for a
single global namespace of articles and other contributions published in organized
serial collections. Requirements for USINs are analyzed with an emphasis on the
use of USINs in scholarly communication. A uniform naming model is described
based on the hierarchical naming of serial publications and the hierarchical
numbering of serial items. A number of concrete design ideas for USIN syntax are
presented. A USIN Global Registry and a USIN Global Database are proposed and
analyzed in terms of specific architectural features that interact to meet the
requirements of publishers, librarians and scholars. Applications of the USIN
concept to literature research, document retrieval, bibliography preparation and
addressing the "broken links" problem of the World-Wide Web are considered.

I. Introduction
The Universal Serial Item Name (USIN) scheme is proposed as a framework for a
single global namespace of articles and other contributions published in organized
serial collections. Although the initial focus is scholarly literature published in
journals, conference proceedings, technical reports and books, the scheme is
intended to accomodate extensions to include other types of serialized contributions
such as magazine articles, bills of a legislature, decisions of a court or minutes of
university committee meetings. The USIN is intended as a vehicle for
interoperability between various bibliographic citation applications, including
finding citations (literature research), retrieving citations (from on-line sources,
libraries or document delivery services), citation indexing, and citation formatting
(bibliography preparation). The USIN is also intended as one possible mechanism
for migrating the World-Wide Web away from dependence on Uniform Resource
Locators (URLs) [4] to a system meeting the requirements for Uniform Resource
Names (URNs) [23].

The USIN concept is related to the Serial Item and Contribution Identifier (SICI)
[19], the Publisher Item Identifier (PII) [1], and the Digital Object Identifier (DOI)
[11] schemes. However the USIN approach is primarily concerned with the task of
document identification in human communication, particularly scholarly, technical
and legal communication, whereas the other schemes are more concerned with
document delivery, library processing and publisher perspectives. In particular, the
USIN should use mnemonic coding and be reproducible by ordinarily literate people
(authors, students, librarians, law clerks, and so on) without the need for specialized
coding knowledge and check-sum algorithms. The USIN system is also intended for
serialized material that is not or cannot be registered with an International Standard
Serial Number (ISSN); both SICI and PII rely on ISSNs for serial item
identification. Philosophically, the USIN concept is most closely related to the SICI
scheme in that they each identify documents with their publication in a particular
organized series. The PII and DOI schemes identify documents as items owned by
publishers, with numbers possibly assignable in advance of publication and
independent of publication numbering. Green and Bide [14] and Paskin [21]
provide good overviews of the various current approaches to identification of
published articles or other items.

Central to the USIN concept is the notion of publication in an organized serial


collection. This is a generalization of the traditional notion of a serial publication.
An organized serial collection is defined to be any series of items published with a
specific publication numbering framework. A (volume, issue, page) numbering
framework might be used for a particular journal. The framework may change over
time (e.g., changes in the number of issues per volume), but the numbering for any
particular item is set when it is published. Both explicit and implicit elements may
be used in the numbering framework, so long as they are fixed at the time of
publication. For example, numbering of articles may be by explicit (volume, issue,
page) numbering, with a counting rule based on page layout to distinguish multiple
articles on a page. The authority for number assignment is usually, but not always,
the publisher. For example, ISBN numbering of books satisfies the USIN definition
of publication numbering framework and so allows the USIN scheme to be applied
to books as well as to conventional serials.

In application to scholarly writing and bibliography preparation, the USIN concept


is envisioned to be used with bibliographic processing "plug-ins" to standard word-
processing software. These plug-ins should be capable of resolving USIN references
into appropriately formatted citations consistent with chosen style guide lines. USIN
resolution may be achieved through locally-mounted databases coupled with World-
Wide Web access as a backup. Authors could thus use USINs as citation tags for
papers of interest, much as they use similar tags with BibTeX, ProCite, EndNote or
other bibliographic formatting tools. However, with the USIN approach, authors
will be spared the drudgery of creating their own bibliographic databases for use
with these products, editors will be spared the task of correcting author errors in
citation, and readers will be spared the difficulty of resolving errors in citations that
authors and editors miss.

In application to literature databases, the USIN can serve as a standard notation to


report the results of a search process. This could open up new opportunities for
combining search results from distinct databases. For example, duplications could
be filtered by USIN matching, or relevant items from one search might be fed back
into a search on a different database. In fact, the USIN idea is intended to serve as
the core data element in a scheme for universal citation databases: databases that
link every document to the documents it cites and vice versa [6].

In application to the World-Wide Web, the USIN concept has considerable promise
as a potential partial solution to the problem of "broken links" [5, 13]. In short, the
URLs that are presently used for hypertext links on the World-Wide Web are based
on "locations" that specify documents in terms of access protocols, port numbers,
directory paths, and filenames. For various reasons, all of these attributes of
document location are subject to change and web links frequently become broken as
a result. Many proposals to resolve this problem through the creation of some form
of Uniform Resource Name have been put forward, but none seem to have
progressed beyond the experimental stage [8, 9].

In comparison to the URN approach, the USIN scheme concentrates on the


somewhat smaller problem of establishing a universal naming scheme for
publications in serialized collections only. One could imagine that USINs could be
developed within the overall URN structure as one particular "namespace" [17]. On
the other hand, there are several reasons why it may be best to focus on a specific
solution for USINs instead of the general URN problem. First of all, it could be
argued that the best focus for perpetual naming schemes is to concentrate on those
items actually intended to be long-term contributions to the global knowledge
archive. From this perspective, publication in an organized serial collection may be
the best single indication of such an intent. Second, the act of assigning a document
a number within a serial collection represents an important technical opportunity
unavailable for general web resources; a specific event in the publication process to
which naming scheme protocols can be tied. Third, focussing on the evolving global
knowledge archive as a development from the present international network of
libraries may suggest different approaches to identifying the "resolution service" for
a USIN. For example, users could be allowed to choose their own resolution service
from those offered by different local libraries, instead of being forced to accept a
network-specified service. In the terminology of the Dexter Hypertext Reference
Model [15], we can take advantage of the flexibilities afforded by resolution within
the run-time layer to overcome difficulties in storage-layer resolution. For all these
reasons, focussing on publications in organized serial collections may be both the
right problem to solve and the one for which URN solutions are most feasible.

Applications of the USIN scheme to other areas such as legal citation and legal
research are also envisaged. However, these are at present beyond the scope of this
paper and are left as an area for future consideration.

This paper is intended as a discussion document to set the framework for


development of the USIN concept. Overall, the goal is to propose the requirements
that must be met by any USIN system, and to suggest some reasonably concrete
design ideas that meet those requirements. Section II focuses on the requirements
analysis with a particular emphasis on the concept of scholar-friendly naming.
Sections III and IV focus on design concepts that satisfy the USIN requirements,
broken down into two main tasks: globally unique naming of serial publications and
hierarchical identification of serial items within a particular publication series.
Section V then discusses requirements for important USIN support technologies.
Section VI concludes the paper.

II. Requirements Analysis


The goal of this section is to discuss the general requirements that any USIN system
must meet, without making premature commitments to particular USIN design
ideas. At the same time, the requirements are used to analyze some of the
inadequacies of the existing identification standards, primarily SICI and ISSN. This
serves both to help establish the need for a new identification scheme and to bring
some concreteness to the discussion. The reader who prefers additional concreteness
may wish to briefly look ahead to some example design ideas for journal article
citation in Section IV.

Requirement #1: Unambiguous Article Identification


It may seem obvious that a USIN scheme must meet the basic goal of unambiguous
article identification: every article must be denotable and every USIN denoting an
article must denote no other article. However, there are difficulties in achieving this
goal and the goal is in fact not achieved by the existing SICI coding scheme. In
essence, the SICI scheme is prone to failure in some rare cases involving articles
appearing on the same page and having similarly abbreviated titles. To deal with the
multiple article per page problem, SICI uses a "title code" of up to six characters,
usually formed from the initial letters of title words. Different articles on a page can
be usually distinguished by this title abbreviation. In principle, however, it is
possible to have two or more articles with the same SICI title abbreviation and
hence the same overall SICI code. Presumably this is one of the reasons for the 12
ambiguities reported within 4 million SICI strings stored in the Uncover database
[22]. Another problem with SICI serial title abbreviation is that it requires human
judgment when the title contains symbology; this is a further possible source of
ambiguity.

In order to ensure that every article is denotable, a logical first step is to ensure that
every serial is denotable. Unfortunately, the existing international standard in serial
identification, the ISSN, has an insufficiently large denotation space. The ISSN
system is based on an eight-digit identifier with seven working digits and a check
digit. The upper limit on the number of serials that can be accommodated is
therefore 10 million. When contemplating a universal designation scheme for serial
items as fine-grained as the minutes of curriculum committee meetings of a
particular university department, it should be become clear that the ISSN system as
presently constituted will not suffice.

Requirement #2: Canonical USINs


Although every USIN must denote at most one article, it is reasonable to allow
different USINs to denote the same article. For example, issue numbers may be an
optional part of the USIN syntax, required only when journals are paginated by
issue. In the case that journals are paginated by volume, it could be desirable to
allow either form (with or without issue numbers) as an acceptable USIN form.
There are many other reasons that alternative forms of a USIN might be desirable
and there is no particular reason to rule this option out in the initial requirement
specification for USINs.

Nevertheless, of the set of USINs that may legally denote an article, exactly one of
them should be specified as the canonical or preferred form. One use for canonical
forms is to make it easy to determine whether two different USINs denote the same
article: convert them both to canonical form and see if they are the same. For
example, if a user searches two distinct databases for articles of interest on a
particular topic and both databases return USINs in canonical form, then it is an
easy matter to filter out duplicate references to the same article because they are
represented by exactly the same string. A second important role for canonical forms
is to support indexing of information by USIN. By always associating information
with the canonical form of a USIN, it will be possible to retrieve that information
given any legal USIN form by first converting to the canonical form.

A further requirement for USINs is that conversion to canonical form be an


algorithmic process based on globally available information. In this way, separate
software systems will be able to interoperate by conversion to the common
canonical form. The requirement for globally available information is not
particularly a restriction on the syntax of USINs, but is a constraint on the
implementation of the overall USIN system and how the basic information on
USINs and their formation on a serial-by-serial basis must be shared.

Requirement #3: Identification of Secondary Serial Components


Although the primary focus of the USIN concept is on the identification of
published articles, there are a number of other related elements worthy of
identification at both coarser and finer granularities. On the coarser side, this
includes identification of the serial itself, volumes or volume ranges of serials, an
index to the volume, individual issues and issue ranges, contents of an issue or
special sections of an issue. At a finer level of granularity, it may include named or
numbered components of articles, such as article abstracts or individual sections,
figures, tables, or equations. Scholars may sometimes want to make reference to
these components; other applications include identifying library holdings on a
volume/issue basis, checking in serial issues when they arrive at the library or
submitting claims for them when they are late, and ordering table of contents pages
for awareness services. The SICI scheme includes capabilities for designating some
of these components through its code structure identifier (CSI) and derivative part
identifier (DPI); the PII and DOI schemes do not appear to account for such
components. ANSI Serials Holdings Statements, used to identify holdings in library
catalogs, includes a variety of conventions for specifying volumes, issues, ranges of
volumes and issues and similar units of collection [2].

It is not possible nor desirable to define a priori the specific set of secondary serial
components that are identifiable in the USIN syntax. Instead the requirement
presented here is that the USIN scheme should accommodate specification of these
elements through an extensible syntax that can be coupled with a specification of
what elements exist on a serial-by-serial basis.

Requirement #4: Scholar-Friendliness


A key requirement central to the entire focus of the USIN concept is that it
emphasizes the needs of the people who use USINs over the needs of computers
that process them. This encompasses many aspects that can be generally grouped
under the term scholar-friendliness. However, this term is not intended to restrict
the set of people whose requirements are considered. Instead, it reflects the notion
that anyone who uses a USIN to cite prior works may be said to be taking on the
role of a scholar in that act.

One might consider that there is a middle ground between accommodating the needs
of scholars and the needs of computer systems. However, the goal of establishing
USINs as names that will serve to denote published items over the long term should
be considered. From this viewpoint, apparent requirements that might derive from
the limitations of present-day computer systems (e.g., fixed-length fields, limited
storage capacity, etc.) should be avoided. There is little doubt that the processing
and storage capabilities of the computer systems that will be available in coming
decades will be vastly superior to those of their present-day counterparts.

Nevertheless, scholar-friendliness cannot be considered an absolute requirement at


the possible expense of unambiguous article identification, canonical forms or other
requirements. Instead, scholar-friendliness should be considered as a desirable trait
to be maximized subject to the constraints imposed by other requirements.

Requirement #4.1: No Required Redundancy

Scholars will often need to write down USINs of interest or type them into their
computers. To minimize the tedium and the chance of error in these manual
processes, USINs should be designed to include only that information necessary to
clearly identify the cited work. Redundant forms that include additional information
may be allowed but must not be required. For example, for a journal that is
paginated by volume and that follows the convention of beginning each article on a
new page, it is sufficient to specify the journal, volume number and initial page
number to uniquely identify an article. In this case, a USIN specification must not
require the inclusion of additional information such as issue number, date or
complete page range.

One counterargument is that redundant information helps prevent errors, but one can
in turn counter that this approach to error control is obsolescent and inferior.
Historically, the requirement for redundant information at data-entry time is
designed to allow error detection at some future processing step. This is the basis for
three forms of redundancy in the existing SICI scheme for article identification:
chronology (date of publication), title codes and check digits. However, these
devices provide error detection without error correction. When an error is
encountered, there may be a considerable delay (e.g., days in interlibrary loan
applications) before the error can be corrected and processing resumed. Consider
instead an interactive process supported by a global network. When a scholar enters
a USIN, interactive software could immediately consult the global USIN database to
verify its correctness and to allow any necessary corrections or resolutions of
ambiguity. One existing model for this is the immediate feedback one receives when
entering an incorrect URL on the World-Wide Web (Web). In this way, an
interactive data entry process can both avoid the tedium of redundant data entry and
support a process of immediate error correction as well as detection. Construction of
such a global USIN system is probably feasible using the present-day technology of
Internet-connected computers; if not, it will certainly become feasible within a small
number of years.

Requirement #4.2: Standard Mnemonics

A second requirement deriving from scholar-friendliness is to emphasize the use of


mnemonic forms for identifying serial publications, and whenever possible, the
standard mnemonic forms that are actually used by the community of scholars that
use a particular serial. For example, the journal ACM Transactions on Programming
Languages and Systems published by the Association for Computing Machinery is
widely known by the acronym TOPLAS. An acceptable mnemonic form for
identifying this serial might thus be S.ACM/TOPLAS where S might denote a global
domain of scholarly societies and ACM is a unique code for the Association for
Computing Machinery within that domain. As a second example, a designation such
as CA.SFU.CMPT/TR might be acceptable as a globally unique mnemonic code for
the Technical Report series of the School of Computing Science at Simon Fraser
University. This code is mnemonic and builds on several accepted and standard
abbrevations: CA as the ISO country code for Canada, SFU as a unique institutional
code for Simon Fraser University in the CA domain (cf. the Internet DNS name
sfu.ca), CMPT as the standard 4-letter department ID used by Simon Fraser
University for the School of Computing Science and TR as the abbreviation for
"Technical Report" as used by the School. The syntax shown in these examples is
intended to be illustrative of a possible realization of this USIN requirement, but
not prescriptive.

Incorporation of existing standard mnemonic codes within USIN designations will


assist scholars in a number of ways. USINs will be reported to scholars as the
results of bibliographic search processes, scholars will enter USINs when doing
citation searches, scholars will use USINs when including references in papers and
scholars will make note of USINs when they find papers of interest. In all these
applications, scholars will find mnemonic forms easier to read, easier to reproduce
and generally more useful. However, note that these requirements are met if the
mnemonic forms are acceptable as one of the alternative forms on USIN input and
are produced during output by any USIN-generating software. More precisely, the
requirement for using mnemonic forms applies to the definition of the canonical
form of USINs, but does not preclude alternative non-mnemonic forms.

Adopting scholar-friendly mnemonic identification necessarily imposes a further


limit on the role of ISSNs within the USIN scheme. Where a serial is
unambiguously known by a mnemonic form, that form must be used as canonical in
place of the ISSN. Nevertheless, ISSNs are likely to have an important role both in
identifying serials for which no mnemonic abbreviation has been defined and for
initially identifying serials before their mnemonic identifications have been
registered and accepted as globally unique.

Requirement #4.3: Publication Numbering

A further requirement deriving from the general principle of scholar-friendliness is


that existing publication numbering conventions should be employed or adapted
wherever possible to identify published articles within a particular serial. For
example, articles in traditional print journals will typically be identified by volume
number, issue number (if required) and page number, with the possible addition of a
code to discriminate multiple articles on a single page. This will be of the greatest
assistance to scholars when forming USINs from either copies of the article in
question or from a citation of the article in a reference list. It will also be helpful to
scholars in decoding USINs and retrieving the items from (physical or virtual)
library shelves.

The requirement for the use of publication numbering rules out the article
identification mechanisms contemplated by the PII and DOI schemes as a basis for
canonical USINs. Both of those schemes emphasize publisher-generated numbers
that may be different from the actual numbering on the published serial. This
requirement also rules out other reasonable schemes for unambiguous article
identification. For example, a scheme based on volume number and sequential
article number would be widely applicable as an unambiguous numbering scheme
for many journals. But scholars may be unable to easily determine the sequential
article number from either a printed copy of the article or a conventional
bibliographic citation. If publication numbering exists, it should be used.

One might argue that identification by publication numbering is less scholar-friendly


than identification using more mnemonic article attributes, such as author name and
key title words. However, this is an instance in which scholar-friendliness should
not be considered an absolute at the expense of a system of unambiguous article
identification.

One might also prefer to use publication chronology (e.g., dates, month-year
combinations) instead of publication numbering. In fact, chronology is a form of
numbering that happens also to be correlated with the passage of time. For some
types of publication, chronology may be the only numbering that exists and hence
must be used. In other cases, acceptable alternative USIN forms may be defined
based on chronology. However, chronology is generally more complex and involves
more identification pitfalls. For example, if (volume, page) identification generally
suffices for article identification in a particular journal, it may be the case that (year,
page) identification is inadequate for at least two reasons. First, the journal may
publish multiple volumes per year. Second, even if volumes are annual, they may
not correspond to calendar years; articles with the same starting page number in two
consecutive volumes could still end up being published in the same year. In other
cases, serial items may have duplicated and hence ambiguous chronology, for
example, when two technical reports are issued on the same date. There are also a
number of annoying coding problems for chronology. If numeric codes are used for
months, how do you code for month combinations or seasons? If nonnumeric
coding is used should it be in English or the original language and should
abbreviations be used? For all these reasons of potential ambiguity and complexity,
identification by simple publication numbering should be used in preference to
chronology.

Requirement #4.4: Standard Numbering Syntax

Because technical reports, government publications, court documents and journal


papers have various different numbering schemes, alternative syntactic conventions
for each type of publication will likely be necessary. In principle, each serial should
be accompanied by a definition of its numbering scheme, including syntax and
semantics of the USIN designations. However, in order to ease the burden on
scholars, efforts should be made to limit the syntactic variations wherever possible.
Thus, there should also be methods for defining standardized numbering schemes,
with the goal that the vast majority of serials will use one of the standard schemes
rather than one of their own design.

Requirement #4.5: Brevity of Article Identification

From the scholar's point of view, the primary role and need for USINs is in
identification of articles. Identifying secondary serial components (volumes, issues,
special sections, abstracts, etc.) is a secondary issue of considerably less importance.
The requirement for scholar-friendliness then is that the syntax for article
identification not be complicated by codes to distinguish articles from other types of
component. Instead, where necessary, the syntax for secondary components should
include additional coding to indicate that a secondary component is being identified;
the absence of such coding should be taken to indicate an article identification.

Requirement #4.6: Ease of Construction and Analysis

It should be easy for scholars to construct and analyze USINs manually. Checksums
and other calculations should be avoided. Appropriate punctuation should be used to
avoid running numeric items together. For example, the code 20000229 used as the
SICI specification for February 29, 2000 violates this requirement. Arcane numeric
codes should also be avoided. Although numeric month codes 1 through 12 are
arguably acceptable, the SICI code 23 meaning "Fall" is not.

Requirement #4.7: Media Independent Specification

It is not uncommon to find a particular serial published in two or more formats, for
example, in HTML format on the Web and on paper. From the scholar's viewpoint,
it is usually the case that it is the content of the article, not the form of its
presentation, that matters. When there is no difference in content, the USIN
specification for articles should be fundamentally independent of publication
medium. This requirement does not preclude media specification from inclusion as
an optional element in a USIN syntax. However, the SICI convention of including
the medium format identifier (MFI) as standard practice would not satisfy the
requirement for USINs.

It may be the case that a publisher creates separate designations for different formats
of a serial, particularly when there may be significant differences in content. In this
case, the publication medium or format may be implicitly identified by the choice of
publication series designation. However, this does not represent a violation of
format independence of the USIN syntax itself.

Requirement #4.8: Embedding USINs in Context

Scholars will need to make use of USINs as notational elements in a variety of


contexts, both formal and informal. Formal contexts include use of USINs as
citation tags for bibliographic formatting software and data elements for
bibliographic database queries. Informal contexts are generally oriented to the
human reader, such as presentation of USINs in reference lists or direct use of
USINs as nouns in sentences. In any of these contexts, there is a potential for
confusion to be created by interaction of the syntax of the USIN with the notational
conventions of its embedding.

The syntax of USINs should be designed to avoid confusions that can be created by
common notational features that may be expected in typical embeddings. In
particular, both formal and informal settings may embed USINs as notational
elements within structures delimited by parentheses, braces or similar bracketting
structures. To avoid confusion, USIN syntax should be constrained to allow
bracketting symbols only if they occur in matched pairs. For example, if a USIN X
is to be acceptable as a parameter in a BibTeX citation tag of the form \cite{X},
then any unmatched braces within X would surely cause confusion. It may be
worthwhile to avoid braces altogether because of their use in the TeX family of
document languages and similarly to avoid angle brackets ("<" and ">") because of
their use in HTML and SGML.

When USINs are used as elements in ordinary discourse, they may often occur at
the end of a sentence or phrase. Punctuation (periods, commas, semicolons and so
on) added at this point should not be a source of confusion. The presence or absence
of whitespace (blanks, tabs or line breaks) after such a punctuation symbol may be
used to discriminate. That is, a period, comma or other punctuation may be used
within the USIN syntax only if it is immediately followed by a nonblank character.
Any of these punctuation marks followed by whitespace should always denote the
end of a sentence or phrase.

Requirement #5: Permanence of USIN Designation


A necessary requirement for the USIN system is that USINs, once assigned and
validated, remain permanently unambiguous identifiers of their documents. This
applies to both canonical and noncanonical USINs. Three hundred years from now,
a scholar may come across a USIN designation in an obsolete form of print media.
She may highlight it with her data capture pen and expect to see instantly the
resolution of it to a full bibliographic reference on her electronic work area. This
requirement implies the need for a global registry system and a set of protocols for
ensuring that USINs, once assigned, are never reused.

However, it need not be required that canonical USINs always remain canonical, at
least in the initial development of the USIN system. Initially, the canonical USIN
forms for many serials will include serial designation by ISSN. As globally unique
mnemonic designations for these serials are gradually registered and accepted, those
forms may become canonical. It may also be the case that changes in the canonical
form of serial numbering become desirable, particularly for those aspects of
numbering that are not directly reflected in publication numbering (for example,
position of an article on a page).

It may be useful to impose constraints on how frequently canonical forms may be


varied and/or on how results of USIN processing may be combined. For example,
new canonical forms might be allowed to be registered at any time, but taking effect
only at certain designated times. When such a time is reached, an updating process
might (a) temporarily disallow new USIN processing requests, (b) allow current
requests to complete or time out, (c) perform global updating of canonical form
information, and (d) allow USIN processing requests to resume. Any application
that needs to ensure the completeness of USIN matching could use the simple
device of requiring that all USIN processing requests are initiated and completed in
the same time frame.

Requirement #6: Accomodating Serial Evolution


Serials evolve. Changes in title, publisher or publication frequency are
commonplace. Serials may merge together or split apart. Serials may suspend
publication and then resume publication at a later date. Serial publishers are also
subject to many kinds of change: renaming, relocation, reorganization and so on.
There is no doubt that accommodation of change must be an important design goal
for USIN development.

Two issues involving particular forms of change deserve special attention in the
development of USIN syntax. The first is that title changes should not necessarily
require changes in the USIN code for a serial. This is at odds with the ISSN
convention, which requires new ISSNs to be issued when there is any significant
change in title. However, in considering mnemonic abbreviations of serial titles,
various changes in title may be accommodated with the same mnemonic. If the
publisher and readers of a journal wish to retain a particular mnemonic by which
the journal is known, the USIN system should respect this. The second issue is that
the syntax for identifying components of a particular serial should be flexible and
changeable. For example, if a serial starts out with sequentially numbered issues, its
USIN syntax should nevertheless accommodate a later reorganization to number the
publication by volume. Similarly, if a traditional print journal identifies articles by
volume and page number, the USIN syntax should accommodate a later change to
an electronic format in which articles are identified by volume and article number.

Requirement #7: Version Discrimination


Articles evolve. Draft versions may be initially circulated in a working paper series,
followed by revised versions in conference papers and further revised versions in
journals. At various stages an author may circulate intermediate versions to limited
groups for review and comment. Post-publication revision of journal articles is also
becoming a possibility with novel e-journal policies such as those of Living Reviews
in Relativity [24].

The USIN scheme generates distinct identifiers for each separately published
version of an article. One possible view of this is that each of these identifiers is in
fact an alternative identifier of the same article, with one of them (presumably the
most recent) being the canonical form. However, this approach has several serious
problems. The first is that there is no good basis for saying when two versions of an
article should be treated as the same. How many insertions and/or deletions of text
may be accomodated? What about changes in title or authorship? It is difficult to
imagine any set of rules that could provide a satisfactory and implementable
decision procedure. It is also difficult to imagine any mechanism that could ensure
that publishers actually identify these equivalent versions so that the correct
mappings to canonical form can be made automatically. Beyond these concerns,
there is also a problem with such equivalences automatically being applied to
citations: changes in the content of an article between versions may render a citation
apparently irrelevant or incorrect. This should not be considered a failure on the part
of the citing author. In essence, it is a misrepresentation to map the author's citation
of a particular version to any other version than the author intended.

Philosophically, then, USINs are names for particular versions of articles, not names
for the more abstract notion of an article that maintains its identity through various
versions over time. Systems to support this more abstract notion, at least at the
coarse-grain level of publication versioning, might well be built on top of a USIN
system, using USINs to identify particular published versions of articles. Finer-
grained versioning concepts, such as those of Augment/NLS [12] or Xanadu [20],
might also make use of USINs to interoperate with conventional bibliographic
databases.

The sharp reader may notice an apparent contradiction between the USIN
requirements with respect to changes to serials and changes to articles. The USIN
requirement for serial codes does represent the more abstract notion of a serial
publication as it goes through various changes rather than the serial as it exists at a
single point in time. However, this distinction between the treatment of serial and
article identifications reflects a fundamental philosophical view. In this view, serials
are like timelines and articles are like points on those lines. The timeline may go
through the twists and turns of changes in publisher, title or numbering scheme and
still retain its identity. Each point on each line is separate entity with a separate
identity. There may be relationships between points such as "version-of" and
"cites", but the separate identities of the points should be maintained in the USIN
approach.

III. Global Naming of Serial Publications


Hierarchical Naming Using the DNS Model
The Domain Name System (DNS) of the Internet is a successful model of a
hierarchical, globally-unique naming system using distributed authority [18]. Under
DNS, a number of global domains such as "edu" (educational institutions, primarily
U.S.), "org" (organizations, primarily non-profit), "ca" (Canadian sites), have been
established by common agreement. Each domain is managed by an independent
domain authority. Each domain authority assigns unique identifiers within its
domain to create subdomains and/or to specify particular computer systems. When a
subdomain is created, authority for assigning further identifiers within the
subdomain is often passed to a responsible organization. Subdomains may be further
divided into subsubdomains and so on.

Consider a USIN scheme that adopts the hierarchical naming idea of DNS, but with
a focus on naming serial publications and publishing organizations, not computer
resources. The distinction between naming publications and naming computer
resources is critical; the failure to make it may be one of the underlying problems of
the URN concept. Notations such as the following may be contemplated:

S.ACM/TOPLAS as a designation for ACM Transactions on Programming


Languages and Systems published by the Association for Computing
Machinery within a global domain for scholarly societies,
S.ACM.SIGPLAN/Notices for SIGPLAN Notices of the ACM's Special
Interest Group on Programming Languages,
CA.SFU.CMPT/TR for the Technical Report series of the School of Computing
Science of Simon Fraser University,
and AU.NLA.ABN.SC/Papers for papers of the Standards Committee of the
Australian Bibliographic Network of the National Library of Australia within
a global domain for Australia.

These examples are for illustrative purposes only; the actual development of a
domain structure and names for serials and their publishers requires a process of
international consultation and consensus.

In the USIN scheme, then, serial publications are given identifiers which must be
unique in the context of a particular publication domain. Thus d1.d2.d3 is
interpreted to specify a subdomain d3 within domain d1.d2, which is itself
hierarchically specified as a subdomain d2 within the global domain d1. In general,
domains will denote publishing organizations, administrative divisions of such
organizations or collectives for identifying organizations or publications.

The USIN syntax shown in this paper is intended to be illustrative rather than
prescriptive of the final form of USINs. Thus the choice of periods and slash marks
as separators is somewhat arbitrary. One could also argue that the distinction
between slash marks and periods is artificial, i.e., that S.ACM.TOPLAS would do as
well as S.ACM/TOPLAS. However, distinguished punctuation allows us to infer
directly from the form of a specification that S.ACM/TOPLAS is a serial publication
of the ACM, while S.ACM.SIGPLAN is an administrative division thereof. One could
also question the decision to reverse the right-to-left structuring of domains under
DNS; the reason for this is to use a consistent left-to-right hierarchical structuring
within all levels of the USIN notation. Lastly, the final syntax of domain,
subdomain and series identifiers is left as an area for further work. However,
allowance for case-sensitivity in such identifiers seems reasonable, e.g., CaS and
CAS could denote separate items.

Three Initial Domains


Prior to international agreements to develop a full domain structure for USINs, it is
nevertheless possible to initialize the scheme by building on existing global
identification standards. With the present focus on the problem for scholarly
literature taken in this paper, three initial USIN domains can be identified: ISSN,
ISBN and RDNS. The ISSN and ISBN domains directly use the international standard
numbering systems for serials and books. For example, ISSN/0164-0925 is an
initial USIN designation for ACM TOPLAS. Over time the notation S.ACM/TOPLAS
might be adopted as the canonical designation of this journal, but ISSN/0164-0925
will always be acceptable. Similarly ISBN is identified as a global domain based on
International Standard Book Numbers.

Names assigned under the Internet's Domain Name System are the basis for the
third leg of the initial tripod supporting the USIN scheme. Whenever a DNS domain
name or host name is clearly associated with a particular publishing organization, it
may be used as a component of the RDNS (restricted DNS) domain of the USIN
scheme. For example, acm.org is a DNS domain identified with the Association for
Computing Machinery, so RDNS."acm.org"/TOPLAS denotes ACM TOPLAS.
Similarly, sfu.ca is a DNS domain for Simon Fraser University, so
RDNS."sfu.ca".CMPT/TR denotes the Technical Report series of the School of
Computing Science at SFU. In this last example, one might consider instead basing
the USIN specification on the cs.sfu.ca domain, that is, RDNS."cs.sfu.ca"/TR.
This form might be allowed, but the form based on the CMPT designation may be
preferred (canonical), because that designation has been specifically chosen by SFU
in a system of unambiguous codes for its departments.

The syntactic convention of enclosing a DNS name in double quotes when used as
an RDNS domain serves two purposes. First, it emphasizes that the hierarchical
structure of the DNS name plays no role in the interpretation of that name as an
RDNS subdomain. In essence, DNS names are being cited as atomic identifiers for
publishing organizations. Second, the quote marks delimit the scope of a DNS
name, within which the "." separator is understood not as a part of the USIN
syntax, but simply as a character in a quoted DNS name.

Unfortunately, there is no constraint within the DNS system that DNS domains are
permanently unique designations of organizations or their successors. Under DNS,
the essential requirement is that domains are unique at any particular point in time,
but it is quite conceivable that a naming authority at some level may reuse or
reassign a name. Furthermore, the association between DNS names and
organizations breaks down as one descends into the hierarchy of subdomains,
subsubdomains and so on. To avoid these problems, the USIN standardization
process could include the publication of a list of acceptable DNS names and their
associated organizations for use within the RDNS domain of the USIN scheme.
These designations should be permanent; the interpretation of a designation within
the RDNS domain should be derived from this list, even if that designation is later
reassigned to some other purpose within DNS itself. The intention of the list should
be to identify all and only those DNS domains that may be clearly identified with
publishing organizations.

The astute reader will note that designations such as RDNS."acm.org"/TOPLAS and
RDNS."sfu.ca".CMPT/TR seem unnecessarily awkward compared to the earlier
examples S.ACM/TOPLAS and CA.SFU.CMPT/TR. We should hope that forms such as
the latter ultimately become canonical under the USIN system. One might ask, then,
why not just skip the RDNS prefix, reverse the order of DNS domain names and
use those reversed names directly at the top-level of the USIN hierarchy in the
initial instance? The answer is that the top-level domain structure of the USIN
system should not be prematurely constrained. Once established for a particular use,
USIN designations are intended to be reserved permanently for that use. The RDNS
prefix allows existing DNS names to be used as a way of initializing the USIN
system, giving time for an orderly process of developing an internationally-
acceptable top-level domain structure.

Within the RDNS domain for a particular publishing organization, the identification
of administrative divisions and publication series should use codes specified by that
organization. In many cases, clear coding schemes are already in place now. In the
important case of universities, a system of unambiguous mnemonic codes for the
academic departments is typically available in the university calendar. Codes to
denote a publication series of a university department (e.g., TR for Technical Report,
TN for Technical Note and so on) are often included on publication lists produced by
the department or may be found on the documents themselves. Wherever possible,
the use of existing naming schemes should be accommodated in this way, in order
to maximize the scholar-friendliness of USIN designations.

Occasionally, one finds a DNS domain that directly corresponds to a particular


serial publication. For example, the electronic journal First Monday has an
associated DNS domain firstmonday.dk. In this case, the DNS name can be used
as a serial publication name directly within RDNS. Assuming then that the internet
domain for First Monday is registered on the list of acceptable RDNS domains, it
has the USIN RDNS/"firstmonday.dk".

In order to ensure the robustness and permanence of USIN designations, one should
expect that certain adaptations and accommodations of historical naming schemes
will be required. Thus, the USIN system must include a method for describing
naming schemes and rules for maintaining consistency. In order to make the greatest
use of historical naming schemes, the rules should be designed to accommodate a
great deal of variability. Nevertheless, some modifications of historical naming
schemes should be expected in order to comply with USIN requirements.

The three initial domains ISSN, ISBN and RDNS provide a plausible initial basis
for unified, permanent and globally-unique designations of archivable serial, book
and institutional publications. There are undoubtedly many cases in which the
coding of USIN specifications will initially be unclear, especially in the case of
institutional publications. However, it is certainly a common practice for the serial
publications of an institution to be identified using a numbering scheme that serves
to unambiguously denote those publications in the local context of an institution. It
is certainly also the case that the vast majority of publishing institutions in the
industrialized world can now be identified by an appropriate DNS domain. These
conditions suggest that it is presently feasible to initiate a USIN system.

Evolution of the USIN System: Towards Scholar-Friendly Names


Although the ISSN, ISBN and RDNS domains may serve to initialize a USIN
system, they will not generally provide a satisfactory basis for the scholar-friendly
canonical designations that meet USIN Requirement #4.2. The development of an
internationally acceptable domain structure is beyond the scope of this paper.
However, to stimulate discussion along these lines, the References section of this
paper includes, for each of the cited references, the discussion of possible initial
USIN designations and forms that may evolve over time.

IV. Hierarchical Identification of Serial Items


This section focusses on the problem of identifying articles and other components
within the context of a particular serial. For concreteness, the first subsection starts
with a proposed USIN syntax for citing journal articles. Following this, a general
model for serial item identification by hierarchical numbering of items within a
series is presented. The final subsection returns to the exploration of some additional
design ideas for USIN syntax.

Example: Journal Article Citation


The following examples illustrate a proposed syntax for citation of traditional (print)
journal articles.

S.ACM/TOPLAS:16@1811
Assuming that S.ACM does become the code for the Association for
Computing Machinery in the global domain for scholarly societies, this is the
canonical USIN in the proposed syntax for the article "A Behavioral Notion
of Subtyping" by Barbara H. Liskov and Jeannette M. Wing appearing in
ACM Transactions on Programming Languages and Systems, volume 16,
number 6, (November 1994), pages 1811-1841.
S.ACM/TOPLAS:16(6)@1811
This is an acceptable alternative USIN for the same journal article, specifying
the issue number.
S.ACM.SIGPLAN/Notices:32(1)@66
This denotes the position paper entitled "Global Computation" by Luca
Cardelli, published in ACM SIGPLAN Notices, Volume 32, Number 1,
January 1997, pp. 66-68. In this case the issue number is required, because
pages are renumbered from 1 with each issue of SIGPLAN Notices.

The syntax is intended to be scholar-friendly: mnemonic of the roles of each


component in the numbering. Volumes are emphasized as the first numbering
component, issues are enclosed in parentheses consistent with many standard
citation formats and the "at sign" indicates the page number at which the article
starts.

It is possible to contemplate a generic syntax for the numbering of serial items,


avoiding specialized syntax for each type of item. For example, the conventions of
the Web's Universal Resource Identifiers [3] might be adopted to use the "/"
punctuation for separation of all elements within the hierarchical numbering of a
serial item. The designation of the TOPLAS example might become
S.ACM/TOPLAS/16/6/1811. Unfortunately, there are a number of disadvantages to a
generic syntax for hierarchical numbering. First, with respect to journal numbering,
optional issue numbers are not easily accommodated. For example, how does one
reconcile S.ACM/TOPLAS/16/1811 as an article denotation with
S.ACM/TOPLAS/16/6 as an issue denotation? Second, the mnemonic value of
associating specific symbols (e.g., "@") with specific concepts (e.g., "at page
number") is lost. Finally, there may be syntactic conflicts between the universal
syntax and existing syntaxes for publisher's numbering schemes. For example, the
"/" separator for URI syntax conflicts with the combined-issue designations such as
3/4 that are frequently used by journals such as The Serials Librarian. For these
reasons, it seems preferable to avoid specifying a generic universal syntax for serial
numbering and instead allow series-dependent syntax. Nevertheless, the number of
alternative syntactic schemes should be kept fairly limited to avoid cognitive
burdens for the scholar.

Multiple Articles Per Page.

Occasionally, one may find journals with more than one article starting on a
particular page. For example, these might be items of technical correspondence.
One solution to this problem of starting page ambiguity is to use sequential
denotations with lower case letters. For example, S.ACM/CACM:38(1)@43a and
S.ACM/CACM:38(1)@43b could respectively denote the two short articles "Women
and Computing in the UK" by Alison Adam and "Announcing a New Resource:
The WCAR List" by Laura L. Downey, both appearing on page 43 of
Communications of the ACM, volume 38, number 1 (January 1995).

There are three small problems with this scheme that may be quite rare but are
theoretically possible and should be addressed. The first is that there may potentially
be more than 26 articles on a page. However, the scheme easily extends so that
designations such as aa for the 27th article and aaa for the 677th article may be
used. Second, there may be an ambiguity in determining the ordering of articles;
pages are two-dimensional while orderings are one-dimensional. The most scholar-
friendly way to resolve this is to follow the natural text ordering. For publications in
English and similar languages, this is column-major numbering: articles in column 1
always precede articles in column 2 and so on, while articles within columns are
numbered top to bottom. Finally, note that page numbers themselves might in some
cases include lower case letters. An example is preface material in a journal volume
numbered using lower case roman numberals. To handle this case, the USIN scheme
might specify that the underscore ("_") character can be used as a separator.

In practice, scholars will not want to learn the details of how to distinguish multiple
articles on a page until it becomes a problem. They may not even be aware of the
problem if they are entering a citation from its written form in a reference list. In
such a case, the user will likely omit the required lower case code when entering the
citation. Interactive USIN processing software should notify the user of the
ambiguity and query him or her for its resolution. Batch-oriented software could
return the set of all articles on the page and issue a warning report through an
appropriate message or log file.

Unpaginated E-Journals

When a journal is not printed on pages, one might expect that article identification
by page number is no longer appropriate. Although many electronic journals have in
fact retained page-oriented formatting and numbering, many others have chosen not
to do so. In particular, there is a growing trend to use the logical document markup
capabilities of SGML [7] and HTML in electronic journals. One advantage is that
formatting may be left to the reader's software; articles can be viewed and printed
in a variety of different formats (with a variety of different paginations) depending
on hardware capability and reader preference. In view of this, it seems reasonable to
expect that the trend towards unpaginated e-journals will continue.

Consider a variation on the standard USIN journal syntax that accommodates


unpaginated e-journals by replacing the @page syntax with $article-number. (An
earlier version of this paper used the more mnemonic # to denote article numbers,
but the $ is easier to use when USINs may be encoded as URLs.) Some e-journals
have explicit article numbering by volume, for example, the Chicago Journal of
Theoretical Computer Science. Supposing that S.MITP/CJTCS identifies this journal,
S.MITP/CJTCS:1995$3 then denotes article 3 in volume 1995, entitled "Rabin
Measures" by Nils Klarlund and Dexter Kozen. In other cases, articles may be
numbered within issues. Thus ISSN/1201-2459:2(3)$4 would denote the article
"Reflections on Milton and Ariosto" by Roy Flannagan, published as article 4 in
Early Modern Literary Studies (ISSN 1201-2459), volume 2, number 3.

When no explicit numbering is provided, article numbers should be determined by


issue, if possible, or by volume, otherwise. In general, scholars will determine article
numbers by counting through the table of contents. In some cases, this may be a
source of ambiguity; if the table of contents includes regular articles, short notes,
corrigenda, submission instructions and/or other items, scholars may have difficulty
determining what to count and what to omit. With the expected availability of on-
line USIN databases, however, a scholar may simply query the database to verify or
determine the correct USINs for articles published in a particular issue or volume.

A General Model for Identification by Hierarchical Numbering


The scheme just illustrated for journal citation is an example of a general concept
for serial item identification: the use of a hierarchical numbering system. Abstractly,
serial items are identified in the context of their serials by specifying hierarchical
numbering tuples. For example, (volume, page) 2-tuples serve to identify articles in
some print journals, while (volume, issue, page, item-count) 4-tuples may be
required for magazines. In some cases, the hierarchy may be quite deep; items in a
particular newspaper may be identified by a 7-level numbering (volume, issue,
edition, section, page, column, item-count). In general, this is the essence of serial
identification: although the particular scheme employed may vary from serial to
serial, every item within every serial may be abstractly identified by some form of
hierarchical numbering tuple.

It is interesting to note that a hierarchical enumeration system ("tumbler


addressing") was also used as the basis of universal document identification in the
proposals for the Xanadu Docuverse [20]. However, those identifications were
based on a server/user/document/version/content hierarchy rather than the pure
publication numbering hierarchy considered here. In essence, the Xanadu address
system attempted to develop a new numbering system to apply to all documents,
whereas the USIN approach is to make characterize and use existing publication
numbering hierarchies within a common framework.

Scope

One defining characteristic of the USIN hierarchical numbering model is that every
counter within every numbering tuple has a scope that defines the context of its
numbering. Issues of a journal are typically numbered from 1 within each volume;
they are said to have volume scope. Page numbers may have volume scope or issue
scope, depending on the particular serial. An "item-count" for distinguishing
multiple articles per page has page scope. The first, or principal, numbering
component of a serial is said to have global scope; it is numbered consecutively in
perpetuity.

Numbering scope is correlated with, but not synonymous with, hierarchical level.
For example, volume scope for page numbers is often used even when volumes are
divided into issues. Similarly, although issues are usually given volume scope when
volumes exist, they may sometimes be given global scope.

Scope-Dependent Numbering

Another important aspect of the model is the use of scope-dependent numbering. In


general, this reflects the fact that some properties of a counter at a particular level
may depend on the actual values of counters at superior scope levels. Some of the
scope dependencies may be relatively minor. For example, a quarterly journal that
changes to a bimonthly journal starting with volume 23 exhibits a scope-
dependency: issues are number 1 through 4 for volumes 1 through 22, and are
numbered 1 through 6 thereafter. Scope-dependency may even affect the need for a
particular counter in serial item identification. For example, the item-counter for
multiple articles per page is not needed for those pages that have only one article
starting on a page. Scope-dependencies may even affect the entire numbering
system. For example, a print journal may switch to electronic publication at some
point with a corresponding switch from a (volume, issue, page) numbering scheme
to a (volume, article-number) scheme.

Syntactic Representation

In general, the numbering scheme for every serial has a syntactic representation that
may be generated by mapping rules from the abstract representation as a
hierarchical numbering tuple. In the suggested standard journal article syntax, the
(volume, page, item-number) tuple of (12, 135, 2) maps to the syntactic
representation 12@135b. In general, each number in a hierarchical numbering tuple
is first mapped to a numeral in some encoding system, such as arabic numerals,
roman numerals or "alphabetic numerals" (a, b, c, ..., aa, ab, ...). Then a syntactic
string for the entire structure may be constructed by concatenation with appropriate
mnemonic operator symbols as punctuation. An essential goal of this process is that
the syntactic encoding be uniquely decodable. Operator symbols must be carefully
chosen both to have mnemonic value and to ensure unambiguous interpretation of
the syntactic forms. In principle, the order of appearance of numbering elements
may also be considered a design choice, but for simplicity and to avoid confusion it
may be desirable to enforce a strict left-to-right ordering of elements according to
the numbering hierarchy.

Parallel Numbering Hierarchies

A fourth aspect of the hierarchical numbering model is that a serial may have
parallel numbering hierarchies for different purposes. In general, these hierarchies
have a common numbering prefix consisting of one or more of their uppermost
numbering levels, with divergence of numbering below these level(s). The simplest
example is that of the article-identification and issue-identification hierarchies of
journals that are paginated with volume scope. In this case, the (volume, page) and
(volume, issue) hierarchies may be considered parallel. In general, syntactic devices
are necessary to distinguish which hierachy is intended in any particular coding; the
(volume, page) and (volume, issue) hierarchies are distinguished by the @ and ()
syntax notations given previously. Other examples of parallel numbering are given
in the later subsection on secondary component notation.

Chronology

Finally, chronology is the fifth general property associated with the hierarchical
numbering model for serials. Chronology is the association of a date and/or time of
publication with a particular serial numbering component. In general, chronology is
a fundamental aspect of serial publication and should be defined for all hierarchical
numbering components down to some level at which all further structure is
considered simultaneously published. For example, traditional print journals have
chronology specified to the issue level, while electronic journals may have
chronology specified to the article level. In general, chronology is scope-dependent;
for example, when a quarterly journal changes to a monthly one, the chronology
associated with issue 3 in each volume may change from "Fall" to "March".
Chronology may also be irregular and possibly out-of-sequence, that is, with
publication numbers assigned out of order of actual publication dates. Chronology
itself is also an instance of hierarchical numbering, for example, using (year, month,
day) 3-tuples or (year, season) 2-tuples.

Further Work: Hierarchical Numbering Theory

One direction for further development is to consider formalization of the model to


become a theory of hierarchical numbering. Such a theory would have as its purpose
the establishment of certain important properties, such as ensuring that every
published item is denotable by a hierarchical numbering tuple, every tuple has a
syntactic representation and every syntactic representation is unambiguously
decodable. In particular, careful attention should be given to the formulation of
arithmetic operations to avoid problems such as the "paradoxes of tumbler
arithmetic" in the Xanadu scheme [20]. The theory should also account for the
particular properties of hierarchical chronological numbering. In this regard, the
theory should be informed by the extensive work of Dershowitz and Reingold in
developing the mathematics of many of the world's important calendar systems
[10].

Additional Design Ideas for Hierarchical Numbering


The following subsections present a number of additional design ideas for the
identification of serial items by hierarchical numbering. Although many of the ideas
are illustrated using examples related to journals, they are intended to apply to other
types of serial as well.

Syntax for Holdings Description

Beyond article identification, the next most important application area for USINs
may be in the description of library holdings or document delivery service
coverage. A single volume or issue of a journal is simple to identify by including
numbering only to the desired level. For example, S.ACM/TOPLAS:16 denotes
volume 16 of TOPLAS, while S.ACM/TOPLAS:16(6) denotes issue 6 thereof. But
holdings are more often described as volume ranges. In cases where issues are
missing, subscriptions are cancelled and then reinstated, or miscellaneous holdings
have been received by donation, the holdings may be broken up into a lists of
individually held items or ranges. To accommodate these requirements, it seems
reasonable to reserve the comma (",") to separate elements of a holdings list and the
double hyphen "--" to serve as a range operator.

Consider a holdings pattern for ACM TOPLAS consisting of volumes 2 through 12


and 16 forward, except for the missing issues 2 and 4 of volume 10. The following
USIN holdings specification could be descriptive.
S.ACM/TOPLAS:2--10(1),10(3),11--12,16--ff

Here, the serial code is specified only once. Commas separate individually held
items or ranges. The start and end of a range are indicated by enumeration to the
required level of specificity. An end range of "ff" indicates a continuing
subscription. As a syntactic constraint to aid in error detection, holdings should be
listed in strictly ascending order.

Only positive holdings data is shown, following the principle adopted by ANSI
Serials Holding Statements [2]. Determination of missing items can be made by
reference to either the USIN global database or an appropriate serial "definition"
(see the subsection on Serials Definition Language in the following section). For
example, using the knowledge that TOPLAS was quarterly during volume 10 tells us
that 10(2) and 10(4) are missing for these holdings while 10(5) is not (because it
does not exist).

The conventions for serials holdings are intended to apply to serials with any form
of hierarchical numbering and to any level of specifity. One implication is that the
syntax of USINs generally must be structured to avoid conflicts with the "," and "--
" symbols of the holdings notation. Another implication is that coverage can be
specified to a finer level of detail. For example, a document delivery service may
wish to identify "scanned holdings" to the article level, that is the articles that have
already been scanned or digitized and are hence available for short-turnaround
delivery.

Secondary Component Notation

Secondary component notation is a proposed means of specifying abstracts of


articles, tables of contents of issues, indexes of volumes and other secondary
components of serials or their articles. In general, secondary component notation is
introduced by a USIN for the relevant article, issue, volume or other component,
followed by a vertical bar and a component specification. The component
specification is typically a standardized mnemonic for the component, possibly
followed by a parenthesized enumeration. The following examples are illustrative.

S.ACM:TOPLAS:16|index
The index of volume 16 of TOPLAS (found at the end of
S.ACM:TOPLAS:16(6)).
S.ACM:TOPLAS:16(6)|contents
The table of contents of volume 16, issue 6 of TOPLAS.
S.ACM:TOPLAS:16@1811|abstract
The abstract of an example TOPLAS article.
S.ACM:TOPLAS:16@1811|sec(4.1)
Subsection 4.1 in the example article, entitled "Type Specifications".
S.ACM:TOPLAS:16@1811|fig(3)
Figure 3 in the example article, captioned "Stack Type".

The last two examples illustrate parallel (volume, page, section, subsection) and
(volume, page, figure) numbering hierarchies respectively for sections and figures
within articles.

It is anticipated that a standard set of mnemonics for standard kinds of components


would be globally defined (index, abstract, section, figure, table, equation and so
on) while others may be defined for individual publications. However, scope
dependencies and numbering syntax for enumerated components will typically be
defined on a serial-by-serial basis.

One may question the need for fine-grained identification of article components.
Indeed it is reasonable to consider deployment of an initial USIN system that
focusses on article identification. Nevertheless, for a scheme that is designed to
serve for article identification and related purposes in perpetuity, it would seem
foolhardy not to allow the extension of the scheme using a notation such as the
secondary component notation presented here.

The Reference Notation

The reference notation is a particular application of the secondary component


notation that would allow designation of an article or other contribution by indirect
reference. For example, S.ACM/TOPLAS:16@1811|ref(17) denotes reference 17 of
the article starting on page 1811 of volume 16 of TOPLAS. As it happens, this
reference is to an article entitled "A semantic database model," by Hammer and
McLeod appearing in ACM Transactions on Database Systems, 6(3), pp. 351-386.
Assuming that the appropriate citation database exists, the indirect reference in this
case could map to the canonical form S.ACM/TODS:6@351.

One use of the reference notation is to guarantee that you can quickly generate an
acceptable USIN for every reference in an article, providing that you can generate a
USIN for the article itself. During creation of citation databases, it may be desirable
to produce a full set of USINs for the reference lists of articles in a fairly
expeditious fashion. If the resolution of some references to their direct USIN form is
proving problematic, they may be left in indirect form during initial data entry. At a
later time, the resolutions of indirect references may be entered either manually or
by acquisition of an independently developed citation set for the same article.

Another use of the reference notation is to serve as a unique canonical form for
personal communications, unpublished works and other otherwise undenotable
items. In this way, there would be no need to create a classification or coding
scheme for such references. Furthermore, each such item would be automatically
given a permanent and unique code. For example, if two authors each write articles
citing "Famous Person, personal communication", those citations would be given
distinct canonical identifiers. This would prevent false positives when doing
coreference searches (finding papers that have 2 or more references in common).

The reference notation is best supported by article styles with an explicitly


numbered reference list at the back. If a reference list exists, but is not numbered,
reference numbers may be determined by counting. Alternatively, if references are
cited by symbolic tags, as in this paper, a possible design choice is to use the
symbolic code itself in the reference notation. For example, the citation of the SICI
standard referenced in an earlier version of this paper might be given the indirect
reference RDNS."sfu.ca".CMPT/TR:97-16|ref(SICI). Another style may use
numbered endnotes, with the possibility of more than one reference per note. In this
case, enumeration with endnote number may use lower case letters; |ref(3c) would
denote the third item cited in endnote 3 of a particular article. In general, each serial
may define its own reference numbering conventions, but it is highly desirable that
one of the standard forms be chosen.

Hyphenation Notation

In some cases it may be desirable to break a long USIN over multiple lines. This
can be accommodated by the following hyphenation convention. A line break may
be inserted after any hyphen appearing in a USIN, without changing its meaning.
Furthermore, any nonhyphenated USIN operator can be converted into a hyphenated
equivalent of that operator by adding a hyphen to the end. Thus, the hyphenated
equivalents of "." and "/" and "--" are respectively ".-" and "/-" and "--" (no
change). The following examples illustrate this convention in use.

RDNS."sfu.ca".CMPT/-
TR:97-16|ref(SICI)

S.ACM/TOPLAS:2--15(1),-
15(3),15(5)--17,20--ff

S.ACM/TOPLAS:2--15(1),15(3),15(5)--
17,20--ff

RDNS."sfu.ca".CMPT/-TR:97-16|ref(SICI)

The last example illustrates that a newline character is not strictly required after a
hyphenated operator. This accommodates reformatting operations that might
eliminate an inserted newline character but leave a vestigial hyphen in place.
Conversion to canonical form eliminates any hyphenated operators and embedded
newlines. USIN processing software should fully recognize the hyphenation
convention in the event that a multi-line USIN is entered using a cut-and-paste
operation.

V. USIN Support Technology


This section considers two important models of support technology for a USIN
scheme: a USIN Global Registry and a USIN Global Database System. The USIN
Global Registry is proposed as a system of institutions and technologies designed to
preserve the knowledge of assigned USINs and their denotations for posterity and to
support publishers and librarians in the assignment of new USINs for new and/or
unassigned works. As differentiated from the Registry, a USIN Global Database
System is not intended for USIN updating, but is instead intended to support the
day-to-day needs of scholars for access to USIN information. This distinction is
conceptually valuable in organizing requirements for the separate purposes of USIN
registration and USIN-based information retrieval. It might ultimately be the case
that the registry and database components are implemented in a single system,
however.

In discussing these technologies, the goal is to present a vision of how USINs may
be generated, verified and used in the day-to-day work of publishers, librarians and
scholars. At this point in the development of the USIN concept, the focus should be
more on the analysis of overall system requirements than on the implementation
details of underlying mechanisms. Nevertheless, a number of design ideas are
included to help give a more concrete picture of the possible operation of an
integrated global USIN system.

USIN Global Registry


Consider a design for the USIN Global Registry based on four principal
components. These are:

SDL: Serials Definition Language:


a language for specifying serial publications and their publication schemes.
UPP: USIN Publication Protocol:
a protocol for assigning USINs as part of the publication process and
verifying that they meet global uniqueness and permanence of identification
requirements.
SRP: Serial Registration Protocol:
a protocol for registering and revising serial codes and their SDL definitions.
PDP: Publication Domain Protocol:
a protocol for creating, modifying and deactivating publication domains.

These are the technologies that publishers and librarians could use on a daily basis
in the assignment of USINs to serially published items.

SDL - Serials Definition Language

Fundamental to the USIN concept is the use of serial designations and numbering
schemes for identification of articles and other serial components. In order to
formally specify these schemes, consider the creation of a Serials Definition
Language (SDL). Each SDL specification would define one serial, establishing its
basic identity and publication scheme. In particular, this would include formal
specification of the hierarchical numbering scheme of the serial including its
abstract structure, scope-dependencies, chronology, and syntactic identification
schemes for articles and other serial components. It would also include the
specification of the canonical and allowable alternative forms for USIN
designations.
In addition to its formal role in the USIN scheme, SDL should also be designed to
serve a variety of related purposes. From a serials check-in and claiming
perspective, the enumeration and chronology specifications of an SDL definition
should also have predictive value as contemplated, for example, by the serial pattern
scheme of McNellis [16]. The SDL definition of a serial should also provide a basis
for evaluating and interpreting USIN holdings specifications and possibly
converting them to MARC Holdings Format. Similarly, from a bibliographic
database perspective, it should be possible to verify the enumeration and chronology
recorded in a database entry against that specified in an SDL definition. It should
also be possible to determine the comprehensiveness of database coverage: are there
any issues or articles published that are not in the database, or is the database
complete?

The requirements above relate to a fairly narrow definition of serials, namely, in


terms of the logical schemes for enumeration, chronology and serial item
identification. It is possible to define a language (say, SECIL) that would be limitied
to these requirements. Such a narrow approach would serve to support a USIN
system, but it seems reasonable to consider serial definition from a broader
perspective while the opportunity exists. In particular, the definition of a serial
logically includes not only its numbering scheme, but also the title, publisher and
publication format. Incorporation of such elements into the language would seem
necessary to merit the term "serials definition language." Beyond this, one might
wish to include additional information, notably classification and indexing
information. This reflects a cataloguing perspective and suggests that a
nomenclature of SCL (serials cataloguing language) might be appropriate. However,
from the viewpoint of designing good modular systems, the SDL approach is
arguable preferable, because it focusses on information deriving directly from its
publication and relevant to the essence of what the serial is. Cataloguing
information is essentially third-party information that may derive from a variety of
sources and should be kept separate; it is information about the serial, not
information defining it. Detailed exploration of these issues is an area for further
work.

UPP: USIN Publication Protocol

When USIN-based bibliographic databases are in widespread use, publishers will


find that the sooner an article is assigned a USIN, the sooner it is advertised to large
communities of scholars. The USIN Publication Protocol (UPP) is therefore
proposed to allow publishers to assign each article a USIN during the publication
process, thereby updating the USIN databases automatically.

A major requirement for UPP is to ensure the integrity of assigned USINs from the
standpoint of global uniqueness and consistency with the current SDL definitions of
serials in question. One approach to this is to maintain within the USIN Global
Registry a current publication state for each serial and to define acceptable UPP
actions in terms of this state. In essence, the publication state identifies the last
issued USIN for the serial, plus a specification of which numbering levels in the
hierarchical numbering scheme are currently open. This gives a basis for predicting
the counter and date values for upcoming UPP requests.

For example, consider the publication state that might exist after registering the
article "Collecting Interpretations of Expressions" by Paul Hudak and Jonathon
Young appearing in ACM TOPLAS, Volume 13, Number 2, April 1991, pages 269-
290 with the USIN S.ACM/TOPLAS:13@269. The state may include volume and issue
counters that are currently open with values 13 and 2, respectively. A page counter
may be closed at page 290 (nothing more will appear on page 290). At this point,
there may be two legal UPP actions: add another article in this issue or close it. As it
happens, there is one more article in the issue. Based on the current publication
state, an expectation may be generated that the next article will have USIN
S.ACM/TOPLAS:13@291. If the publisher indeed submits that USIN with the next
UPP request, it can be accepted, otherwise an error can be reported.

After a "close issue" request has been made, the SDL definition and publication
state can be used to predict the next publication action and expected date. In the
example, this is an "open new issue" request for issue 3 of volume 13, July 1991.
These may be verified when the actual request is made. When issue 4 of this
volume is closed, the SDL definition should tell us that there are no more expected
issues in this volume. The expected sequence of following UPP requests is then a
"close volume" request, followed by an "open volume" request for volume 14, 1992,
an "open issue" request for issue 1 in January 1992 and an article publication
request with USIN S.ACM/TOPLAS:14@1. Each of these expectations may be in turn
verified against the actual UPP requests made.

Of course, mechanisms will be required to deal with various kinds of exceptions to


the predicted publication pattern. For example, when a particular issue is expected,
one may instead see a combined issue (with combined enumeration) instead.
Alternatively, an issue may be skipped altogether, or a special issue may be inserted
into the publication stream between two regular issues. Publication numbering may
also be out of order with respect to date of publication. For example, in a technical
report series, it is not uncommon for numbers to be assigned in advance of
publication, with variable delays between the assignment of a number and actual
publication. An apparent publication exception may also be the first indication of an
actual change in publication pattern. In this case, the SDL definition should be
corrected to reflect the updated publication pattern and reregistered with SRP,
described below.

SRP: Serial Registration Protocol

Serials Registration Protocol is the proposed service for registering a serial code and
its accompanying SDL definition and tracking changes thereto over time. This
includes registering changes in publication numbering or chronology, changes in
publisher or publication domain, addition of alternative USIN codings, changes to
the canonical USIN form and/or deactivations and reactivations. In general, SRP
requests would be made with respect to a particular publication-domain/serial-code
combination.

Perhaps the most critical function under SRP is the creation of a new serial code
within an existing publication domain. The code may be the initial code for a new
or previously unregistered serial publication or it may be an alternative code for an
existing publication. In either event, creation of a serial code should always be
considered with care, because it creates, in the context of the given publication
domain, a permanent USIN binding between that code and the serial in question.
From this perspective, it is worth considering appropriate verification actions for
creation of a new serial code. Of course, verification that the code is previously
unassigned is an automatic function that should be implemented by the appropriate
query to the USIN Global Registry. Beyond this, there should also be some manual
verification to ensure that the code assignment is reasonably consistent with the
USIN concept. One option is to use national serial registration centres analogous to
those of the current international ISSN network. However, such a system is likely to
be too cumbersome for the management of publications at the fine-grained level of,
say, minutes of committee meetings of particular university departments. It also
does not account for an institutional role in approving the serial codes chosen by
administrative divisions within the institution.

An alternative for verifying serial code assignments that overcomes these problems
is the following. SRP requests for new serial code creation must be approved by a
USIN-certified cataloguing librarian. Certifications are awarded by an appropriate
international standards body. Each authority for a publication domain may designate
a certified librarian for that domain. When an SRP request to create a new serial
code is issued, it is handled by the librarian registered for that domain, if such a
librarian exists. Otherwise, verification of the creation request is attempted in the
immediately superior publication domain, and so on. For example, a university may
designate a single USIN-certified librarian to handle all institutional requests for
new serial codes. Regardless of how deeply structured the administrative hierarchy
within the university is, all serial code creation requests within the university are
passed up the domain hierarchy to be handled by this individual.

The second major function of the SRP protocol is to register the publication pattern
of a serial and changes to that pattern as required from time to time. As described
above, these publication patterns are specified as part of the serial's SDL definition.
UPP can be used to check the consistency of the publication patterns against future
publication attempts. That is, each time a USIN is specified in a future UPP request,
it serves to check that the SDL definition is correctly predicting the actual
publication numbering and chronology.

Whenever the publication pattern of a serial is changed, the SDL definition must be
modified to account for both future and past publications. The checking of future
publications is done by UPP. SRP is responsible for checking that the revised SDL
definition correctly accounts for the USINs assigned to past publications. This
checking may be done by formally re-evaluating the revised definition against the
entire history of actual publication as recorded in the global registry. The checking
should satisfy two conditions: (1) every USIN previously registered should be
accounted for by the new SDL definition, and (2) the new SDL definition should
not "predict" any past publication that does not, in fact, exist. Exhaustive checking
or a provably equivalent alternative method should be used. That is, a reduced form
of checking that puts at risk the consistency of the USIN system should not be
justified on the basis of minor concerns of computer processing efficiency.

The third major function of SRP is to register canonical and alternative forms of
USIN for a serial. When a serial is registered for the first time, the publication-
domain/serial-code combination under which it is first registered is the canonical
form of USIN. Subsequently, SRP may be used to create alternative USIN forms.
When such an attempt is made, the SRP request must specify both the publication-
domain/serial-code combination for the current canonical USIN and the new
alternative publication-domain/serial-code combination. It may be reasonable to
require that permission from the domain authority of both domains be obtained.
Any number of alternative forms for a serial may be created in this way.

The SRP request to change the canonical form of a serial must specify the
publication-domain/serial-code combination of both the current and proposed new
canonical forms. The request is made by the authority for the new publication
domain and must be verified by the authority for the currently canonical publication
domain. If approved, the change will be scheduled to occur at the next scheduled
global synchronization time for changes to USIN canonical forms, or to a later
synchronization time specified in the change request. Once the change becomes
effective, the canonical form is switched, but both forms remain acceptable.

SRP also can be used to deactivate or reactivate a serial. In essence, deactivation of


a serial registers a new publication pattern in which no further publications are
predicted. Reactivation requires a new SDL definition that may change the title and
future publication pattern of a serial, but still requires consistency with the entire
history of previously assigned USINs.

PDP: Publication Domain Protocol

Publication Domain Protocol is the final proposed service of the USIN Global
Registry. This protocol is used to create and register new publication domains,
transfer authority for domains, register the USIN-certified librarians for a domain
and other related functions. In general, these actions will refer to subdomains of
some existing publication domain; even top-level USIN domains such as ISSN and
RDNS may be considerd as subdomains of a global USIN publication domain.

Creation of a code for a new publication domain under PDP parallels the creation of
a new serial code under SRP. In both cases, the proposed code must be checked to
verify that it is previously unused in the context of the parent publication domain.
Furthermore, the manual review of serial codes by a USIN-certified librarian should
also occur for new publication domains. Ideally, this manual review should verify
that the publication domain corresponds to an actual publishing institution,
organization or administrative division thereof and is a scholar-friendly mnemonic
designation of that unit consistent with historical practice wherever possible.
Alternatively, the publication domain may represent a newly-formed collective or
coalition expressly formed for the purpose of organizing the upper levels of the
USIN domain structure.

A further parallel with SRP is to suggest that formal domain definitions be


registered and revised as required from time to time. These definitions would
specify the identity and organizational history of a publishing entity. From a domain
definition, then, one should be able to determine the name of a particular publishing
entity, its parent organization, its successors and predecessors and so on. However,
domain definitions would not have the complexity of serial definitions under SDL,
because there are no corresponding requirements in publication domains for
enumeration, chronology and other aspects of serial definitions.

PDP should also support the registration of alternative USINs and changes in
canonical USIN for the publishing entities denoted by publishing domains. The
registration of alternative USINs under PDP could parallel SRP in a straightforward
fashion. However, registration of a new canonical USIN for a publishing domain is
complicated by the implications for serials and subdomains within that domain.
Consider a proposed change from RDNS."acm.org" to S.ACM as the canonical USIN
for the Association for Computing Machinery. Normally, this should imply
corresponding changes for all subordinate serials and subdomains recursively. Thus,
changes in canonical USIN from RDNS."acm.org"/CACM to S.ACM/CACM, from
RDNS."acm.org".SIGPLAN to S.ACM.SIGPLAN and from
RDNS."acm.org".SIGPLAN/Notices to S.ACM.SIGPLAN/Notices should all be
expected in the example. However, it may be unwise to automatically make such
changes without review in every instance. Thus, under PDP, a change in canonical
form for a publishing domain should be carried out by first registering all the
appropriate changes for subordinate serials and subdomains. This may be enforced
under PDP by permitting a registration of a new canonical form for a publication
domain only when alternative canonical forms for all active subdomains and serials
therein have been registered.

Finally, PDP should also provide for the deactivation and possible reactivation of
domains. Deactivation of a publication domain implies that no further publication
activity is contemplated within that domain or its subdomains. Hence deactivation of
a domain should only be permitted when all subordinate serials and subdomains
have themselves been deactivated. Reactivation of a publication domain may
occasionally be contemplated. However, to ensure the permanence of identification
of USINs issued in the subdomain prior to its earlier deactivation, a reactivation
request should not be automatically granted. Instead, a "contract" may be first
returned identifying previous use of the domain, assigned subdomains and serials
and the requirement that new use will respect these. The proposed new domain
authority should agree to these terms before the domain can be reactivated.
USIN Global Database System
Now consider how the day-to-day needs of scholars can be directly supported by a
USIN Global Database System. Three basic needs can be identified: (a) the need to
inquire about the article or other item denoted by a given USIN, (b) the need of
authors to cite articles by USIN, and (c) the need to use USINs in literature
research, both to denote search keys (citation indexing) and search results. USIN
Inquiry Protocol is the first proposed technology to assist users in this regard; it
provides for both the interactive inquiry about USINs and for hypertext citation of
USINs in World-Wide Web documents. To support citation by USIN in other types
of document formatting software, a Bibliographic Retrieval Protocol is proposed
coupled with bibliographic formatting "plug-ins" for standard word processing
packages. The final subsection discusses the role of the USIN Global Database and
USINs generally in literature research.

UIP - USIN Inquiry Protocol

One of the primary motivations underlying the USIN concept is to address the
"broken links" problem on the World-Wide Web: citation of works by Uniform
Resource Locator (URL) is prone to failure when the cited item is moved or
removed. To solve this problem, it has long been suggested that names of resources
rather than their locations should be the basis of citation, but none of the proposals
for Uniform Resource Names (URNs) has yet succeeded. A more successful
approach may be to concentrate on an important subset of the general problem:
links to serially-published documents. For this subset, consider the direct use of
USINs as permanent, "unbreakable" links and the development of USIN Inquiry
Protocol (UIP) to enable this use. For example, a hypertext reference to a sample
TOPLAS article could be coded using the following HTML markup.

<A HREF="uip:S.ACM/TOPLAS:16@1811">A Behavioral Notion of Subtyping</A>

Note that a hyperlink formed in this way makes no reference to any particular
computer system. Thus, the requirements of URNs are satisfied; the target of a link
is designated by naming what it is instead of where it is located.

Apart from this use in Web-based documents, UIP also supports the direct inquiries
about a particular USIN. All that the scholar need do is to type
uip:S.ACM/TOPLAS:16@1811 directly into the "location" field of his favorite Web
browser (assuming that the browser has been updated to include the UIP client-side
software.)

Ignoring for the moment how it works, the critical issue from a user perspective is
what you get when you make a UIP/USIN inquiry, either directly or by activating a
hyperlink. One answer is that you retrieve a metadata page, that is, an information
page about a document, but not the document itself. In general, the direct retrieval
of documents cannot be guaranteed because many of them may not be electronically
available. On the other hand, if a document is available on-line, it may be available
from a variety of different sources with a variety of different formats and/or pricing
structures. The purpose of a metadata page, then, is to provide a full bibliographic
description of the article or other item denoted by the target USIN, and a set of links
for making further inquiries about the article and/or retrieving a copy of it.

In general, one may consider an ambitious design goal for metadata pages: to
provide a comprehensive information resource with respect to the cited items. In
addition to basic bibliographic information and links for acquiring copies of articles,
a number of other items could be provided. Each article metadata page could
include direct links to information about the serial and its publisher. Using the
USIN notation it should also be easy to include links for retrieval of contents pages
for sibling articles in the same journal issue or volume. Links for exploring other
publications by the authors of the article might be included. In particular, links for
locating subsequently published corrigenda would be worth highlighting.
Information on review articles that discuss the document of interest may be
included. In conjunction with a citation database, links for retrieving the sets of
articles that are respectively cited by and cite this article could also be considered.
Finally, it may be reasonable to consider including links to search services that can
locate similar articles by full-text searching using a document surrogate (keywords
and other metadata that describe the current document).

It may be the case that the coded USIN in a UIP hyperreference does not refer to a
single article, but instead denotes some other serial component or is ambiguous or
erroneous. In each of these cases, the page returned through UIP should also strive
to provide comprehensive information to the user. For example, in the case of an
USIN reference by page number where more than two articles start on the specified
page, a menu showing each possible article could be returned together with their
correct canonical USINs.

These ambitious goals for the metadata pages returned by UIP servers need not
represent an obstacle to server development. The initial implementations of UIP
servers may focus on basic capabilities, allowing additional functionality to be
added over time. In addition, many of the capabilities could be implemented in a
fairly modular fashion. For example, if a particular document delivery service
supports web-based document ordering by USIN, then generating the appropriate
document ordering link is a simple matter.

Returning to the issue of how UIP may be implemented, note that the syntax for
UIP/USIN citations does not specify the actual server to be consulted in resolving
the UIP request. Rather it is reasonable to expect that the server would be specified
by an appropriate client-side mechanism, such as a UIPSERVER browser parameter or
environment variable. Typically, users might choose to set their UIPSERVER to
specify a server operated by a major local research library or library consortium. In
this way, the metadata pages returned can be formatted to emphasize local holdings
of cited documents, even when the citing document is remotely located.
Bibliographic Retrieval and Formatting

A key goal of the USIN scheme is to support authors of scholarly works in the
preparation of bibliographic references. This may be achieved by bibliographic
processing "plug-ins" or "add-ons" to standard word processing software that will
allow authors to cite works by merely entering USINs at the appropriate citation
points. The bibliographic processing modules could then take care of all the
remaining details for resolving and formatting the citations: retrieving the actual full
bibliographic citations, assigning appropriate in-text reference numbers or labels,
formatting the citations according to a chosen style guideline, sorting them
according to a user- or style-specified ordering, and incorporating the citations into
the document as a reference list at the back or sequentially in footnotes. As well as
removing a considerable source of tedium in the preparation of scholarly works, the
use of USINs in this way should also improve the accuracy and quality of citations
by eliminating manual errors and inconsistencies. Finally, a serendipitous benefit of
having the citations in a paper represented as USINs is that the citation set can then
be made available as data; citation databases can thus be supported by citation data
provision at the source [6].

A modular design for a USIN-based bibliographic processing system is to allow


many different bibliographic formatting tools to retrieve data from the USIN Global
database using a common retrieval protocol (say BRP: Bibliographic Retrieval
Protocol) and citation representation format (say BDF: Bibliographic Data Format).
This would allow the development of competing bibliographic formatting tools that
might cater to different user preferences and to different types of document
processing system. BRP could be designed to work with locally-mounted copies of
the USIN database for access to the bulk of historic bibliographic data, coupled with
direct Internet access to the USIN Global Database for access to the latest
references. BDF should provide a highly-structured logical format for citation data,
in order to allow various transformations on that data to be easily implemented.
Ideally, UPP (USIN Publication Protocol) and BDF should be designed together so
that the bibliographic data in the correct format is gathered directly during the USIN
registration process.

USINs, the USIN Global Database and Literature Research

In support of bibliographic inquiry, retrieval and formatting, the USIN global


database is designed to provide a comprehensive solution when starting with a set of
citations represented as USINs. But consider also the literature research task, that is,
the need to find citations of potential interest using various search methods. In this
case, the USINs are not known ahead of time, but may represent the results of the
search process. In support of literature research, then, what role should USINs, in
general, and the USIN Global Database, in particular, play?

One possible approach is to expand the requirements for the USIN Global Database
to also provide comprehensive support for literature research activities. After all, the
USIN Global Database is intended to be comprehensive in its coverage of the
citable works and must provide the basic bibliographic data (author, title, serial
name, serial enumeration, publication date) for each archived item. With the
extension of the database to include abstracts, keywords and classification data for
each item, it is possible to contemplate comprehensive support for literature
research.

An alternative approach, however, is to support multiple alternative literature


databases each of which provide their own methods of augmenting the basic
bibliographic data available from the USIN Global Database. USINs themselves
could form the basis of interoperability between the databases, i.e., distinct results
from different databases could be easily combined by USIN sorting and matching
operations. Such an approach would support different classification schemes that
might be appropriate in different subject areas, competition between different full-
text searching techniques based on article abstracts and/or article full text, selective
databases that target sources relevant to a particular topic or type of material,
experimentation with filtering schemes that grade the level or nature of materials,
alternative language databases that support searching in languages other than
English, and so on.

From the standpoint of good modular system design, one can also argue that the
USIN Global Database should deal only with the basic bibliographic data that
derives from the publication process. Classification, evaluation and review materials
should be considered third-party metadata that may come from a variety of sources.
Without any agreed upon method for standardizing what types of metadata should
be provided and who should provide it, it would be a poor choice to impose de facto
standardization by incorporating a particular third-party metadata scheme into the
USIN Global Database.

Nevertheless, it is reasonable to consider a limited extension of the USIN Global


Database to support one additional form of metadata, namely citation metadata. A
requirement of UPP could be that the USINs of cited references be supplied as part
of the publication process. If, as suggested previously, scholars use USINs in
writing their documents, it should not be difficult to provide them in the publication
process. If this were done, it could support the development of a universal citation
database that would in turn be a valuable tool for literature research and a potential
catalyst for reform in scholarly communication [6].

VI. Conclusion
The USIN scheme is a proposed system for the global and persistent identification
of the publications in organized serial collections. Ultimately some global
identification scheme is likely to be developed for interoperation of various article
citation applications. Scholars should seize the opportunity that now exists to ensure
that the scheme that succeeds is the one that is designed primarily to meet the long-
term needs of people (authors and readers), not the short-term needs of particular
present-day computer systems belonging to vendors, libraries or document delivery
services.

This paper has presented a vision for a scholar-friendly universal identification


system for serially published works. It has also presented a number of concrete
design proposals for USIN syntax and technological components that can support a
global USIN system. In particular, a uniform naming model has been presented
based on hierarchical naming of serial publications and hierarchical numbering of
serial items. Two important systems in support of the USIN concept have been
proposed, specifically, a USIN Global Registry and a USIN Global Database.
Designs for each of these systems have been presented at a level that illustrates how
specific architectural features can interact to meet the requirements of publishers,
librarians and scholars.

There is a great deal more work required to fully realize the USIN concept. The
author would be most appreciative of your help.

Acknowledgements
Andrew Walenstein has helped greatly by providing valuable feedback on several
drafts of this paper. Jim Cole, while still questioning some issues from a serials
cataloguing perspective, has been a source of considerable encouragement. I am also
grateful to the anonymous referees for many constructive criticisms and helpful
suggestions.

References
[1]
American Chemical Society, American Institute of Physics, American
Mathematical Society, American Physical Society, Elsevier Science, IEEE,
"Publisher Item Identifier as a means of document identification", updated
October 9, 1997. Archived publication unknown. Available at
http://www.elsevier.nl/inca/homepage/about/pii/.

With no other formal denotation known for this work, it might only be
denotable by reference to this paper. Possible eventual USIN:
S.BCS/JoDI:1(3)$1|ref(1). This assumes that BCS becomes assigned to the
British Computer Society in the international domain of scholarly societies,
and that JoDI is reserved by BCS to to denote the Journal of Digital
Information.

[2]
American National Standards Committee on Library and Information
Sciences and Related Publishing Practices, Z39, Subcommittee E: Serials
Holding Statements. American National Standard for Information Sciences -
Serial Holdings Statements. ANSI Z39.44-1986. Approved August 14, 1985.
American National Standards Institute, New York, 1986.

Suggested initial USIN: ISSN.8756-0860/Z39.44-1986. Possible eventual


form US.ANSI/ANS:Z39.44-1986.

[3]
T. Berners-Lee. "Universal Resource Identifiers in WWW: A Unifying Syntax
for the Expression of Names and Addresses of Objects on the Network as
used in the World-Wide Web", RFC 1630, RFC Editor, Internet Society, June
1994. Available at URL: http://ds.internic.net/rfc/rfc1630.txt.

Suggested initial USIN: RDNS."isoc.org"/RFC:1630. Possible eventual form


I.ISOC/RFC:1630.

[4]
T. Berners-Lee, L. Masinter, M. McCahill (Eds.), "Uniform Resource
Locators", RFC 1738, RFC Editor, Internet Society, December 1994.
Available at URL: http://ds.internic.net/rfc/rfc1738.txt.

Suggested initial USIN: RDNS."isoc.org"/RFC:1738. Possible eventual form


I.ISOC/RFC:1738, where ISOC might uniquely denote the Internet Society in
a domain I of International organizations. Here, RFCs are identified in the
domain for the Internet Society, the principal sponsor of the series.
Technically, the "RFC Editor", chartered by the Internet Society, is said to be
the publisher. However, it seems clear enough that RFC will remain an
unambigous code for this series in the context of Internet Society sponsored
publications.

[5]
Robert D. Cameron. "To Link or To Copy?-Four Principles for Materials
Acquisition in Internet Electronic Libraries", Technical Report TR 94-08,
School of Computing Science, Simon Fraser University, December 1994.
Available at http://elib.cs.sfu.ca/project/papers/e-lib-links.html.

Suggested initial USIN: RDNS."sfu.ca".CMPT/TR:94-08. Possible eventual


form CA.SFU.CMPT/TR:94-08.

[6]
Robert D. Cameron. "A Universal Citation Database as a Catalyst for Reform
in Scholarly Communication", First Monday 2(4), April 1997. Available at
URL: http://www.firstmonday.dk/issues/issue2_4/cameron/index.html

Suggested initial USIN: RDNS/"firstmonday.dk":2(4)$4. Here, the article


number ( $4) is determined by counting. Eventually, the form
P.Munksgaard/FirstMonday:2(4)$4 may be used, where Munksgaard is the
code for Munksgaard International Publishers in an international publishers
domain. Another possibility is J.FirstMonday:2(4)$4 based on the concept
of a global journal domain J operated by a publisher consortium.

[7]
James H. Coombs, Allen H. Renear, and Steven J. DeRose. "Markup Systems
and the Future of Scholarly Text Processing." Communications of the ACM,
30(11), Nov. 1987, pages 933-947. Available at URL:
http://www.sil.org/sgml/coombs.html.

Suggested initial USINs: ISSN/0001-0782:30@933,


RDNS."acm.org"/CACM:30@933. Possible eventual form
S.ACM/CACM:30@933. An interesting point to note is that issue numbers are
not required for CACM prior to volume 33.

[8]
R. Daniel. "A Trivial Convention for using HTTP in URN Resolution", RFC
2169, RFC Editor, Internet Society, June 1997. Available at URL:
http://ds.internic.net/rfc/rfc2169.txt.

Suggested initial USIN: RDNS."isoc.org"/RFC:2169. Possible eventual form


I.ISOC/RFC:2169.

[9]
R. Daniel and M. Mealling. "Resolution of Uniform Resource Identifiers
using the Domain Name System", RFC 2168, RFC Editor, Internet Society,
June 1997. Available at URL: http://ds.internic.net/rfc/rfc2168.txt.

Suggested initial USIN: RDNS."isoc.org"/RFC:2168. Possible eventual form


I.ISOC/RFC:2168.

[10]
Nachum Dershowitz and Edward M. Reingold. Calendrical Calculations,
Cambridge University Press, Cambridge, UK, 1997. Suggested USINs:
ISBN/0-521-56413-1 and ISBN/0-521-56474-3. These codes use ISBNs for
the hardback and paperback versions, respectively. Choosing the code for the
hardback version as canonical may be appropriate.
[11]
DOI Foundation, "A Guide to Using Digital Object Identifiers", October 10,
1997. Archived publication unknown. Available at
http://www.doi.org/guidebook/guidebook.html.

Possible eventual USIN: S.BCS/JoDI:1(3)$1|ref(11).

[12]
Douglas C. Englebart, "Authorship Provisions in AUGMENT", Digest of
Papers - Compcon Spring 84 - Twenty-Eighth IEEE Computer Society
International Conference, San Francisco, February 27--March 1, 1984, pp.
465-472.
Initial USINs: ISBN/0-8186-0525-1@465 (paper), ISBN/0-8186-4525-3@465
(microfiche), ISBN/0-8186-8525-5@465 (casebound). Possible eventual form
I.IEEE/Compcon:28@465.

[13]
Roy T. Fielding. "Maintaining Distributed Hypertext Infostructures: Welcome
to MOMspider's Web", Computer Networks and ISDN Systems 27(2),
November 1994, Special Issue Selected Papers of the First World-Wide Web
Conference, pp. 193-204. On-line paper and software distribution available at
http://www.ics.uci.edu/WebSoft/MOMspider/.

Suggested initial USIN: ISSN/0169-7552:27@193. Possible eventual form


P.Elsevier/COMNET:27@193. Here, the code COMNET is used by Elsevier for
this journal.

[14]
Brian Green and Mark Bide. "Unique Identifiers: A Brief Introduction", Book
Industry Communication, London, 1997. Archived publication unknown.
Available at URL http://www.bic.org.uk/bic/uniquid.

Possible eventual USIN: S.BCS/JoDI:1(3)$1|ref(14).

[15]
Frank Halasz and Mayer Schwartz. "The Dexter Hypertext Reference Model",
Communications of the ACM 37(2), February 1994, pp. 30-39. Available at
URL: http://ds.internic.net/rfc/rfc2141.txt.

Suggested initial USIN: ISSN/0001-0782:37(2)@30. Possible eventual form


S.ACM/CACM:37(2)@30.

[16]
Claudia Houk McNellis. "A Serial Pattern Scheme for a Value-Based
Predictive Check-in System", Serials Review, Vol 22, No. 4, Winter 1996,
pages 1-11.

Suggested initial USIN: ISSN/0098-7913:22(4)@1,


RDNS."jaipress.com"/SR:22(4)@1. The code SR is speculative. Possible
eventual form P.JAI/SR:22(4)@1.

[17]
R. Moats. "URN Syntax", RFC 2141, RFC Editor, Internet Society, May
1997. Available at URL: http://ds.internic.net/rfc/rfc2141.txt.

Suggested initial USIN: RDNS."isoc.org"/RFC:2141. Possible eventual form


I.ISOC/RFC:2141.

[18]
P. Mockapetris, "Domain Names: Concepts and Facilities", RFC 1034, RFC
Editor, Internet Society, November, 1987. Available at URL:
http://ds.internic.net/rfc/rfc1034.txt.

Suggested initial USIN: RDNS."isoc.org"/RFC:1034. Possible eventual form


I.ISOC/RFC:1034.

[19]
National Information Standards Organization. Serial Item and Contribution
Identifier (SICI): An American National Standard Developed by the National
Information Standards Organization: Approved August 14, 1996 by the
American National Standards Institute. National Information Standards series
ANSI/NISO Z39.56-1996 (Version 2). NISO Press, Bethesda, Maryland,
1997. Available at URL: http://sunsite.Berkeley.EDU/SICI/.

This is an interesting case which is published in the National Information


Standards series (ISSN 1041-5653) of NISO. It has also been given an ISBN.
But the code Z39.56-1996 represents its numbering as an American National
Standard. Suggested initial USIN: ISSN.1041-5653/Z39.56-1996. Possible
eventual form US.ANSI/ANS:Z39.56-1996.

[20]
Theodor Holm Nelson. Literary Machines, Edition 87.1, 1987. Initial USIN:
ISBN/0-89347-055-4.
[21]
Norman Paskin. "Information Identifiers", Learned Publishing, Vol 10, No. 2,
April 1997, pages 135-156. Available at URL
http://www.elsevier.com/inca/homepage/about/infoident/Menu.shtml.

Suggested initial USIN: ISSN/0953-1513:10@135. Learned Publishing is


published by the Association of Learned and Professional Society Publishers.
On the path towards mnemonic identification, the USIN form
RDNS."alpsp.org.uk"/LP:10@135 may temporarily be used before an
international domain structure is in place. Eventually, the canonical form may
become S.ALPSP/LP:10@135 based on a domain S of scholary societies.

[22]
Fritz Schwarz and Cindy Hepfer. "Changes to the Serial Item and
Contribution Identifier and the Effects of Those on Publishers and Libraries",
The Serials Librarian 28(3/4), 1996, pp. 367-70.

Suggested initial USINs: ISSN/0361-526X:28@367 and


RDNS."haworth.com"/SL:28@367. Possible eventual form
P.Haworth/SL:28@367.

[23]
K. Sollins and L. Masinter. "Functional Requirements for Uniform Resource
Names", RFC 1737, RFC Editor, Internet Society, December 1994. Available
at URL: http://ds.internic.net/rfc/rfc1737.txt.

Suggested initial USIN: RDNS."isoc.org"/RFC:1737. Possible eventual form


I.ISOC/RFC:1737.

[24]
Jennifer Wheary and Bernard F. Schutz, "Living Reviews in Relativity:
Making an Electronic Journal Live", The Journal of Electronic Publishing.
Available at URL: http://www.press.umich.edu:80/jep/03-01/LR.html.

Suggested initial USIN: ISSN/1080-2711:3(1)$5. Possible eventual form


EDU.UMICH.PRESS/JEP:3(1)$5.

You might also like