Scholar-Friendly DOI Suffixes with JACC: Journal Article Citation Convention

Robert D. Cameron School of Computing Science Simon Fraser University CMPT TR 1998-08 March, 1998 Copyright 1998, Robert D. Cameron. See the Copyright section.

Contents
Abstract I. Introduction II. JACC: The Basics III. Multiple Articles Per Page IV. Articles in Unpaginated Electronic Journals V. Conclusion

Abstract
JACC (Journal Article Citation Convention) is proposed as an alternative to SICI (Serial Item and Contribution Identifier) as a convention for specifying journal articles in DOI (Digital Object Identifier) suffixes. JACC is intended to provide a very simple tool for scholars to easily create Web links to DOIs and to also support interoperability between legacy article citation systems and DOI-based services. The simplicity of JACC in comparison to SICI should be a boon both to the scholar and to the implementor of DOI responders.

I. Introduction
The Digital Object Identifier (DOI) system is a developing new standard for the globally unique identification of published digital content based on a publisheroriented model. The first part or prefix of a DOI identifies a content publisher using a numeric code assigned by a central registry, while the second part or suffix of a

DOI identifies a particular content item using a publisher-specified string [DOI]. To avoid unnecessary limitation on possible applications of the DOI as well as to allow publishers maximum flexibility in developing their own DOI processing systems, very few restrictions have initially been placed on the nature of the publisherspecified string. In this regard, the DOI suffix deliberately follows the "simple" or "dumb" identifier model, as opposed to the "compound" or "intelligent" identifier model that would impose a highly-structured coding system for content identification [Paskin][GreenBide]. A significant drawback of the simple suffix model of the DOI is that it greatly limits interoperability between newly deployed DOI systems and legacy systems that identify content in some other way. In recognition of this point, there is a developing convention that the suffix of a DOI may be a compound string consisting of the name of an existing identification standard in square brackets (for example, [ISBN] for International Standard Book Number or [SICI] for Serial Item Contribution Identifier) followed by a legal identifier of that standard [Unicorn]. In the important area of journal article citation, however, the use of the existing SICI standard in the DOI suffix may be costly and of limited value. In general, SICI uses a complex coding scheme involving ISSN, chronology, enumeration, pagination, title code, derivative part identifier, medium format identifier, standard version and check digit. Coding for chronology and title code is particularly complex and error-prone in a variety of special cases. Depending on the available information, many different (correct and incorrect) SICIs could potentially be generated for any particular article. SICIs may also be ambiguous, at an estimated rate of one duplication per million assigned SICIs in the 1996 version of the standard [SICI]. Compounding these difficulties is the fact that SICI has failed to achieve widespread use in article citation applications generally. Hence, even assuming that a SICI-based DOI suffix is implemented, the interoperability achieved thereby may be quite limited. At the same time, there is another notion of interoperability that ought to be considered, namely the ability for researchers, students, bibliographers and others to easily create World-Wide Web links to articles. In this regard, one might consider anyone who is citing a work to be acting in the role of a scholar and hence the greatest interoperability might be achieved by citation conventions that are "scholarfriendly". The notion of scholar-friendliness as a requirement for document identifiers has been considered at length in the proposal for Universal Serial Item Names [USIN]. In essence, the argument is that scholars will prefer short, mnemonic identifiers that are easy to generate from standard bibliographic information and that employ a common and conventional syntax. The Link Manager of the American Physical Society is an example of one publisher implicitly recognizing the value of a scholar-friendly approach to linking [APS-LM]. Such scholar-friendly linking conventions may be of direct benefit to publishers in a number of ways. First of all, direct linking to DOI response pages from web pages

of various kinds may be a valuable form of free advertising. Secondly, many publishers are considering the incorporation of active citations as an important value-added feature in the electronic versions of papers [Hunter]. In this regard, the ability to create links in a simple, scholar-friendly manner may also reduce publisher costs. Thirdly, publisher efforts to support the needs of scholars in linking activities may have a positive and commercially valuable effect on image or prestige. Finally, in the implementation of DOI response technology, a simple scholar-friendly convention for DOI suffixes may be far less expensive than an alternative based on SICI. The purpose of this paper then is to propose Journal Article Citation Convention (JACC) as a limited convention for citation of journal articles in a DOI suffix. It is limited in the sense that it is proposed specifically as a convention that applies in the most common cases, but not as standard with all the complexity necessary for a universal citation identifier scheme. In this regard, JACC is designed to be a handy tool that a publisher may use if it does the job. If not, the publisher still has the freedom to implement the DOI suffix using any other scheme, conventional or not. In accepting limits on its applicability, JACC retains a simplicity that maximizes scholar-friendliness and minimizes implementation difficulties for a preponderance of journals. Section II of this paper introduces the basic JACC structures for identification of legacy articles by page of occurrence in a print journal. Section III addresses the primary problem with page-based identification: the possibility of ambiguity when more than one article starts on a page. The JACC notation for publications in unpaginated electronic journals is discussed in section IV. Section V concludes.

II. JACC: The Basics
The initial focus of JACC is identification of an article by its appearance in a print publication. This is perhaps ironic in support of "digital object" identification, but is almost universally appropriate for legacy objects and will continue to be applicable for a great many newly published objects that are either published in both print and digital form or are published in a digital form that retains the concept of pagination. The following syntax represents the basic JACC method of article identification for journals paginated by volume.
[JACC]<journal-code>:<volume>@<page>

Here <journal-code> may either be a journal ISSN or a mnemonic code for the journal specified by the publisher, <volume> is the volume number in which the article appears and <page> specifies the first page number of the article. Consider, for example, citation of the article "A Behavioral Notion of Subtyping" by Barbara H. Liskov and Jeannette M. Wing appearing in ACM Transactions on Programming Languages and Systems, volume 16, number 6, (November 1994),

pages 1811-1841. This journal is widely known in the computing science community by the acronym TOPLAS, and that acronym uniquely denotes this journal in the space of ACM publications. As publisher, the ACM might then designate that TOPLAS is to be used as the preferred <journal-code>. In this event, the full DOI suffix for this citation would be [JACC]TOPLAS:16@1811. This citation is scholar-friendly in several ways. It uses the commonly known mnemonic for the journal and the mnemonic value of the at-sign to indicate the page number at which the article starts. It is brief, avoiding any redundant information entry. It uses the publication numbering that uniquely specifies this article and that would be found in any correct bibliographic citation of the article. Although it is recommended that publishers specify mnemonic codes for journals wherever possible, the JACC convention is that journal specification by ISSN is always acceptable. For example, the ACM should also accept [JACC]01640925:16@1811 as a DOI suffix equivalent to the more mnemomic form above. The use of the ISSN in this way should permit almost universal interoperability with legacy bibliographic database systems. Of course, if the publisher does not define a mnemonic code for the journal, then the ISSN must be used. In the event of print publications that are paginated by issue instead of by volume, JACC uses issue numbers in parentheses.
[JACC]<journal-code>:<volume>(<issue>)@<page> Parentheses have mnemonic value for designating issue numbers because of their use in some common bibliographic styles. Furthermore, the grouping action of parentheses is beneficial to avoid misinterpretation of combined enumeration. For example, the DOI suffix [JACC]SL:32(3/4)@17 might be used by Haworth for the article "Govzines on the Web: A Preachment" by Joe Morehead, appearing in Serials Librarian, volume 29, nos. 3/4, 1997, pp. 17-30.

In the case of journals paginated by volume, JACC allows inclusion of issue numbers, but does not require it. Scholars may well specify issue numbers even in cases where they are not required. Indeed, it may not always be clear from a bibliographic citation whether the issue number is required or not. Accepting but not requiring issue numbers in this case should maximize both scholar-friendliness and interoperability with legacy systems. In this initial version, JACC is deliberately focussed on those serials which follow a conventional two-level (volume, issue) enumeration structure. The premise is that a simple-to-implement system that is widely applicable will result in the fastest possible deployment of scholar-friendly DOI responder systems. Furthermore, the goal is to freeze the syntax for two-level serials so that publishers may confidently deploy responders without fear of future incompatibility problems. Ultimately, compatible conventions for serials with more complex and/or less conventional enumeration structures should be developed. However, for those that follow the conventional two-level structure, the burden of implementation of more complex

DOI responder technologies can be avoided.

III. Multiple Articles Per Page
The issue of identification ambiguity that arises when two or more articles start on a single page is a small but thorny one. The SICI standard addresses this problem through the use of a title code, consisting of the initial letter of each of the first six title words. However, complexities arise in the presence of punctuation, foreign character sets and special symbols. The rules can be easily misinterpreted resulting in incorrect SICI codes. Furthermore, the use of a title code does not actually solve the problem in all cases; consider the possibility of two articles on the same page both with a single title word starting with the same letter. Under JACC, distinction between multiple articles on a page is handled by the simple device of appending a lower-case alphabetic designation, a to denote the first article on the page, b for the second article, and so on. For example, following this convention, the Association for Computing Machniery could specify the two DOI suffixes [JACC]CACM:38(1)@43a and [JACC]CACM:38(1)@43b, respectively, for the two short articles "Women and Computing in the UK" by Alison Adam and "Announcing a New Resource: The WCAR List" by Laura L. Downey, both appearing on page 43 of Communications of the ACM, volume 38, number 1 (January 1995). For completeness, the remote possibility of more than 26 articles per page is accomodated under JACC by using the specifications aa for the 27th article, ab for the 28th article, aaa for the 677th article, and so on. Furthermore, in order to have a uniform rule for enumerating articles on a page, JACC specifies column-major order, first enumerating all articles top-to-bottom in column 1, then moving on to column two and so on. In essence, the use of the alphabetic designation under JACC is equivalent to the use of sequence numbers in the APS Link Manager [APS-LM]. The principal problem with disambiguation of articles within a page by sequential position is that standard bibliographic citations do not provide the information necessary to determine article count within a page. This problem should be considered from the two perspectives of its effect on link construction by scholars and interoperability concerns for legacy bibliographic systems. In linking to an article by volume and page number, a scholar will not normally know when an alphabetic suffix is needed and hence will typically omit it in the first instance. However, a DOI responder can provide almost full functionality in this case by simply returning a response page that lists all of the articles that begin on the cited journal page. If the scholar checks the action of the link, the ambiguity can be caught and resolved. If not, when a reader of the scholar's work traverses the link, they can nevertheless be given a useful response page which leads to the desired article with only one step of indirection.

When a citation from a legacy bibliographic system is converted to a DOI using the JACC notation, it will also be typical that any required alphabetic suffix is omitted. However, if this conversion occurs in an on-line article access system, then it imposes at worst one additional step of indirection as above. If the conversion is part of an automatic document delivery system, a reasonable response may be to simply return the set of all articles starting on that page. Other applications may follow similar approaches, interpreting the DOI either as an incomplete article specifier that requires interactive resolution or as an article set specifier for all the articles starting on that page. In this regard, the most important thing to avoid is the errors that may be caused by making an inappropriate default assumption, for example, that an omitted suffix by default is interpreted to designate the first article on the given page.

IV. Articles in Unpaginated Electronic Journals
In the present period of experimentation with a variety of formats for e-journal publication, it may be somewhat risky to propose a "convention" for citation of earticles. Nevertheless, JACC does include a notation for e-journals based on the observation that e-journals commonly have "contents pages" that list articles either by volume or by issue. Thus, the JACC notation uses sequential article number in the contents page as the basis of identification.
[JACC]<journal-code>:<volume>$<article-number> [JACC]<journal-code>:<volume>(<issue>) $<article-number>

For example, the University of Michigan Press might choose to use the DOI suffix [JACC]JEP:3(2)$4 to denote the fourth article on the "contents page" of Volume 3, Number 2 of the Journal of Electronic Publishing (http://www.press.umich.edu/jep/03-02/), namely the article "Solving the Dilemma of Copyright Protection Online" by Bill Rosenblatt. Although the JACC choice of the dollar sign ( $) to denote <article-number> is somewhat arbitrary, one can expect that scholars will learn to mnemonically associate this symbol with the concept of numbering from the contents page. For mnemonic value, the number sign ( #) might have been a better initial choice, but this symbol requires a rather scholar-unfriendly encoding when DOIs are used in URLs. From a publisher's perspective, adoption of the JACC e-journal notation ought to be a fairly low-cost solution. The major implication is that journal contents pages need to be maintained in a stable and well-organized fashion. Article numbers should either be shown directly on the page or the page should be formatted so that counting is straightforward. If a publisher chooses to incrementally update contents pages as new articles are published, the JACC notation can still be used so long as new articles are always added at the end of the list. The principal benefits in adopting the JACC convention will be ease of implementation of DOI responders and the intangibles that flow form supporting the linking activities of scholars.

From the scholar's point of view, article identification by contents-page count will be an initially unfamiliar concept. Other schemes, such as article identification by author surname, might be considered more mnemonic and are already being used by some e-journals in URL construction. However, these schemes vary by journal and always leave open potential problems of ambiguity. The JACC viewpoint is that the most scholar-friendly approach is the one that requires scholars to learn only a single identification concept for the great majority of cases.

V. Conclusion
JACC is a simple tool proposed to help publishers create scholar-friendly linking support in their DOI suffix systems. Although simple, the conventions of JACC are applicable to virtually all (print or electronic) journals that use standard volume or (volume,issue) based enumeration. The simplicity of JACC, particularly in comparison to the SICI alternative, should be a boon both to publishers in the ease of implementation of DOI responders and to scholars in the ease of learning and remembering the linking conventions. Further development of JACC may be contemplated along the lines proposed by the Universal Serial Item Name (USIN) scheme [USIN], of which JACC is essentially a small, fixed subset. However, those developments should be carried out in an upwards-compatible fashion so that no additional implementation burdens are placed on publishers who adopt this version of JACC.

Copyright
Copyright 1998, Robert D. Cameron. Permission to copy for individual use is permitted. Multiple copies may be made for use in classrooms, discussion groups, or committee meetings, provided that notice of the intent and extent of the copying is sent to the author (e-mail is satisfactory). Reproduction or republication in any general distribution medium is not permitted without explicit consent of the author. All copying requires that the integrity of the paper be preserved and that this copyright notice be reproduced in full.

References
[DOI] DOI Foundation, "A Guide to Using Digital Object Identifiers", October 10, 1997. URL: http://www.doi.org/guidebook/guidebook.html. [Paskin] Norman Paskin. "Information Identifiers", Learned Publishing, Vol 10, No. 2, April 1997, pages 135-156. URL: http://www.elsevier.com/inca/homepage/about/infoident/Menu.shtml. [GreenBide]

Brian Green and Mark Bide. "Unique Identifiers: A Brief Introduction", Book Industry Communication, London, 1997. URL: http://www.bic.org.uk/bic/uniquid. [Unicorn] Mark Bide, "In Search of the Unicorn: The Digital Object Identifier from a User Perspective", British National Bibliography Research Fund Report, Book Industry Communication, London November, 1997. URL: http://www.bic.org.uk/bic/uncorn2.pdf. [SICI] National Information Standards Organization. Serial Item and Contribution Identifier (SICI): An American National Standard Developed by the National Information Standards Organization: Approved August 14, 1996 by the American National Standards Institute. National Information Standards series ANSI/NISO Z39.56-1996 (Version 2). NISO Press, Bethesda, Maryland, 1997. URL: http://sunsite.Berkeley.EDU/SICI/. [USIN] Robert D. Cameron. "Towards Universal Serial Item Names", Technical Report TR 97-16, School of Computing Science, Simon Fraser University, December 3, 1997. URL: http://elib.cs.sfu.ca/USIN/USIN.html. [APS-LM] American Physical Society, "Frequently Asked Questions about the APS link manager", 1998. URL: http://publish.aps.org/linkfaq.html [Hunter] Karen Hunter, "Adding Value by Adding Links", Journal of Electronic Publishing, Volume 3, Number 3, 1998. URL: http://www.press.umich.edu/jep/03-03/hunter.html