Gwen Williams Professors Hendersons LIS 437 Technical Services Functions 9 May 2004 University Institutional Repositories and
a Library’s Mission The opening statement on MIT’s DSpace home page reads: DSpace is a groundbreaking digital library system to capture, store, index, preserve, and redistribute the intellectual output of a university’s research faculty in digital formats. Developed jointly by MIT Libraries and Hewlett-Packard (HP), DSpace is now freely available to research institutions world-wide as an open source system that can be customized and extended. 1 At ﬁrst glance, it could appear that DSpace should be thought to be a traditional library that has been scanned into digital format. As such, we might expect deﬁning features of the traditional library, such as storage and preservation of published (for the most part) items; cataloging records that contain descriptive information and subject access points; classiﬁcation schemes that bring the entire library into an ensemble of works through the subject-approach to publications, such as Library of Congress Classiﬁcation scheme, or SuDocs Classiﬁcation scheme, or some combination of a few classiﬁcation schemes; indexes that subordinate and collocate for patrons like desired items, by author, by uniform title, by subject heading, for example; open stacks arrangements for authorized patrons, which tend to span the entire ensemble of works in a library’s holdings (exception is possibly the rare book and special collections); and mechanisms for circulating items to authorized patrons, the holders of library cards. Based on the brief description of DSpace provided by its creators, it would appear that DSpace is much like a traditional library. Indeed, some of the aspects we typically associate with traditional libraries are present in DSpace, a university institutional repository. We can, for example, provisionally conclude that DSpace—and by extension, institutional repositories in general—does include some form of cataloging records, as it indicates
DSpace Federation Home, online 9 May 2004, http://dspace.org/index.html
the presence of indexes; it does include some mechanisms for circulating, or redistributing, resources; and it does include some mechanisms for storing and preserving digital resources. Moreover, DSpace was a joint project between MIT Libraries and HewlettPackard, obviously suggesting that to think about DSpace as a traditional library scanned into digital format is not too far from the mark. But is this really the case? Clifford Lynch advises that the institutional repository is not “a collection of journals, and should not be managed like one. . . That’s not the point of an institutional repository.”2 So if DSpace is not to be managed like a library manages a collection of published resources, such as journals, or monographs, for that matter, then what exactly is an institutional repository such as DSpace? And what roles will libraries and librarians play in the management of institutional repositories? I aim to explore these questions in this essay. I do, however, believe that it is fruitful to understand these questions about libraries, roles for librarians, institutional repositories, and the management of these newly emerging digital spaces by comparing the institutional repository with the traditional library. The differences that such a comparison would make evident could prove useful for addressing the most important question for any librarian or future librarian to think about: what roles will libraries and librarians play in the management of institutional repositories? Features of institutional repositories that are library-like features There seem to be three main features of DSpace that exhibit traditional library-like features: cataloging records of some kind, associated with speciﬁc digital resources, that enable indexing; some mechanisms of circulating, or redistributing, resources; and some mechanisms for storing and preserving digital resources. With respect to the mechanisms for circulating, or redistributing, resources, and with the mechanisms for storing and preserving digital resources, Lynch suggests that these two traditional library concerns are the very reasons for the creation of institutional repository software, such as DSpace. He writes,
Lynch, Clifford, “Institutional Repositories: Essential Infrastructure For Scholarship in The Digital Age,” portal: Libraries and the Academy, vol. 3, no. 2 (2003): 333.
In my view, a university-based institutional repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution.3 As such, the very point of building and maintaining institutional repositories would be to preserve and provide access to a breadth of resources produced by institutionally-sanctioned members. This seems, to me, to be a succinct and useful way to summarize the very point of building and maintaining traditional libraries; with one caveat, however, traditional libraries do not limit their collections to those resources produced exclusively by their home institution, in this case, to the university to which any given academic library is subordinated to. But, if one looks at the published authors in any given academic library—especially the authors of those resources most particular and most important to academic libraries—one could say that the majority of these authors of monographs and journal articles are, in fact, institutionally-sanctioned scholars, researchers, teachers, and administrators by virtue of the organization of university faculty and its corresponding publishing regime across the various disciplines. In short, the providing of stewardship over intellectual resources and managing the mechanisms for distributing such resources are not new to libraries and librarians. The “DSpace Internal Reference Speciﬁcation—Functionality” manual indicates that this particular institutional repository software includes mechanisms for browsing and searching author, subject, and title indexes, which of course suggests that a catalog record, or a catalog-like record has been somehow associated with each particular digital resource.4 I conclude that catalog records, or catalog-like records, exist because of how indexes are built, that is to say, indexes are essentially lists, generally alphabetically arranged. Machine3 4
Ibid, pg. 328.
“DSpace Internal Reference Speciﬁcation—Funcationality,” version 2002-03-01, online 9 May 2004, http://dspace.org/technology/features.html
indexes associated with web search engines, for example, that point toward the full text (retrieval based on keyword searching) have been ﬁrstly arranged an alphabetical listing of words into indexes, which point toward web resources that contain the particular words searched. These indexes could be built on the ﬂy and discarded, or stored in index ﬁles, which seems more likely. But as this paper is not a paper on the simplicities of natural language and full text search and retrieval, I should say that based on this rudimentary functioning of indexes, I can conclude two things. One, isolated record ﬁelds such as “author,” “title,” and “subject” must exist somewhere in DSpace because DSpace is able to create indexes to such ﬁelds. Two, such isolated record ﬁelds must exist within a catalog record, or catalog-like record, somewhere in the institutional repository. As it turns out, the catalog-like records associated with each particular DSpace stored digital resource do, indeed, use a data structure encoding scheme that has similar components as the MARC format encoding scheme used to structure traditional library catalog records: “The baseline metadata requested for each submitted item [digital resource] is based upon the qualiﬁed Dublin Core Metadata Scheme, adapted to DSpace requirements by MIT Libraries.”5 I would emphasize similar components of traditional library catalog records, for, as we know, the Dublin Core Metadata Scheme is not the equivalent replacement for the MARC format. Rather, there exist some shared ﬁelds between the two encoding formats: ﬁelds such as “author,” “title,” “alternative titles,” “date of issue [or ‘publication’],” “ISBN, ISSN, if applicable,” “subject words,” and further description of the entity [digital resource or monograph].6 Likewise, similar software as DSpace software for building digital collections, or digital libraries, such as Greenstone software, generally provides mechanisms for the establishment of catalog-like records. And oftentimes provides mechanisms for the creation of catalog-like records using the Dublin Core Metadata Scheme and its elements, or designated ﬁelds, if you will. In other words, institutional repository software seems to generally include an encoding structure for the creation of catalog-like records—in particular, for the creation of catalog-like records that
contain similar elements as the ISBD eight elements for descriptive cataloging. Features of libraries that are not integral to institutional repositories A perusal of DSpace and its functionality manual, as well as a perusal of a university-wide forum, such as The Chronicle of Higher Education, suggests that there are several features of traditional libraries that seem to not be very integral to the establishment and management of institutional repositories like MIT’s DSpace. Broadly speaking, these library features can be considered in four categories. One, key aspects of the library catalog record and library catalog. Two, responsibility for acquisition and collection management of resources. Three, open access across all library collections and holdings for patrons. And four, top managerial responsibility. I will consider these four categories in order. Fundamental aspects of the library catalog record and library catalog are not part of MIT’s DSpace. For example, DSpace does not utilize classiﬁcation schemes to provide browsing structures for patrons vis-à-vis a subject approach to resources. In fact, the functionality manual does not even address the issue of whether or not resources should be classiﬁed in some manner—although classiﬁcation is implied through its functional features of the creation of communities, collections, and authorized users (more on this in the next section). Another key feature of the library catalog record and library catalog that does not appear in DSpace is what I would call a controlled- and authority-based indexing capability. Let me explain. Above I indicated that DSpace has the capability to build indexes on record ﬁelds suggest as “author,” “title,” and “subject”: This is true. However, the Dublin Core Metadata Scheme adapted by DSpace is missing two crucial aspects for enabling a controlled- and authority-based indexing capability. That is, while author is a possible DSpace record ﬁeld, it is not a required record ﬁeld; moreover, there is “currently no authority control for authors (i.e. DSpace does not currently know that “Samuel Clemens” and “Mark Twain” are the same author, nor does it distinguish well between two authors that share the same name).”7 In addition, while subject is a possible DSpace record ﬁeld, it is not a
required record ﬁeld; moreover, there is “currently no thesauri or authority control for subject keywords,” much less grand controlled vocabulary schemas such LCSH, or Sears. 8 It would be interesting to see logs on precision and recall in DSpace—in other words, it will prove interesting when persons start contemplating and researching the relevance9 of resources retrieved, absent what appears to me to be any sound sort of indexing capability (a controlled- and authority-based indexing capability). Related to the lack of classiﬁcation schemes and capability for controlled- and authority-based indexing is the second broad way that traditional library features differ from institutional repositories features such as DSpace features. That is, the responsibility for the acquisition and collection management of digital resources in DSpace differs dramatically from the acquisition and collection management functions as performed in libraries. In libraries, librarians and library workers are responsible for coordinating the acquiring and managing the library collections—with of course, input from university faculty. In DSPace, and in institutional repositories in general, the responsibility for acquiring and collecting resources—or “capturing” resources—resides with university faculty and university personnel outside of the library. That is to say, the university-community-centered aspect to institutional repositories places the acquisition and collection management of digital resources in the hands of the authors themselves. Lynch writes, a faculty member, must exercise stewardship over the actual content and its metadata: migrating the content to new formats as they evolve over time, creating metadata describing the content, and ensuring the metadata is available in the appropriate schemas and formats and through appropriate protocol interfaces such as open archives metadata harvesting.10 When one considers where the acquisition and collection management functions are placed with respect to developing the institutional
And by relevance, I mean to suggest what Don Swanson and Patrick Wilson might suggest relevance is: that is, subject-based relevance, or knowledge-based relevance as the searcher deﬁnes it.
Lynch, “Institutional. . .,” pg. 330.
repository, the very ﬂexibility of the Dublin Core Metadata Scheme and its lack of controlled vocabularies, authority ﬁles, and classiﬁcation schemes becomes somewhat more understandable. Academic libraries tend toward providing open access across all library collections and holdings, for authorized patrons. I am not sure what percentage of academic libraries have partially closed stacks areas such as UIUC’s Main Bookstacks, but even given the “closed stacks” that may be present, one can, I think, safely assume that access for authorized patrons (university community of faculty, staff, and students) is somehow provided across all library collections and holdings. Institutional repositories such as DSpace operate on a very fundamental difference with respect to access—and to access across all collections within an institution’s repository. That is, DSpace places the control over access within the domain of whomever is authorized to establish a community of collections and authorized users; and particular collections within particular communities. In other words, similar to the course management tools such as WebCT and Blackboard, a faculty member can authorize the list of persons that have exclusive access to the resources collected and managed. This implies something very different than the library providing access across the whole library’s collections. This suggests that DSpace can be thought to be a space which is in fact multiple spaces—or many locked rooms with a limited number of keys to any given room, dispensed as the faculty member charged with managing the room determines. Finally, the last major difference between traditional libraries and institutional repositories concerns top managerial responsibility for each type of space. That is, nowhere in the literature have I found that a University Librarian sits atop the managerial chain of an institutional repository. Indeed, as Lynch points outs, While operational responsibility for these services [an institutional repository] may reasonably be situated in different organizational units at different universities, a effective institutional repository of necessity represents a collaboration among librarians, information technologists, archives and records managers, faculty, and university administration and policymakers.11
Ibid, pg. 328.
Lest librarians and future librarians be alarmed by this, I should say that any look at the literature that discusses what kinds of resources could potentially be stored within an institutional repository such as DSpace should indicate that a vast number of potential resources are kinds of resources that libraries currently do not manage: administrative records such as payroll and student transcripts; teaching resources such as syllabi, handouts, assignment descriptions, and tests; pre-publication drafts of scholarly research; databases of primary research; department communications about curriculum, course offerings, committee meetings, and teaching symposiums; student work including papers, lab reports, and artistic creations; documents and schedules pertaining to the coordination of K-12 pre-service teacher training with local K-12 schools; personal collections of papers, emails, letters, proposals, and what-not by individual faculty members; and so on. Features of institutional repositories that are beyond the traditional library In addition to the vast number of potential institutional repository resources mentioned above, there are other key features of institutional repositories that are beyond the managerial domain of the traditional library. Two of the most crucial differences are the use of archiving techniques that most librarians do not use in their daily work; and the community-centered, or user-centered if you like, control over deﬁning and managing communities of collections and users. The Dublin Core Metadata Scheme for DSpace includes the element called, “Series Name and Report Number.” Moreover, as the functionality manual indicates, searchers of institutional repository collections can access the resources by date the items were placed within the repository: this seems to me to be a searching and browsing capability that DSpace enables community members to perform that traditional library catalogs and tools do not. Also, the submission of resources to the institutional repository includes archival concepts for organizing archival resources: the inclusion of provenance information and serialization of resources. These seem fundamental and important components of a institutional repository of unpublished resources generated by various university members. For these archival concepts and functional features enable something that traditional organization of resources in libraries do not:
DSpace offers history functionality to provide an audit trail of the administration of the archive, to provide data supporting root-cause analysis, and to support human-moderated rollbacks.12 Secondly, to reiterate somewhat what I have previously stated, institutional repositories and their collections are determined by university community members beyond the library. In other words, institutional repository software such as DSpace does not require, nor was it intended to require, the mediation of librarians with respect to deﬁning and managing the collections of resources. Communities of university-sanctioned users are responsible for determining and managing their particular slices, or rooms and keys, of DSpace. Subordinate to every community would be the collections associated with each community. Authorization to access any given collections or any given community of collections resides with each community administrator—in theory, this could mean every member of the teaching and research faculty. An institutional repository is not a traditional library manifested in the digital I believe that I have sufﬁciently demonstrated that an institutional repository, that can be built from software such as MIT’s DSpace, is not a traditional library that has been simply digitized. An institutional repository is something much larger than a traditional library that is manifested in the digital. While it is useful to compare features of the institutional repository to features of the traditional library, it does not seem wise to conceptually equate the two. In other words, it does not appear wise to conceive of the institutional repository as simply the traditional print-bound library gone digital. But Lynch tells us as much when he suggests that institutional repositories developed in institutions of higher education will probably move into the broader social realm: University institutional repositories have some very interesting and unexplored extensions to what we might think of as community or public repositories; this may in fact be another case of a concept developed within higher
“DSpace Internal. . .”
education moving more broadly into our society. Public libraries might join forces with local government, local historical societies, local museums and archives, and members of their local communities to establish community repositories. Public broadcasting might also have a role here.13 Implications for libraries and librarians: a question of university-wide organization and the mission of libraries So what roles will libraries and librarians play in the management of institutional repositories? I believe, and I have written extensively on this issue in my LIS 437 Think Piece assignment, that it comes down to considering the library’s mission as being characterized mostly by stewardship, which includes the providing of service. That is, it seems rather clear to me that the mission of university libraries is subordinated to the larger university-wide organization and its mission. That is, libraries play a stewardship role for universities, providing service and access to resources for university members. The conception and functional features of institutional repositories, such as DSpace, seem to emphasize this subordinated, but of course, quite crucial mission of libraries and librarians. And of course, there are opportunities for librarians to collaborate with university faculty in devising library-like features that could improve any community’s organization of collections, such as various thesauri, classiﬁcation schemes of some kind, and name-authority ﬁles of some fashion. Librarians could even play a role in facilitating searching and browsing across communities and collections, should community managers so desire. Moreover, I think it would be very librarian-like to do so.
Lynch, “Institutional. . .,” pg. 336.
Carlson, Scott, “Cornell Tries a New Publishing Model,” Chronicle of Higher Education, (5 March 2004): A29. ____, “Penn State Program to Allow Sharing of Course Materials and Research Data,” Chronicle of Higher Education, (312 October 2003): A32. ____, “The Uncertain Fate of Scholarly Artifacts in a Digital Age,” Chronicle of Higher Education, (30 January 2004): A25-A27. Carnevale, Dan, “Colleges are Relieved as PeopleSoft Rejects Latest Oracle Takeover,” Chronicle of Higher Education, (20 February 2004): A30. ____, “A New Technology Lets Colleges Spread Information to People Who Want It,” Chronicle of Higher Education, (13 February 2004): A31-A32. “DSpace Internal Reference Speciﬁcation—Functionality,” version 2002-03-01, online 9 May 2004, http://dspace.org/technology/features.html DSpace Federation Home, online 9 May 2004, http://dspace.org/index.html Vincent Kiernan, “Company to Track Citations of Online Scholarship,” Chronicle of Higher Education, (19 March 2004): A31. ____, “Killing Bytes, Not Trees,” Chronicle of Higher Education, (9 April 2004): A31-A33. Lynch, Clifford, “Institutional Repositories: Essential Infrastructure For Scholarship in The Digital Age,” portal: Libraries and the Academy, vol. 3, no. 2 (2003): 327-336. ____, Interview with Clifford Lynch, Ubiquity, vol. 4, no. 23 (July 30 - August 5, 2003) Marcum, Deanna. “Requirements for the Future Digital Library.” Address to the Elsevier Digital Libraries Symposium, Pennsylvania: 25 January 2003. Milstead, Jessica and Susan Feldman, “Metadata: Cataloging by Any Other Name,” Online, January 1999.
Read, Brock, “New Digital Library Offers Alternative to Slides,” Chronicle of Higher Education, (16 April, 2004): A34. ____, “Planning With Pixels, Not Pencils,” Chronicle of Higher Education, (14 November 2003): A29. ____, “Science Library Stages Avant-Garde Plays, One Viewer at a Time,” Chronicle of Higher Education, (28 November 2003): A35. Short, Edmund C., “Knowledge and the Educational Purposes of Higher Education: Implications for the Design of a Classiﬁcation Scheme,” Cataloging and Classiﬁcation Quarterly, vol. 19, no. 3/4 (1995): 59-66. Unsworth, John M., “The Next Wave: Liberation Technology,” The Chronicle Review, (23 January 2004): B16-B20. Vest, Charles M., “Why MIT Decided to Give Away All Its Course Materials via the Internet,” The Chronicle Review, (23 January 2004): B20-B21. Young, Jeffrey R., “Google Tests Search Engine for Colleges’ Scholarly Materials,” The Chronicle of Higher Education, vol. L., no. 33 (23 April 2004): A36. ___, “Will Colleges Miss the Next Big Thing?,” The Chronicle of Higher Education, vol. L., no. 33 (23 April 2004): A35-A36.