Implementing FAST Facets in ContentDM

Jody Perkins Metadata Librarian Miami University Libraries John Millard Head of Digital Initiatives Miami University Libraries

Overview
This project explores the use of FAST facets for a ContentDM collection. Miami Stories, a collection of oral histories, will serve as a test case. A tag set based on FAST facets will be used to markup transcribed text. Tagged data will be parsed into ContentDM fields that mirror FAST facets. Terms will be edited and checked against FAST authority files. Miami Stories, a continuing oral history project focused on Miami University history, was established as a permanent program of the Miami University Libraries and operates out of the University Archives through the Department of Special Collections and Archives. The purpose of this program is threefold: 1) to initiate professionallyconducted oral history projects of relevance to the documentation of University history, 2) to store, preserve and provide access to oral history recordings and ensure that the content is maintained in an accessible format, and 3) to coordinate and support independent projects on campus that capture aspects of the University’s oral history.

Faceted Application of Subject Terminology (FAST)
From the FAST Project page at OCLC

http://www.oclc.org/research/projects/fast/

“The Library of Congress Subject Headings schema (LCSH) is by far the most commonly used and widely accepted subject vocabulary for general application. The purpose of adapting the LCSH with a simplified syntax to create FAST is to retain the very rich vocabulary of LCSH while making the schema easier to understand, control, apply, and use.” This is a complete list of FAST Facets as listed on their website. Not all facets will be relevant for all collections. FAST is still in beta and authority files for every facet are not yet complete. Application guidelines are also still in development. At present our use of FAST facets is an attempt to standardize subject level access points across collections that can easily be mapped to Dublin Core. In the future we hope to develop an automated process of checking the collected terms against FAST.

Facet Identification and ContentDM
Project Workflow
The digitizing and metadata workflow created for the project facilitates coordination of activities, as shown above, in two departments at opposite ends of the Miami University Oxford Campus.

Tagging Interface

As staff enter or type the transcript, they are asked to identify words, phrases or concepts that are mentioned in the recording. The transcript-tagging interface allows for simple identification of words or phrases that are likely candidates for FAST facets. The transcriber simply highlights the word or phrase and presses the corresponding FAST facet button. A pair of simple XML-like tags are inserted around the selected text.

The editing interface is built using the open source JavaScript Quicktags library developed by Alex King (http://alexking.org/ projects/js-quicktags) with the HTML tagset replaced with custom tags representing the FAST facets. After interviews are videotaped, project staff enter basic metadata into an online object tracking and metadata system. The system generates a printed transmission receipt which is then attached to the original recording and sent to Digital Initiatives for processing and digital conversion. After digital processing, project staff create transcripts of the recordings which are then added to the tracking system record created earlier.
Extracted facets and possible expanded terms based on context

Once a text is transcribed, the tagged facets can be extracted and become a source of semantic data for the metadata cataloger to use in the construction of valid FAST headings.

FAST Facets in a ContentDM Schema
Metadata will be gathered from a variety of sources - the digital services submission form, a marked-up transcript, automated processes, as part of a batch process or via direct keying by a metadata editor. Terms in the FAST designated fields will come directly from a tagged transcript. As a final step the parsed terms will be checked against the appropriate FAST authority file - ideally as part of an automated process. Terms not found in FAST will be parsed into separate keyword fields, retaining the same facets, and may be used later in developing local controlled vocabularies. Facets are implemented in ContentDM as individual fields in the collection schema as illustrated below. The example record to the right illustrates potential sources of field values.