You are on page 1of 5

Adrian Salas

MIAS 298

Basic User Functionality: Getty Research Institute DAM System


The Getty Research Institute (GRI) is a large research library specializing in the
multiple facets of research and exchange that can be linked to art history. At its core the
GRIs most obvious assets are it hundreds of thousands of books, periodicals, and
collections of paper based archives. The GRI, much like the Getty Museum it is tied to, is
also an ever evolving collecting institution which is concentrated on expanding its
holdings to take in collections of items relevant to study of the art world but which falls
outside the traditional library scope of the bound book. As such, audio-visual material is
becoming an important section of the collection as is the management of the digital assets
which this collection will necessarily create in the course of its integration into the GRI.
These collections can take the shape of audio, video, and still images. As most of this
material is also rare or unique to their particular collections, the GRI needs to be
concerned with several responsibilities to the material outside of just housing physical
assets. Cataloging is important to the organization, discovery, tracking, and accessibility
of the various items. The descriptive metadata generated by cataloging plays right into the
next important responsibility of the institute which is preservation. This is where the
rubber hits the road so to speak, because much of the video and audio the Getty acquires
is on formats, such as inch Umatic video tape, that are well into their limited life spans.
Digital conversion is necessary to save the content of these assets once their container
format hits the end of its serviceable life. After the initial digitization, the continued
maintenance and tracking of these assets becomes a paramount concern. In addition,
providing accessibility for the users of these collections, both internal to the organization
and external, will be an important concern.
1. One of the first steps in establishing the user requirements for a digital asset
management system relevant to the needs of the Getty Research Institute is to identify the
user base for said system. The primary users of the back end of the system will be the
digital services department. This is the department which is responsible for creating and
managing digital content from the Getty Research Institutes holdings. This can involve
photographing objects and images to create digital surrogates, digitizing video and audio,
and managing born digital files and documents which come into the collection. Once
these files are acquired, the department then is also in charge of digital preservation and
creating access versions. Furthermore, other departments would also need to be able to
look at the back end. Technical services staff such as catalogers may need access to the
descriptive metadata in the files, and reference staff would also need to be able to access
rights metadata. The GRI is also becoming increasingly invested in participating in
collaborative online initiatives to showcase collections. This means that there are now
metadata specialists on staff who wrangle metadata and as such may be need to be able to
look at file naming structures and technical technical and descriptive metadata to make
decisions on how to best map internal information to outside systems such as the Getty
Research Portal and OCLC.
On the front end most of the users will encounter the content managed through the
Salas 1

filter of the GRIs discovery system Primo which runs off the Ex Libris Alma platform.
These users will be researchers utilizing the GRIs physical location, but also staff across
the Gettys two campuses. Furthermore, more restricted access to content will need to be
facilitated for off site researchers exploring the Gettys holdings. Most front end users
primary need in interfacing with the system will be to search catalog records and
determine what items have accessible digital surrogates (most of which will only be
accessible on the Gettys premises).
2 &3 Formats currently held are:
Image/TIFF - 272,913
Image/JPEG - 184,706
PDF - 2964
Audio/MPEG - 2,840
Audio/WAV - 2672
Video/MP4 - 604
Video/MXF - 130
Video/MPEG - 19
Audio/MIDI - 16
Video/Quicktime - 14
There are a total of 466,935 items currently in the GRIs Rosetta repository. There
are a very few random items not listed above which make up the remainder of the
collection, such as word and powerpoint documents.
There is a breakdown of storage needs that goes as follows:
Below 1Mb - 79,173 files
Between 1.01Mb to 5Mb - 29,416 files
Between 5.01Mb to 10Mb - 2,024 files
Between 10.01Mb to 50Mb - 115,792 files
Between 50.01Mb to 100Mb - 159,873 files
Between 100.01 Mb to 1000Mb - 79,538 files
Above 1000Mb - 1,427 files
If one is to make a rough conservative estimate based on these ranges and calculate
storage needs that would equal 18,654,945 Mb (18,218 Gb or 17.8 Tb). The Getty is
relatively new to making AV a large part of its collection, but by the same token, it has
snapped up large collections as it can. It is also constantly ramping up digitizing
initiatives for its collections as a whole. As the Long Beach collection making up the bulk
of the GRIs video collection is a little under 10 years old, it is not unreasonable to expect
that digital storage needs may double in in the next 5 years, as more video come online
due to internal mandates to ramp up digital conversion efforts, increased hiring of
dedicated digitization staff, and more efficient workflows. Furthermore the Research
Institute continues to acquire collections. A rough tripling of storage to a capacity of
around 54 Tb in 5 years does not seem an unreasonable goal to allow the collection to
Salas 2

grow.
The video content held by the GRI is primarily artist videos from various collections
such as the Long Beach Museum of Arts video collection, and the Kitchen in New York
City. There are also a smaller but still notable number of videos recorded of various
events sponsored by the Getty and related recordings such as artist interviews. Audio
mainly consists of artist interviews and oral histories. The TIFFs and JPEGs that go in the
system are complete objects such as manuscript and print collections. One off objects
such as digitized photos will go into another system such as TEAMS.
4. There are file naming standards in place. There are 4-5 standards for A/V material
depending on the format, and in once case the specific collection(Long Beach Museum of
Art video). In general the main rules that all the standards share is to remove all spaces
and symbols in names and replace them with underscores(_). Furthermore all alphabetical
characters are normalized to lower case. The general convention for the physical assets is
to first establish an objects root based on who has the holdings, then to enter either the
items accession number or unique ID number that is assigned at cataloging, and finally an
alphabetical identifier that establishes format. Here is an example culled from the
guidelines: gri_990074_v01 (for a full video). For METS records, the file naming
is based on identifying the file as METS after the accession or identification number of
the original object. For example: 2248_246_mets.xml.
5. The Metadata standard used for AV material is METS. Dublin Core is the
descriptive metadata standard. Metadata for objects should imported from existing
MARC or EAD catalog records when they exist, but due to the backlog in video
processing, much of the A/V has to be cataloged on the fly as it is processed. Every entry
into the system will have cataloging first before it is digitized though. Much of the
descriptive metadatas controlled language ideally would be pulled from the Gettys many
vocabularies for the arts. As for technical metadata, it should be extracted from the files.
Ideally a process to embed metadata directly into MXF file wrappers would be
undertaken.
6. The desired search function would be quick Google style searches off
a master index. This would allow for searches that could be off of categories such as title,
author, dates, and keyword as long as the METS record has the proper metadata. In
essence one should be able to use the same search techniques and refinements that are
useful in the GRI discovery system on the front end. As the video collections have been
acquired in large chunks and subject to tight staffing limitations in the cataloging and
digitization workflow, time based indexing of video content is not a primary concern, or
indeed even being considered for implementation in the foreseeable future.
7. The ideal access structure for the audio and video collections in particular, is for a
multi-tiered approach to digital content creation. Audio should be created in two levels:
Salas 3

Lossless WAV for creating archival master copies, and lossy MP3 for accessible user
copies. Video should be in three levels: An archival master copy which consist of
JPEG2000 and WAV audio wrapped in an MXF Wrapper, an MP4 mezzanine copy, and a
use copy optimized for streaming (but should it be required a higher quality version can
be put on DVD from the mezzanine file). The access files should be pushed out to the
Gettys catalog/discovery system, but only in a limited manner for on-site streaming on
one of the Gettys campuses. The archival master will go into dark storage for
preservation.
There are materials in the collections spanning languages, so multi-language support
is a necessity. This should include support for non western characters too, as there are
objects containing metadata in languages such as Chinese. Enabling features to allow
annotations and notes from users is not a particularly pressing concern. Ideally though,
customizable interfaces for the playback of video would be a much appreciated, although
not strictly needed feature. For instance, the ability to design multiple sizes of playback
window and playback control schemes for the embedded video player would be very
useful in ensuring optimized video streaming for end users.
8. For the workflow, some very important things the system can do to streamline the
process is import existing descriptive metadata from items such as MARC records and
EAD finding aids, allow intuitive and robust entry and embedding of rights metadata,
extract technical metadata from source files, smoothly incorporate the SAMMA systems
conservation (cleaning and playback) and preservation (digitization) focused capabilities.
As these are the steps that are important in digital content creation, the asset that is
created then will need to have clear path towards leading a bifurcated life as a largely
untouched and storage intensive preservation asset, and as more flexible use copies.
Furthermore, the physical asset that is the source of these digital surrogates will still need
to be maintained in GRI information systems so that these can be accessed at later dates
should a need arise.
9. The system ideally will push content to the Gettys multiple web platforms as seen
fit. These include the ex Libris Alma cataloging/index system which feeds the GRIs
circulation capabilities as well as Primo, which is the front end discovery system used by
most researchers utilizing the collections. Furthermore there are other platforms, such as
the Getty led open content initiative, the Getty Research Portal, which currently does not
host digitized AV materials such as videos, but the ability to push publications to areas
like this should be maintained.
10. The digital collections are currently held on site in RAID storage system, but the
collection will be migrated to a hierarchical storage management system with tiered file
storage if things go according to plans. This system should ideally be mirrored in at least
one redundant storage system in another location. In addition to the physical makeup of
the storage system, the preservation elements of the system would ideally encompass five
main functions: 1. Running check sums on assets. 2. Virus checking. 3. Identifying
formats that are ingested into the system. 4. Extracting technical metadata. 5. Running a
risk analysis script which which in essence will check files for software compatibility
issues, so that all files in the system can be assured of being able to be played back.
Salas 4

Checksums are something that the current system is fully capable of, but it also currently
carries out three different types of checks as a single standard has not yet been settled on.
11. The staff should all have individual logons to the DAM and the applications
which interact with it. Levels of access to systems can then be assigned by department
heads and system administrators to those who need them for their job assignments. The
most hands on users of the DAM would be members of the digital services department,
but others such as cataloging staff and reference librarians could conceivably need to
access the system and be granted some degree of editing privileges too. Other
departments such as circulation and vocabularies may also need access, but it would
probably be on a more limited basis with perhaps less metadata editing capability.
12. The DAM should be able to interact with Alma, the back-end of the GRIs
catalog. This system contains records structured in MARC and EAD. This is ideally how
users would be able to search for content that has been digitized. This would also be the
portal that researchers would be able to use for viewing or listening to access copies of
digitized material when they are on-site at a Getty location. As the DAM would manage
still photos too, it would need to be able to push these to other host areas the Getty may
be using like its Getty Research Portal. Also the system may also have to interact with
other DAM systems across the Getty such as the Museum System (TMS) and TEAMS.
There are also legacy systems that the system may need to import items from such as
Digitool.
13. The Getty primarily runs on Windows computers for most of its workflows.
However the digital services department does use a Mac station with Adobe Premier for
to edit and prepare sound and video mezzanine files. The database which underlies the
Gettys digital resources is Oracle. The servers are Linux based. Generally, open source
software is shied away from in favor of licensed enterprise software, although the
handling of born digital content is a notable exception.
Gaps: The GRI currently funnels most of its A/V assets, along with a large number of still
images through the ex Libris Rosetta system for management and preservation. While the
system currently does do a lot as far as asset management is concerned, it still is not
perfect. Rosetta is very geared towards digital preservation, which is why this system was
chosen over the previous asset system Digitool which lacked in that area. As such though,
Rosetta sometimes functions in ways which make it clear that its primary purpose is to
function as a digital repository. For instance, it was remarked that the media sharing and
playback functions of the system could be improved. The systems ingest also has trouble
when it comes to versioning files in other formats such as the mezzanine files created for
videos. In short, the DAM system could use a little more of the multimedia friendly
functionality of a Media Asset Management (MAM) system.

Salas 5