Professional Documents
Culture Documents
Abstract. Modern scholarship increasingly relies on Simple search of books. Suppose that Alice, a
sophisticated computerized analyses of copyrighted scholar who owns a roomful of books, wants to
works. Technological access control schemes that pre- search all of the books looking for references to
vent novel computerized analyses of works prevent fair Francis Bacon, accumulating a list of citations. Al-
use and impede scholarship, and are therefore counter to ice may employ an assistant to skim through the
the goals of copyright law. books and collect this information. Similarly, if Al-
ice owns a collection of copyrighted books on digi-
tal media, she may want to perform similar searches
Introduction. Scholarship would be impeded if electronically. Whether a human assistant or a com-
scholars lost the ability to use computer programs puter program searches the books is legally imma-
of their own devising to analyze the full digitized terial; employing a computer program to search the
versions of copyrighted works. We provide specific books is fair use.
examples of scholarly projects that rely on this abil- Computers offer many practical advantages for
ity. The examples apply to works that are in the form search applications. It might be prohibitively expen-
of text documents, musical scores, audio, video, and sive to search a large collection by hand, but doing
computer programs. the same search on an inexpensive computer might
These facts justify a finding that scholarship is provide an instant result. Thus manual searches can-
impeded by the anti-circumvention prohibitions in not substitute for computerized searches.
the Digital Millennium Copyright Act, with respect
Laws that prohibit scholars from using comput-
to works in the form of text, musical scores, audio,
erized “assistants” artificially impede the progress
video, and computer programs.
of scholarship and science. If the digital works are
technically protected in such a way that they can be
This is a response to the Copyright Office’s request for
comments [CO99] on what classes of works should be ex- viewed on the screen but not electronically searched,
empted from the Digital Millennium Copyright Act’s prohibi- then the technical protection interferes with nonin-
tion on circumventing technological measures that control ac- fringing uses.
cess to copyrighted works.
† The views expressed in this document are those of the au- In this scenario, the publisher may meet Alice’s
thors, not necessarily those of Princeton University. Affiliation needs by providing a generic text search facility.
is listed only to identify the authors. Alice could search for the words “Francis Bacon”,
1
or perhaps “Bacon” and sort through the results of Thematic search of a musical work. Suppose
the search manually. Although a generic publisher- that Claire, a scholar who owns a collection of mu-
provided search facility can satisfy Alice, we will sical recordings, wants to search the collection look-
see below that such a facility fails to meet the needs ing for a particular musical theme. Like Alice and
of many other scholars. Bob, Claire has the right under copyright law to do
this, using either a human assistant or a computer
program.
Thematic search of musical scores. Suppose that Claire finds herself facing a more difficult re-
Bob, a scholar who owns a collection of musical search problem than Bob faces. Effective searching
scores, wants to search the collection looking for the through audio recordings of music is a very diffi-
occurrence of a particular musical theme. Copyright cult research problem that has seen steady but slow
law permits Bob to do this; whether a human as- progress over the last twenty years, for example in
sistant or a computer program performs this search the research on “structured audio” [VGS98]. Active
is legally immaterial. Technical protections on dig- research groups in this area need access to a wide
ital works that prevent computerized searches (on variety of recorded musical works in order to proto-
privately owned copies) interfere with noninfringing type, test, and improve their technology. Like Bob,
uses of copyrighted works. Claire needs to write computer programs that access
the original work directly.
Searches of this type have many research uses in
musicology. Indeed, entire research centers, such as
the Center for Computer Assisted Research in the Video. Suppose that David, a public-health re-
Humanities (at Stanford University), focus on tech- searcher who owns a collection of recorded movies,
nological search and analysis of music. There is a wants to search the collection looking for depictions
great deal of active research on how to encode musi- of cigarettes and related paraphernalia. David has
cal scores for computerized analysis and how to per- the right under copyright law to do this.
form the analyses. (Selfridge-Field’s book [SF97] The algorithms for doing this automatically are
summarizes research in this area and provides many not yet mature, but an active and robust discipline
citations to the research literature.) of “video content analysis” [SZ99] or “object-based
Musicology researchers perform several kinds of video coding” [PCK 99] is seeking to provide tools
operations on musical scores. They translate the for this kind of query. Research in these areas pro-
scores into different electronic formats to facilitate gresses by devising computer programs that take
analysis. They develop novel search and analysis video content as input. The research would be
criteria to represent abstract concepts such as “musi- severely inhibited if scientists cannot get access to
cal themes”. They develop novel search techniques the actual video content of the works they purchase,
to efficiently find certain patterns in encoded musi- but are limited by restrictive interface mechanisms
cal scores. to on-screen viewing or specific kinds of searches.
These activities all require the ability to write
computer programs that analyze a score directly. Innovative Text Searches and Analysis Modern
Unless the publisher of an electronic musical score scholars of Shakespeare analyze the frequency of
provides scholars with the ability to write computer word usage in the different plays. Shakespeare is
programs that directly access the score, scholars will known to have acted the role of the ghost in Ham-
lose the ability to perform these kinds of analyses. let. Donald Foster of Vassar College used statistical
Note that generic publisher-provided search facil- computations to notice that specific words that the
ities cannot possibly meet this need. Researchers ghost speaks appear more frequently in the next play
are constantly developing new and better search that Shakespeare wrote — it’s as if they were on his
methodologies. Confining scholars to any particular mind while writing the next play. In each play, there
search facility will impede research on new search seems to be one role whose words appear more fre-
methods. quently in all roles of the next play [Dol91].
2
This particular kind of statistical analysis could 64(226):66139–66143, November 1999.
not be foreseen by a publisher of the texts of Shake- http://lcweb.loc.gov/copyright/fedreg/64fr66139.pdf.
speare’s plays. Almost any generic search-engine
interface would be too limited to calculate the spe- [Dol91] Edward Dolnick. The ghost’s vo-
cific correlations necessary for this analysis. To ef- cabulary. The Atlantic Monthly,
ficiently perform a computerized test of this theory 268(4):82–86, October 1991.
that Shakespeare acted in all his own plays, the full http://www.theatlantic.com/unbound/flashbks/shakes/d
text of the plays must be readable by a computer pro-
References
[CO99] Library of Congress Copyright Of-
fice. Exemption to prohibition on
circumvention of copyright pro-
tection systems for access control
technologies. Federal Register,