S06 - Samuelson 2023 - Generative AI Meets Copyright

SPECIAL SEC TION A M AC H I N E - I N T E L L I G E N T WO R L D
ing the relative (and perhaps absolute) in- technology may disproportionately aug- POLICY FORUM
come of the previously highly-paid workers ment lower-skilled labor, reducing income
whose skills have been automated.
It is too early to determine with much cer-
tainty how this will play out for AI, whether
inequality. This, at the very least, calls into
question whether a change in the innova-
tor’s mindset is needed: Task automation
Generative
the impact on any particular job will be
positive or negative. Research is beginning
to emphasize which jobs are most likely to
may be a path to substantially improved
labor productivity.
This potential to reverse the recent
AI meets
be affected rather than lost (8). For exam-
ple, that classification tasks such as image
recognition can be done with AI will affect
trend toward skill-biased technical change
does not mean that AI is without risk. Other
concerns remain, including those related to
copyright
workers whose jobs involve classification privacy, liberty, democracy, and monopoly Ongoing lawsuits could
tasks, such as radiologists (7). Recent work power (3). Our emphasis is on understand- affect everyone who uses
examining differences between generative ing that one person’s automation is anoth-
AI (specifically, LLMs) and nongenerative er’s augmentation, and that it is difficult for generative AI
AI [as described in (7)] shows that millions engineers or policy-makers to pick which
of jobs have the potential to be affected by particular innovation will increase or re- By Pamela Samuelson
LLMs. Notably, these studies emphasize duce inequality overall. We believe that both
G
that “affect” does not mean “replace.” For regulators and engineers should be careful enerative artificial intelligence (AI)
many jobs, automating some aspects of the in shutting down a particular technology is a disruptive technology that is
workflow might increase productivity, the trajectory because it appears to automate widely adopted by members of the
wages of workers who have that job, and human work. In the process of automating general public as well as scientists
the number of workers hired to do that job. some work, other work can be augmented. and technologists who are enthusi-
Even when some jobs get automated, that Often, our analysis suggests that such astic about the potential to acceler-
might complement the tasks done by other augmentation from AI will increase the job ate research in a wide variety of fields. But
workers. Many empirical exercises [for ex- productivity of less-skilled workers who some professional artists, writers, and pro-
ample, (7–9)] emphasize the direct impact can now perform at levels achieved by their grammers fiercely object to the use of their
on jobs, but they do not explore the jobs skilled counterparts. This suggests that skill creations as training data for generative AI
that might be enhanced through comple- premia that have contributed to widening systems and to outputs that may compete
mentary production processes. For exam- inequality may be eroded. Thus, it is quite with or displace their works (1, 2). Lack of
ple, in January 2023, there were 186,417 job plausible that the use of AI to automate attribution and compensation for use of
postings in the United States that specified tasks will both increase productivity and their original creations are other sources
language skills (such as Spanish Language decrease income inequality. If so, then we of aggravation to critics of generative AI.
or American Sign Language), or about 5% of may want more automation, not less. j Copyright lawsuits that are now underway
the total job postings (see SM). Automating in the United States have substantial im-
REF ERENCES AND NOTES
language translation would directly affect plications for the future of generative AI
1. E. Brynjolfsson, Daedalus 151, 272 (2022).
many of these jobs. At the same time, many 2. J. S. Gans, A. Leigh, Innovation + Equality: Creating a systems. If the plaintiffs prevail, the only
other jobs that do not require language Future that is more Star Trek than Terminator (MIT Press, generative AI systems that may be lawful
skills would also be affected. For example, a 2019). in the United States would be those trained
3. D. Acemoglu, S. Johnson, Power and Progress: Our
recent study showed that small businesses Thousand-Year Struggle Over Technology and Prosperity on public domain works or under licenses,
that used a rudimentary automated lan- (PublicAffairs, 2023). which will affect everyone who deploys
guage translation tool on eBay experienced 4. D. Acemoglu, P. Restrepo, Am. Econ. Rev. 108, 1488 generative AI, integrates it into their prod-
(2018).
a 17.5% increase in exports to markets where 5. C. Goldin, L. Katz, The Race Between Education and ucts, and uses it for scientific research.
that language is used (15). Automation of Technology (Harvard Univ. Press, 2008). Conflicts between creators of copy-
some jobs could create opportunities for 6. D. H. Autor, Science 344, 843 (2014). righted works and developers of technolo-
7. E. Brynjolfsson, T. Mitchell, Science 358, 1530 (2017).
those whose work would appear to be un- 8. E. Felten, M. Raj, R. Seamans, Strateg. Manage. J. 42,
gies that enable the use of those creations
affected, as measured with the tasks and 2195 (2021). in unexpected and innovative ways is noth-
skills involved in current workflows. 9. T. Eloundou, S. Manning, P. Mishkin, D. Rock, ing new. In the early 20th century, the dis-
arXiv:2303.10130 [econ.GN] (2023). ruptive technology of the day was player
10. A. Agrawal, J. S. Gans, A. Goldfarb, The Economics of
CONCLUSION Artificial Intelligence: An Agenda (Univ. Chicago Press, pianos. Music copyright owners sued the
Many economists who have studied the im- 2019). makers of piano rolls, claiming that rolls of
pact of automation on labor markets have 11. K. Kanazawa, D. Kawaguchi, H. Shigeoka, Y. Watanabe, their musical compositions were infringe-
“AI, skill, and productivity: The case of taxi drivers,” no.
argued recently that the direction of AI w30612, National Bureau of Economic Research (2022). ments. Subsequent copyright-disruptive
research needs to be changed away from 12. E. Brynjolfsson, D. Li, L. Raymond, “Generative AI at technologies have included cable televi-
automating tasks to focusing on overall work,” National Bureau of Economic Research (NBER) sion, photocopiers, videotape recording
working paper 31161 (NBER, 2023).
job augmentation. The implicit argument 13. D. H. Autor, L. Katz, M. Kearney, Rev. Econ. Stat. 90, 300 machines, and MP3 players, each of which
is that a focus on augmentation will lead (2008). (except photocopiers) attracted copyright
to more complementarity with lower-wage 14. A. Agrawal, J. S. Gans, A. Goldfarb, Power and Prediction industry challenges (all of which failed in
(Harvard Business Review Press, 2022).
labor and more new tasks. However, many 15. E. Brynjolfsson, X. Hui, M. Liu, Manage. Sci. 65, 5449 the courts, although Congress sometimes
recent advances in AI that have been de- (2019). later extended protections in the after-
veloped with the explicit goal of task auto- math of failed lawsuits).
SUPPLEMENTARY MATERIALS
mation have appeared to increase worker
science.org/doi/10.1126/science.adh9429
productivity; that is, task automation has Berkeley Law School, University of California Berkeley,
been labor augmenting. Furthermore, AI 10.1126/science.adh9429 Berkeley, CA, USA.Email: psamuelson@berkeley.edu
158 14 JULY 2023 • VOL 381 ISSUE 6654 science.org SCIENCE

When new technologies pose new copy- Rulings in favor of plaintiffs might trig- sions. Nor does copyright’s scope extend
right questions that Congress did not an- ger “innovation arbitrage,” causing devel- to inferences that readers might draw
ticipate, courts typically consider which opers of generative AI systems to move from reviewing an author’s works, such
outcome is most consistent with the con- their bases of operation to countries that as insights about patterns of connections
stitutional purposes of copyright. The regard the ingestion of copyrighted works among concepts or how works of that kind
Constitution gives Congress the power “to as training data as fair use, like Israel’s are constructed.
promote the progress of Science and use- Ministry of Justice did in early 2023. Other Moreover, Stability AI did not prepare
ful Arts,” that is, to foster the creation and countries that want to attract AI innova- the dataset on which the Stable Diffusion
dissemination of knowledge for the public tions may follow suit. If courts uphold the model was trained. This was done by a
good. This requires balancing the legiti- Stability AI plaintiffs’ claims, OpenAI’s nonprofit German research organization
mate interests of copyright owners to pre- GPT4 and Google’s BARD may also be in known as LAION (Large-Scale Artificial
vent misappropriations of their works that jeopardy. Their developers would be very Intelligence Open Network). LAION ini-
undermine incentives to create with the attractive targets of follow-on lawsuits. tially developed LAION-5B, a dataset
legitimate interests of developers of inno- consisting of 5.85 billion hyperlinks that
vative technologies and follow-on creators INGESTING TRAINING DATA pair images and text descriptions from
who need some breathing space in which Stability AI has yet to articulate its main the open internet. LAION makes this da-
they, too, can innovate. defenses to the copyright charges. Insofar taset available to the public for free for
What makes generative AI more dis- as the complaints allege that Stable use as training data for those who want to
ruptive than previous technologies? One Diffusion contains copies of in-copyright use it to build generative models. LAION
factor is certainly the exceptionally rapid images used as training data, the claims also developed a subset of LAION-5B,
pace at which generative AI technologies are factually and technically inaccurate. known as LAION-Aesthetics, that con-
have been launched, adopted, and adapted. Stable Diffusion contains an extremely sists of hyperlinks to 600 million images
Evolution in the fields of law and policy, by large number of parameters that math- selected by some human testers for their
contrast and of necessity, is much slower. ematically represent concepts embodied in visual appeal and by a machine-learning
It is, moreover, not easy to assess how to the training data, but the images as such analysis of human aesthetic ratings. The
calibrate balances among competing copy- are not embodied in its model. Stable Diffusion model was trained on the
right interests in the early stages of new- Training a model begins by tokenizing LAION-Aesthetics dataset.
technology evolutions. Generative AI seems the contents of works ingested as training LAION’s creation of this dataset was very
poised to have substantial impacts likely lawful because the European
on the careers of professional writ- Union (EU) adopted an exemp-
ers and artists. During the 2023 “The complaints against Stability AI overlook tion allowing nonprofit research
Writers Guild of America strike,
for instance, uses of generative
the intentionally porous nature of copyrights.” organizations to make copies of in-
copyright works for text and data
AI are one focus of negotiations. mining (TDM) purposes. The EU
Screenwriters are understandably worried data into component elements. The model created this exception in recognition of the
that these technologies will displace them uses these tokens to discern statistical societal value of TDM as a means by which
or diminish their compensation. correlations—often at staggeringly large researchers can create new knowledge.
Stability AI is defending two copyright scales—among features of the content on This exemption cannot be overridden by
infringement lawsuits in the United States which the model is being trained. In es- contract. (A second EU exemption autho-
that are focused on Stable Diffusion, a sence, the model is extracting and analyz- rizes commercial actors to engage in TDM,
widely used image generator. Getty Images ing precise facts about, and correlations although copyright owners can opt out of
is the plaintiff in one of these lawsuits. The between, discrete elements of the works this exemption, as some have done.)
other is a class-action lawsuit on behalf to ascertain which other discrete elements Stability AI makes Stable Diffusion avail-
of visual artists on whose images Stable either do or do not follow or are proximate able on an open-source basis. However, it
Diffusion was trained. Both complaints to these elements and the frequency with also provides a subscription service so that
assert that Stability AI made unlawful which the correlations do or do not exist in those who lack resources or the inclination
copies of the plaintiffs’ images when in- varying contexts. to host the open-source version can have
gesting them as inputs for training Stable The complaints against Stability AI access to Stable Diffusion to generate im-
Diffusion’s model and that output images overlook the intentionally porous nature ages in response to text prompts. Yet, in-
produced by Stable Diffusion in response of copyrights. What copyright law protects sofar as ingesting in-copyright images to
to user prompts are infringing derivative is only the original expression that authors train a generative model requires making
works. contribute (such as sequences of words in a at least temporary or incidental copies of
A third generative AI lawsuit (Doe v. poem or the melody of music). Copyright’s them, Stability AI is likely to argue that
Github, Inc.) challenges OpenAI’s devel- scope never extends to any ideas, facts, or this is a fair use under US copyright law.
opment of Codex, a large language model methods embodied in works nor to ele-
(LLM) trained upon billions of lines of ments common in works of that kind (un- FAIR USE
open-source software code. Also challenged der copyright’s “scenes a faire” doctrine), Under US law, fair uses of in-copyright
is GitHub and OpenAI’s collaborative de- elements capable of being expressed in works do not infringe copyrights. Courts
velopment of Copilot, a coding assistant very few ways (under the “merger” doc- consider four factors when assessing fair
tool that draws upon the Codex LLM to trine), or the underlying subjects depicted use defenses: (i) the purpose of the chal-
suggest lines of code for specific functions in protected works. Photographs of cats, lenged use, (ii) the nature of the copyrighted
in response to user prompts. (Microsoft, for instance, do not give the photographer works, (iii) the amount and substantiality of
which owns GitHub and has invested heav- exclusive rights to characteristic features the taking, and (iv) the effect of the chal-
ily in OpenAI, is a fellow defendant.) of cats, such as their noses or facial expres- lenged use on the market for or value of the
SCIENCE science.org 14 JULY 2023 • VOL 381 ISSUE 6654 159

SPECIAL SEC TION A M AC H I N E - I N T E L L I G E N T WO R L D
copyrighted work. The purpose and market

effects factors are generally the most impor-
tant determinants in fair-use cases, but all
four factors must be weighed together in a
holistic analysis.
Research, scholarship, and teaching are
among the favored fair-use purposes, as
are criticism, comment, and news report-
ing. Noncommercial uses are generally
favored more than commercial uses. Since
1994, when the Supreme Court considered
the fairness of 2 Live Crew’s rap parody of
a popular Roy Orbison song in Campbell
v. Acuff-Rose Music, Inc., courts have given
considerable weight to whether the pur-
pose of a challenged use was “transforma-
tive.” The Court defined this term as uses
that “add[] something new, with a further
purpose or different character, altering
the first with new expression, meaning, or
message” [(3), p. 579]. Transformative uses
are also less likely than nontransformative
uses to harm the market for the first work.
People are, for instance, unlikely to pur-
chase 2 Live Crew’s parody if they want to
listen to Roy Orbison’s rendition.
The Stability AI plaintiffs will likely ar-
gue that the ingestion of their works as
training data was nontransformative and
commercial. Both considerations would,
if accepted, tip against fair use. However,
several court decisions have ruled that
analogous digital uses of in-copyright
works qualified as transformative fair uses.
For example, in Authors Guild v. Google,
Inc., a court ruled that Google’s digitiza- pecially the amount factor. As in the Authors an intent to establish one) is not, however,
tion of millions of books from research Guild case, the Stability AI plaintiffs may a consideration that by itself can resolve
library collections to index their contents emphasize that the defendant made exact a dispute in transformative fair use cases.
and serve a few snippets of book contents copies of the entirety of many millions of In its 2021 Google LLC v. Oracle America,
in response to user search queries was a works without permission or compensation. Inc., decision, the Supreme Court rejected
“highly transformative” fair use. Although However, courts typically inquire whether Oracle’s argument that Google’s use of
Google’s purpose was commercial, it was such copying was necessary to achieve a parts of the Java application program-
very different from the purposes for which transformative purpose. In the Authors ming interface (API) had deprived Oracle
the books were marketed. Google’s use Guild case, the court recognized that Google of license revenues to which it claimed an
facilitated greater public access to knowl- could not index book contents and serve up entitlement. The Court stated that courts
edge as well as enabling TDM research and snippets in response to search queries unless should consider the public benefits of a
the creation of new research tools. In Field it copied the books’ contents. Stability AI challenged use as well as potential lost
v. Google, Inc., a court found that Google’s will likely make a similar necessity argument revenues and how much creativity a chal-
cache copying of contents from Field’s about training-data usages of images. lenged use has enabled and balance this
website was a transformative fair use. The market effect of a challenged use is against potential losses.
The nature-of-the-work factor often has sometimes said to be the most important This consideration was very relevant
little importance in fair-use cases. The fair-use factor. The Getty complaint against in the Oracle case. Not only was Google’s
Stability AI plaintiffs may argue that be- Stablity AI emphasizes that it has established Android smartphone platform, in which the
cause works of visual art lie at the very a licensing market for use of its premium pho- Java API was used, a highly innovative new
core of copyright, fair use should be thin- tographs as training data for generative AI. software product, but it enabled millions of
ner for these works than for the old library That bolsters Getty’s argument that Stability programmers to use their familiarity with
books at issue in the Authors Guild case. AI’s appropriation of 12 million images from the Java API to create many millions of
ILLUSTRATION: A. MASTIN/SCIENCE
A countervailing consideration is that Getty websites has harmed a licensing mar- programs. The Court thought this use was
the visual artists on whose works Stable ket. The class-action claim against Stability consistent with the constitutional objective
Diffusion trained made their works avail- AI is weaker because Stability AI could not of copyright to promote creative progress.
able on the open internet, as did Field in have gotten a license from the class of visual The public greatly benefited from Android’s
the Google, Inc., case. artists whose works were ingested to con- existence and the availability of large num-
Transformative purposes tend to have struct the Stable Diffusion model. bers of apps that ran on that platform.
spillover effects on other fair-use factors, es- The existence of a licensing market (or Stability AI will almost certainly channel
160 14 JULY 2023 • VOL 381 ISSUE 6654 science.org SCIENCE

the public-benefit and creative-impacts state- unless a court decides to overturn decades emphasize that the images produced by
ments in the Oracle decision and point to of precedents interpreting the derivative Stable Diffusion compete with their works
the exceptional creativity embodied in Stable work right and broaden it substantially, in the marketplace. They can point to
Diffusion as well as the hundreds of millions the class action’s output infringement the Supreme Court’s Goldsmith decision,
of creative uses of this generative AI system, claim is likely to fail. which treated competing uses as weigh-
including those by graphic artists who use it The class-action complaint acknowl- ing against fairness. Yet Goldsmith in-
to generate ideas or refine creations. edges that “[i]n general, none of the Stable volved two works that were substantially
The Stability AI plaintiffs will likely Diffusion output images provided in re- similar in their expressions—Goldsmith’s
counter this argument with the Supreme sponse to a particular Text Prompt is likely photo of Prince and Warhol’s print derived
Court’s 2023 ruling in Andy Warhol to be a close match for any specific image from Goldsmith’s photo—that competed in
Foundation for the Visual Arts, Inc. v. in the training data” [(4), p. 23]. Even “in the same licensing market for magazines.
Goldsmith, which somewhat narrowed the the style of ” claims seem weak because Stability AI will be relying on differences
conception of transformative purposes. It copyright law does not protect styles as in Stable Diffusion’s outputs relative to
no longer suffices for challenged works to such. Infringement can be found only if plaintiffs’ works in order to distinguish
have a new meaning or message. More im- there is a close resemblance between ex- their case’s context from that of Goldsmith.
portant now are whether the challenged pressive elements of a stylistically similar
use has a different purpose than the first work and original expression in particular CONCLUDING THOUGHTS
work and how commercial the use is. works by that artist. Based on existing precedents and an un-
Stability AI will argue that ingesting copy- The reason that Stable Diffusion outputs derstanding about how Stable Diffusion
righted materials as training data had a are highly unlikely to be substantially simi- was trained and how it generates images
very different purpose than the works as lar to particular images on which its model in response to prompts, Stability AI seem-
first published. was trained is due to how Stable Diffusion ingly has a reasonable chance of prevailing
What might tip the scale against assembles them. Constructing a model for on the copyright claims. (Both the Getty
Stability AI’s fair-use defense is whether an image-generating AI requires process- and the class-action complaints raise other
images produced by Stable Diffusion in- ing enormous quantities of input data to claims that cannot be addressed in this
fringe the derivative work right of the produce abstract representations of image brief article.) The lawsuits are, however, in
authors of the images on which its LLM elements (such as cats playing with a ball very early stages, and it may be years be-
was trained. A relevant precedent is Sega on a linoleum floor). Diffusion adds noise fore courts render decisions.
Enterprises Ltd. v. Accolade, Inc., in which to image elements when encoding them. In mid-May 2023, Congress held its first
an appellate court decided that Accolade The pairing of text descriptions and images hearing about generative AI and copyright
had made fair use of Sega software when allows the model to cluster the abstract issues, during which witnesses expressed
making reverse-engineered copies for the representations so that similar representa- divergent views. The US Copyright Office
legitimate purpose of extracting informa- tions will be in proximity (representations is well aware of the consternation that
tion about how to make its videogames of cats near other cat representations). generative AI has fomented in copyright-
compatible with the Sega platform. Had When a user enters a prompt directing dependent communities. The Office hosted
Accolade reverse-engineered for an illegiti- the software to generate a specific type of “listening sessions” in spring 2023 to pro-
mate purpose, such as to appropriate ex- output, the generative AI system uses com- vide stakeholders with opportunities to
pression from the Sega games, its fair-use plex statistical calculations to assemble an explain their perspectives on the two prin-
defense would have faltered. The Accolade output that the system predicts will match cipal questions posed in the Stability AI
games competed with Sega’s games, but what the user requested. cases: Is the use of in-copyright works as
the court thought that this was the kind of It is, however, possible for generative AI training data for generative AI systems an
competition among noninfringing works outputs to infringe copyrights. If the same infringement of copyright? Are the outputs
that copyright is supposed to foster. input image (say, of Mickey Mouse) is pres- of generative AI systems infringing deriva-
ent in many works on which the model was tive works?
OUTPUTS AS INFRINGEMENTS trained and its developer did not follow During the summer of 2023, the Office
The class-action complaint against Stability industry best practices by eliminating du- plans to allow interested parties to sub-
AI asserts that all images produced by plicates and using output filters to prevent mit written comments expressing their
Stable Diffusion are infringing derivative infringements, user prompts could result perspectives and analyses on these and
works because all are derived from the im- in infringing outputs (although this user, related questions. The Office intends to
ages on which its model trained. It charac- not the developer of the generative AI sys- write a report setting forth its conclusions,
terizes Stable Diffusion as a “collage tool” tem, may be the infringer). Ironically, the which may include legislative recommen-
whose outputs compete against the artists’ larger and more diverse that the dataset on dations. Scientists who have an interest in
own works and thereby harm their mar- which a generative model was trained is, the future of generative AI would be well
kets. Users of Stable Diffusion, moreover, the less likely are infringing outputs. advised to submit comments. j
can submit prompts requesting the genera- The Getty complaint against Stability
REF ERENCES AND NOTES
tion of an image of a particular subject “in AI is more modest in its infringing output
1. N. Klein, “AI machines aren’t ‘hallucinating.’ But their
the style of ” a specific named artist. claims. Yet Getty, too, may find it difficult makers are,” The Guardian, 8 May 2023.
However, courts have long held that to to prove that particular Stable Diffusion 2. M. Cuenco, “We must declare jihad against AI,” Compact,
infringe copyright’s derivative work right, outputs are substantially similar to par- 28 April 2023.
it is not enough to show that a second work ticular photographs to which it owns copy- 3. Campbell v. Acuff-Rose Music, Inc., 510 US 569, 579
(1994).
was “based upon” an earlier work or some rights. In general, Stable Diffusion outputs
4. Complaint, Anderson v. Stability AI, Ltd., case no. 3:23-
of its elements. The second work must will be distinguishably different from the cv-00201 para. 93 (N.D. Cal. Jan. 13, 2023).
have appropriated a substantial quantum images on which the model was trained.
of the first work’s original expression. So, The Stability AI plaintiffs will likely 10.1126/science.adi0656
SCIENCE science.org 14 JULY 2023 • VOL 381 ISSUE 6654 161

Copyright 2023 American Association for the Advancement of Science. All rights reserved.

S06 - Samuelson 2023 - Generative AI Meets Copyright

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

S06 - Samuelson 2023 - Generative AI Meets Copyright

Uploaded by

Copyright:

Available Formats

SPECIAL SEC TION A M AC H I N E - I N T E L L I G E N T WO R L D

158 14 JULY 2023 • VOL 381 ISSUE 6654 science.org SCIENCE

SCIENCE science.org 14 JULY 2023 • VOL 381 ISSUE 6654 159

copyrighted work. The purpose and market

160 14 JULY 2023 • VOL 381 ISSUE 6654 science.org SCIENCE

SCIENCE science.org 14 JULY 2023 • VOL 381 ISSUE 6654 161

You might also like