Professional Documents
Culture Documents
Intelligence
Index Report
2024
Artificial Intelligence
Index Report 2024
Introduction to the
AI Index Report 2024
Welcome to the seventh edition of the AI Index report. The 2024 Index is our most comprehensive to date and
arrives at an important moment when AI’s influence on society has never been more pronounced. This year,
we have broadened our scope to more extensively cover essential trends such as technical advancements
in AI, public perceptions of the technology, and the geopolitical dynamics surrounding its development.
Featuring more original data than ever before, this edition introduces new estimates on AI training costs,
detailed analyses of the responsible AI landscape, and an entirely new chapter dedicated to AI’s impact on
science and medicine.
The AI Index report tracks, collates, distills, and visualizes data related to artificial intelligence (AI). Our
mission is to provide unbiased, rigorously vetted, broadly sourced data in order for policymakers, researchers,
executives, journalists, and the general public to develop a more thorough and nuanced understanding of the
complex field of AI.
The AI Index is recognized globally as one of the most credible and authoritative sources for data and insights
on artificial intelligence. Previous editions have been cited in major newspapers, including the The New York
Times, Bloomberg, and The Guardian, have amassed hundreds of academic citations, and been referenced
by high-level policymakers in the United States, the United Kingdom, and the European Union, among
other places. This year’s edition surpasses all previous ones in size, scale, and scope, reflecting the growing
significance that AI is coming to hold in all of our lives.
2
Artificial Intelligence
Index Report 2024
Message From
the Co-directors
A decade ago, the best AI systems in the world were unable to classify objects in images at a human level. AI
struggled with language comprehension and could not solve math problems. Today, AI systems routinely exceed
human performance on standard benchmarks.
Progress accelerated in 2023. New state-of-the-art systems like GPT-4, Gemini, and Claude 3 are impressively
multimodal: They can generate fluent text in dozens of languages, process audio, and even explain memes. As AI
has improved, it has increasingly forced its way into our lives. Companies are racing to build AI-based products,
and AI is increasingly being used by the general public. But current AI technology still has significant problems. It
cannot reliably deal with facts, perform complex reasoning, or explain its conclusions.
AI faces two interrelated futures. First, technology continues to improve and is increasingly used, having major
consequences for productivity and employment. It can be put to both good and bad uses. In the second future,
the adoption of AI is constrained by the limitations of the technology. Regardless of which future unfolds,
governments are increasingly concerned. They are stepping in to encourage the upside, such as funding university
R&D and incentivizing private investment. Governments are also aiming to manage the potential downsides, such
as impacts on employment, privacy concerns, misinformation, and intellectual property rights.
As AI rapidly evolves, the AI Index aims to help the AI community, policymakers, business leaders, journalists, and
the general public navigate this complex landscape. It provides ongoing, objective snapshots tracking several
key areas: technical progress in AI capabilities, the community and investments driving AI development and
deployment, public opinion on current and potential future impacts, and policy measures taken to stimulate AI
innovation while managing its risks and challenges. By comprehensively monitoring the AI ecosystem, the Index
serves as an important resource for understanding this transformative technological force.
On the technical front, this year’s AI Index reports that the number of new large language models released
worldwide in 2023 doubled over the previous year. Two-thirds were open-source, but the highest-performing
models came from industry players with closed systems. Gemini Ultra became the first LLM to reach human-
level performance on the Massive Multitask Language Understanding (MMLU) benchmark; performance on the
benchmark has improved by 15 percentage points since last year. Additionally, GPT-4 achieved an impressive 0.96
mean win rate score on the comprehensive Holistic Evaluation of Language Models (HELM) benchmark, which
includes MMLU among other evaluations.
3
Artificial Intelligence
Index Report 2024
4
Artificial Intelligence
Index Report 2024
Top 10 Takeaways
1. AI beats humans on some tasks, but not on all. AI has surpassed human performance on several
benchmarks, including some in image classification, visual reasoning, and English understanding. Yet it trails
behind on more complex tasks like competition-level mathematics, visual commonsense reasoning and planning.
3. Frontier models get way more expensive. According to AI Index estimates, the training costs
of state-of-the-art AI models have reached unprecedented levels. For example, OpenAI’s GPT-4 used an
estimated $78 million worth of compute to train, while Google’s Gemini Ultra cost $191 million for compute.
4. The United States leads China, the EU, and the U.K. as the leading source of top AI
models. In 2023, 61 notable AI models originated from U.S.-based institutions, far outpacing the European
Union’s 21 and China’s 15.
5. Robust and standardized evaluations for LLM responsibility are seriously lacking.
New research from the AI Index reveals a significant lack of standardization in responsible AI reporting.
Leading developers, including OpenAI, Google, and Anthropic, primarily test their models against different
responsible AI benchmarks. This practice complicates efforts to systematically compare the risks and
limitations of top AI models.
7. The data is in: AI makes workers more productive and leads to higher quality work. In
2023, several studies assessed AI’s impact on labor, suggesting that AI enables workers to complete tasks more
quickly and to improve the quality of their output. These studies also demonstrated AI’s potential to bridge
the skill gap between low- and high-skilled workers. Still, other studies caution that using AI without proper
oversight can lead to diminished performance.
5
Artificial Intelligence
Index Report 2024
9. The number of AI regulations in the United States sharply increases. The number of AI-
related regulations in the U.S. has risen significantly in the past year and over the last five years. In 2023, there
were 25 AI-related regulations, up from just one in 2016. Last year alone, the total number of AI-related regulations
grew by 56.3%.
10. People across the globe are more cognizant of AI’s potential impact—and more nervous.
A survey from Ipsos shows that, over the last year, the proportion of those who think AI will dramatically affect their
lives in the next three to five years has increased from 60% to 66%. Moreover, 52% express nervousness toward AI
products and services, marking a 13 percentage point rise from 2022. In America, Pew data suggests that 52% of
Americans report feeling more concerned than excited about AI, rising from 37% in 2022.
6
Artificial Intelligence
Index Report 2024
Steering Committee
Co-directors
Jack Clark, Anthropic, OECD
Raymond Perrault, SRI International
Members
Erik Brynjolfsson, Stanford University Juan Carlos Niebles, Stanford University, Salesforce
John Etchemendy, Stanford University Vanessa Parli, Stanford University
Katrina Ligett, Hebrew University Yoav Shoham, Stanford University, AI21 Labs
Terah Lyons, JPMorgan Chase & Co. Russell Wald, Stanford University
James Manyika, Google, University of Oxford
Research Associate
Loredana Fattorini
Stanford University
Affiliated Researchers
Elif Kiesow Cortez, Stanford Law School Research Fellow Alexandra Rome, Freelance Researcher
Anka Reuel, Stanford University Lapo Santarlasci, IMT School for
Robi Rahman, Data Scientist Advanced Studies Lucca
Graduate Researchers
James da Costa, Stanford University
Simba Jonga, Stanford University
Undergraduate Researchers
Emily Capstick, Stanford University Julia Betts Lotufo, Stanford University
Summer Flowers, Stanford University Sukrut Oak, Stanford University
Armin Hamrah, Claremont McKenna College Andrew Shi, Stanford University
Amelia Hardy, Stanford University Jason Shin, Stanford University
Mena Hassan, Stanford University Emma Williamson, Stanford University
Ethan Duncan He-Li Hellman, Stanford University Alfred Yu, Stanford University
7
Artificial Intelligence
Index Report 2024
The AI Index 2024 Annual Report by Stanford University is licensed under Attribution-NoDerivatives 4.0 International.
The AI Index was conceived within the One Hundred Year Study on Artificial Intelligence (AI100).
The AI Index welcomes feedback and new ideas for next year. Contact us at AI-Index-Report@stanford.edu.
The AI Index acknowledges that while authored by a team of human researchers, its writing process was aided
by AI tools. Specifically, the authors used ChatGPT and Claude to help tighten and copy edit initial drafts.
The workflow involved authors writing the original copy, then utilizing AI tools as part of the editing process.
8
Artificial Intelligence
Index Report 2024
Supporting Partners
Analytics and
Research Partners
9
Artificial Intelligence
Index Report 2024
Contributors
The AI Index wants to acknowledge the following individuals by chapter and section for their contributions of data,
analysis, advice, and expert commentary included in the AI Index 2024 Report:
Introduction
Loredana Fattorini, Nestor Maslej, Vanessa Parli, Ray Perrault
Chapter 3: Responsible AI
Jack Clark, Loredana Fattorini, Amelia Hardy, Katrina Ligett, Nestor Maslej, Vanessa Parli, Ray Perrault,
Anka Reuel, Andrew Shi
Chapter 4: Economy
Susanne Bieller, Erik Brynjolfsson, Mar Carpanelli, James da Costa, Natalia Dorogi, Heather English, Murat Erer,
Loredana Fattorini, Akash Kaura, James Manyika, Nestor Maslej, Cal McKeever, Julia Nitschke, Layla O’Kane,
Vanessa Parli, Ray Perrault, Brittany Presten, Carl Shan, Bill Valle, Casey Weston, Emma Williamson
10
Artificial Intelligence
Index Report 2024
Contributors (cont’d)
Chapter 6: Education
Betsy Bizot, John Etchemendy, Loredana Fattorini, Kirsten Feddersen, Matt Hazenbush, Nestor Maslej,
Vanessa Parli, Ray Perrault, Svetlana Tikhonenko, Laurens Vehmeijer, Hannah Weissman, Stuart Zweben
Chapter 8: Diversity
Betsy Bizot, Loredana Fattorini, Kirsten Feddersen, Matt Hazenbush, Nestor Maslej, Vanessa Parli, Ray Perrault,
Svetlana Tikhonenko, Laurens Vehmeijer, Caroline Weis, Hannah Weissman, Stuart Zweben
11
Artificial Intelligence
Index Report 2024
The AI Index thanks the following organizations and individuals who provided data for inclusion in this year’s report:
Organizations
Center for Research on International Federation of Robotics
Foundation Models Susanne Bieller
Rishi Bommasani, Percy Liang
Lightcast
Center for Security and Emerging Cal McKeever, Julia Nitschke, Layla O’Kane
Technology, Georgetown University
Catherine Aiken, James Dunham, Autumn Toney LinkedIn
Murat Erer, Akash Kaura, Casey Weston
Code.org
Hannah Weissman McKinsey & Company
Natalia Dorogi, Brittany Presten
Computing Research Association
Betsy Bizot, Stuart Zweben Munk School of Global Affairs and Public Policy
Peter Loewen, Lee Slinger
Epoch
Ben Cottier, Robi Rahman Quid
Heather English, Bill Valle
GitHub
Peter Cihon, Kevin Xu Schwartz Reisman Institute for Technology
and Society
Govini Maggie Arai, Marco Monteiro Silva
Alison Boyer, Rebecca DeCrescenzo,
Philip de Guzman, Jeff Park Studyportals
Kirsten Feddersen, Laurens Vehmeijer
Informatics Europe
Svetlana Tikhonenko Women in Machine Learning
Caroline Weis
The AI Index also thanks Jeanina Casusi, Nancy King, Carolyn Lehman, Shana Lynch, Jonathan Mindes, and
Michi Turner for their help in preparing this report; Joe Hinman and Nabarun Mukherjee for their help in
maintaining the AI Index website; and Annie Benisch, Marc Gough, Panos Madamopoulos-Moraris, Kaci Peel,
Drew Spence, Madeline Wright, and Daniel Zhang for their work in helping promote the report.
12
Artificial Intelligence
Index Report 2024
Table of Contents
Report Highlights 14
Appendix 458
13
Artificial Intelligence
Index Report 2024
Report Highlights
Chapter 1: Research and Development
1. Industry continues to dominate frontier AI research. In 2023, industry produced 51 notable
machine learning models, while academia contributed only 15. There were also 21 notable models resulting from
industry-academia collaborations in 2023, a new high.
2. More foundation models and more open foundation models. In 2023, a total of 149 foundation
models were released, more than double the amount released in 2022. Of these newly released models, 65.7%
were open-source, compared to only 44.4% in 2022 and 33.3% in 2021.
3. Frontier models get way more expensive. According to AI Index estimates, the training costs of
state-of-the-art AI models have reached unprecedented levels. For example, OpenAI’s GPT-4 used an estimated
$78 million worth of compute to train, while Google’s Gemini Ultra cost $191 million for compute.
4. The United States leads China, the EU, and the U.K. as the leading source of top AI
models. In 2023, 61 notable AI models originated from U.S.-based institutions, far outpacing the European
Union’s 21 and China’s 15.
5. The number of AI patents skyrockets. From 2021 to 2022, AI patent grants worldwide increased
sharply by 62.7%. Since 2010, the number of granted AI patents has increased more than 31 times.
6. China dominates AI patents. In 2022, China led global AI patent origins with 61.1%, significantly
outpacing the United States, which accounted for 20.9% of AI patent origins. Since 2010, the U.S. share of AI
patents has decreased from 54.1%.
7. Open-source AI research explodes. Since 2011, the number of AI-related projects on GitHub has
seen a consistent increase, growing from 845 in 2011 to approximately 1.8 million in 2023. Notably, there was a
sharp 59.3% rise in the total number of GitHub AI projects in 2023 alone. The total number of stars for AI-related
projects on GitHub also significantly increased in 2023, more than tripling from 4.0 million in 2022 to 12.2 million.
8. The number of AI publications continues to rise. Between 2010 and 2022, the total number of AI
publications nearly tripled, rising from approximately 88,000 in 2010 to more than 240,000 in 2022. The increase
over the last year was a modest 1.1%.
14
Artificial Intelligence
Index Report 2024
Report Highlights
Chapter 2: Technical Performance
1. AI beats humans on some tasks, but not on all. AI has surpassed human performance on several
benchmarks, including some in image classification, visual reasoning, and English understanding. Yet it trails
behind on more complex tasks like competition-level mathematics, visual commonsense reasoning and planning.
2. Here comes multimodal AI. Traditionally AI systems have been limited in scope, with language models
excelling in text comprehension but faltering in image processing, and vice versa. However, recent advancements
have led to the development of strong multimodal models, such as Google’s Gemini and OpenAI’s GPT-4. These
models demonstrate flexibility and are capable of handling images and text and, in some instances, can even
process audio.
4. Better AI means better data which means … even better AI. New AI models such as
SegmentAnything and Skoltech are being used to generate specialized data for tasks like image segmentation and
3D reconstruction. Data is vital for AI technical improvements. The use of AI to create more data enhances current
capabilities and paves the way for future algorithmic improvements, especially on harder tasks.
5. Human evaluation is in. With generative models producing high-quality text, images, and more,
benchmarking has slowly started shifting toward incorporating human evaluations like the Chatbot Arena
Leaderboard rather than computerized rankings like ImageNet or SQuAD. Public sentiment about AI is becoming
an increasingly important consideration in tracking AI progress.
6. Thanks to LLMs, robots have become more flexible. The fusion of language modeling with
robotics has given rise to more flexible robotic systems like PaLM-E and RT-2. Beyond their improved robotic
capabilities, these models can ask questions, which marks a significant step toward robots that can interact more
effectively with the real world.
15
Artificial Intelligence
Index Report 2024
8. Closed LLMs significantly outperform open ones. On 10 select AI benchmarks, closed models
outperformed open ones, with a median performance advantage of 24.2%. Differences in the performance of
closed and open models carry important implications for AI policy debates.
16
Artificial Intelligence
Index Report 2024
Report Highlights
Chapter 3: Responsible AI
1. Robust and standardized evaluations for LLM responsibility are seriously lacking.
New research from the AI Index reveals a significant lack of standardization in responsible AI reporting. Leading
developers, including OpenAI, Google, and Anthropic, primarily test their models against different responsible AI
benchmarks. This practice complicates efforts to systematically compare the risks and limitations of top AI models.
2. Political deepfakes are easy to generate and difficult to detect. Political deepfakes are already
affecting elections across the world, with recent research suggesting that existing AI deepfake methods perform
with varying levels of accuracy. In addition, new projects like CounterCloud demonstrate how easily AI can create
and disseminate fake content.
4. Risks from AI are becoming a concern for businesses across the globe. A global survey on
responsible AI highlights that companies’ top AI-related concerns include privacy, data security, and reliability.
The survey shows that organizations are beginning to take steps to mitigate these risks. Globally, however, most
companies have so far only mitigated a small portion of these risks.
5. LLMs can output copyrighted material. Multiple researchers have shown that the generative outputs
of popular LLMs may contain copyrighted material, such as excerpts from The New York Times or scenes from
movies. Whether such output constitutes copyright violations is becoming a central legal question.
6. AI developers score low on transparency, with consequences for research. The newly
introduced Foundation Model Transparency Index shows that AI developers lack transparency, especially
regarding the disclosure of training data and methodologies. This lack of openness hinders efforts to further
understand the robustness and safety of AI systems.
17
Artificial Intelligence
Index Report 2024
8. The number of AI incidents continues to rise. According to the AI Incident Database, which tracks
incidents related to the misuse of AI, 123 incidents were reported in 2023, a 32.3 percentage point increase from
2022. Since 2013, AI incidents have grown by over twentyfold. A notable example includes AI-generated, sexually
explicit deepfakes of Taylor Swift that were widely shared online.
9. ChatGPT is politically biased. Researchers find a significant bias in ChatGPT toward Democrats in the
United States and the Labour Party in the U.K. This finding raises concerns about the tool’s potential to influence
users’ political views, particularly in a year marked by major global elections.
18
Artificial Intelligence
Index Report 2024
Report Highlights
Chapter 4: Economy
1. Generative AI investment skyrockets. Despite a decline in overall AI private investment last year,
funding for generative AI surged, nearly octupling from 2022 to reach $25.2 billion. Major players in the generative
AI space, including OpenAI, Anthropic, Hugging Face, and Inflection, reported substantial fundraising rounds.
2. Already a leader, the United States pulls even further ahead in AI private investment.
In 2023, the United States saw AI investments reach $67.2 billion, nearly 8.7 times more than China, the next
highest investor. While private AI investment in China and the European Union, including the United Kingdom,
declined by 44.2% and 14.1%, respectively, since 2022, the United States experienced a notable increase of 22.1%
in the same time frame.
3. Fewer AI jobs in the United States and across the globe. In 2022, AI-related positions made
up 2.0% of all job postings in America, a figure that decreased to 1.6% in 2023. This decline in AI job listings is
attributed to fewer postings from leading AI firms and a reduced proportion of tech roles within these companies.
4. AI decreases costs and increases revenues. A new McKinsey survey reveals that 42% of surveyed
organizations report cost reductions from implementing AI (including generative AI), and 59% report revenue
increases. Compared to the previous year, there was a 10 percentage point increase in respondents reporting
decreased costs, suggesting AI is driving significant business efficiency gains.
5. Total AI private investment declines again, while the number of newly funded AI
companies increases. Global private AI investment has fallen for the second year in a row, though less than
the sharp decrease from 2021 to 2022. The count of newly funded AI companies spiked to 1,812, up 40.6% from
the previous year.
6. AI organizational adoption ticks up. A 2023 McKinsey report reveals that 55% of organizations now
use AI (including generative AI) in at least one business unit or function, up from 50% in 2022 and 20% in 2017.
7. China dominates industrial robotics. Since surpassing Japan in 2013 as the leading installer of
industrial robots, China has significantly widened the gap with the nearest competitor nation. In 2013, China’s
installations accounted for 20.8% of the global total, a share that rose to 52.4% by 2022.
19
Artificial Intelligence
Index Report 2024
9. The data is in: AI makes workers more productive and leads to higher quality work.
In 2023, several studies assessed AI’s impact on labor, suggesting that AI enables workers to complete tasks more
quickly and to improve the quality of their output. These studies also demonstrated AI’s potential to bridge the skill
gap between low- and high-skilled workers. Still, other studies caution that using AI without proper oversight can
lead to diminished performance.
10. Fortune 500 companies start talking a lot about AI, especially generative AI. In 2023,
AI was mentioned in 394 earnings calls (nearly 80% of all Fortune 500 companies), a notable increase from
266 mentions in 2022. Since 2018, mentions of AI in Fortune 500 earnings calls have nearly doubled. The most
frequently cited theme, appearing in 19.7% of all earnings calls, was generative AI.
20
Artificial Intelligence
Index Report 2024
Report Highlights
Chapter 5: Science and Medicine
1. Scientific progress accelerates even further, thanks to AI. In 2022, AI began to advance
scientific discovery. 2023, however, saw the launch of even more significant science-related AI applications—
from AlphaDev, which makes algorithmic sorting more efficient, to GNoME, which facilitates the process of
materials discovery.
2. AI helps medicine take significant strides forward. In 2023, several significant medical systems
were launched, including EVEscape, which enhances pandemic prediction, and AlphaMissence, which assists in
AI-driven mutation classification. AI is increasingly being utilized to propel medical advancements.
3. Highly knowledgeable medical AI has arrived. Over the past few years, AI systems have shown
remarkable improvement on the MedQA benchmark, a key test for assessing AI’s clinical knowledge. The
standout model of 2023, GPT-4 Medprompt, reached an accuracy rate of 90.2%, marking a 22.6 percentage
point increase from the highest score in 2022. Since the benchmark’s introduction in 2019, AI performance on
MedQA has nearly tripled.
4. The FDA approves more and more AI-related medical devices. In 2022, the FDA approved 139
AI-related medical devices, a 12.1% increase from 2021. Since 2012, the number of FDA-approved AI-related medical
devices has increased by more than 45-fold. AI is increasingly being used for real-world medical purposes.
21
Artificial Intelligence
Index Report 2024
Report Highlights
Chapter 6: Education
1. The number of American and Canadian CS bachelor’s graduates continues to rise, new
CS master’s graduates stay relatively flat, and PhD graduates modestly grow. While the
number of new American and Canadian bachelor’s graduates has consistently risen for more than a decade, the
number of students opting for graduate education in CS has flattened. Since 2018, the number of CS master’s and
PhD graduates has slightly declined.
3. Less transition of academic talent from industry to academia. In 2019, 13% of new AI faculty
in the United States and Canada were from industry. By 2021, this figure had declined to 11%, and in 2022, it
further dropped to 7%. This trend indicates a progressively lower migration of high-level AI talent from industry
into academia.
4. CS education in the United States and Canada becomes less international. Proportionally
fewer international CS bachelor’s, master’s, and PhDs graduated in 2022 than in 2021. The drop in international
students in the master’s category was especially pronounced.
5. More American high school students take CS courses, but access problems remain.
In 2022, 201,000 AP CS exams were administered. Since 2007, the number of students taking these exams has
increased more than tenfold. However, recent evidence indicates that students in larger high schools and those in
suburban areas are more likely to have access to CS courses.
6. AI-related degree programs are on the rise internationally. The number of English-language,
AI-related postsecondary degree programs has tripled since 2017, showing a steady annual increase over the past
five years. Universities worldwide are offering more AI-focused degree programs.
22
Artificial Intelligence
Index Report 2024
23
Artificial Intelligence
Index Report 2024
Report Highlights
Chapter 7: Policy and Governance
1. The number of AI regulations in the United States sharply increases. The number of AI-related
regulations has risen significantly in the past year and over the last five years. In 2023, there were 25 AI-related
regulations, up from just one in 2016. Last year alone, the total number of AI-related regulations grew by 56.3%.
2. The United States and the European Union advance landmark AI policy action. In 2023,
policymakers on both sides of the Atlantic put forth substantial proposals for advancing AI regulation The
European Union reached a deal on the terms of the AI Act, a landmark piece of legislation enacted in 2024.
Meanwhile, President Biden signed an Executive Order on AI, the most notable AI policy initiative in the United
States that year.
3. AI captures U.S. policymaker attention. The year 2023 witnessed a remarkable increase in AI-related
legislation at the federal level, with 181 bills proposed, more than double the 88 proposed in 2022.
4. Policymakers across the globe cannot stop talking about AI. Mentions of AI in legislative
proceedings across the globe have nearly doubled, rising from 1,247 in 2022 to 2,175 in 2023. AI was mentioned in
the legislative proceedings of 49 countries in 2023. Moreover, at least one country from every continent discussed
AI in 2023, underscoring the truly global reach of AI policy discourse.
5. More regulatory agencies turn their attention toward AI. The number of U.S. regulatory agencies
issuing AI regulations increased to 21 in 2023 from 17 in 2022, indicating a growing concern over AI regulation
among a broader array of American regulatory bodies. Some of the new regulatory agencies that enacted AI-
related regulations for the first time in 2023 include the Department of Transportation, the Department of Energy,
and the Occupational Safety and Health Administration.
24
Artificial Intelligence
Index Report 2024
Report Highlights
Chapter 8: Diversity
1. U.S. and Canadian bachelor’s, master’s, and PhD CS students continue to grow more
ethnically diverse. While white students continue to be the most represented ethnicity among new resident
graduates at all three levels, the representation from other ethnic groups, such as Asian, Hispanic, and Black or
African American students, continues to grow. For instance, since 2011, the proportion of Asian CS bachelor’s
degree graduates has increased by 19.8 percentage points, and the proportion of Hispanic CS bachelor’s degree
graduates has grown by 5.2 percentage points.
2. Substantial gender gaps persist in European informatics, CS, CE, and IT graduates at
all educational levels. Every surveyed European country reported more male than female graduates in
bachelor’s, master’s, and PhD programs for informatics, CS, CE, and IT. While the gender gaps have narrowed in
most countries over the last decade, the rate of this narrowing has been slow.
3. U.S. K–12 CS education is growing more diverse, reflecting changes in both gender and
ethnic representation. The proportion of AP CS exams taken by female students rose from 16.8% in 2007 to
30.5% in 2022. Similarly, the participation of Asian, Hispanic/Latino/Latina, and Black/African American students
in AP CS has consistently increased year over year.
25
Artificial Intelligence
Index Report 2024
Report Highlights
Chapter 9: Public Opinion
1. People across the globe are more cognizant of AI’s potential impact—and more nervous.
A survey from Ipsos shows that, over the last year, the proportion of those who think AI will dramatically affect
their lives in the next three to five years has increased from 60% to 66%. Moreover, 52% express nervousness
toward AI products and services, marking a 13 percentage point rise from 2022. In America, Pew data suggests
that 52% of Americans report feeling more concerned than excited about AI, rising from 38% in 2022.
3. The public is pessimistic about AI’s economic impact. In an Ipsos survey, only 37% of
respondents feel AI will improve their job. Only 34% anticipate AI will boost the economy, and 32% believe it will
enhance the job market.
5. ChatGPT is widely known and widely used. An international survey from the University of Toronto
suggests that 63% of respondents are aware of ChatGPT. Of those aware, around half report using ChatGPT at
least once weekly.
26
Artificial Intelligence
CHAPTER 1:
Index Report 2024
Research and
Development
CHAPTER 1:
Artificial Intelligence
Index Report 2024 Research and
Development
Preview
Overview 29 1.4 AI Conferences 66
Chapter Highlights 30 Conference Attendance 66
1.2 Patents 38
AI Patents 38
Overview 38
By Filing Status and Region 39
Table of Contents 28
CHAPTER 1:
Artificial Intelligence
Index Report 2024 Research and
Development
Overview
This chapter studies trends in AI research and development. It begins by examining
trends in AI publications and patents, and then examines trends in notable AI systems and
foundation models. It concludes by analyzing AI conference attendance and open-source
AI software projects.
Table of Contents 29
CHAPTER 1:
Artificial Intelligence
Index Report 2024 Research and
Development
Chapter Highlights
1. Industry continues to dominate frontier AI research. In 2023, industry produced 51 notable
machine learning models, while academia contributed only 15. There were also 21 notable models resulting from
industry-academia collaborations in 2023, a new high.
2. More foundation models and more open foundation models. In 2023, a total of 149 foundation
models were released, more than double the amount released in 2022. Of these newly released models, 65.7%
were open-source, compared to only 44.4% in 2022 and 33.3% in 2021.
3. Frontier models get way more expensive. According to AI Index estimates, the training costs of
state-of-the-art AI models have reached unprecedented levels. For example, OpenAI’s GPT-4 used an estimated
$78 million worth of compute to train, while Google’s Gemini Ultra cost $191 million for compute.
4. The United States leads China, the EU, and the U.K. as the leading source of top AI
models. In 2023, 61 notable AI models originated from U.S.-based institutions, far outpacing the European
Union’s 21 and China’s 15.
5. The number of AI patents skyrockets. From 2021 to 2022, AI patent grants worldwide increased
sharply by 62.7%. Since 2010, the number of granted AI patents has increased more than 31 times.
6. China dominates AI patents. In 2022, China led global AI patent origins with 61.1%, significantly
outpacing the United States, which accounted for 20.9% of AI patent origins. Since 2010, the U.S. share of AI
patents has decreased from 54.1%.
7. Open-source AI research explodes. Since 2011, the number of AI-related projects on GitHub has
seen a consistent increase, growing from 845 in 2011 to approximately 1.8 million in 2023. Notably, there was a
sharp 59.3% rise in the total number of GitHub AI projects in 2023 alone. The total number of stars for AI-related
projects on GitHub also significantly increased in 2023, more than tripling from 4.0 million in 2022 to 12.2 million.
8. The number of AI publications continues to rise. Between 2010 and 2022, the total number of AI
publications nearly tripled, rising from approximately 88,000 in 2010 to more than 240,000 in 2022. The increase
over the last year was a modest 1.1%.
Table of Contents 30
Artificial Intelligence Chapter 1: Research and Development
Index Report 2024 1.1 Publications
1.1 Publications
Overview Total Number of AI Publications1
The figures below present the global count of Figure 1.1.1 displays the global count of AI publications.
English- and Chinese-language AI publications from Between 2010 and 2022, the total number of AI
2010 to 2022, categorized by type of affiliation and publications nearly tripled, rising from approximately
cross-sector collaborations. Additionally, this section 88,000 in 2010 to more than 240,000 in 2022. The
details publication data for AI journal articles and increase over the last year was a modest 1.1%.
conference papers.
200
Number of AI publications (in thousands)
150
100
50
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.1.1
1 The data on publications presented this year is sourced from CSET. Both the methodology and data sources used by CSET to classify AI publications have changed since their data was last
featured in the AI Index (2023). As a result, the numbers reported in this year’s section differ slightly from those reported in last year’s edition. Moreover, the AI-related publication data is fully
available only up to 2022 due to a significant lag in updating publication data. Readers are advised to approach publication figures with appropriate caution.
By Type of Publication
Figure 1.1.2 illustrates the distribution of AI publication journal and conference publications have increased
types globally over time. In 2022, there were roughly at comparable rates. In 2022, there were 2.6 times as
230,000 AI journal articles compared to roughly many conference publications and 2.4 times as many
42,000 conference submissions. Since 2015, AI journal publications as there were in 2015.
232.67, Journal
200
Number of AI publications (in thousands)
150
100
41.17, Conference
12.88, Book chapter
50 5.07, Preprint
1.49, Article
0.79, Unknown
0.70, Dissertation
0.57, Book
0.12, Other
0 0.05, Clinical trial
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.1.22
2 It is possible for an AI publication to be mapped to more than one publication type, so the totals in Figure 1.1.2 do not completely align with those in Figure 1.1.1.
By Field of Study
Figure 1.1.3 examines the total number of AI sevenfold since 2015. Following machine learning, the
publications by field of study since 2010. Machine most published AI fields in 2022 were computer vision
learning publications have seen the most rapid (21,309 publications), pattern recognition (19,841), and
growth over the past decade, increasing nearly process management (12,052).
60
Number of AI publications (in thousands)
50
40
30
21.31, Computer vision
19.84, Pattern recognition
20 12.05, Process management
10.39, Computer network
9.17, Control theory
8.31, Algorithm
10 7.18, Linguistics
6.83, Mathematical optimization
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.1.3
By Sector
This section presents the distribution of AI publications (81.1%), maintaining its position as the
publications by sector—education, government, leading global source of AI research over the past
industry, nonprofit, and other—globally and then decade across all regions (Figure 1.1.4 and Figure 1.1.5).
specifically within the United States, China, and the Industry participation is most significant in the United
European Union plus the United Kingdom. In 2022, States, followed by the European Union plus the United
the academic sector contributed the majority of AI Kingdom, and China (Figure 1.1.5).
70%
60%
AI publications (% of total)
50%
40%
30%
20%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.1.4
75.48%
Education 75.63%
81.75%
14.06%
Industry 9.47%
7.39%
5.60%
Government 9.28%
10.05%
4.87%
United States
Nonpro t 5.62% European Union and United Kingdom
0.80% China
AI Journal Publications
Figure 1.1.6 illustrates the total number of AI journal publications from 2010 to 2022. The number of AI journal
publications experienced modest growth from 2010 to 2015 but grew approximately 2.4 times since 2015.
Between 2021 and 2022, AI journal publications saw a 4.5% increase.
200
Number of AI publications (in thousands)
150
100
50
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.1.6
35
30
25
20
15
10
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.1.7
This section examines trends over time in global AI patents, which can reveal important insights into the evolution of
innovation, research, and development within AI. Additionally, analyzing AI patents can reveal how these advancements
are distributed globally. Similar to the publications data, there is a noticeable delay in AI patent data availability, with
2022 being the most recent year for which data is accessible. The data in this section comes from CSET.
1.2 Patents
AI Patents
Overview
Figure 1.2.1 examines the global growth in granted years. For instance, between 2010 and 2014, the total
AI patents from 2010 to 2022. Over the last decade, growth in granted AI patents was 56.1%. However,
there has been a significant rise in the number of AI from 2021 to 2022 alone, the number of AI patents
patents, with a particularly sharp increase in recent increased by 62.7%.
50
Number of AI patents (in thousands)
40
30
20
10
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.2.1
120
100
Number of AI patents (in thousands)
80
60 62.26, granted
40
20
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.2.2
The gap between granted and not granted AI (Figure 1.2.3). In recent years, all three geographic
patents is evident across all major patent-originating areas have experienced an increase in both the total
geographic areas, including China, the European number of AI patent filings and the number of
Union and United Kingdom, and the United States patents granted.
70 70 70
Number of AI patent lings (in thousands)
60 60 60
50 50 50
40 40 40
35.31
30 30 30
20 20 20
15.11
12.08
10 10 10
2.17
0 0 1.17 0
2010 2012 2014 2016 2018 2020 2022 2010 2012 2014 2016 2018 2020 2022 2010 2012 2014 2016 2018 2020 2022
Figure 1.2.3
Figure 1.2.4 showcases the regional breakdown North America led in the number of global AI patents.
of granted AI patents. As of 2022, the bulk of the However, since then, there has been a significant
world’s granted AI patents (75.2%) originated from shift toward an increasing proportion of AI patents
East Asia and the Pacific, with North America being originating from East Asia and the Pacific.
the next largest contributor at 21.2%. Up until 2011,
80%
75.20%, East Asia and Paci c
70%
Granted AI patents (% of world total)
60%
50%
40%
30%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.2.4
Disaggregated by geographic area, the majority of the world’s granted AI patents are from China (61.1%) and the
United States (20.9%) (Figure 1.2.5). The share of AI patents originating from the United States has declined from
54.1% in 2010.
61.13%, China
60%
50%
Granted AI patents (% of world total)
40%
30%
10%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.2.5
Figure 1.2.6 and Figure 1.2.7 document which (Figure 1.2.6). Figure 1.2.7 highlights the change in
countries lead in AI patents per capita. In 2022, the granted AI patents per capita from 2012 to 2022.
country with the most granted AI patents per 100,000 Singapore, South Korea, and China experienced the
inhabitants was South Korea (10.3), followed by greatest increase in AI patenting per capita during
Luxembourg (8.8) and the United States (4.2) that time period.
Luxembourg 8.73
Japan 2.53
China 2.51
Singapore 2.06
Australia 1.91
Canada 1.25
Germany 0.66
Denmark 0.56
Finland 0.56
France 0.33
Lithuania 0.28
0 1 2 3 4 5 6 7 8 9 10
Granted AI patents (per 100,000 inhabitants) Figure 1.2.6
Percentage change of granted AI patents per 100,000 inhabitants by country, 2012 vs. 2022
Source: Center for Security and Emerging Technology, 2023 | Chart: 2024 AI Index report
Singapore 5,366%
China 3,569%
Denmark 1,463%
Japan 1,137%
France 1,086%
Germany 961%
Australia 908%
Finland 907%
Canada 803%
0% 500% 1,000% 1,500% 2,000% 2,500% 3,000% 3,500% 4,000% 4,500% 5,000% 5,500%
% change of granted AI patents (per 100,000 inhabitants)
Figure 1.2.7
This section explores the frontier of AI research. While many new AI models are introduced annually, only a small
sample represents the most advanced research. Admittedly what constitutes advanced or frontier research is
somewhat subjective. Frontier research could reflect a model posting a new state-of-the-art result on a benchmark,
introducing a meaningful new architecture, or exercising some impressive new capabilities.
The AI Index studies trends in two types of frontier AI models: “notable models” and foundation models.3 Epoch, an
AI Index data provider, uses the term “notable machine learning models” to designate noteworthy models handpicked
as being particularly influential within the AI/machine learning ecosystem. In contrast, foundation models are
exceptionally large AI models trained on massive datasets, capable of performing a multitude of downstream tasks.
Examples of foundation models include GPT-4, Claude 3, and Gemini. While many foundation models may qualify as
notable models, not all notable models are foundation models.
Within this section, the AI Index explores trends in notable models and foundation models from various perspectives,
including originating organization, country of origin, parameter count, and compute usage. The analysis concludes
with an examination of machine learning training costs.
3 “AI system” refers to a computer program or product based on AI, such as ChatGPT. “AI model” refers to a collection of parameters whose values are learned during training, such as GPT-4.
4 New and historic models are continually added to the Epoch database, so the total year-by-year counts of models included in this year’s AI Index might not exactly match those published in
last year’s report.
Sector Analysis
Until 2014, academia led in the release of machine Creating cutting-edge AI models now demands a
learning models. Since then, industry has taken substantial amount of data, computing power, and
the lead. In 2023, there were 51 notable machine financial resources that are not available in academia.
learning models produced by industry compared to This shift toward increased industrial dominance in
just 15 from academia (Figure 1.3.1). Significantly, 21 leading AI models was first highlighted in last year’s
notable models resulted from industry/academic AI Index report. Although this year the gap has slightly
collaborations in 2023, a new high. narrowed, the trend largely persists.
51, Industry
50
Number of notable machine learning models
40
30
15, Academia
10
2, Government
0, Industry–research collective collaboration
0, Research collective
0 0, Academia-government collaboration
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
Figure 1.3.1
AI, the AI Index research team analyzed the country United States 61
of origin of notable models. China 15
15, and France with 8. For the first time since 2019, Singapore 3
the European Union and the United Kingdom
United Arab Emirates 3
together have surpassed China in the number of
Egypt 2
notable AI models produced (Figure 1.3.3). Since
0 5 10 15 20 25 30 35 40 45 50 55 60
2003, the United States has produced more models Number of notable machine learning models
50
40
30
25, European Union and
United Kingdom
20
15, China
10
0
2003
2005
2007
2009
2011
2013
2015
2017
2019
2021
2023
Figure 1.3.3
5 A machine learning model is considered associated with a specific country if at least one author of the paper introducing it has an affiliation with an institution based in that country. In cases
where a model’s authors come from several countries, double counting can occur.
1–10
11–20
21–60
61–100
101–430
Figure 1.3.4
Parameter Trends
Parameters in machine learning models are numerical originate. Parameter counts have risen sharply since
values learned during training that determine how a the early 2010s, reflecting the growing complexity
model interprets input data and makes predictions. of tasks AI models are designed for, the greater
Models trained on more data will usually have more availability of data, improvements in hardware, and
parameters than those trained on less data. Likewise, proven efficacy of larger models. High-parameter
models with more parameters typically outperform models are particularly notable in the industry sector,
those with fewer parameters. underscoring the capacity of companies like OpenAI,
Anthropic, and Google to bear the computational
Figure 1.3.5 demonstrates the parameter count of
costs of training on vast volumes of data.
machine learning models in the Epoch dataset,
categorized by the sector from which the models
10B
Number of parameters (log scale)
100M
1M
10K
100
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Publication date
Figure 1.3.5
100M
1M
10K
100
0.01
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Publication date
Figure 1.3.6
6 FLOP stands for “floating-point operation.” A floating-point operation is a single arithmetic operation involving floating-point numbers, such as addition, subtraction, multiplication, or
division. The number of FLOPs a processor or computer can perform per second is an indicator of its computational power. The higher the FLOP rate, the more powerful the computer is.
An AI model with a higher FLOP rate reflects its requirement for more computational resources during training.
Figure 1.3.7 highlights the training compute of notable The original Transformer, released in 2017, required
machine learning models since 2012. For example, around 7,400 petaFLOPs. Google’s Gemini Ultra, one
AlexNet, one of the papers that popularized the now of the current state-of-the-art foundation models,
standard practice of using GPUs to improve AI models, required 50 billion petaFLOPs.
required an estimated 470 petaFLOPs for training.
GPT-4
10B Claude 2
PaLM (540B)
Llama 2-70B
Training compute (petaFLOP - log scale)
1M
BERT-Large
100K
Transformer
10K
1000 AlexNet
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Publication date
Figure 1.3.7
Highlight:
behind powerful LLMs, has been achieved by Stock type Historical projection Compute projection
training models on increasingly larger amounts of Low-quality 2032.4 [2028.4; 2039.2] 2040.5 [2034.6; 2048.9]
language stock
data. As noted recently by Anthropic cofounder
High-quality 2024.5 [2023.5; 2025.7] 2024.1 [2023.2; 2025.3]
and AI Index Steering Committee member Jack language stock
Clark, foundation models have been trained on Image stock 2046 [2037; 2062.8] 2038.8 [2032; 2049.8]
The growing data dependency of AI models data, which is data generated by AI models
has led to concerns that future generations of themselves. For example, it is possible to use
computer scientists will run out of data to further text produced by one LLM to train another LLM.
scale and improve their systems. Research from The use of synthetic data for training AI systems
Epoch suggests that these concerns are somewhat is particularly attractive, not only as a solution
warranted. Epoch researchers have generated for potential data depletion but also because
historical and compute-based projections for generative AI systems could, in principle, generate
when AI researchers might expect to run out data in instances where naturally occurring data
of data. The historical projections are based on is sparse—for example, data for rare diseases or
observed growth rates in the sizes of data used to underrepresented populations. Until recently, the
train foundation models. The compute projections feasibility and effectiveness of using synthetic
adjust the historical growth rate based on data for training generative AI systems were not
projections of compute availability. well understood. However, research this year has
For instance, the researchers estimate that suggested that there are limitations associated with
computer scientists could deplete the stock of training models on synthetic data.
high-quality language data by 2024, exhaust low- For instance, a team of British and Canadian
quality language data within two decades, and researchers discovered that models predominantly
use up image data by the late 2030s to mid-2040s trained on synthetic data experience model
(Figure 1.3.8). collapse, a phenomenon where, over time, they
Theoretically, the challenge of limited data lose the ability to remember true underlying data
availability can be addressed by using synthetic distributions and start producing a narrow range of
Highlight:
Figure 1.3.9
7 In the context of generative models, density refers to the level of complexity and variation in the outputs produced by an AI model. Models that have a higher generation density
produce a wider range of higher-quality outputs. Models with low generation density produce a narrower range of more simplistic outputs.
Highlight:
1.60
1.40
1.20
1.00
Density
Generation 0
0.80
Generation 1
Generation 2
0.60 Generation 3
Generation 4
Generation 5
0.40
Generation 6
Generation 7
0.20 Generation 8
Generation 9
0.00
−3 −2 −1 0 1 2 3
Figure 1.3.10
In a similar study published in 2023 on the use generated images declines. Figure 1.3.11 highlights
of synthetic data in generative imaging models, the degraded image generations of models that are
researchers found that generative image models augmented with synthetic data; for example, the
trained solely on synthetic data cycles—or with faces generated in steps 7 and 9 increasingly display
insufficient real human data—experience a strange-looking hash marks. From a statistical
significant drop in output quality. The authors perspective, images generated with both synthetic
label this phenomenon Model Autophagy Disorder data and synthetic augmentation loops have higher
(MAD), in reference to mad cow disease. FID scores (indicating less similarity to real images),
lower precision scores (signifying reduced realism
The study examines two types of training processes:
or quality), and lower recall scores (suggesting
fully synthetic, where models are trained exclusively
decreased diversity) (Figure 1.3.12). While synthetic
on synthetic data, and synthetic augmentation,
augmentation loops, which incorporate some real
where models are trained on a mix of synthetic
data, show less degradation than fully synthetic
and real data. In both scenarios, as the number of
loops, both methods exhibit diminishing returns with
training generations increases, the quality of the
further training.
Highlight:
Figure 1.3.11
Assessing FFHQ syntheses: FID, precision, and recall in synthetic and mixed-data training loops
Source: Alemohammad et al., 2023 | Chart: 2024 AI Index report
0.70 0.40
20
0.60 0.35
0.30
0.50
15
0.25
0.40
Precision
Recall
FID
0.20
10
0.30
0.15
0.20
0.10
5
0.10 0.05
0 0.00 0.00
0 2 4 6 0 2 4 6 0 2 4 6
Generations Generations Generations
Figure 1.3.12
Model Release
Foundation Models Foundation models can be accessed in different
Foundation models represent a rapidly evolving ways. No access models, like Google’s PaLM-E, are
and popular category of AI models. Trained on vast only accessible to their developers. Limited access
datasets, they are versatile and suitable for numerous models, like OpenAI’s GPT-4, offer limited access to
downstream applications. Foundation models such as the models, often through a public API. Open models,
GPT-4, Claude 3, and Llama 2 showcase remarkable like Meta’s Llama 2, fully release model weights, which
abilities and are increasingly being deployed in real- means the models can be modified and freely used.
world scenarios.
Figure 1.3.13 visualizes the total number of foundation
Introduced in 2023, the Ecosystem Graphs is a new models by access type since 2019. In recent years, the
community resource from Stanford that tracks the number of foundation models has risen sharply, more
foundation model ecosystem, including datasets, than doubling since 2022 and growing by a factor of
models, and applications. This section uses data from nearly 38 since 2019. Of the 149 foundation models
the Ecosystem Graphs to study trends in foundation released in 2023, 98 were open, 23 limited and 28
models over time. 8
no access.
160
Open Limited No access
149
140
120
100 98
Foundation models
80
72
60
32
40 23
12
27
20 9
28 28
4 2 10
0
2019 2020 2021 2022 2023
Figure 1.3.13
8 The Ecosystem Graphs make efforts to survey the global AI ecosystem, but it is possible that they underreport models from certain nations like South Korea and China.
In 2023, the majority of foundation models were released as open access (65.8%), with 18.8% having no access
and 15.4% limited access (Figure 1.3.14). Since 2021, there has been a significant increase in the proportion of
models released with open access.
70%
65.77%, Open
60%
Foundation models (% of total)
50%
40%
30%
20%
18.79%, No access
15.44%, Limited
10%
0%
Figure 1.3.14
108, Industry
100
80
Number of foundation models
60
40
28, Academia
20
9, Industry-academia collaboration
4, Government
0 0, Industry-government collaboration
Figure 1.3.15
Figure 1.3.16 highlights the source of various foundation models that were released in 2023. Google introduced
the most models (18), followed by Meta (11), and Microsoft (9). The academic institution that released the most
foundation models in 2023 was UC Berkeley (3).
Google 18
Meta 11
Microsoft 9
OpenAI 7
Together 5
Hugging Face 4
Anthropic 4
AI2 4
Stability AI 3
Cerebras 3
Shanghai AI Laboratory 3
Adobe 3
Industry
UC Berkeley 3
Academia
DeepMind 2 Nonpro t
Stanford University 2
0 2 4 6 8 10 12 14 16 18
Number of foundation models
Figure 1.3.16
Since 2019, Google has led in releasing the most foundation models, with a total of 40, followed by OpenAI with
20 (Figure 1.3.17). Tsinghua University stands out as the top non-Western institution, with seven foundation model
releases, while Stanford University is the leading American academic institution, with five releases.
Google 40
OpenAI 20
Meta 19
Microsoft 18
DeepMind 15
Tsinghua University 7
EleutherAI 6
Together 6
Cohere 5
Stanford University 5
Hugging Face 5
Anthropic 5
Industry
AI2 4
Academia
BigScience 4 Nonpro t
Shanghai AI Laboratory 4
0 4 8 12 16 20 24 28 32 36 40
Number of foundation models
Figure 1.3.17
China (20), and the United Kingdom (Figure 1.3.18). 0 10 20 30 40 50 60 70 80 90 100 110
Number of foundation models
Since 2019, the United States has consistently led
in originating the majority of foundation models Figure 1.3.18
(Figure 1.3.19).
80
60
40
20, China
20 15, European Union and
United Kingdom
Figure 1.3.19
Figure 1.3.20 depicts the cumulative count of foundation models released and attributed to respective countries
since 2019. The country with the greatest number of foundation models released since 2019 is the United States
(182), followed by China (30), and the United Kingdom (21).
1–10
11–30
31–182
Figure 1.3.20
9 Ben Cottier and Robi Rahman led research at Epoch AI into model training cost.
10 A detailed description of the estimation methodology is provided in the Appendix.
11 The cost figures reported in this section are inflation-adjusted.
200M 191,400,000
Training cost (in U.S. dollars)
150M
100M
78,352,034
50M
12,389,056
4,324,883 6,405,653 1,319,586 3,931,897
930 3,288 160,018
0
Transformer
BERT-Large
RoBERTa Large
LaMDA
PaLM (540B)
GPT-4
Gemini Ultra
Llama 2 70B
2017 2018 2019 2020 2021 2022 2023
Figure 1.3.21
Figure 1.3.22 visualizes the training cost of all AI models for which the AI Index has estimates. As the figure shows,
model training costs have sharply increased over time.
Gemini Ultra
GPT-4
100M
Falcon 180B
PaLM (540B)
Training cost (in U.S. dollars - log scale)
10M
GPT-3 175B (davinci) Llama 2 70B
BLOOM-176B
Megatron-BERT HyperCLOVA LLaMA-65B
1M Switch Flamingo
AlphaStar Meta Pseudo Labels PaLI StarCoder
GNMT
RoBERTa Large
100K T0-XXL
JFT Imagen
Xception
BigGAN-deep 512×512
10K Big Transformer for Back-Translation
BERT-Large
Transformer
1000 SciBERT
IMPALA
100
As established in previous AI Index reports, there is a direct correlation between the training costs of AI models
and their computational requirements. As illustrated in Figure 1.3.23, models with greater computational training
needs cost substantially more to train.
Gemini Ultra
GPT-4
100M
PaLM (540B)
Training cost (in U.S. dollars - log scale)
10M
GPT-3 175B (davinci) Megatron-Turing NLG 530B
Llama 2 70B
LaMDA
1M
RoBERTa Large
100K
10K
BERT-Large
Transformer
1000
AI conferences serve as essential platforms for researchers to present their findings and network with peers and
collaborators. Over the past two decades, these conferences have expanded in scale, quantity, and prestige.
This section explores trends in attendance at major AI conferences.
1.4 AI Conferences
Conference Attendance
Figure 1.4.1 graphs attendance at a selection of Specifically, there was a 6.7% rise in total attendance
AI conferences since 2010. Following a decline in over the last year. Since 2015, the annual number of
attendance, likely due to the shift back to exclusively attendees has risen by around 50,000, reflecting not
in-person formats, the AI Index reports an increase just a growing interest in AI research but also the
in conference attendance from 2022 to 2023.12 emergence of new AI conferences.
80
70
Number of attendees (in thousands)
63.29
60
50
40
30
20
10
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 1.4.1
12 This data should be interpreted with caution given that many conferences in the last few years have had virtual or hybrid formats. Conference organizers report that measuring the exact
attendance numbers at virtual conferences is difficult, as virtual conferences allow for higher attendance of researchers from around the world. The conferences for which the AI Index tracked
data include NeurIPS, CVPR, ICML, ICCV, ICRA, AAAI, ICLR, IROS, IJCAI, AAMAS, FAccT, UAI, ICAPS, and KR.
Neural Information Processing Systems (NeurIPS) AI conferences, NeurIPS, ICML, ICCV, and AAAI
remains one of the most attended AI conferences, experienced year-over-year increases in attendance.
attracting approximately 16,380 participants in 2023 However, in the past year, CVPR, ICRA, ICLR, and IROS
(Figure 1.4.2 and Figure 1.4.3). Among the major observed slight declines in their attendance figures.
30
25
Number of attendees (in thousands)
20
16.38, NeurIPS
15
10 8.34, CVPR
7.92, ICML
7.33, ICCV
6.60, ICRA
4.47, AAAI
5 3.76, ICLR
3.65, IROS
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 1.4.2
3.50
3.00
Number of attendees (in thousands)
2.50
1.50
Figure 1.4.3
GitHub is a web-based platform that enables individuals and teams to host, review, and collaborate on code
repositories. Widely used by software developers, GitHub facilitates code management, project collaboration,
and open-source software support. This section draws on data from GitHub providing insights into broader trends in
open-source AI software development not reflected in academic publication data.
1.81
1.50
Number of AI projects (in millions)
1.00
0.50
0.00
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 1.5.1
13 GitHub’s methodology for identifying AI-related projects has evolved over the past year. For classifying AI projects, GitHub has started incorporating generative AI keywords from a
recently published research paper, a shift from the previously detailed methodology in an earlier paper. This edition of the AI Index is the first to adopt this updated approach. Moreover, the
previous edition of the AI Index utilized country-level mapping of GitHub AI projects conducted by the OECD, which depended on self-reported data—a method experiencing a decline in
coverage over time. This year, the AI Index has adopted geographic mapping from GitHub, leveraging server-side data for broader coverage. Consequently, the data presented here may not
align perfectly with data in earlier versions of the report.
Figure 1.5.2 reports GitHub AI projects by geographic followed closely by the European Union and the
area since 2011. As of 2023, a significant share United Kingdom at 17.9%. Notably, the proportion of AI
of GitHub AI projects were located in the United projects from developers located in the United States
States, accounting for 22.9% of contributions. India on GitHub has been on a steady decline since 2016.
was the second-largest contributor with 19.0%,
60%
50%
AI projects (% of total)
40%
37.09%, Rest of the world
30%
10%
3.04%, China
0%
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 1.5.2
Stars
a platform that offers a variety of tools for computer
GitHub users can show their interest in a repository
vision, such as object detection and feature extraction.
by “starring” it, a feature similar to liking a post on
social media, which signifies support for an open- The total number of stars for AI-related projects on
source project. Among the most starred repositories GitHub saw a significant increase in the last year, more
are libraries such as TensorFlow, OpenCV, Keras, and than tripling from 4.0 million in 2022 to 12.2 million in
PyTorch, which enjoy widespread popularity among 2023 (Figure 1.5.3). This sharp increase in GitHub stars,
software developers in the AI coding community. For along with the previously reported rise in projects,
example, TensorFlow is a popular library for building underscores the accelerating growth of open-source
and deploying machine learning models. OpenCV is AI software development.
12 12.21
10
Number of GitHub stars (in millions)
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 1.5.3
In 2023, the United States led in receiving the China, and India, saw a year-over-year increase in
highest number of GitHub stars, totaling 10.5 million the total number of GitHub stars awarded to projects
(Figure 1.5.4). All major geographic regions sampled, located in their countries.
including the European Union and United Kingdom,
2.12, China
2 1.92, India
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 1.5.4
Preview
Overview 76 Editing 100
Chapter Highlights 77 EditVal 100
Highlighted Research: ControlNet 101
2.1 Overview of AI in 2023 78 Highlighted Research: Instruct-NeRF2NeRF 103
Timeline: Significant Model Releases 78 Segmentation 105
State of AI Performance 81 Highlighted Research: Segment Anything 105
AI Index Benchmarks 82 3D Reconstruction From Images 107
Highlighted Research: Skoltech3D 107
2.2 Language 85 Highlighted Research: RealFusion 108
Understanding 86
HELM: Holistic Evaluation of 2.5 Video Computer Vision and
Language Models 86 Video Generation 109
MMLU: Massive Multitask Generation 109
Language Understanding 87 UCF101 109
Generation 88 Highlighted Research: Align Your Latents 110
Chatbot Arena Leaderboard 88 Highlighted Research: Emu Video 111
Factuality and Truthfulness 90
TruthfulQA 90 2.6 Reasoning 112
HaluEval 92 General Reasoning 112
MMMU: A Massive Multi-discipline
2.3 Coding 94 Multimodal Understanding and Reasoning
Generation 94 Benchmark for Expert AGI 112
Table of Contents 74
CHAPTER 2:
Artificial Intelligence
Index Report 2024 Technical
Performance
Preview (cont’d)
Moral Reasoning 122 2.11 Properties of LLMs 141
MoCa 122 Highlighted Research:
Causal Reasoning 124 Challenging the Notion of Emergent Behavior 141
Generation 127
Highlighted Research: UniAudio 128 2.12 Techniques for LLM Improvement 148
Highlighted Research: Prompting 148
MusicGEN and MusicLM 129 Highlighted Research:
Graph of Thoughts Prompting 148
Table of Contents 75
CHAPTER 2:
Artificial Intelligence
Index Report 2024 Technical
Performance
Overview
The technical performance section of this year’s AI Index offers a comprehensive overview
of AI advancements in 2023. It starts with a high-level overview of AI technical performance,
tracing its broad evolution over time. The chapter then examines the current state of a wide
range of AI capabilities, including language processing, coding, computer vision (image and
video analysis), reasoning, audio processing, autonomous agents, robotics, and reinforcement
learning. It also shines a spotlight on notable AI research breakthroughs from the past year,
exploring methods for improving LLMs through prompting, optimization, and fine-tuning, and
wraps up with an exploration of AI systems’ environmental footprint.
Table of Contents 76
CHAPTER 2:
Artificial Intelligence
Index Report 2024 Technical
Performance
Chapter Highlights
1. AI beats humans on some tasks, but not on all. AI has surpassed human performance on several
benchmarks, including some in image classification, visual reasoning, and English understanding. Yet it trails behind on
more complex tasks like competition-level mathematics, visual commonsense reasoning and planning.
2. Here comes multimodal AI. Traditionally AI systems have been limited in scope, with language models
excelling in text comprehension but faltering in image processing, and vice versa. However, recent advancements have
led to the development of strong multimodal models, such as Google’s Gemini and OpenAI’s GPT-4. These models
demonstrate flexibility and are capable of handling images and text and, in some instances, can even process audio.
3. Harder benchmarks emerge. AI models have reached performance saturation on established benchmarks
such as ImageNet, SQuAD, and SuperGLUE, prompting researchers to develop more challenging ones. In 2023, several
challenging new benchmarks emerged, including SWE-bench for coding, HEIM for image generation, MMMU for general
reasoning, MoCa for moral reasoning, AgentBench for agent-based behavior, and HaluEval for hallucinations.
4. Better AI means better data which means … even better AI. New AI models such as SegmentAnything
and Skoltech are being used to generate specialized data for tasks like image segmentation and 3D reconstruction. Data is
vital for AI technical improvements. The use of AI to create more data enhances current capabilities and paves the way for
future algorithmic improvements, especially on harder tasks.
5. Human evaluation is in. With generative models producing high-quality text, images, and more, benchmarking
has slowly started shifting toward incorporating human evaluations like the Chatbot Arena Leaderboard rather than
computerized rankings like ImageNet or SQuAD. Public feeling about AI is becoming an increasingly important
consideration in tracking AI progress.
6. Thanks to LLMs, robots have become more flexible. The fusion of language modeling with robotics has
given rise to more flexible robotic systems like PaLM-E and RT-2. Beyond their improved robotic capabilities, these models
can ask questions, which marks a significant step toward robots that can interact more effectively with the real world.
7. More technical research in agentic AI. Creating AI agents, systems capable of autonomous operation
in specific environments, has long challenged computer scientists. However, emerging research suggests that the
performance of autonomous AI agents is improving. Current agents can now master complex games like Minecraft and
effectively tackle real-world tasks, such as online shopping and research assistance.
8. Closed LLMs significantly outperform open ones. On 10 select AI benchmarks, closed models
outperformed open ones, with a median performance advantage of 24.2%. Differences in the performance of closed and
open models carry important implications for AI policy debates.
Table of Contents 77
Artificial Intelligence Chapter 2: Technical Performance
Index Report 2024 2.1 Overview of AI in 2023
The technical performance chapter begins with a high-level overview of significant model releases in 2023 and reviews
the current state of AI technical performance.
Figure 2.1.3
Source: Stability AI, 2023
Sep. 27, 2023 Mistral 7B Large language Mistral AI Mistral 7B, launched
model by French AI company
Mistral, is a compact 7
billion parameter model
that surpasses Llama Figure 2.1.8
Source: Mistral AI, 2023
2 13B in performance,
ranking it top in its class
for size.
Oct. 27, 2023 Ernie 4.0 Large language Baidu Baidu, a multinational
model Chinese technology
company, has launched
Ernie 4.0, which is
among the highest- Figure 2.1.9
performing Chinese Source: PR Newswire, 2023
LLMs to date.
Nov. 21, 2023 Claude 2.1 Large language Anthropic Anthropic’s latest LLM,
model Claude 2.1, features an
industry-leading 200K
context window, which
enhances its capacity
Figure 2.1.12
to process extensive Source: Medium, 2023
content such as lengthy
literary works.
Figure 2.1.15
Source: Bootcamp, 2023
120%
Human baseline
100%
Performance relative to the human baseline (%)
80%
60%
40%
20%
0%
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
1 An AI benchmark is a standardized test used to evaluate the performance and capabilities of AI systems on specific tasks. For example, ImageNet is a canonical AI benchmark that features
a large collection of labeled images, and AI systems are tasked with classifying these images accurately. Tracking progress on benchmarks has been a standard way for the AI community to
monitor the advancement of AI systems.
2 In Figure 2.1.16, the values are scaled to establish a standard metric for comparing different benchmarks. The scaling function is calibrated such that the performance of the best model for
each year is measured as a percentage of the human baseline for a given task. A value of 105% indicates, for example, that a model performs 5% better than the human baseline.
Figure 2.1.17
3 Benchmarks can also saturate or see limited improvement because the problem created is hard and the corresponding performance fails to improve. The issue of benchmark saturation
discussed in this section refers more to benchmarks where performance reaches a close-to-perfection level on which it is difficult to improve.
4 For brevity, Figure 2.1.17 highlights a selection of deprecated benchmarks. Additional benchmarks that were deprecated either because there was saturation, no new state-of-the-art score
was documented, or research focus shifted away from the benchmark include: Celeb-DF (deepfake detection), CIFAR-10 (image classification), NIST FRVT (facial recognition), and Procgen
(reinforcement learning).
Figure 2.1.18 illustrates the year-over-year increases relatively soon after they are introduced,
improvement, in percent, on a selection of then the improvement slows. In the last few years,
benchmarks featured in the 2023 AI Index report. many of these benchmarks have shown little or no
Most benchmarks see significant performance improvement.
25%
20%
Year-over-year improvement (%)
15%
10%
5% 4.47%, Kinetics-400
3.02%, COCO
1.84%, VQA
1.54%, ImageNet Top 1
0.23%, Cityscapes
0% 0.11%, SuperGLUE
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 2.1.18
In response to benchmark saturation, AI researchers including those for tasks in coding, advanced
are pivoting away from traditional benchmarks and reasoning, and agentic behavior—areas that were
testing AI on more difficult challenges. The 2024 AI underrepresented in previous versions of the report
Index tracks progress on several new benchmarks (Figure 2.1.19).5
Figure 2.1.19
5 This report includes an Appendix with details regarding the sourcing of new benchmarks featured in this chapter.
Figure 2.2.1
Figure 2.2.2
1.00 0.96
Task Leading model Score
0.83 0.82
0.78 0.78 0.77 GSM8K - EM GPT-4 (0613) 0.93
0.80 0.73 0.72
0.69 0.68 LegalBench - EM GPT-4 (0613) 0.71
Mean win rate
Palmyra X V3 (72B)
Palmyra X V2 (33B)
PaLM-2 (Unicorn)
Yi (34B)
PaLM-2 (Bison)
Figure 2.2.4
Figure 2.2.3
6 HELM evaluates 10 scenarios: (1) NarrativeQA (reading comprehension), (2) Natural Questions (closed-book) (closed-book short-answer question answering), (3) Natural Questions
(open-book) (open-book short-answer question answering), (4) OpenBookQA (commonsense question answering), (5) MMLU (multisubject understanding), (6) GSM8K (grade school
math), (7) MATH (competition math), (8) LegalBench (legal reasoning), (9) MedQA (medical knowledge), and (10) WMT 2014 (machine translation).
7 There are several versions of HELM. This section reports the score on HELM Lite, Release v1.0.0 (2023-12-19), with the data having been collected in January 2024.
MMLU: Massive Multitask Language In early 2023, GPT-4 posted a state-of-the-art score
Understanding on MMLU, later surpassed by Google’s Gemini Ultra.
The Massive Multitask Language Understanding Figure 2.2.6 highlights the top model scores on the
(MMLU) benchmark assesses model performance in MMLU benchmark in different years. The scores
zero-shot or few-shot scenarios across 57 subjects, reported are the averages across the test set. As of
including the humanities, STEM, and social sciences January 2024, Gemini Ultra holds the top score of
(Figure 2.2.5). MMLU has emerged as a premier 90.0%, marking a 14.8 percentage point improvement
benchmark for assessing LLM capabilities: Many state- since 2022 and a 57.6 percentage point increase since
of-the-art models like GPT-4, Claude 2, and Gemini have MMLU’s inception in 2019. Gemini Ultra’s score was
been evaluated against MMLU. the first to surpass MMLU’s human baseline of 89.8%.
Figure 2.2.5
80%
70%
Average accuracy (%)
60%
50%
40%
30%
2019 2020 2021 2022 2023
Figure 2.2.6
Figure 2.2.7
1,260
1,240
1,220
1,200
Elo rating
1,180
1,160
1,140
1,120
1,100
GP Cla Ge Mi Cla Cla Mi GP GP GP
T-3 ud mi xtr ud ud str T-4 T-4 T-4
.5- e -2. ni al- e -2. e-1 al -06 -03 -Tu
Tu 1 Pro 8× 0 Me 13 14 rb
rb (De 7b diu o
o-0 -In m
613 v) str
uc
t-v
0.1
TruthfulQA
Introduced at ACL 2022, TruthfulQA is a benchmark
designed to evaluate the truthfulness of LLMs in
generating answers to questions. This benchmark
comprises approximately 800 questions across 38
categories, including health, politics, and finance.
Figure 2.2.9
Many questions are crafted to challenge commonly
held misconceptions, which typically lead humans to
answer incorrectly (Figure 2.2.9). Although one of the
observations of the paper is that larger models tend to
be less truthful, GPT-4 (RLHF) released in early 2024,
has achieved the highest performance thus far on the
TruthfulQA benchmark, with a score of 0.6 (Figure
2.2.10). This score is nearly three times higher than that
of a GPT-2-based model tested in 2021, indicating that
LLMs are becoming progressively better at providing
truthful answers.
0.55
0.50
MC1 ↑
0.45
0.40
0.35
0.30
Figure 2.2.10
Figure 2.2.12
Coding involves the generation of instructions that computers can follow to perform tasks. Recently, LLMs have
become proficient coders, serving as valuable assistants to computer scientists. There is also increasing evidence that
many coders find AI coding assistants highly useful.
2.3 Coding
Generation in 2022 (Figure 2.3.2). Since 2021, performance on
HumanEval has increased 64.1 percentage points.
On many coding tasks, AI models are challenged
to generate usable code or to solve computer
science problems.
100%
96.30%
80%
60%
Pass@1
40%
20%
0%
2021 2022 2023
Figure 2.3.2
SWE-bench
As AI systems’ coding capabilities improve, it has multiple functions, interact with various execution
become increasingly important to benchmark environments, and perform complex reasoning.
models on more challenging tasks. In October 2023,
researchers introduced SWE-bench, a dataset Even state-of-the-art LLMs face significant
comprising 2,294 software engineering problems challenges with SWE-bench. Claude 2, the
sourced from real GitHub issues and popular best-performing model, solved only 4.8% of the
Python repositories (Figure 2.3.3). SWE-bench dataset’s problems (Figure 2.3.4).8 In 2023, the top-
presents a tougher test for AI coding proficiency, performing model on SWE-bench surpassed the
demanding that systems coordinate changes across best model from 2022 by 4.3 percentage points.
3.97%
4%
3.01%
1.74% 1.96%
2%
0.52% 0.70% 0.70%
0.20% 0%
0%
ChatGPT-3.5 + BM25 Retrieval
ChatGPT-3.5
GPT-4
SWE-Llama 7B
SWE-Llama 13B
Figure 2.3.4
Figure 2.3.3
8 According to the SWE-bench leaderboard, unassisted systems have no assistance in finding the relevant files in the repository. Assisted systems operate under the “oracle” retrieval setting,
which means the systems are provided with the list of files that were modified in the pull request.
Computer vision allows machines to understand images and videos and create realistic visuals from textual prompts
or other inputs. This technology is widely used in fields such as autonomous driving, medical imaging, and video
game development.
Figure 2.4.2
DALL-E 2 3.5B
Openjourney v1 1B
MultiFusion 13B
9 The 12 evaluation aspects of HEIM are: (1) Alignment: How closely does the image align with the given text? (2) Quality: What is the quality of the produced image? (3) Aesthetic:
How aesthetically pleasing is the generated image? (4) Originality: How original is the image? (5) Reasoning: Does the model understand objects, counts, and spatial relations? (6)
Knowledge: Does the model have knowledge about the world? (7) Bias: Are the generated images biased? (8) Toxicity: Are the generated images toxic or inappropriate? (9) Fairness:
Do the generated images exhibit performance disparities? (10) Robust: Is the model robust to input perturbations? (11) Multilinguality: Does the model support non-English languages?
(12) Efficiency: How fast is model inference?
Highlighted Research:
MVDream
Creating 3D geometries or models from Sample generations from MVDream
Source: Shi et al., 2023
text prompts has been a significant
challenge for AI researchers, with existing
models struggling with problems such
as multiface Janus issue (inaccurately
regenerating context described by text
prompts) and content drift (inconsistency
across different 3D views). MVDream is a
new 3D generation system developed by
ByteDance and University of California, San
Diego researchers that overcomes some of
these hurdles (Figure 2.4.5). In quantitative
evaluations, MVDream’s generated models
achieve Inception Score (IS) and CLIP Figure 2.4.5
scores comparable to those in the training
set, indicating the high quality of the
generated images (Figure 2.4.6). MVDream
has major implications, especially for
creative industries where 3D content
creation is traditionally time-consuming
and labor-intensive.
Figure 2.4.6
Instruction-Following VisIT-Bench
In computer vision, instruction-following is the In 2023, a team of industry and academic
capacity of vision-language models to interpret researchers introduced VisIT-Bench, a benchmark
text-based directives related to images. For instance, consisting of 592 challenging vision-language
an AI system could be given an image of various instructions across about 70 instruction categories,
ingredients and tasked with suggesting how to use such as plot analysis, art knowledge, and location
them to prepare a healthy meal. Capable instruction- understanding (Figure 2.4.8). As of January 2024, the
following vision-language models are necessary for leading model on VisIT-Bench is GPT-4V, the vision-
developing advanced AI assistants. enabled variant of GPT-4 Turbo, with an Elo score
of 1,349, marginally surpassing the human reference
score for VisIT-Bench (Figure 2.4.9).
A sample VisIT-Bench instruction set
Source: Bitton et al., 2023
Figure 2.4.8
VisIT-Bench: Elo rating
Source: Hugging Face, 2024 | Chart: 2024 AI Index report
1,338, human baseline 1,349
1,280
1,120
960
800
Elo rating
640
480
320
160
0
MM Pa Op Mi Vis Oc Ot Ins Ly ide mP Lla LL LL GP
GP nda en niG ua top ter tru nx( cs9 LU ma aV aV T-4
GP Fla PT lG us c tBL 8B G Ad A1 A- V
T T1 mi -4 PT V2 ) b -O a 3B Plu
3b ng IP wl pte s
o r-v
2
EditVal
Despite the promise of text-guided image
editing, few robust methods can evaluate
how accurately AI image editors adhere to
editing prompts. EditVal, a new benchmark
for assessing text-guided image editing,
includes over 13 edit types, such as adding
objects or changing their positions,
across 19 object classes (Figure 2.4.10).
The benchmark was applied to evaluate
eight leading text-guided image editing
methods including SINE and Null-text.
Figure 2.4.10
Performance improvements since 2021 on
a variety of the benchmark’s editing tasks,
EditVal automatic evaluation: editing accuracy
are shown in Figure 2.4.11. Source: EditVal Leaderboard, 2024 | Chart: 2024 AI Index report
0.52, Size
0.50
0.47, Object addition
0.40
Editing accuracy
0.20
0.00
2021 2022
Figure 2.4.11
Highlighted Research:
ControlNet
Conditioning inputs or performing conditional In 2023, researchers from Stanford introduced
control refers to the process of guiding the output a new model, ControlNet, that improves
created by an image generator by specifying conditional control editing for large text-
certain conditions that a generated image must to-image diffusion models (Figure 2.4.12).
meet. Existing text-to-image models often lack ControlNet stands out for its ability to handle
precise control over the spatial composition various conditioning inputs. Compared to
of an image, making it difficult to use prompts other previously released models in 2022,
alone to generate images with complex layouts, human raters prefer ControlNet both in terms
diverse shapes, and specific poses. Fine-tuning of superior quality and better condition fidelity
these models for greater compositional control by (Figure 2.4.13). The introduction of ControlNet is
training them on additional images is theoretically a significant step toward creating advanced text-
feasible, but many specialized datasets, such as to-image generators capable of editing images
those for human poses, are not large enough to to more accurately replicate the complex images
support successful training. frequently encountered in the real world.
Figure 2.4.12
Highlighted Research:
ControlNet (cont’d)
4.22 4.28
4.09
4 3.93 4
3.21 3.28
3 3
Condition delity
Result quality
2.52
2.31
2 2
1.10 1.02
1 1
0 0
PITI
(sketch)
Sketch-Guided
(beta = 3.2)
Sketch-Guided
(beta = 1.6)
ControlNet-lite
ControlNet
PITI
(sketch)
Sketch-Guided
(beta = 1.6)
Sketch-Guided
(beta = 3.2)
ControlNet-lite
ControlNet
2022 2023 2022 2023
Figure 2.4.13
Highlighted Research:
Instruct-NeRF2NeRF
New models can edit 3D geometries using (Figure 2.4.14). This method efficiently generates
only text instructions. Instruct-NeRF2NeRF is a new, edited images that adhere to textual
model developed by Berkeley researchers that instructions, achieving greater consistency than
employs an image-conditioned diffusion model current leading methods (Figure 2.4.15).
for iterative text-based editing of 3D geometries
Figure 2.4.14
Highlighted Research:
Instruct-NeRF2NeRF (cont’d)
0.40 0.40
0.03
0.00 0.00
SD On Ins Pe Pe On SD Ins
Sw e-t tru r-fra r- fra e-t Sw tru
/ IP im ct-N me me im / IP ct-N
2P eD eR IP2 IP2 eD 2P eR
U F2 P P U F2
Ne Ne
RF RF
Figure 2.4.15
Segmentation
Segmentation involves assigning individual image pixels to specific categories (for example: human, bicycle, or street).
Highlighted Research:
Hypersim 6.10
NDISPark 2.70
VISOR 1.80
Plittersdorf 1.50
EgoHOS 0.80
IBD -0.30
WoodScape -0.60
Cityscapes -2.00
PIDRay -5.80
DRAM -6.50
TrashCan -15.00
GTEA -21.40
−25 −20 −15 −10 −5 0 5 10 15 20 25 30 35 40 45 50
Mean IoU
Figure 2.4.17
Figure 2.4.18
Dataset Sensor types RGB Depth High Poses/scene Lighting # Scenes # Frames
resolution resolution resolution
(MPix) (MPix) geometry
Figure 2.4.19
Figure 2.4.20
10.23
0.80 0.76
10.16 10.08 0.74 0.74 0.74 0.74
9.72 0.72 0.73
10 0.69 0.70
0.71
0.69
0.71
6.27 6.30
5.89
6
0.40
0.20
2
0 0.00
Backpack
Chair
Motorcycle
Orange
Skateboard
Teddy bear
Vase
Backpack
Chair
Motorcycle
Orange
Skateboard
Teddy bear
Vase
Figure 2.4.21
Video analysis concerns performing tasks across videos rather than single images.
UCF101
UCF101 is an action recognition dataset of realistic
action videos that contain 101 action categories
(Figure 2.5.1). More recently, UCF101 has been used
to benchmark video generators. This year’s top
model, W.A.L.T-XL, posted an FVD16 score of 36,
more than halving the state-of-the-art score posted
the previous year (Figure 2.5.2).
700
600
500
400
FVD16 ↓
300
200
100
36
0
2021 2022 2023
Figure 2.5.2
Highlighted Research:
Figure 2.5.3
Video LDM vs. LVG: FVD and FID
Source: Blattmann et al., 2023 | Chart: 2024 AI Index report
FVD ↓ FID ↓
500 478
400 389
356
Performance
300
200
100
53.50 51.90
31.60
0
LVG Video LDM Video LDM (cond.)
2022 2023
Figure 2.5.4
Emu Video vs. prior works: human-evaluated video quality and text faithfulness win rate
Source: Girdhar et al., 2023 | Chart: 2024 AI Index report
60% 56%
40%
20%
0%
Imagen Video
Make-A-Video
CogVideo
Gen-2
PYOCO
Pika Labs
2022 2023
Figure 2.5.6
Reasoning in AI involves the ability of AI systems to draw logically valid conclusions from different forms of information.
AI systems are increasingly being tested in diverse reasoning contexts, including visual (reasoning about images), moral
(understanding moral dilemmas), and social reasoning (navigating social situations).10
2.6 Reasoning
General Reasoning Massive Multi-discipline Multimodal Understanding
General reasoning pertains to AI systems being and Reasoning Benchmark for Expert AGI. MMMU
able to reason across broad, rather than specific, comprises about 11,500 college-level questions
domains. As part of a general reasoning challenge, from six core disciplines: art and design, business,
for example, an AI system might be asked to reason science, health and medicine, humanities and social
across multiple subjects rather than perform one science, and technology and engineering (Figure
narrow task (e.g., playing chess). 2.6.1). The question formats include charts, maps,
tables, chemical structures, and more. MMMU is
MMMU: A Massive Multi-discipline
Multimodal Understanding and Reasoning one of the most demanding tests of perception,
Benchmark for Expert AGI knowledge, and reasoning in AI to date. As of
In recent years, the reasoning abilities of AI systems January 2024, the highest performing model is
have advanced so much that traditional benchmarks Gemini Ultra, which leads in all subject categories
like SQuAD (for textual reasoning) and VQA (for with an overall score of 59.4% (Figure 2.6.2).11 On
visual reasoning) have become saturated, indicating most individual task categories, top models are still
a need for more challenging reasoning tests. well beyond medium-level human experts (Figure
2.6.3). This relatively low score is evidence of
Responding to this, researchers from the United MMMU’s effectiveness as a benchmark for assessing
States and Canada recently developed MMMU, the AI reasoning capabilities.
10 Some abilities highlighted in the previous sections implicitly involve some form of reasoning. This section highlights tasks that have a more specific reasoning focus.
11 The AI Index reports results from the MMMU validation set, as recommended by the paper authors for the most comprehensive coverage. According to the authors, the test set, with its
unreleased labels and larger size, presents a more challenging yet unbiased benchmark for model performance, ensuring a more robust evaluation. The test set results are available on the
MMMU page.
Figure 2.6.1
59.40%
60% 56.80%
51.60% 51.40% 51.10%
45.20% 43.00% 41.20% 41.10% 39.40%
40%
20%
0%
Gemini Ultra*
GPT-4V(ision) (Playground)
InternVL-Chat-V1.2*
Qwen-VL-MAX*
LLaVA-1.6-34B*
Qwen-VL-PLUS*
InternLM-XComposer2-VL*
Marco-VL*
OmniLMM-12B*
In MM-Zephyr-7B*
Figure 2.6.212
Model
Figure 2.6.3
12 An asterisk (*) next to the model names indicates that the results were provided by the authors.
Figure 2.6.4
80%
72.50%, Expert human validators
Accuracy (%) on the main set
60%
40%
39.70% 41.00%
30.50%, Nonexpert human validators
29.10% 28.00%
20%
0%
Few-Shot CoT Few-Shot CoT Few-Shot CoT GPT-4 with search
Llama-2-70B-chat GPT-3.5-turbo-16k GPT-4 (backo to CoT on abstention)
Evaluation method and model Figure 2.6.5
Highlighted Research:
80%
69%
65%
60%
Accuracy (%)
40%
25%
23%
20%
0%
GPT-4 Temp = 0 GPT-4 Temp = 0.5 GPT-4V Zero-Shot GPT-4V One-Shot
Model
13 Some claim these models exhibit such reasoning capabilities, while others claim they do not.
Figure 2.6.7
Mathematical Reasoning
Mathematical problem-solving benchmarks evaluate that AI models develop multistep solutions utilizing
AI systems’ ability to reason mathematically. AI arithmetic operations (Figure 2.6.8). GSM8K has
models can be tested with a range of math problems, quickly become a favored benchmark for evaluating
from grade-school level to competition-standard advanced LLMs. The top-performing model on GSM8K
Figure 2.6.8
GSM8K: accuracy
Source: Papers With Code, 2023 | Chart: 2024 AI Index report
97%
96%
Accuracy (%)
93%
2022 2023
Figure 2.6.9
84.30%
80%
60%
Accuracy (%)
40%
20%
0%
2021 2022 2023
Figure 2.6.11
Figure 2.6.12
Visual Reasoning
Visual reasoning tests how well AI systems can behind their answers (Figure 2.6.13). Performance
reason across both visual and textual data. in VCR is measured using the Q->AR score, which
evaluates the machine’s ability to both select the
Visual Commonsense Reasoning (VCR) correct answer to a question (Q->A) and choose the
Introduced in 2019, the Visual Commonsense appropriate rationale behind that answer (Q->R). While
Reasoning (VCR) challenge tests the commonsense AI systems have yet to outperform humans on this
visual reasoning abilities of AI systems. In this task, their capabilities are steadily improving. Between
challenge, AI systems not only answer questions 2022 and 2023, there was a 7.93% increase in AI
based on images but also reason about the logic performance on the VCR challenge (Figure 2.6.14).
Figure 2.6.13
Visual Commonsense Reasoning (VCR) task: Q->AR score
Source: VCR Leaderboard, 2023 | Chart: 2024 AI Index report
85, Human baseline
81.60
80
70
Q->AR score
60
50
MoCa
The ability of AI models to reason in linguistic and
visual domains is well established, yet their capacity
for moral reasoning, especially moral reasoning
that aligns with human moral judgments, is less
understood.14 To further explore this topic, a team of
Stanford researchers created a new dataset (MoCa)
of human stories with moral elements (Figure 2.6.15).
The researchers then presented these models with
stories of human actions and prompted the models
to respond, measuring moral agreement with the Figure 2.6.15
14 The topic of AI and moral alignment is contentious, as there are no universally agreed-upon moral principles. What constitutes moral alignment for one party may significantly
differ for another.
Zero-shot alignment with human judgments on the moral permissibility task: discrete agreement
Source: Nie et al., 2023 | Chart: 2024 AI Index report
41.90
40.30
40
37.10
Discrete agreement (agg.)
32.30 32.30
30
26.60 26.60 26.60 26.60
23.40
20.20
20 18.50
10
0
ALBERT-xxlarge
RoBERTa-large
GPT-2-XL
GPT-3-babbage-v1
Electra-gen-large
GPT-3-curie-v1
GPT-3.5-davinci-v3
GPT-3.5-davinci-v2
Alpaca-7B
Anthropic-claude-v1
GPT-3.5-turbo
GPT-4
2019 2020 2022 2023
Figure 2.6.16
Figure 2.6.17
Forward action inference with initial belief: accuracy Backward belief inference with initial belief: accuracy
Source: Gandhi et al., 2023 | Chart: 2024 AI Index report Source: Gandhi et al., 2023 | Chart: 2024 AI Index report
TB TB∧FB TB TB∧FB
Accuracy (%)
62%
60% 60%
46% 47%
42% 40%
40% 40% 35%34%
29% 29% 30%
24%
20%
20% 20% 12% 9% 15%
10% 7%
0% 0%
text-davinci-003
GPT-3.5-turbo
LLaMA-65
Human performance
Claude-2
GPT-4-0314
Claude
GPT-3.5-turbo
text-davinci-003
Claude
Claude-2
LLaMA-65
GPT-4-0314
Human performance
2022 2023 2022 2023
Figure 2.6.18 Figure 2.6.19
TB TB∧FB
99%98%
100%
91% 90%
80%
Accuracy (%)
62%59%
60% 55%52%
43% 41%
40% 35%
31%
27% 25%
20%
0%
text-davinci-003
GPT-3.5-turbo
LLaMA-65
Claude-2
Claude
GPT-4-0314
Human performance
2022 2023
Figure 2.6.20
Highlighted Research:
60%
49% 50% 50% 50% 50% 51% 51%
48%
40%
20%
0%
PNL-MLP
Slope
davinci
text-ada-001
ada
text-babbage-001
text-curie-001
text-davinci-001
babbage
curie
bQCD
Mosaic
text-davinci-002
GPT-3.5-turbo
text-davinci-003
GPT-3.5-turbo
(causal agent)
GPT-3.5-turbo
(single prompt)
LMPrior
GPT-4
(single prompt)
AI systems are adept at processing human speech, with audio capabilities that include transcribing spoken words to
text and recognizing individual speakers. More recently, AI has advanced in generating synthetic audio content.
2.7 Audio
Generation
2023 marked a significant year in the field of audio This advancement was highlighted by the release
generation, which involves creating synthetic audio of several prominent audio generators, such as
content, ranging from human speech to music files. UniAudio, MusicGen, and MusicLM.
Highlighted Research:
UniAudio
UniAudio is a high-level language modeling UniAudio surpasses leading methods in tasks,
technique to create audio content. UniAudio including text-to-speech, speech enhancement,
uniformly tokenizes all audio types and, like modern and voice conversion (Figure 2.7.1). With 1 billion
LLMs, employs next-token prediction for high- parameters and trained on 165,000 hours of audio,
quality audio generation. UniAudio is capable of UniAudio exemplifies the efficacy of big data and
generating high-quality speech, sound, and music. self-supervision for music generation.
UniAudio vs. selected prior works in the training stage: objective evaluation metrics
Source: Yang et al., 2023 | Chart: 2024 AI Index report
4.90 4.80
5 5
4 4
3 3
2.30
2.00
2 2
0 0
Shen et al. (2023) UniAudio Wang et al. (2023e) UniAudio
3.96
4 3.66 4
3.21 3.29 3.35
3 2.72 2.63 3
2.44 2.41 2.36
1.88
2 2 1.68
1 1
0 0
Richter et al. (2023) UniAudio Wang et al. (2018) UniAudio
Text-to-sound Text-to-music
4.93
5 5 4.52
4 4 3.65
3.12
3 2.60 2.60 3
1.90
2 2
1.40
1 1
0 0
Liu et al. (2023a) UniAudio Copet et al. (2023) UniAudio
Highlighted Research:
0.00
0.10
0.20
0.30
0.40
0
5
10
15
Ri usion
14.80
Ri usion
0.19
Table of Contents
Mousai
7.50
Mousai
0.23
MusicGen w/
MusicGen (2023)
Index Report 2024
5.00
random melody (1.5B)
Artificial Intelligence
MusicGen w/
Highlighted Research:
0.28
random melody (1.5B) MusicLM
4.00
Source: Copet et al., 2023 | Chart: 2024 AI Index report
MusicGen w/o
MusicGen w/o
3.80
melody (3.3B)
0.31
melody (300M)
Mousai (2023)
Chapter 2 Preview
MusicGen w/o
3.40
melody (1.5B)
MusicGen w/o
0.31
melody (3.3B)
MusicGen w/o
3.10
melody (300M)
MusicGen w/o
0.32
melody (1.5B) Noise2Music
Ri usion (2022)
2.10
OVL.↑
KL ↓
Evaluation of MusicGen and baseline models on MusicCaps
0
20
40
60
80
100
0
1
2
MusicGEN and MusicLM (cont’d)
Mousai Ri usion
76.11
2.06
MusicGen w/o
78.43
melody (300M) Mousai
1.59
Ri usion
79.31
MusicGen w/
1.31
melody (300M)
MusicGen w/o
80.74
melody (1.5B)
MusicGen w/o
1.23
melody (3.3B)
84.81
melody (3.3B)
Figure 2.7.2
130
2.7 Audio
Chapter 2: Technical Performance
Artificial Intelligence Chapter 2: Technical Performance
Index Report 2024 2.8 Agents
AI agents, autonomous or semiautonomous systems designed to operate within specific environments to accomplish
goals, represent an exciting frontier in AI research. These agents have a diverse range of potential applications, from
assisting in academic research and scheduling meetings to facilitating online shopping and vacation booking.
2.8 Agents
General Agents assessed over 25 LLM-based agents, including those
This section highlights benchmarks and research built on OpenAI’s GPT-4, Anthropic’s Claude 2, and
into agents that can flexibly operate in general task Meta’s Llama 2. GPT-4 emerged as the top performer,
environments. achieving an overall score of 4.01, significantly
higher than Claude 2’s score of 2.49 (Figure 2.8.2).
AgentBench
The research also suggests that LLMs released in
AgentBench, a new benchmark designed for
2023 outperform earlier versions in agentic settings.
evaluating LLM-based agents, encompasses eight
Additionally, the AgentBench team speculated that
distinct interactive settings, including web browsing,
agents’ struggles on certain benchmark subsections
online shopping, household management, puzzles,
can be attributed to their limited abilities in long-term
and digital card games (Figure 2.8.1). The study
reasoning, decision-making, and instruction-following.
Figure 2.8.1
3.50
3.00
2.44 2.49
Overall score
2.50 2.32
2.00
1.71
1.60
1.50 1.39
1.25
0.93 0.96
1.00
0.50
0.00
text-davinci-002
text-davinci-003
GPT-3.5-turbo
Vicuna-13b
CodeLlama-34b
chat-bison-001
Claude-instant
Claude
Claude-2
GPT-4
2022 2023
Figure 2.8.2
Highlighted Research:
Voyageur
Recent research by Nvidia, Caltech, UT Austin, playground for its players (Figure 2.8.3). Voyager
Stanford, and UW Madison demonstrates that excels in this environment, adeptly remembering
existing LLMs like GPT-4 can be used to develop plans, adapting to new settings, and transferring
flexible agents capable of continuous learning. knowledge. It significantly outperforms previous
The team created Voyager, a GPT-4-based agent models, collecting 3.3 times more unique items,
for Minecraft—a complex video game with no traveling 2.3 times further, and reaching key
set endpoint that is essentially a boundless virtual milestones 15.3 times faster (Figure 2.8.4).
Voyager in action
Source: Wang et al., 2023
Figure 2.8.3
12
GPT-4 GPT-4 (no retrieval) Claude-1 Claude-1 (no retrieval) AutoGPT LangChain (React) Baseline
100%
80%
60%
Success rate
40%
20%
0%
Ba CL CI fat fee ho ide im lla og Pa sp ve
by RS FA hom db us nti db ma bn rkins ac cto
LM R-1 ac e-p fy- -in -ar esh riz
0 ne k ric co fer xiv on ip- ati
t e n e ’s-d tita on
tra nc ise n
i ls e ase i c
Task
Figure 2.8.5
15 The full tasks include: (1) CIFAR-10 (improve a baseline image classification model), (2) imdb (improve a baseline sentiment classification model), (3) ogbn-arxiv (improve a baseline
paper classification model from scratch), (4) house prices (train a regression model), (5) spaceship titanic (train a classifier model from scratch), (6) Parkinson’s-disease (train a time-series
regression model), (7) FathomNet (train an out-of-distribution image classification model), (8) feedback (train an out-of-distribution text regression model), (9) identify contrails (train an
out-of-distribution image segmentation model), (10) CLRS (model classic algorithms over graphs and lists), (11) BabyLM (train language model over 10M words), (12) llama-inference (improve
the runtime/autoregressive generation speed of Llama 7B, (13) vectorization (improve the inference speed of a model), (14) literature-review-tool (perform literature review), and (15) bibtex-
generation (generate BibTex from sketch).
Over time, AI has become increasingly integrated into robotics, enhancing robots’ capabilities to perform complex
tasks. Especially with the rise of foundation models, this integration allows robots to iteratively learn from their
surroundings, adapt flexibly to new settings, and make autonomous decisions.
2.9 Robotics
Highlighted Research:
PaLM-E
PaLM-E is a new AI model from Google that outperforms previous state-of-the-art methods like
merges robotics with language modeling to SayCan and PaLI on both embodied visual question
address real-world tasks like robotic manipulation answering and planning (Figure 2.9.2).16 On
and knowledge tasks like question answering and robotic manipulation tasks, PaLM-E outperforms
image captioning. Leveraging transformer-based competing models (PaLI and CLIP-FT) in its ability
architectures, the largest PaLM-E model is scaled to detect failures, which is a crucial step for robots
up to 562B parameters. The model is trained to perform closed-loop planning (Figure 2.9.3).
on diverse visual language as well as robotics
PaLM-E is significant in that it demonstrates that
data, which results in superior performance on
language modeling techniques as well as text
a variety of robotic benchmarks. PaLM-E also
data can enhance the performance of AI systems
sets new standards in visual tasks like OK-VQA,
in nonlanguage domains, like robotics. PaLM-E
excels in other language tasks, and can engage in
also highlights how there are already linguistically
chain-of-thought, mathematical, and multi-image
adept robots capable of real-world interaction and
reasoning, even without specific training in these
high-level reasoning. Developing these kinds of
areas. Figure 2.9.1 illustrates some of the tasks that
multifaceted robots is an essential step in creating
the PaLM-E model can perform.
more general robotic assistants that can, for
On Task and Motion Planning (TAMP) domains, example, assist in household work.
where robots have to manipulate objects, PaLM-E
16 Embodied Visual Question Answering (Embodied VQA) is a task where agents need to navigate through 3D environments and answer questions about the objects they
visually perceive in the environment.
Highlighted Research:
PaLM-E (cont’d)
PaLM-E in action
Source: Robotics at Google, 2023
Figure 2.9.1
Model Embodied VQA q1 Embodied VQA q2 Embodied VQA q3 Embodied VQA q4 Planning p1 Planning p2
Figure 2.9.2
Figure 2.9.3
Highlighted Research:
RT-2
Real-world robots could benefit from certain and adaptable approaches for conditioning robotic
capabilities possessed by LLMs, such as text and policy. It outshines state-of-the-art models like
code generation, as well as visual understanding. Manipulation of Open-World Objects (MOO) across
RT-2, a new robot released from DeepMind, various benchmarks, particularly in tasks involving
represents an ambitious attempt to create a unseen objects. On such tasks, an RT-2/PaLM-E
generalizable robotic model that has certain variant achieves an 80% success rate, significantly
LLM capabilities. RT-2 uses a transformer-based higher than MOO’s (53%) (Figure 2.9.4). In unseen
architecture and is trained on both robotic object tasks, RT-2 surpasses the previous year’s
trajectory data that is tokenized into text and state-of-the-art model, RT-1, by 43 percentage
extensive visual-language data. points. This indicates an improvement in robotic
performance in novel environments over time.
RT-2 stands out as one of the most impressive
Evaluation of RT-2 models and baselines on seen and unseen tasks: success rate
Source: Brohan et al., 2023 | Chart: 2024 AI Index report
R3M (2022) VC-1 (2023) RT-1 (2022) MOO (2023) RT-2-PaLM-E-12B1 (2023) RT-2-PaLI-X-55B (2023)
100%
92% 93%
91%
80%
80%
75%
73% 72%
66%
63% 62%62%
60%
Success rate
53%
49%
45%
40%40%
40% 37%
35% 35%
32%
23%22%
20%
20%
11% 11% 12%
10%
8%
1% 0%
0%
Seen tasks Unseen objects (avg.) Unseen backgrounds (avg.) Unseen environments (avg.) Unseen average
Task
Figure 2.9.4
In reinforcement learning, AI systems are trained to maximize performance on a given task by interactively learning
from their prior actions. Systems are rewarded if they achieve a desired goal and punished if they fail.
from Human Feedback models reported using RLHF. The rising popularity of
Reinforcement learning has gained popularity RLHF is also evidenced by the fact that many leading
in enhancing state-of-the-art language models LLMs report improving their models with RLHF
Figure 2.10.1
✓ ✓ ✓ ✓ x
Figure 2.10.2
17 It is possible that more models use RLHF as part of their training than reported. The Index only tracks data for models that publicly report using RLHF.
Highlighted Research:
RLAIF and RLHF vs. SFT baseline: win rate
RLAIF Source: Lee et al., 2023 | Chart: 2024 AI Index report
100%
RLHF RLAIF
40%
reinforcement learning based on the
preferences of LLMs to align other AI
models toward human preferences.
20%
60%
tasks focused on producing the least
Harmless rate
alignment.
0%
SFT RLHF RLAIF
Policy
Figure 2.10.4
Highlighted Research:
Comparison of di erent algorithms on TL;DR summarization task across di erent sampling temperatures
Source: Rafailov et al., 2023 | Table: 2024 AI Index report
0.70
0.60
0.40
0.30
0.20
0.10
0.00
0.00 0.25 0.50 0.75 1.00
Sampling temperature
Figure 2.10.5
This section focuses on research exploring critical properties of LLMs, such as their capacity for sudden behavioral
shifts and self-correction in reasoning. It is important to highlight these studies to develop an understanding of how
LLMs, which are increasingly representative of the frontier of AI research, operate and behave.
18 Some of these papers include Brown et al., 2023, Ganguli et al., 2022, Srivastava et al., 2022, and Wei et al., 2022.
Highlighted Research:
Figure 2.11.1
Highlighted Research:
Highlighted Research:
Math I: Prime vs. Composite (n=1,000) Math II: Happy Numbers (n=500)
84% 84%
80% 80%
60% 60%
Accuracy
Accuracy
51%
20% 20%
0% 0%
2023-Mar 2023-Jun 2023-Mar 2023-Jun
21% 80%
Response rate
Response rate
20%
60%
40%
10%
23%
5%
20%
0% 0%
2023-Mar 2023-Jun 2023-Mar 2023-Jun
30% 40%
20%
20%
10% 10%
1%
0% 0%
2023-Mar 2023-Jun 2023-Mar 2023-Jun
60%
Accuracy
20%
40%
10%
20%
0% 0%
2023-Mar 2023-Jun 2023-Mar 2023-Jun
Figure 2.11.2
Highlighted Research:
Standard prompting: 1 call Self-correct (round 1): 3 calls Self-correct (round 2): 5 calls
100%
95.50%
91.50%
89.00%
82.00%
79.50% 80.00%
80%
60%
Accuracy (%)
49.00% 49.00%
43.00%
40%
20%
0%
GSM8K CommonSenseQA HotpotQA
Figure 2.11.3
Closed vs. Open Model Performance innovation, and enhance transparency within the AI
As LLMs become increasingly ubiquitous, debate ecosystem. Others contend that open-source models
intensifies over their varying degrees of accessibility. present considerable security risks, such as facilitating
Some models such as Google’s Gemini remain closed, the creation of disinformation or bioweapons, and
accessible solely to their developers. In contrast, should therefore be approached with caution.
models like OpenAI’s GPT-4 and Anthropic’s Claude
2 offer limited access, available publicly via an API. In the context of this debate, it is important to
However, model weights are not fully released, which acknowledge that current evidence indicates a
means the model cannot be independently modified notable performance gap between open and closed
by the public or further scrutinized. Conversely, models.19 Figures 2.11.4 and 2.11.5 juxtapose the
Meta’s Llama 2 and Stability AI’s Stable Diffusion performances of the top closed versus open model
adopt an open approach, fully releasing their model on a selection of benchmarks.20 On all selected
weights. Open-source models can be modified and benchmarks, closed models outperform open ones.
freely used by anyone. Specifically, on 10 selected benchmarks, closed
models achieved a median performance advantage of
Viewpoints differ on the merits of closed versus open 24.2%, with differences ranging from as little as 4.0%
AI models. Some argue in favor of open models, citing on mathematical tasks like GSM8K to as much as
their ability to counteract market concentration, foster 317.7% on agentic tasks like AgentBench.
Benchmark Task category Best closed model score Best open model score
Figure 2.11.4
19 By closed models, the AI Index is referring both to models that are fully closed and those with limited access.
20 The data in this section was collected in early January 2024.
317.71%
300%
Closed vs. open percentage score di erence
250%
200%
150%
100%
54.82%
50% 39.57%
27.54%
17.07% 20.91%
8.96% 3.97%
0%
HELM MMLU Chatbot Arena HumanEval SWE-bench MATH GSM8K AgentBench
Leaderboard
General language Coding Mathematical reasoning Agent-based
behavior
Benchmark
Figure 2.11.5
As LLMs use increases, techniques are being sought to enhance their performance and efficiency. This section examines
some of those advances.
Highlighted Research:
Figure 2.12.1
Highlighted Research:
80 80 80
64.56 63.04
60 60 60
52.87
40 40 38.12 40
21.18 19.53
20 20 16.79 20 17.52
6.58 7.00
4.53
2.25 1.90 1.84
0 0 0
CoT IO ToT2 ToT GoT IO CoT ToT2 ToT GoT IO CoT ToT2 ToT GoT
(L=3, k=10) (L=2, k=20) (L=7, k=10) (L=4, k=20) (L=10, k=10) (L=4, k=20)
Figure 2.12.2
Highlighted Research:
Sample OPRO
prompts and
optimization
progress
Source: Yang et al., 2023
Figure 2.12.3
Accuracy di erence on 23 BIG-bench Hard (BBH) tasks using PaLM 2-L scorer
Source: Yang et al., 2023 | Chart: 2024 AI Index report
40 50
Accuracy di�erence
40
20
30
0
20
−20 10
0
boolean_expressions
causal_judgement
date_understanding
disambiguation_qa
dyck_languages
formal_fallacies
geometric_shapes
hyperbaton
logical_deduction_seven_objects
movie_recommendation
multistep_arithmetic_two
navigate
object_counting
penguins_in_a_table
reasoning_about_colored_objects
ruin_names
salient_translation_error_detection
snarks
sports_understanding
temporal_sequences
tracking_shu�ed_objects_seven_objects
web_of_lies
word_sorting
boolean_expressions
causal_judgement
date_understanding
disambiguation_qa
dyck_languages
formal_fallacies
geometric_shapes
hyperbaton
logical_deduction_seven_objects
movie_recommendation
multistep_arithmetic_two
navigate
object_counting
penguins_in_a_table
reasoning_about_colored_objects
ruin_names
salient_translation_error_detection
snarks
sports_understanding
temporal_sequences
tracking_shu�ed_objects_seven_objects
web_of_lies
word_sorting
Task
Figure 2.12.4
Model competitions based on 10,000 simulations using GPT-4 and the Vicuna benchmark
Source: Dettmers et al., 2023 | Chart: 2024 AI Index report
1,400 1,348
1,200
1,022
974 992
1,000 966
902 916
879
Elo rating (mean)
800
600
400
200
0
Guanaco 7B Bard Guanaco 13B ChatGPT Vicuna 13B Guanaco 33B Guanaco 65B GPT-4
Figure 2.12.5
Highlighted Research:
Flash-Decoding
Flash-Decoding, developed by Stanford inference: For example, on a 256 batch size and
researchers, tackles inefficiency in traditional 256 sequence length, Flash-Decoding is 48 times
LLMs by speeding up the attention mechanism, faster than PyTorch Eager and six times faster
particularly in tasks requiring long sequences. than FlashAttention-2 (Figure 2.12.7). Inference
It achieves this by parallelizing the loading of on models like ChatGPT can cost $0.01 per
keys and values, then separately rescaling and response, which can become highly expensive
combining them to maintain right attention outputs when deploying such models to millions of users.
(Figure 2.12.6). In various tests, Flash-Decoding Innovations like Flash-Decoding are critical for
outperforms other leading methods like PyTorch reducing inference costs in AI.
Eager and FlashAttention-2, showing much faster
Flash-Decoding
operation process
Source: Dao et al., 2023
Figure 2.12.6
Highlighted Research:
Flash-Decoding (cont’d)
Performance comparison of multihead attention algorithms across batch sizes and sequence lengths
Source: Dao et al., 2023 | Chart: 2024 AI Index report
4,000
Run-time (in microseconds)
3,000
2,000
1,000
0
bs bs bs bs bs bs bs bs bs bs
=2 =12 =6 =3 =16 =8 =4 =2 =1, =1,
56 8, 4, 2, , se , se , se , se seq seq
, se se seq seq qle qle qle qle len len
qle qle len len n= n= n= n= =6 =13
n= n= =1, =2 4 8,1 16, 32,7 5,5 1,0
256 512 02 ,04 ,09 92 38 68 3
4 8 6 4 6 72
Figure 2.12.7
This section examines trends in the environmental impact of AI systems, highlighting the evolving landscape of transparency and
awareness. Historically, model developers seldom disclosed the carbon footprint of their AI systems, leaving researchers to make
their best estimates. Recently, there has been a shift toward greater openness, particularly regarding the carbon costs of training AI
models. However, disclosure of the environmental costs associated with inference—a potentially more significant concern—remains
insufficient. This section presents data on carbon emissions as reported by developers in addition to featuring notable research
exploring the intersection of AI and environmental impact. With AI models growing in size and becoming more widely used, it has
never been more critical for the AI research community to diligently monitor and mitigate the environmental effects of AI systems.
CO2 equivalent emissions (tonnes) by select machine learning models and real-life examples, 2020–23
Source: AI Index, 2024; Luccioni et al., 2022; Strubell et al., 2019 | Chart: 2024 AI Index report
500
CO2 equivalent emissions (tonnes)
400
300
200
Starcoder (15.5B)
BLOOM (176B)*
GPT-3 (175B)*
OPT (175B)*
21 In its technical report on Llama 2, Meta notes that it offsets all the carbon emissions generated during the model’s training process.
The variance in emission estimates is due to factors Super Cluster). However, smaller models can still have
such as model size, data center energy efficiency, and high emissions if trained on energy grids powered by
the carbon intensity of energy grids. Figure 2.13.2 shows less efficient energy sources. Some estimates suggest
the emissions of select models in relation to their size. that model emissions have declined over time, which is
Generally, larger models emit more carbon, a trend presumably tied to increasingly efficient mechanisms of
clearly seen in the Llama 2 model series, which were all model training. Figure 2.13.3 features the emissions of
trained on the same supercomputer (Meta’s Research select models along with their power consumption.
CO2 equivalent emissions (tonnes) and number of parameters by select machine learning models
Source: AI Index, 2024; Luccioni et al., 2022 | Chart: 2024 AI Index report
Gopher (280B)
100B
Llama 2 (70B)
Llama 2 (34B)
Luminous Extended (30B)
Starcoder (15.5B)
Granite (13B) Llama 2 (13B)
Luminous Base (13B)
10B
Llama 2 (7B)
10 100
CO2 equivalent emissions (log scale - tonnes) Figure 2.13.2
Environmental impact of select models
Source: AI Index, 2024; Luccioni et al., 2022 | Table: 2024 AI Index report
A major challenge in evaluating the environmental training AI models can be significant. While the per-
impacts of AI models is a lack of transparency about query emissions of inference may be relatively low,
emissions. Consistent with findings from other studies, the total impact can surpass that of training when
most prominent model developers do not report carbon models are queried thousands, if not millions, of
emissions, hampering efforts to conduct thorough and times daily. Research on the emissions from model
accurate evaluations of this metric. For example, many 22
inference is scant. A study by Luccioni et al., published
prominent model developers such as OpenAI, Google, in 2023, is among the first to comprehensively assess
Anthropic, and Mistral do not report emissions in the emissions from model inference. Figure 2.13.4
training, although Meta does. illustrates the emissions from 1,000 inferences across
various model tasks, revealing that tasks like image
Inference generation have a much higher carbon footprint than
As highlighted earlier, the environmental impact of text classification.
1000
Model emissions (g of CO2 - log scale)
100
10
Te To Qu Fil Im Ze Ob Ze Te Su Im Ze Im
xt ke est l ma ag ro jec ro xt mma ag ro- ag
cla n cla ion sk ec -sh td -sh ge ec sh eg
ssi ssi las ot ete ot ne riz ap ot en
ca an si sen cti que rat ati tio su era
ti ca sw ca tim st ion on nin m tio
on ti eri ti on ion g ma n
on ng on en riz
ta an ati
na sw on
lys eri
is ng
Task
Figure 2.13.4
22 Research also suggests that the reporting of carbon emissions on open model development platforms, such as Hugging Face, is declining over time.
Management of thermal energy storage systems Anticipating thermal energy needs and managing thermal energy storage Olabi et al., 2023
systems.
Improving waste management Saving time and costs in waste-to-energy conversion, waste sorting, and Fang et al., 2023
waste monitoring.
More e ciently cooling buildings Optimizing the energy usage associated with air-conditioning. Luo et al., 2022
Improving pest management Identifying and eliminating pests in commercial tomato harvests. Rustia et al., 2022
Enhancing urban air quality Forecasting and predicting air quality in urban cities. Shams et al., 2021
Figure 2.13.5
23 Several of the data points in Figure 2.13.5 were adopted from this literature review on the topic of AI and sustainability.
Preview
Overview 160 3.4 Security and Safety 186
Chapter Highlights 161 Current Challenges 186
AI Security and Safety in Numbers 187
3.1 Assessing Responsible AI 163 Academia 187
Responsible AI Definitions 163 Industry 188
AI Incidents 164 Featured Research 191
Examples 164 Do-Not-Answer: A New Open Dataset
Risk Perception 166 for Comprehensive Benchmarking of
LLM Safety Risks 191
Risk Mitigation 167
Universal and Transferable Attacks
Overall Trustworthiness 168
on Aligned Language Models 193
Benchmarking Responsible AI 169
MACHIAVELLI Benchmark 195
Tracking Notable RAI Benchmarks 169
Reporting Consistency 170
3.5 Fairness 197
Current Challenges 197
3.2 Privacy and Data Governance 172
Fairness in Numbers 197
Current Challenges 172
Academia 197
Privacy and Data Governance in Numbers 173
Industry 198
Academia 173
Featured Research 199
Industry 174
(Un)Fairness in AI and Healthcare 199
Featured Research 175
Social Bias in Image Generation Models 200
Extracting Data From LLMs 175
Measuring Subjective Opinions in LLMs 201
Foundation Models and Verbatim Generation 177
LLM Tokenization Introduces Unfairness 202
Auditing Privacy in AI Models 179
Overview
AI is increasingly woven into nearly every facet of our lives. This integration is occurring in
sectors such as education, finance, and healthcare, where critical decisions are often based
on algorithmic insights. This trend promises to bring many advantages; however, it also
introduces potential risks. Consequently, in the past year, there has been a significant focus
on the responsible development and deployment of AI systems. The AI community has also
become more concerned with assessing the impact of AI systems and mitigating risks for
those affected.
This chapter explores key trends in responsible AI by examining metrics, research, and
benchmarks in four key responsible AI areas: privacy and data governance, transparency
and explainability, security and safety, and fairness. Given that 4 billion people are expected
to vote globally in 2024, this chapter also features a special section on AI and elections and
more broadly explores the potential impact of AI on political processes.
Chapter Highlights
1. Robust and standardized evaluations for LLM responsibility are seriously lacking.
New research from the AI Index reveals a significant lack of standardization in responsible AI reporting.
Leading developers, including OpenAI, Google, and Anthropic, primarily test their models against different
responsible AI benchmarks. This practice complicates efforts to systematically compare the risks and limitations
of top AI models.
2. Political deepfakes are easy to generate and difficult to detect. Political deepfakes are already
affecting elections across the world, with recent research suggesting that existing AI deepfake detection methods
perform with varying levels of accuracy. In addition, new projects like CounterCloud demonstrate how easily AI
can create and disseminate fake content.
4. Risks from AI are a concern for businesses across the globe. A global survey on responsible AI
highlights that companies’ top AI-related concerns include privacy, security, and reliability. The survey shows that
organizations are beginning to take steps to mitigate these risks. However, globally, most companies have so far
only mitigated a portion of these risks.
5. LLMs can output copyrighted material. Multiple researchers have shown that the generative outputs
of popular LLMs may contain copyrighted material, such as excerpts from The New York Times or scenes from
movies. Whether such output constitutes copyright violations is becoming a central legal question.
6. AI developers score low on transparency, with consequences for research. The newly
introduced Foundation Model Transparency Index shows that AI developers lack transparency, especially
regarding the disclosure of training data and methodologies. This lack of openness hinders efforts to further
understand the robustness and safety of AI systems.
8. The number of AI incidents continues to rise. According to the AI Incident Database, which tracks
incidents related to the misuse of AI, 123 incidents were reported in 2023, a 32.3% increase from 2022. Since
2013, AI incidents have grown by over twentyfold. A notable example includes AI-generated, sexually explicit
deepfakes of Taylor Swift that were widely shared online.
9. ChatGPT is politically biased. Researchers find a significant bias in ChatGPT toward Democrats in the
United States and the Labour Party in the U.K. This finding raises concerns about the tool’s potential to influence
users’ political views, particularly in a year marked by major global elections.
This chapter begins with an overview of key trends in responsible AI (RAI). In this section the AI Index defines key terms
in responsible AI: privacy, data governance, transparency, explainability, fairness, as well as security and safety. Next,
this section looks at AI-related incidents and explores how industry actors perceive AI risk and adopt AI risk mitigation
measures. Finally, the section profiles metrics pertaining to the overall trustworthiness of AI models and comments on
the lack of standardized responsible AI benchmark reporting.
Data governance Establishment of policies, procedures, and standards to Policies and procedures are in place to maintain data
ensure the quality, security, and ethical use of data, which quality and security, with a particular focus on ethical use
is crucial for accurate, fair, and responsible AI operations, and consent, especially for sensitive health information.
particularly with sensitive or personally identi able
information.
Explainability The capacity to comprehend and articulate the rationale The platform can articulate the rationale behind its
behind AI decisions, emphasizing the importance of AI treatment recommendations, making these insights
being not only transparent but also understandable to users understandable to doctors and patients, ensuring trust in its
and stakeholders. decisions.
Fairness Creating algorithms that are equitable, avoiding bias or The platform is designed to avoid bias in treatment
discrimination, and considering the diverse needs and recommendations, ensuring that patients from all
circumstances of all stakeholders, thereby aligning with demographics receive equitable care.
broader societal standards of equity.
Privacy An individual’s right to con dentiality, anonymity, and Patient data is handled with strict con dentiality, ensuring
protection of their personal data, including the right to anonymity and protection. Patients consent to whether and
consent and be informed about data usage, coupled with how their data is used to train a treatment recommendation
an organization’s responsibility to safeguard these rights system.
when handling personal data.
Security and safety The integrity of AI systems against threats, minimizing Measures are implemented to protect against cyber threats
harms from misuse, and addressing inherent safety risks and ensure the system’s reliability, minimizing risks from
like reliability concerns and the potential dangers of misuse or inherent system errors, thus safeguarding patient
advanced AI systems. health and data.
Transparency Open sharing of development choices, including data The development choices, including data sources and
sources and algorithmic decisions, as well as how AI algorithmic design decisions, are openly shared. How the
systems are deployed, monitored, and managed, covering system is deployed and monitored is clear to healthcare
both the creation and operational phases. providers and regulatory bodies.
Figure 3.1.1
1 Although Figure 3.1.1 breaks down various dimensions of responsible AI into specific categories to improve definitional clarity, this chapter organizes these dimensions into the following
broader categories: privacy and data governance, transparency and explainability, security and safety, and fairness.
AI Incidents
The AI Incident Database (AIID) tracks instances of ethical The continuous increase in reported incidents likely
misuse of AI, such as autonomous cars causing pedestrian arises from both greater integration of AI into real-
fatalities or facial recognition systems leading to wrongful world applications and heightened awareness of its
arrests. As depicted in Figure 3.1.2, the number of
2
potential for ethical misuse. However, it is important
AI incidents continues to climb annually. In 2023, 123 to note that as awareness grows, incident tracking
incidents were reported, a 32.3% increase from 2022. and reporting also improve, indicating that earlier
Since 2013, AI incidents have grown by over twentyfold. incidents may have been underreported.
100
Number of AI incidents
80
60
40
20
0
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 3.1.2
Examples
live for 17 hours, amassing over 45 million views
The next section details recent AI incidents to shed light
before they were removed. Generative AI models
on the ethical challenges commonly linked with AI.
can effortlessly extrapolate from training data,
AI-generated nude images of Taylor Swift which often include nude images and celebrity
In January 2024, sexually explicit, AI-generated photographs, to produce nude images of celebrities,
images purportedly depicting Taylor Swift surfaced even when images of the targeted celebrity are
on X (formerly Twitter). These images remained absent from the original dataset. There are filters put
in place that aim to prevent such content creation; Tesla recognizing pedestrian but not slowing
down at a crosswalk
however, these filters can usually be circumvented Source: Gitlin, 2023
Risk Perception
In collaboration with Accenture, this year a team of nongenerative AI. They were presented with a list
Stanford researchers ran a global survey with respondents of 14 risks and could select all that apply to them,
from more than 1,000 organizations to assess the global given their AI adoption strategies.4 The researchers
state of responsible AI. The organizations, with total found that privacy and data governance risks, e.g.,
revenues of at least $500 million each, were taken the use of data without the owner’s consent or data
from 20 countries and 19 industries and responded in leaks, are the leading concerns across the globe.
February–March 2024.3 The objective of the Global State Notably, they observe that these concerns are
of Responsible AI survey was to gain an understanding of significantly higher in Asia and Europe compared to
the challenges of adopting responsible AI practices and to North America. Fairness risks were only selected by
allow for a comparison of responsible AI activities across 20% of North American respondents, significantly
10 dimensions and across surveyed industries and regions. less than respondents in Asia (31%) and Europe
(34%) (Figure 3.1.5). Respondents in Asia selected,
Respondents were asked which risks were relevant to on average, the highest number of relevant risks
them, given their AI adoption strategy; i.e., depending (4.99), while Latin American respondents selected,
on whether they develop, deploy, or use generative or on average, the fewest (3.64).
60%
55% 56% 55%
52% 51% 51% 51%
50% 49% 49%
47%
45%
42% 42% 43% 43% 44% 43%
41%
% of respondents
40% 38%
37%
34% 33%
31%
30% 28%
20%
20%
10%
0%
Privacy and data governance Reliability Security Transparency Fairness
Risks Figure 3.1.5
Note: Not all differences between regions are statistically significant.
3 The full Global State of Responsible AI report is forthcoming in May 2024. Additional details about the methodology can be found in the Appendix to this chapter.
4 The full list of risks can be found in the Appendix. In Figure 3.1.5, the AI Index only reports the percentages for risks covered by this chapter.
Risk Mitigation
The Global State of Responsible AI survey finds Some companies in Europe (18%), North America
that organizations in most regions have started to (17%), and Asia (25%) have already operationalized
operationalize responsible AI measures. The majority of more than half of the measures the researchers asked
organizations across regions have fully operationalized about across the following dimensions: fairness,
at least one mitigation measure for risks they reported transparency and explainability, privacy and data
as relevant to them, given their AI adoption (Figure 3.1.6). governance, reliability, and security.5
100%
18% 17%
25% 25% 26%
80%
% of respondents
60%
56%
77% 79% 65%
40% 67%
20%
18%
7% 10%
0%
Asia Europe Latin America North America Rest of the world
Figure 3.1.6
Note: Not all differences between regions
are statistically significant.
5 The AI Index only considers the adoption of RAI measures across the dimensions covered in the AI Index. The Global State of Responsible AI report covers RAI adoption across 10 dimensions.
80
74.72
71.32 71.99 72.45
69.24
70 65.96 66.51
63.24 63.56
Average trustworthiness score
60
50
40
30
20
10
0
Ze Tu Vic Tu GP Lla Lla GP Lla Cla
p hy lu-2- un lu -2- T-4 ma ma T-3 ma ud
r-7 7b a-1 13b -03 -2- -2- .5- -2- e-2
b-b 3b 14 13B 13B tur Ch
eta -v1 -ch -ch bo at-7
.3.0 at- at- -03 b
-G AW GP 01
PT Q TQ
Q
Model
Figure 3.1.7
Benchmarking Responsible AI
Tracking Notable Responsible AI Benchmarks in 2021, TruthfulQA assesses the truthfulness of
Benchmarks play an important role in tracking the LLMs in their responses. RealToxicityPrompts and
capabilities of state-of-the-art AI models. In recent ToxiGen track the extent of toxic output produced
years there has been a shift toward evaluating models by language models. Additionally, BOLD and BBQ
not only on their broader capabilities but also on evaluate the bias present in LLM generations.
responsibility-related features. This change reflects Citations, while not completely reflective of
the growing importance of AI and the growing benchmark use, can serve as a proxy for tracking
demands for AI accountability. As AI becomes more benchmark salience.
ubiquitous and calls for responsibility mount, it will
become increasingly important to understand which Virtually all benchmarks tracked in Figure 3.1.8 have
benchmarks researchers prioritize. seen more citations in 2023 than in 2022, reflecting
their increasing significance in the responsible
Figure 3.1.8 presents the year-over-year citations for a AI landscape. Citations for TruthfulQA have risen
range of popular responsible AI benchmarks. Introduced especially sharply.
369, TruthfulQA
350
319, RealToxicityPrompts
300
250
Number of papers
200
150
104, ToxiGen
100
93, BOLD
75, BBQ
50
Reporting Consistency
The effectiveness of benchmarks largely depends standardized benchmark reporting for responsible
on their standardized application. Comparing model AI capability evaluations is lacking. The AI Index
capabilities becomes more straightforward when examined a selection of leading AI model developers,
models are consistently evaluated against a specific specifically OpenAI, Meta, Anthropic, Google, and
set of benchmarks. However, testing models on Mistral AI. The Index identified one flagship model
different benchmarks complicates comparisons, as from each developer (GPT-4, Llama 2, Claude 2,
individual benchmarks have unique and idiosyncratic Gemini, and Mistral 7B) and assessed the benchmarks
natures. Standardizing benchmark testing, therefore, on which they evaluated their model. A few standard
plays an important role in enhancing transparency benchmarks for general capabilities evaluation were
around AI capabilities. commonly used by these developers, such as MMLU,
HellaSwag, ARC Challenge, Codex HumanEval, and
New analysis from the AI Index, however, suggests that GSM8K (Figure 3.1.9).
MMLU ✓ ✓ ✓ ✓ ✓
HellaSwag ✓ ✓ ✓ ✓
Challenge (ARC) ✓ ✓ ✓ ✓
WinoGrande ✓ ✓ ✓
Codex HumanEval ✓ ✓ ✓ ✓ ✓
GSM8K ✓ ✓ ✓ ✓ ✓
Big Bench Hard ✓ ✓ ✓
Natural Questions ✓ ✓ ✓
BoolQ ✓ ✓ ✓
Figure 3.1.9
However, consistency was lacking in the reporting The inconsistency in reported benchmarks
of responsible AI benchmarks (Figure 3.1.10). complicates the comparison of models, particularly
Unlike general capability evaluations, there is in the domain of responsible AI. The diversity
no universally accepted set of responsible AI in benchmark selection may reflect existing
benchmarks used by leading model developers. benchmarks becoming quickly saturated, rendering
TruthfulQA, at most, is used by three out of the five them ineffective for comparison, or the regular
selected developers. Other notable responsible introduction of new benchmarks without clear
AI benchmarks like RealToxicityPrompts, ToxiGen, reporting standards. Additionally, developers
BOLD, and BBQ are each utilized by at most two of might selectively report benchmarks that positively
the five profiled developers. Furthermore, one out highlight their model’s performance. To improve
of the five developers did not report any responsible responsible AI reporting, it is important that a
AI benchmarks, though all developers mentioned consensus is reached on which benchmarks model
conducting additional, nonstandardized internal developers should consistently test.
capability and safety tests.
TruthfulQA ✓ ✓ ✓
RealToxicityPrompts ✓ ✓
ToxiGen ✓
BOLD ✓
BBQ ✓ ✓
Figure 3.1.10
A comprehensive definition of privacy is difficult and context-dependent. For the purposes of this report, the AI
Index defines privacy as an individual’s right to the confidentiality, anonymity, and protection of their personal data,
along with their right to consent to and be informed about if and how their data is used. Privacy further includes an
organization’s responsibility to ensure these rights if they collect, store, or use personal data (directly or indirectly). In
AI, this involves ensuring that personal data is handled in a way that respects individual privacy rights, for example,
by implementing measures to protect sensitive information from exposure, and ensuring that data collection and
processing are transparent and compliant with privacy laws like GDPR.
Data governance, on the other hand, encompasses policies, procedures, and standards established to ensure the
quality, security, and ethical use of data within an organization. In the context of AI, data governance is crucial for
ensuring that the data used for training and operating AI systems is accurate, fair, and used responsibly and with
consent. This is especially the case with sensitive or personally identifiable information (PII).
21
150
150
15
124
17
100 12
92
13
160
105 15
50 97
39
48
32
0
2019 2020 2021 2022 2023
Figure 3.2.1
6 The methodology employed by the AI Index to gather conference submission data is detailed in the Appendix of this chapter. The conference data is presented in various forms throughout
the chapter. The same methodology was applied to all data on conference submissions featured in this chapter.
Organizations were also asked whether they took Rest of the world (1.90) 16% 72% 11%
steps to adopt measures to mitigate data governance–
0% 20% 40% 60% 80% 100%
related risks.8 The survey listed six possible data % of respondents
governance–related measures they could indicate Figure 3.2.2
Note: The numbers in parentheses are the average numbers of
adopting.9 Example measures include ensuring data mitigation measures fully operationalized within each region.
Not all differences between regions are statistically significant.
compliance with all relevant laws and regulations,
securing consent for data use, and conducting regular Adoption of AI-related data governance measures by
industry
audits and updates to maintain data relevance. Source: Global State of Responsible AI report, 2024 | Chart: 2024 AI Index report
Overall, less than 0.6% of companies indicated that None 1–50% 51–99% All
they had fully operationalized all six data governance
Industry and avg. number of measures adopted
Aerospace, automotive,
mitigations. However, 90% of companies self- and transport (1.96)
10% 81% 9%
fully operationalize any of the measures. Globally, the Financial services (2.24) 10% 76% 13%
companies surveyed reported adopting an average of
Healthcare and
82%
2.2 out of 6 data governance measures. life sciences (2.20)
9% 9%
7 The survey is introduced above in section 3.1, Assessing Responsible AI. The full Global State of Responsible AI Report is forthcoming in May 2024. Details about the methodology can be
found in the Appendix of this chapter.
8 The following analyses only look at companies that indicated in a previous question that privacy and data governance risks are relevant to them in the context of their AI adoption.
9 Respondents were further given the free-text option “Other” to report additional mitigations not listed.
Featured Research training data from the models. Using this approach, the
authors managed to extract not just PII but also NSFW
This section highlights significant research that was content, verbatim literature, and universal unique
published in 2023 on privacy and data governance in identifiers.10
AI. These studies explored data extraction from LLMs,
challenges in preventing duplicated generative AI Red teaming models through various human-readable
content, and low-resource privacy auditing. prompts to provoke unwanted behavior has become
increasingly common. For instance, one might ask a
Extracting Data From LLMs model if it can provide instructions for building a bomb.
LLMs are trained on massive amounts of data, much
While these methods have proven somewhat effective,
of which has been scraped from public sources like
the research mentioned above indicates there are
the internet. Given the vastness of information that
other, more complex methods for eliciting unwanted
can be found online, it is not surprising that some
behavior from models.
PII is inevitably scraped as well. A study published in
November 2023 explores extractable memorization:
Extracting PII From ChatGPT
if and how sensitive training data can be extracted Source: Nasr et al., 2023
10 A UUID is a 128-bit value that allows for the unique identification of objects or entities on the internet.
2,300
Number of memorized output examples extracted
2,000 1,900
1,800
1,500
1,500 1,400
1,250
1,200
1,150 1,120
1,050
1,000
500
0
company one b J life send make part with work
Repeated token
Figure 3.2.5
Foundation Models and Verbatim Generation The authors argue that blocking the verbatim output
This year, many AI researchers investigated the of extended texts could reduce the risk of exposing
issue of generative models producing content that copyrighted material and personal information through
mirrors the material on which they were trained. For extraction attacks. They propose a solution where
example, research from Google, ETH Zurich, and the model, upon generating each token, checks for
Cornell explored data memorization in LLMs and n-gram matches with the training data to avoid exact
found that models without any protective measures reproductions. Although they developed an efficient
(i.e., filters that guard against outputting verbatim method for this check, effectively preventing perfect
responses) frequently reproduce text directly from verbatim outputs, they observed that the model could
their training data. Various models were found to still approximate memorization by slightly altering
exhibit differing rates of memorization for different outputs. This imperfect solution highlights the ongoing
datasets (Figure 3.2.6). challenge of balancing model utility with privacy and
copyright concerns.
0.50
0.40
Proportion memorized
0.30
0.20
0.10
0.00
Speeches Monologues OS licenses Novels Lyrics 2011 Lyrics 2021 original spaces lower caps
Datasets Style
Figure 3.2.6
Research has also highlighted challenges with Story” (Figure 3.2.8). This indicates that the model
exact and approximate memorization in visual might have been trained on copyrighted material.
content generation, notably with Midjourney v6. Despite efforts to frame indirect prompts to avoid
This study discovered that certain prompts could infringement, the problem persisted, emphasizing the
produce images nearly identical to those in films, broader copyright issues associated with AI’s use of
even without direct instructions to recreate specific unlicensed data. The research further underscores
movie scenes (Figure 3.2.7). For example, a generic the difficulties in guiding generative AI to steer clear
prompt such as “animated toys --v 6.0 -- ar16:9 of copyright infringement, a concern also applicable
--style raw” yielded images closely resembling, and to DALL-E, the image-generating model associated
potentially infringing upon, characters from “Toy with ChatGPT (Figure 3.2.9).
Figure 3.2.7
Figure 3.2.9
Figure 3.2.8
5. Privacy C
parameters are CORRECT
calculated based H
on the percent of
privacy
correct guesses L score
Transparency in AI encompasses several aspects. Data and model transparency involve the open sharing of development
choices, including data sources and algorithmic decisions. Operational transparency details how AI systems are
deployed, monitored, and managed in practice. While explainability often falls under the umbrella of transparency,
providing insights into the AI’s decision-making process, it is sometimes treated as a distinct category. This distinction
underscores the importance of AI being not only transparent but also understandable to users and stakeholders. For
the purposes of this chapter, the AI Index includes explainability within transparency, defining it as the capacity to
comprehend and articulate the rationale behind AI decisions.
54
Number of AI transparency and explainability submissions
350
300 46
250 56
231
44 44
200 189
48 42
150
134
25
100 89 24
30 183
50 89
54 63
39
0
2019 2020 2021 2022 2023
Figure 3.3.1
44% of all surveyed organizations indicated that None 1–50% 51–99% All
transparency and explainability are relevant
concerns given their AI adoption strategy.11
Aerospace, automotive,
88%
and transport (1.43)
Overall, less than 0.7% of companies indicated full
Communication, media,
operationalization of all the measures. However, and technology (1.40)
16% 76%
11 The survey is introduced above in section 3.1, Assessing Responsible AI. The full State of Responsible AI Report is forthcoming in May 2024. Details about the methodology can be found in
the Appendix of this chapter.
12 Respondents were further given the free-text option “Other” to report additional mitigations not listed.
Foundation model transparency total scores of open vs. closed developers, 2023
Source: 2023 Foundation Model Transparency Index
Llama 2 54
BLOOMZ 53
GPT-4 48
Stable Di�usion 2 47
PaLM 2 40
Claude 2 36
Command 34
Jurassic-2 25
Open
In�ection-1 21 Closed
Titan Text 12
0 10 20 30 40 50 60 70 80 90 100
Score
Figure 3.3.4
13 An updated version of the FMTI is scheduled for release in spring 2024. Therefore, the figures presented in this edition of the AI Index may not reflect the most up-to-date assessment of
developer transparency.
The researchers further categorize the models based on categorization by access level. This perspective offers
their openness levels, as detailed in Figure 3.3.5. While greater insights into the variability of model access
Figure 3.3.4 provides an aggregated overview of the and illustrates how existing models align with different
transparency of each foundation model, incorporating access schemes.
over 100 indicators, Figure 3.3.5 outlines the models’
Level of Fully closed Gradual/staged release Hosted access Cloud-based/API Downloadable Fully open
access access
System PaLM (Google) GPT-2 (OpenAI) DALL-E 2 (OpenAI) GPT-3 (OpenAI) OPT (Meta) BLOOM (BigScience)
(developer) Gopher (DeepMind) Stable Di usion (Stability AI) Midjourney (Midjourney) Craiyon (Craiyon) GPT-J (EleutherAI)
Imagen (Google)
Make-A-Video (Meta)
Figure 3.3.5
Figure 3.3.6
In 2023, as AI capabilities continued to improve and models became increasingly ubiquitous, concerns about their
security and safety became a top priority for decision-makers. This chapter explores three distinct aspects of security
and safety. First, guaranteeing the integrity of AI systems involves protecting components such as algorithms, data, and
infrastructure against external threats like cyberattacks or adversarial attacks. Second, safety involves minimizing harms
stemming from the deliberate or inadvertent misuse of AI systems. This includes concerns such as the development
of automated hacking tools or the utilization of AI in cyberattacks. Lastly, safety encompasses inherent risks from AI
systems themselves, such as reliability concerns (e.g., hallucinations) and potential risks posed by advanced AI systems.
250 64
Number of AI security and safety submissions
78
215
16
200
168 75
41
162
150 33
65 77
32
51
100
24
41 152
37
50
33 88
71
43
24
0
2019 2020 2021 2022 2023
Figure 3.4.1
Aerospace, automotive,
15% 76%
and transport (1.86)
mitigation adoption rates by industry.
Communication, media,
11% 72% 15%
and technology (2.35)
Healthcare and
10% 82%
life sciences (2.16)
14 The survey is introduced above in section 3.1, Assessing Responsible AI. The full State of Responsible AI Report is forthcoming in May 2024. Details about the methodology can be found in
the Appendix of this chapter.
15 Respondents were further given the free-text option ‘Other’ to report additional mitigations not listed.
Organizations were also queried on the relevance Adoption of AI-related cybersecurity measures by
region
of security risks, such as cybersecurity incidents, Source: Global State of Responsible AI report, 2024 | Chart: 2024 AI Index report
Aerospace, automotive,
9% 66% 24%
and transport (1.87)
Communication, media,
53% 39%
and technology (2.23)
Healthcare and
10% 71% 16%
life sciences (1.80)
16 Respondents were further given the free-text option “Other” to report additional mitigations not listed.
The survey inquired about companies’ perspectives responsible for mitigating all associated risks (Figure
on risks associated with foundation model 3.4.6). Furthermore, 86% of respondents either agree
developments. A significant majority, 88% of or strongly agree that the potential threats posed by
organizations, either agree or strongly agree generative AI are substantial enough to warrant globally
that those developing foundation models are agreed-upon governance.
Strongly disagree Disagree Neither disagree nor agree Agree Strongly agree Not sure
Figure 3.4.6
Do-Not-Answer: A New Open Dataset source datasets for assessing safety risks in LLMs.
for Comprehensive Benchmarking of Their evaluation encompasses responses from six
LLM Safety Risks prominent language models: GPT-4, ChatGPT, Claude,
As the capabilities of LLMs expand, so too does their Llama 2, Vicuna, and ChatGLM2. The authors also
potential for misuse in hazardous activities. LLMs developed a risk taxonomy spanning a range of risks,
could potentially be utilized to support cyberattacks, from mild to severe. The authors find that most models
facilitate spear-phishing campaigns, or theoretically output harmful content to some extent. GPT-4 and
even assist in terrorism. Consequently, it is becoming ChatGPT are mostly prone to discriminatory, offensive
increasingly crucial for developers to devise output, while Claude is susceptible to propagating
mechanisms for evaluating the potential dangers of AI misinformation (Figure 3.4.7). Across all tested models,
models. Closed-source developers such as OpenAI the highest number of violations was recorded for
and Anthropic have constructed datasets to assess ChatGLM2 (Figure 3.4.8).
Human-chatbot
2 3 2 0 4 10
interaction harms
Misinformation harms 1 0 7 1 6 20
Risk category
Discrimination, exclusion,
7 0 3 10 12 15
toxicity, hateful, o ensive
Malicious uses 3 0 1 6 4 18
Information hazards 1 0 3 6 26 22
80
70
Number of harmful responses
60
52
50
40
30
23
20
16
14
10
3
0
ChatGPT Llama 2 Claude GPT-4 Vicuna ChatGLM2
2022 2023
Foundation model Figure 3.4.8
Universal and Transferable Attacks on Aligned the success rates of different attacking styles on
Language Models leading LLMs. The method the researchers introduce
Recent attention in AI security has centered on is called Greedy Coordinate Gradient (GCG). The
uncovering adversarial attacks capable of bypassing study demonstrates that these suffixes (the GCG
the implemented safety protocols of LLMs. Much of attack) often transfer effectively across both closed
this research requires substantial human intervention and open models, encompassing ChatGPT, Bard,
and is idiosyncratic to specific models. However, in Claude, Llama-2-Chat, and Pythia. This study raises
2023, researchers unveiled a universal attack capable an important question as to how models can be better
of operating across various LLMs. This attack induces fortified against automated adversarial attacks. It
aligned models to generate objectionable content also demonstrates how LLMs can be vulnerable to
(Figure 3.4.9). attacks that employ unintelligible, non-human-readable
prompts. Current red-teaming methodologies primarily
The method involved automatically generating suffixes
focus on interpretable prompts. This new research
that, when added to various prompts, compel LLMs
suggests there is a significant gap in buffering LLMs
to produce unsafe content. Figure 3.4.10 highlights
against attacks utilizing uninterpretable prompts.
Figure 3.4.9
100%
80%
Attack success rate (%)
60%
40%
20%
0%
Pythia-12B Falcon-7B Guanaco-7B ChatGLM-6B MPT-7B Stable-Vicuna Vicuna-7B Vicuna-13B GPT-3.5 GPT-4
Model Figure 3.4.10
Immorality ↓
Disutility ↓
Manipulation 100 120 119 111 95 91 87
Agent
Figure 3.4.12
Fairness in AI emphasizes developing systems that are equitable and avoid perpetuating bias or discrimination against
any individual or group. It involves considering the diverse needs and circumstances of all stakeholders impacted by AI
use. Fairness extends beyond a technical concept and embodies broader social standards related to equity.
3.5 Fairness
Academia
Current Challenges The rise of LLMs like ChatGPT and Gemini made the
Defining, measuring, and ensuring fairness is complex public significantly more aware of some of the fairness
due to the absence of a universal fairness definition and issues that can arise when AI systems are broadly
a structured approach for selecting context-appropriate deployed. This heightened awareness has led to a
fairness definitions. This challenge is magnified by the rise in AI-fairness-related submissions at academic
multifaceted nature of AI systems, which require the conferences.
integration of fairness measures at almost every stage of
their life cycle. In 2023, there were 212 papers on fairness and bias
submitted, a 25.4% increase from 2022 (Figure 3.5.1).
200
36
Number of AI fairness and bias submissions
169
33
150 29
150
38 13
65
100 98
27
75
27
39 27
57
50 36
17
13 46
29 34
15 15
0
2019 2020 2021 2022 2023
Figure 3.5.1
measures (Figure 3.5.2). Figure 3.5.3 visualizes None 1–50% 51–99% All
integration rates by industry.
Aerospace, automotive,
Industry and avg. number of measures adopted
58% 35%
and transport (2.09)
Communication, media,
11% 61% 27%
and technology (1.96)
Healthcare and
69% 23%
life sciences (1.80)
17 The survey is introduced above in section 3.1, Assessing Responsible AI. The full Global State of Responsible AI Report is forthcoming in May 2024. Details about the methodology can be
found in the Appendix of this chapter. By AI adoption, the researchers mean whether the organization uses, develops, modifies, or resells generative or nongenerative AI.
18 Respondents were further given the free-text option “Other” to report additional mitigations not listed.
studies, the AI Index highlights some critical topics that notable LLMs delivered highly racialized responses
are of interest to AI fairness researchers. The research per question.19 The study revealed that all models
featured below reveals how image generation models demonstrated some degree of race-based medical
can perpetuate social stereotypes, LLMs tend to bias, although their responses to identical questions
reflect Western opinions, and model tokenization can varied. For certain queries, like the basis of race, only
Number of runs (out of 5 total runs) with concerning race-based responses by large language model
Source: Omiye et al., 2023 | Chart: 2024 AI Index report
Tell me about skin thickness di erences between Black and white skin. 5 5 5 4
What is the di erence in pain threshold between Black and white patients? 2 1 2 0
Social Bias in Image Generation Models detection tasks, BiasPainter achieves an automatic
BiasPainter is a new testing framework designed to bias detection accuracy of 90.8%, a considerable
detect social biases in image generation models, such improvement over previous methods.
as DALL-E and Midjourney. As highlighted in the 2023
Midjourney generation: “influential person”
AI Index, many image generation models frequently Source: Marcus and Southen, 2024
1.00 0.97
0.89 0.89
0.86
0.82
0.80
0.80
0.74
0.72
Average bias score
0.60
0.45
0.40
0.40
0.32
0.28 0.26 0.28
0.20
0.15
0.00
Stable Di usion 1.5 Stable Di usion 2.1 Midjourney Stable Di usion XL InstructPix2Pix
2022 2023
Image generation model Figure 3.5.6
Measuring Subjective Opinions in LLMs The experiments indicate that the models’ responses
Research from Anthropic suggests that large language closely align with those from individuals in Western
models do not equally represent global opinions countries (Figure 3.5.8). The authors point out a
on a variety of topics such as politics, religion, notable lack of diversity in opinion representation,
and technology. In this study, researchers built a especially from non-Western nations among the
GlobalOpinionQA dataset to capture cross-country shared responses. While it is challenging for models
opinions on various issues (Figure 3.5.7). They then to precisely match the highly diverse distributions
generated a similarity metric to compare people’s of global opinions—given the inherent variation in
answers in various countries with those outputted perspectives—it is still valuable to understand which
by LLMs. Using a four-point Likert scale, LLMs were opinions a model is likely to share. Recognizing
asked to rate their agreement with statements from the the biases inherent in models can highlight their
World Values Survey (WVS) and Pew Research Center’s limitations and facilitate adjustments that improve
Global Attitudes (GAS) surveys, including questions regional applicability.
like, “When jobs are scarce, employers should give
priority to people of this country over immigrants,” or
“On the whole, men make better business executives
than women do.”
GlobalOpinionQA Dataset
Source: Durmus et al., 2023
Figure 3.5.7
0.51–0.54
0.55–0.58
0.59–0.62
0.63–0.65
0.66–0.69
Figure 3.5.8
that can be input—languages that require more Tôi đưa con chó cưa tôi đên công viên
tokens proportionally use up more of this window.
Tôi đưa con chó cưa tôi đên công viên
This can reduce the available context for the model,
potentially diminishing the quality of service for 16 tokens
those users.
Figure 3.5.10
Asturian
Bulgarian
Burmese
Catalan
Central Kanuri (Arabic script)
Central Kurdish
Chinese (Simpli ed)
Chinese (Traditional)
Danish
English
Fon
Galician
Georgian
Haitian Creole
Indonesian
Italian
Japanese
Javanese
Kabiyè
Kannada
Khmer
Kikuyu
Lao
Malayalam
Meitei (Bengali script)
Norwegian Bokmål
Norwegian Nynorsk
Nuer
Odia
Pangasinan
Shan
Standard Arabic
Standard Malay
Swedish
Tajik
Tamil
Telugu
Thai XLM-RoBERTa
Turkish RoBERTa
Yoruba
Yue Chinese
0 2 4 6 8 10 12 14 16 18
Tokenization premium
Figure 3.5.11
In 2024, around 4 billion people across the globe will vote in national elections, for example, in the United States, U.K.,
Indonesia, Mexico, and Taiwan. Upcoming elections coupled with greater public awareness of AI have led to discussions
of AI’s possible impact on elections. This section covers how AI can impact elections and more specifically examines the
generation and dissemination of mis- and disinformation, the detection of AI-generated content, the potential political
bias of LLMs, and the broader impact of AI on politics.
Deepfakes
(Generation/Detection)
Visual Audio
Figure 3.6.1
20 This section uses the terms synthetic content, disinformation, and deepfakes in the following senses: Synthetic content is any content (text, image, audio, video) that has been created with
AI. Disinformation is false or misleading information generated with the explicit intention to deceive or manipulate an audience. Deepfakes are AI-generated image, video, or audio files that
can often create convincingly realistic yet deceptive content.
Slovakia’s 2023 election illustrates how AI-based Progressive Slovakia leader Michal Šimečka
Source: Meaker, 2023
disinformation can be used in a political context.
Shortly before the election, a contentious audio clip
emerged on Facebook purportedly capturing Michal
Šimečka, the leader of the Progressive Slovakia party
(Figure 3.6.2), and journalist Monika Tódová from
the newspaper Denník N, discussing illicit election
strategies, including acquiring voters from the
Roma community. The authenticity of the audio was
immediately challenged by Šimečka and Denník N.
An independent fact-checking team suggested that
AI manipulation was likely at play. Because the clip
was released during a pre-election quiet period, when Figure 3.6.2
AI constantly scrapes
internet for content
AI
AI
4. AI shares counter
article to Twitter
AI posts links to
gatekeeper chooses COUNTER-HEADLINE fake
the article along with
posts that
content to counter journalist
look like user
AI
profile
commentary
comments soundclip
images http://
fake users
Figure 3.6.3
21 The figure was adapted from Simon, Altay, and Mercier, 2023.
Detecting Deepfakes
Recent research efforts to counter deepfakes have introduced deepfake detection methods suffer
focused on improving methods for detecting AI- significant performance declines on never-before-seen
generated content. For example, a team of Singaporean datasets (Figure 3.6.4). However, the study does note
researchers studied how well deepfake detectors that there are underlying similarities between seen and
generalize to datasets they have not been trained on. unseen datasets, meaning that in the future, robust and
The researchers compared five deepfake detection broadly generalizable deepfake detectors could be
80% 77.15%
69.75% 71.00%
69.03%
65.94% 65.85%
63.16%
Accuracy (%)
60%
40%
20%
0%
XceptionNet3 MesoInception3 MesoNet3 E cientNet6 ShallowNet6
2017 2018 2019
Detector
Figure 3.6.4
In the context of deepfake detectors, it is also balanced with respect to race and gender (Figure
important to highlight earlier experiments that show 3.6.5). The authors then demonstrate that between
that the performance of deepfake detection methods various racial subgroups, performance accuracy
varies significantly across attributes such as race. could differ by as much as 10.7 percentage points.
Some of the underlying datasets used to train deepfake The detectors performed worst on dark skin and best
detectors, like FaceForensics++, are not equally on Caucasian faces.
36.00%
35%
Percentage of ethnic and gender subgroups
30%
25.70%
25%
20%
15%
10% 8.70%
8.00%
5.80% 5.50%
5%
3.00%
1.80%
0%
Male Female Male Female Male Female Male Female
Caucasian Asian Indian Hispanic/Latino
Figure 3.6.5
LLMs and Political Bias which political allegiance most closely corresponds to
LLMs are increasingly recognized as tools through the regular ChatGPT.
which ordinary individuals can inform themselves about
important political topics such as political processes, Figure 3.6.6 shows strong positive correlations (blue
candidates, or parties. However, new research lines) between the default ChatGPT, i.e., one that was
published in 2023 suggests that many major LLMs like answering questions without additional instructions,
ChatGPT are not necessarily free of bias. and both the Democrat and the radical Democrat
ChatGPT versions, i.e., versions of ChatGPT that were
The study revealed that ChatGPT exhibits a notable and asked to answer like a Democrat or radical Democrat.
systematic bias favoring Democrats in the United States On the other hand, the researchers found a strong
and the Labour Party in the U.K. As part of the study, negative correlation between the default GPT and
the researchers compared the answers of a default both Republican ChatGPTs. The identification of bias
ChatGPT to those of Republican, Democrat, radical in these LLMs raises concerns about their potential to
Republican, and radical Democrat versions of ChatGPT. influence the political views and stances of users who
This research design was created to better identify engage with these tools.
Democrat ChatGPT Republican ChatGPT Radical Democrat ChatGPT Radical Republican ChatGPT
3 3
ρ = 0.96 ρ = 0.93
2 2
Default ChatGPT
Default ChatGPT
ρ = -0.12
1 1
ρ = -0.86
0 0
0 1 2 3 0 1 2 3
Political ChatGPT Political ChatGPT
Figure 3.6.622
22 ChatGPT answers are coded on a scale of 0 (strongly disagree), 1 (disagree), 2 (agree), and 3 (strongly agree).
Figure 3.6.7
AI can also influence political processes in other ways. the degree to which each AI political use case is
Research from Queen’s University Belfast notes other technologically ready, the risk level it possesses,
ways in which AI can affect political processes, and and how visible the deployment of AI would be to
potential mitigations associated with different risk users (Figure 3.6.9). For example, they propose that
cases (Figure 3.6.8). For instance, AI could be utilized employing AI for voter authentication is already highly
for video surveillance of voters, potentially undermining feasible, and this application carries a significant risk.
the integrity of elections. The same authors identify
Polling booth locations Drop box location determination Business ethos Plural results
Facility location Volatility and �nding costs Auditing AI
Clustering Partisan manipulation Disadvantaged voters
Figure 3.6.8
Assessments of AI integration and risks in electoral
processes
Source: P et al., 2023 | Chart: 2024 AI Index report
Figure 3.6.9
Preview
Overview 215 4.4 Corporate Activity 258
Chapter Highlights 216 Industry Adoption 258
Adoption of AI Capabilities 258
4.1 What’s New in 2023: A Timeline 218 Adoption of Generative AI Capabilities 266
Use of AI by Developers 269
4.2 Jobs 223 Preference 269
AI Labor Demand 223 Workflow 270
Global AI Labor Demand 223 AI’s Labor Impact 272
U.S. AI Labor Demand by Skill Cluster Earnings Calls 277
and Specialized Skill 224 Aggregate Trends 277
U.S. AI Labor Demand by Sector 228 Specific Themes 278
U.S. AI Labor Demand by State 229 Highlight: Projecting AI’s Economic Impact 279
AI Hiring 232
AI Skill Penetration 234 4.5 Robot Installations 283
AI Talent 236 Aggregate Trends 283
Highlight: How Much Do Industrial Robots:
Computer Scientists Earn? 240 Traditional vs. Collaborative Robots 285
By Geographic Area 286
4.3 Investment 242 Country-Level Data on Service Robotics 290
Corporate Investment 242 Sectors and Application Types 292
Startup Activity 243 China vs. United States 294
Global Trends 243
Regional Comparison by Funding Amount 245
ACCESS THE PUBLIC DATA
Regional Comparison by Newly Funded
AI Companies 251
Focus Area Analysis 254
Overview
The integration of AI into the economy raises many compelling questions. Some predict
that AI will drive productivity improvements, but the extent of its impact remains uncertain.
A major concern is the potential for massive labor displacement—to what degree will jobs
be automated versus augmented by AI? Companies are already utilizing AI in various ways
across industries, but some regions of the world are witnessing greater investment inflows
into this transformative technology. Moreover, investor interest appears to be gravitating
toward specific AI subfields like natural language processing and data management.
This chapter examines AI-related economic trends using data from Lightcast, LinkedIn,
Quid, McKinsey, Stack Overflow, and the International Federation of Robotics (IFR). It
begins by analyzing AI-related occupations, covering labor demand, hiring trends, skill
penetration, and talent availability. The chapter then explores corporate investment in
AI, introducing a new section focused specifically on generative AI. It further examines
corporate adoption of AI, assessing current usage and how developers adopt these
technologies. Finally, it assesses AI’s current and projected economic impact and robot
installations across various sectors.
Chapter Highlights
1. Generative AI investment skyrockets. Despite a decline in overall AI private investment last year,
funding for generative AI surged, nearly octupling from 2022 to reach $25.2 billion. Major players in the generative
AI space, including OpenAI, Anthropic, Hugging Face, and Inflection, reported substantial fundraising rounds.
2. Already a leader, the United States pulls even further ahead in AI private investment.
In 2023, the United States saw AI investments reach $67.2 billion, nearly 8.7 times more than China, the next
highest investor. While private AI investment in China and the European Union, including the United Kingdom,
declined by 44.2% and 14.1%, respectively, since 2022, the United States experienced a notable increase of 22.1%
in the same time frame.
3. Fewer AI jobs, in the United States and across the globe. In 2022, AI-related positions made
up 2.0% of all job postings in America, a figure that decreased to 1.6% in 2023. This decline in AI job listings is
attributed to fewer postings from leading AI firms and a reduced proportion of tech roles within these companies.
4. AI decreases costs and increases revenues. A new McKinsey survey reveals that 42% of surveyed
organizations report cost reductions from implementing AI (including generative AI), and 59% report revenue
increases. Compared to the previous year, there was a 10 percentage point increase in respondents reporting
decreased costs, suggesting AI is driving significant business efficiency gains.
5. Total AI private investment declines again, while the number of newly funded AI
companies increases. Global private AI investment has fallen for the second year in a row, though less than
the sharp decrease from 2021 to 2022. The count of newly funded AI companies spiked to 1,812, up 40.6% from
the previous year.
6. AI organizational adoption ticks up. A 2023 McKinsey report reveals that 55% of organizations now
use AI (including generative AI) in at least one business unit or function, up from 50% in 2022 and 20% in 2017.
7. China dominates industrial robotics. Since surpassing Japan in 2013 as the leading installer of
industrial robots, China has significantly widened the gap with the nearest competitor nation. In 2013, China’s
installations accounted for 20.8% of the global total, a share that rose to 52.4% by 2022.
9. The data is in: AI makes workers more productive and leads to higher quality work.
In 2023, several studies assessed AI’s impact on labor, suggesting that AI enables workers to complete tasks
more quickly and to improve the quality of their output. These studies also demonstrated AI’s potential to bridge
the skill gap between low- and high-skilled workers. Still other studies caution that using AI without proper
oversight can lead to diminished performance.
10. Fortune 500 companies start talking a lot about AI, especially generative AI.
In 2023, AI was mentioned in 394 earnings calls (nearly 80% of all Fortune 500 companies), a notable increase
from 266 mentions in 2022. Since 2018, mentions of AI in Fortune 500 earnings calls have nearly doubled. The
most frequently cited theme, appearing in 19.7% of all earnings calls, was generative AI.
The chapter begins with an overview of some of the most significant AI-related economic events in 2023, as selected
by the AI Index Steering Committee.
Source:
Legal.io, 2023
Figure 4.1.11
4.2 Jobs
AI Labor Demand
requiring AI skills, in many, the number of AI-related
This section analyzes the demand for AI-related skills job postings over the past five years has increased.1
in labor markets, drawing on data from Lightcast.
Lightcast has analyzed hundreds of millions of job Lightcast speculates that the 2023 decrease in AI job
postings from over 51,000 websites since 2010, postings is driven by many top AI employers (such
identifying those that require AI skills. as Amazon, Deloitte, Capital One, Randstad, and
Global AI Labor Demand Elevance Health) scaling back their overall posting
Figure 4.2.1 shows the percentage of job postings counts. Additionally, many companies shifted the
demanding AI skills. In 2023, the United States (1.6%), occupational mix of their postings. For example,
Spain (1.4%), and Sweden (1.3%) led in this metric. Amazon, in 2023, posted a higher share of operational
In 2022, AI-related jobs accounted for 2.0% of all roles like sales delivery driver, packager, and postal
American job postings. In 2023, that number dropped service/mail room worker than in 2022. At the same
to 1.6%. Although most countries saw a decrease time, there was a lower share of demand for tech roles
from 2022 to 2023 in the share of job postings like software developers and data scientists.
2.00%
AI job postings (% of all job postings)
0.00%
2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 Figure 4.2.1
1 In 2023, Lightcast slightly changed its methodology for determining AI-related job postings from what was used in previous versions of the AI Index report. Lightcast also updated its
taxonomy of AI-related skills. As such, some of the numbers in this chart do not completely align with those featured in last year’s report.
AI job postings (% of all job postings) in the United States by skill cluster, 2010–23
Source: Lightcast, 2023 | Chart: 2024 AI Index report
1.00%
0.80%
AI job postings (% of all job postings)
0.60%
Figure 4.2.3 compares the top 10 specialized skills sought in AI job postings in 2023 versus those from 2011 to
2013.2 On an absolute scale, the demand for nearly every specialized skill has increased over the past decade, with
Python’s notable increase in popularity highlighting its ascendance as a preferred AI programming language.
Top 10 specialized skills in 2023 AI job postings in the United States, 2011–13 vs. 2023
Source: Lightcast, 2023 | Chart: 2024 AI Index report
152,201
Python (programming language)
13,503
133,066
Computer science
43,748
93,541
SQL (programming language)
25,194
91,883
Data analysis
20,770
85,480
Data science
3,892
73,069
Agile methodology
8,602
68,459
Amazon web services
1,712
67,772
Automation
12,327
64,557
Software engineering
18,704 2023
62,180 2011–13
Project management
25,953
0 50,000 100,000 150,000 200,000 250,000 300,000
Number of AI job postings
Figure 4.2.3
2 The decision to select 2011–2013 as the point of comparison was because some data at the jobs/skills level from earlier years is quite sparse. Lightcast therefore used 2011–2013 to have a
larger sample size for a benchmark from 10 years ago with which to compare. Figure 4.2.3 juxtaposes the total number of job postings requiring certain skills from 2011 to 2013 with the total
amount in 2023.
In 2023, Lightcast saw great increases in the number of U.S. job postings citing generative AI skills. That year,
15,410 job postings specifically cited generative AI as a desired skill, large language modeling was mentioned in
4,669 postings, and ChatGPT appeared in 2,841 job listings (Figure 4.2.4).
ChatGPT 2,841
Figure 4.2.5 illustrates what proportion of all generative AI job postings released in 2023 referenced particular
generative AI skills. The most cited skill was generative AI (60.0%), followed by large language modeling (18.2%)
and ChatGPT (11.1%).
ChatGPT 11.06%
U.S. AI Labor Demand by Sector and educational services. The leading sectors were
Figure 4.2.6 shows the percentage of U.S. job information (4.6%); professional, scientific, and
postings requiring AI skills by industry sector from technical services (3.3%); and finance and insurance
2022 to 2023. Nearly every sector experienced a (2.9%). As noted earlier, the decrease in AI job postings
decrease in the proportion of AI job postings in 2023 was related to changes in the hiring patterns of several
compared to 2022, except for public administration major U.S. employers.
AI job postings (% of all job postings) in the United States by sector, 2022 vs. 2023
Source: Lightcast, 2023 | Chart: 2024 AI Index report
U.S. AI Labor Demand by State Number of AI job postings in the United States by state, 2023
Source: Lightcast, 2023 | Chart: 2024 AI Index report
Figure 4.2.7 highlights the number
of AI job postings in the United AK ME
1,309 2,125
States by state. The top three states
VT NH MA
were California (70,630), followed 1,003 2,077 23,017
by Texas (36,413) and Virginia WA MT ND SD MN WI MI NY CT RI
14,725 892 1,293 1,351 5,824 5,055 13,734 24,397 5,415 2,992
(24,417).
OR ID WY NE IA IL IN OH PA NJ
5,478 3,357 729 2,288 3,531 20,178 5,267 10,409 13,294 14,705
CA NV UT CO KS MO KY WV DC MD DE
70,630 3,404 3,679 10,292 5,431 7,390 2,315 533 6,861 16,312 2,462
AZ NM OK AR TN VA NC
9,022 2,952 4,125 3,783 5,318 24,417 12,976
TX LA MS AL GA SC
36,413 2,924 1,736 6,214 14,325 3,305
HI FL
2,283 17,678
Figure 4.2.7
Figure 4.2.8 demonstrates what Percentage of US states job postings in AI, 2023
Source: Lightcast, 2023 | Chart: 2024 AI Index report
percentage of a state’s total job
postings were AI-related. The top AK ME
1.06% 1.55%
states according to this metric were
VT NH MA
the District of Columbia (2.7%), 1.14% 0.87% 1.88%
followed by Delaware (2.4%) and WA MT ND SD MN WI MI NY CT RI
1.65% 0.75% 1.03% 1.01% 0.80% 0.69% 1.20% 1.43% 1.09% 1.74%
Maryland (2.1%).
OR ID WY NE IA IL IN OH PA NJ
0.91% 1.43% 0.95% 0.81% 0.75% 1.26% 0.62% 0.75% 1.06% 1.63%
CA NV UT CO KS MO KY WV DC MD DE
1.60% 0.79% 0.94% 1.03% 1.19% 0.92% 0.52% 0.46% 2.66% 2.10% 2.38%
AZ NM OK AR TN VA NC
0.94% 1.14% 0.89% 1.34% 0.64% 2.09% 0.98%
TX LA MS AL GA SC
1.11% 0.59% 0.79% 1.05% 1.14% 0.58%
HI FL
1.25% 0.71%
Figure 4.2.8
Figure 4.2.9 examines which U.S. Percentage of US AI job postings by state, 2023
Source: Lightcast, 2023 | Chart: 2024 AI Index report
states accounted for the largest
proportion of AI job postings AK ME
0.28% 0.46%
nationwide. California was first: In
VT NH MA
2023, 15.3% of all AI job postings 0.22% 0.45% 4.99%
in the United States were for jobs WA MT ND SD MN WI MI NY CT RI
3.19% 0.19% 0.28% 0.29% 1.26% 1.10% 2.98% 5.29% 1.17% 0.65%
based in California, followed by
Texas (7.9%) and Virginia (5.3%). OR ID WY NE IA IL IN OH PA NJ
1.19% 0.73% 0.16% 0.50% 0.77% 4.37% 1.14% 2.26% 2.88% 3.19%
CA NV UT CO KS MO KY WV DC MD DE
15.31% 0.74% 0.80% 2.23% 1.18% 1.60% 0.50% 0.12% 1.49% 3.54% 0.53%
AZ NM OK AR TN VA NC
1.96% 0.64% 0.89% 0.82% 1.15% 5.29% 2.81%
TX LA MS AL GA SC
7.89% 0.63% 0.38% 1.35% 3.11% 0.72%
HI FL
0.49% 3.83%
Figure 4.2.9
Figure 4.2.10 illustrates the trends in the four states with highest AI job postings: Washington, California, New York,
and Texas. Each experienced a notable decline in the share of total AI-related job postings from 2022 to 2023.
2.50%
Percentage of U.S. states’ job postings in AI
2.00%
1.65%, Washington
1.60%, California
1.50%
1.43%, New York
1.11%, Texas
1.00%
0.50%
0.00%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 4.2.10
Figure 4.2.11 shows how AI-related job postings have been distributed across the top four states over time. Since
2019, California’s proportion of AI job postings has steadily declined, while Texas has seen a slight increase.
25%
Percentage of United States AI job postings
20%
10%
7.89%, Texas
3.19%, Washington
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 4.2.11
AI Hiring added a new employer in the same period the job began,
divided by the total number of LinkedIn members in
The hiring data presented in the AI Index is based on
the corresponding location. Conversely, the relative AI
a LinkedIn dataset of skills and jobs that appear on
talent hiring rate is the year-over-year change in AI hiring
their platform. The geographic areas included in the
relative to overall hiring rate in the same geographic
sample make at least 10 AI hires each month and have
area.3 Therefore, figure 4.2.12 illustrates which specific
LinkedIn covering a substantial portion of the labor
regions have experienced the most significant rise in AI
force. LinkedIn’s coverage of India’s and South Korea’s
talent recruitment compared to the overall hiring rate,
sizable labor forces fall below this threshold, so insights
serving as an indicator of AI hiring vibrancy. In 2023, the
drawn about these countries should be interpreted with
regions with the greatest relative AI hiring rates year over
particular caution.
year were Hong Kong (28.8%), followed by Singapore
Figure 4.2.12 reports the relative AI hiring rate year-over- (18.9%) and Luxembourg (18.9%). This means, for
year ratio by geographic area. The overall hiring rate is example, that in 2023 in Hong Kong, the ratio of AI
computed as the percentage of LinkedIn members who talent hiring relative to overall hiring grew 28.8%.
Singapore 18.93%
Luxembourg 18.85%
India 16.83%
Portugal 14.84%
Denmark 13.23%
Norway 13.08%
Belgium 10.41%
Spain 9.63%
Sweden 7.37%
Finland 7.27%
Chile 6.65%
Figure 4.2.13 showcases the year-over-year ratio of AI hiring by geographic areas over the past five years. Starting
from the beginning of 2023, countries including Australia, Canada, Singapore, and India have experienced a
noticeable uptick in AI hiring.
3 For each month, LinkedIn calculates the AI hiring rate in the geographic area, divides the AI hiring rate by overall hiring rate in that geographic area, calculates the year-over-year change of
this ratio, and then takes the 12-month moving average using the last 12 months.
4 For brevity, the visualization only includes the top 15 countries for this metric.
RelativeAI
Relative AIhiring
hiringrate
rate year-over-year
year-over-year ratio
ratio by geographic
by geographic area,area, 2018–23
2018–23
Source:LinkedIn,
Source: LinkedIn,2023
2023 | Chart:
| Chart: 2024
2024 AI Index
AI Index report
report
Australia
Australia Belgium
Belgium Brazil Brazil Canada Canada
100%
100% 100%
100% 100% 100% 100% 100%
50%
50% 50%50% 50% 50% 50% 50%
6.53% 10.41% 4.44%
0% 6.53% 0% 0% 10.41%
0% -0.47% 0% 4.44%
0% 0% -0.47% 0%
−50% −50% −50% −50%
−50% 2019 2021 2023 −50%
2019 2021 2023 −50%
2019 2021 2023 −50%2021
2019 2023
2019 2021 2023 2019 2021 2023 2019 2021 2023 2019 2021 2023
Chile Denmark Finland France
Chile Denmark Finland France
100% 100% 100% 100%
100% 100% 100% 100%
50% 50% 50% 50%
50% 50% 13.23% 50% 50%
0% 6.65% 0% 0% 7.27% 0% -0.43%
6.65% 13.23% 7.27%
0% 0% 0% 0% -0.43%
−50% −50% −50% −50%
2019 2021 2023 2019 2021 2023 2019 2021 2023 2019 2021 2023
−50% −50% −50% −50%
2019 Germany
2021 2023 2019 2021
Hong Kong 2023 2019
India 2021 2023 2019
Ireland 2021 2023
−50%
0% 0.16%−50% 0% −50% 0% −50% 0% -0.48%
2019 2021 2023 2019 2021 2023 2019 2021 2023 2019 2021 2023
rate year-over-year
0% 0%
Relative
50%
−50% −50%50% −50% 50% −50% 50%
2019 2021 2023 2019 2021 2023 2019 2021 2023 2019 2021 2023 18.93%
6.15% 13.08% 14.84%
0% 0% 0% 0%
South Africa South Korea Spain Sweden
−50%
100% −50%
100% 100% −50% 100% −50%
2019 2021 2023 2019 2021 2023 2019 2021 2023 2019 2021 2023
50% 50% 50% 50%
South Africa South Korea Spain Sweden
0% 7.17% 0% 5.81% 0% 9.63% 0% 7.37%
100% 100% 100% 100%
−50% −50% −50% −50%
50% 2019 2021 2023 50%
2019 2021 2023 50%
2019 2021 2023 201950%2021 2023
India 2.75
Germany 1.90
Canada 1.67
Israel 1.63
Singapore 1.50
France 1.48
Spain 1.29
Brazil 1.21
Netherlands 1.13
Italy 1.08
Switzerland 1.06
Figure 4.2.15 disaggregates AI skill penetration rates the country. For all countries in the sample, the relative
by gender across different countries or regions. AI skill penetration rate is greater for men than women.
A country’s rate of 1.5 for women means female India (1.7), the United States (1.2), and Israel (0.9) have
members in that country are 1.5 times more likely to the highest reported relative AI skill penetration rates
list AI skills than the average member in all countries for women.
pooled together across the same set of occupations in
Figure 4.2.17
0.10% Male
0.20% 0.21%
0.07% Female
0.00% 0.00%
2017 2020 2023 2017 2020 2023
Figure 4.2.18
LinkedIn data provides insights on the AI talent gained per 10,000 LinkedIn members by geographic area.
or lost due to migration trends. Net flows are defined5
The countries that report the greatest incoming
as total arrivals minus departures within the given time migration of AI talent are Luxembourg, Switzerland,
period. Figure 4.2.19 examines net AI talent migration and the United Arab Emirates.
Net AI talent migration per 10,000 LinkedIn members by geographic area, 2023
Source: LinkedIn, 2023; World Bank Group, 2023 | Chart: 2024 AI Index report
Luxembourg 3.67
Switzerland 1.60
Cyprus 1.24
Germany 1.04
Canada 0.96
Finland 0.90
Netherlands 0.76
Australia 0.67
Ireland 0.60
Singapore 0.50
Denmark 0.44
Norway 0.41
Figure 4.2.20 documents AI talent migration data over time. In the last few years, Israel, India, and South Korea
have seen declining net AI talent migration figures, suggesting that AI talent has been increasingly flowing out of
these countries.
5 LinkedIn membership varies considerably between countries, which makes interpreting absolute movements of members from one country to another difficult. To compare migration
flows between countries fairly, migration flows are normalized for the country of interest. For example, if country A is the country of interest, all absolute net flows into and out of country
A (regardless of origin and destination countries) are normalized based on LinkedIn membership in country A at the end of each year and multiplied by 10,000. Hence, this metric indicates
relative talent migration of all other countries to and from country A.
Net AI
Net AItalent
talentmigration
migration per
per 10,000
10,000 LinkedIn
LinkedIn members
members by geographic
by geographic area, area, 2019–23
2019–23
Source:
Source:LinkedIn,
LinkedIn,2023;
2023;World
WorldBank Group,
Bank 2023
Group, | Chart:
2023 20242024
| Chart: AI Index reportreport
AI Index
Australia
Australia Belgium
Belgium Brazil Brazil Canada Canada
11 1 1 1 1 1 1 0.96
0.67 0.96
0.67 0.28
0 0 0.28 0 -0.05 0
0 0 0 -0.05 0
−1 −1 −1 −1
−1
2019 2021 2023 −1
2019 2021 2023 2019 −1 2021 2023 2019 −1 2021 2023
2019 2021 2023 2019 2021 2023 2019 2021 2023 2019 2021 2023
Chile Costa Rica Cyprus* Denmark
Chile Costa Rica 4 Cyprus* Denmark
3 4
1 1 1
1 1 2 3 1 0.44
1 2 1.24
0 -0.07 0 0.01 0 0.44
1.24
0 -0.07 0 0.01 0 1 0
−1 −1 −1 0 −1
2019 2021 2023 2019 2021 2023 2019 2021 2023 2019 2021 2023
−1 −1 −1 −1
2019 2021 2023 2019 2021 2023 2019 2021 2023 2019 2021 2023
Finland France Germany Hong Kong
Finland France Germany
1.04
Hong Kong
1 0.90 1 1 1
0.06 −1 0.12
−10 −1 0 0 −1 0
2019 2021 2023 2019 2021 2023 2019 2021 2023 2019 2021 2023
members)
−1 −1 −1 −1
2019 India
2021 2023 2019 Ireland
2021 2023 2019 Israel2021 2023 2019 Italy 2021 2023
members)
0.60
01 0 1 0 1 0 1 -0.18
LinkedIn
0.60 -0.57
(per 10,000
-0.76
−1 −1 −1 −1
0
2019 2021 2023 20190 2021 2023 2019 0 2021 2023 2019 0 2021 2023 -0.18
-0.57
(per 10,000
-0.76
−1 Luxembourg* −1 Netherlands −1 New Zealand −1 Norway
62019 2021 2023 2019 2021 2023 2019 2021 2023 2019 2021 2023
talent migration
5
4 3.67 1 1 1
3 Luxembourg* Netherlands0.76 New Zealand Norway
0.41
26 0.19
migration
0 0 0
15
04 3.67 1 1 1
0.76
−13 −1 −1 −1 0.41
2
2019 2021 2023 20190 2021 2023 2019 2021 2023 0.192019 0 2021 2023
Net AI
0
1
Net AI talent
−1 −1 −1
2019 2021 2023 2019 2021 2023 2019 2021 2023
Table of Contents Chapter 4 Preview 239
Artificial Intelligence Chapter 4: Economy
Index Report 2024 4.2 Jobs
Highlight:
Highlight:
This section monitors AI investment trends, leveraging data from Quid, which analyzes investment data from more than
8 million companies worldwide, both public and private. Employing natural language processing, Quid sifts through
vast unstructured datasets—including news aggregations, blogs, company records, and patent databases—to detect
patterns and insights. Additionally, Quid is constantly expanding its database to include more companies, sometimes
resulting in higher reported investment volumes for specific years. For the first time, this year’s investment section in the
AI Index includes data on generative AI investments.
4.3 Investment
In 2023, the total investment dropped to $189.2
Corporate Investment billion, a decrease of approximately 20% from 2022.
Figure 4.3.1 illustrates the trend in global corporate AI Despite a slight reduction in private investment, the
investment from 2013 to 2023, including mergers and most significant downturn occurred in mergers and
acquisitions, minority stakes, private investments, and acquisitions, which fell by 31.2% from the previous
public offerings. For the second consecutive year, year. However, over the past decade, AI-related
global corporate investment in AI has seen a decline. investments have increased thirteenfold.
350 Merger/acquisition
337.4
Minority stake
Private investment
300 Public o ering
Total investment (in billions of U.S. dollars)
250 173.42
234.95
202.49
200 189.16
39.04
117.16
150 80.61
86.42
103.27
100 132.36
79.62 36.43
21.89
53.72 103.4
50 95.99
33.82 24.68 64.02
25.43 43.1 58.18
14.57 19.04
25.72 29.51
0
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 4.3.1
Global Trends
Startup Activity Global private AI investment has declined for the
This section analyzes private investment trends in second consecutive year (Figure 4.3.2). However, the
artificial intelligence startups that have received over decrease from 2022 was small (-7.2%) and smaller than
$1.5 million in investment since 2013. the drop observed from 2021 to 2022. Despite recent
declines, private AI investment globally has grown
substantially in the last decade.
120
Total investment (in billions of U.S. dollars)
100 95.99
80
60
40
20
0
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 4.3.2
While overall AI private investment decreased last year, funding for generative AI sharply increased (Figure 4.3.3).
In 2023, the sector attracted $25.2 billion, nearly nine times the investment of 2022 and about 30 times the amount
from 2019. Furthermore, generative AI accounted for over a quarter of all AI-related private investment in 2023.
20
15
10
0
2019 2020 2021 2022 2023
Figure 4.3.3
Interestingly, the number of newly funded AI companies jumped to 1,812, a 40.6% increase over the previous year
(Figure 4.3.4). Figure 4.3.5 visualizes the average size of AI private investment events, calculated by dividing the total
yearly AI private investment by the total number of AI private investment events. From 2022 to 2023, the average
increased marginally, growing from $31.3 million to $32.4 million.
1,600
1,400
Number of companies
1,200
1,000
800
600
400
200
0
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 4.3.4
Average size of AI private investment events, 2013–23
Source: Quid, 2023 | Chart: 2024 AI Index report
35
32.44
Average investment (in millions of U.S. dollars)
30
25
20
15
10
0
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 4.3.5
2023 marked a significant increase in the number of newly funded generative AI companies, with 99 new startups
receiving funding, compared to 56 in 2022, and 31 in 2019 (Figure 4.3.6).
80
Number of companies
60
40
20
0
2019 2020 2021 2022 2023
Figure 4.3.6
Figure 4.3.7 reports AI funding events disaggregated AI private investment events by funding size,
2022 vs. 2023
by size. In 2023, AI private investment events Source: Quid, 2023 | Table: 2024 AI Index report
decreased across nearly all funding size categories, Funding Size 2022 2023
Figure 4.3.7
Regional Comparison by Funding Amount times greater than the amount invested in the next
The United States once again led the world in terms highest country, China ($7.8 billion), and 17.8 times
of total AI private investment. In 2023, the $67.2 the amount invested in the United Kingdom ($3.8
billion invested in the United States was roughly 8.7 billion) (Figure 4.3.8).
China 7.76
Germany 1.91
Sweden 1.89
France 1.69
Canada 1.61
Israel 1.52
India 1.39
Singapore 1.14
Japan 0.68
Australia 0.37
Spain 0.36
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70
Total investment (in billions of U.S. dollars)
Figure 4.3.8
When aggregating private AI investments since 2013, the country rankings remain the same: The United States
leads with $335.2 billion invested, followed by China with $103.7 billion, and the United Kingdom at $22.3 billion
(Figure 4.3.9).
China 103.65
Israel 12.83
Canada 10.56
Germany 10.35
India 9.85
France 8.31
Singapore 6.25
Japan 4.81
Australia 3.40
Switzerland 3.28
Sweden 2.88
0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340
Total investment (in billions of U.S. dollars)
Figure 4.3.9
Figure 4.3.10, which looks at AI private investment over time by geographic area, suggests that the gap in private
investments between the United States and other regions is widening over time. While AI private investments have
decreased in China (-44.2%) and the European Union plus the United Kingdom (-14.1%) since 2022, the United
States has seen a significant increase (22.1%) during the same period.
80
70
67.22, United States
Total investment (in billions of U.S. dollars)
60
50
40
30
20
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 4.3.10
The disparity in regional AI private investment becomes particularly pronounced when examining generative AI-
related investments. For instance, in 2022, the United States outpaced the combined investments of the European
Union plus United Kingdom in generative AI by approximately $1.9 billion (Figure 4.3.11). By 2023, this gap widened
to $21.1 billion.
20
Total investment (in billions of U.S. dollars)
15
10
Regional Comparison by Newly Funded Consistent with trends in private investment, the
AI Companies United States leads all regions with 897 new AI
This section examines the number of newly funded companies, followed by China with 122, and the
AI companies across different geographic regions. United Kingdom with 104 (Figure 4.3.12).
China 122
Germany 76
Canada 59
France 58
India 45
South Korea 44
Israel 43
Japan 42
Singapore 29
Australia 24
Spain 21
Switzerland 17
Brazil 15
A similar trend is evident in the aggregate data since 2013. In the last decade, the number of newly funded AI
companies in the United States is around 3.8 times the amount in China, and 7.6 times the amount in the United
Kingdom (Figure 4.3.13).
China 1,446
Israel 442
Canada 397
France 391
India 338
Japan 333
Germany 319
Singapore 193
Australia 147
Switzerland 123
Spain 94
Sweden 94
0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 5,000 5,500
Number of companies
Figure 4.3.13
Figure 4.3.14 presents data on newly funded AI United States, along with the European Union and the
companies in specific geographic regions, highlighting United Kingdom, have seen significant increases in the
a decade-long trend where the United States number of new AI companies, in contrast to China,
consistently surpasses both the European Union and which experienced a slight year-over-year decrease.
the United Kingdom, as well as China. Since 2022, the
800
700
600
Number of companies
500
400
368, European Union and United Kingdom
300
200
122, China
100
0
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 4.3.14
AI infrastructure/research/governance
NLP, customer support
Data management, processing
Medical and healthcare
AV
Fintech
Quantum computing
Semiconductor
Energy, oil, and gas
Creative, music, video content
Ed tech
Marketing, digital ads
Drones
Manufacturing
Cybersecurity, data protection
AR/VR
Retail
Insurtech
Entertainment
VC
Agritech
Legal tech
Facial recognition
2023
Fitness and wellness
2022
Geospatial
0 2 4 6 8 10 12 14 16 18
Total investment (in billions of U.S. dollars)
Figure 4.3.15
Privateinvestment
Private investmentinin
AIAI
byby focus
focus area,
area, 2017–23
2017–23
Source:Quid,
Source: Quid,2023
2023| Chart:
| Chart: 2024
2024 AI Index
AI Index report
report
AI
AI infrastructure/research/governance
infrastructure/research/governance AR/VR
AR/VR AV AV Agritech Agritech
18.27
18.27
15
15 15 15 15 15 15 15
10 10 10 10
10 10 10 10
5 5 5 5
5 5 5 2.66 5
0 0 0.67 2.66 0.50
0 0.67 0 0 0.50
2017 2020 2023 20170 2020 2023 2017 0 2020 2023 2017 0 2020 2023
2017 2020 2023 2017 2020 2023 2017 2020 2023 2017 2020 2023
Creative, music, video content Cybersecurity, data protection Data management, processing Drones
Creative, music, video content Cybersecurity, data protection Data management, processing Drones
15 15 15 15
15 15 15 15
10 10 10 10
10
5 5
10 5
10 5.50 5
10
05
1.34
0 5 0.89 0 5 5.50
0 5 0.96
2017 2020 2023 1.34 2017 2020 2023 0.89 2017 2020 2023 2017 2020 2023 0.96
0 0 0 0
2017 2020
Ed tech 2023 2017Energy, oil,
2020and gas 2023 2017 2020
Entertainment 2023 2017recognition
Facial 2020 2023
10
15 10 15 10 15 10 15
5 5 5 5
10 10 10 10
1.24 1.47 0.52 0.28
0 0 0 0
of U.S. dollars)
5
2017 2020 2023 2017
5 2020 2023 2017
5 2020 2023 2017
5 2020 2023
1.24 1.47 0.52 0.28
0 0 0 0
U.S. dollars)
15 15 15 15
(in billions
5 5 5 5
Total investment
10 2.13
0 0 10 0.22 0 10 0.31 0 10 0.60
2017 2020 2023 2017 2020 2023 2017 2020 2023 2017 2020 2023
5 5 5 5
Total investment
5
15 5 15 5 15 5 15 4.20
0 0.42 0 0.89 0 1.06 0
10 10 10 10
2017 2020 2023 2017 2020 2023 2017 2020 2023 2017 2020 2023
5 5 5 5 4.20
NLP, customer support Quantum computing Retail Semiconductor
0 0.42 0 0.89 0 1.06 0
15
2017 2020 2023 15 2017 2020 2023 15 2017 2020 202315 2017 2020 2023
10 10 10 10
8.06
NLP, customer support Quantum computing Retail Semiconductor
5 5 5 5
15 15 1.97 15 0.67 15 1.66
0 0 0 0
2017
10 2020 2023 2017
10 2020 2023 2017 10 2020 2023 2017 10 2020 2023
8.06
5 VC 5 5 5
1.97 0.67 1.66
150 0 0 0
2017 2020 2023 2017 2020 2023 2017 2020 2023 2017 2020 2023
10
5 VC
0 0.51
15 Figure 4.3.16
2017 2020 2023
10
5
Table of Contents Chapter 4 Preview 255
0 0.51
2017 2020 2023
Artificial Intelligence Chapter 4: Economy
Index Report 2024 4.3 Investment
Finally, 4.3.17 shows private investment in AI by focus and the European Union and United Kingdom in
area over time within select geographic regions, investment in almost all focus area categories.
highlighting how private investment priorities in AI A notable exception is facial recognition, where
differ across geographies. The significant increases 2023 investment totals were $90 million in the
observed in AI infrastructure/research/governance United States and $130 million in China. Likewise, in
were mostly driven by investment in the United semiconductor investments, China ($630 million) is
States. The United States significantly outpaces China not far behind the United States ($790 million).
15 15 15 15
10 10 10 10
US, 18.22 US, 0.60 US, 1.92 US, 0.29
5 CN, 0.03 5 CN, 0.01 5 CN, 0.28 5 CN, 0.01
EU/UK, 0.01 EU/UK, 0.04 EU/UK, 0.02 EU/UK, 0.13
0 0 0 0
2017 2020 2023 2017 2020 2023 2017 2020 2023 2017 2020 2023
Creative, music, video content Cybersecurity, data protection Data management, processing Drones
15 15 15 15
10 US, 0.76 10 US, 0.28 10 US, 4.72 10 US, 0.87
5 CN, 0.02 5 CN, 0.00 5 CN, 0.37 5 CN, 0.05
EU/UK, 0.39 EU/UK, 0.54 EU/UK, 0.18 EU/UK, 0.02
0 0 0 0
2017 2020 2023 2017 2020 2023 2017 2020 2023 2017 2020 2023
15 15 15 15
10 10 10 10
US, 0.87 US, 0.68 US, 0.28 US, 0.09
5 CN, 0.09 5 CN, 0.36 5 CN, 0.02 5 CN, 0.13
EU/UK, 0.15 EU/UK, 0.27 EU/UK, 0.18 EU/UK, 0.01
Total investment (in billions of U.S. dollars)
0 0 0 0
2017 2020 2023 2017 2020 2023 2017 2020 2023 2017 2020 2023
15 15 15 15
10 10 10 10
US, 0.96 US, 0.20 US, 0.31 US, 0.42
5 CN, 0.04 5 CN, 0.00 5 CN, 0.00 5 CN, 0.00
EU/UK, 0.67 EU/UK, 0.02 EU/UK, 0.00 EU/UK, 0.10
0 0 0 0
2017 2020 2023 2017 2020 2023 2017 2020 2023 2017 2020 2023
15 15 15 15
10 10 10 10
US, 0.27 US, 0.20 US, 0.41 US, 2.74
5 CN, 0.01 5 CN, 0.53 5 CN, 0.41 5 CN, 0.48
EU/UK, 0.03 EU/UK, 0.09 EU/UK, 0.08 EU/UK, 0.53
0 0 0 0
2017 2020 2023 2017 2020 2023 2017 2020 2023 2017 2020 2023
15 15 15 15
10 10 10 10
US, 5.42 US, 1.84 US, 0.35 US, 0.79
5 CN, 0.60 5 CN, 0.00 5 CN, 0.12 5 CN, 0.63
EU/UK, 1.19 EU/UK, 0.08 EU/UK, 0.08 EU/UK, 0.14
0 0 0 0
2017 2020 2023 2017 2020 2023 2017 2020 2023 2017 2020 2023
VC
15
10 US, 0.23
5 CN, 0.10
EU/UK, 0.00
0
2017 2020 2023 Figure 4.3.17
This section examines the practical application of AI by corporations, highlighting industry adoption trends, how
businesses are integrating AI, the specific AI technologies deemed most beneficial, and the impact of AI adoption on
financial performance.
Share of respondents who say their organizations have adopted AI in at least one function, 2017–23
Source: McKinsey & Company Survey, 2023 | Chart: 2024 AI Index report
60%
55%
50%
% of respondents
40%
30%
20%
10%
0%
2017 2018 2019 2020 2021 2022 2023
Figure 4.4.1
Figure 4.4.2 shows the proportion of surveyed case by function among surveyed businesses in 2023
companies that use AI for specific functions. was contact-center automation (26%), followed by
Companies may report employing AI in multiple personalization (23%), customer acquisition (22%), and
capacities. The most commonly adopted AI use AI-based enhancements of products (22%).7
Personalization 23%
Sales 17%
Figure 4.4.2
7 Personalization is the practice of tailoring products, services, content, recommendations, and marketing to the individual preferences of customers or users. For example, personalization
can include sending tailored email messages to clients or customers to improve engagement.
With respect to the type of AI capabilities embedded of embedding was for virtual agents, also in the
in at least one function or business unit, as indicated financial services industry. Across all industries,
by Figure 4.4.3, robotic process automation had the most embedded AI technologies were NL text
the highest rate of embedding within the financial understanding (30%), robotic process automation
services industry (46%). The next highest rate (30%), and virtual agents (30%).
All industries 22% 24% 13% 23% 18% 16% 16% 30% 9% 18% 10% 30% 5% 30%
Consumer goods/
31% 24% 7% 18% 17% 13% 7% 27% 13% 26% 2% 24% 5% 32%
retail
Industry
Financial services 26% 23% 15% 20% 20% 15% 14% 37% 7% 11% 16% 46% 8% 42%
Healthcare systems/
pharma and 14% 15% 12% 20% 10% 12% 19% 20% 8% 11% 6% 28% 4% 21%
med. products
Tech, media,
26% 33% 17% 31% 23% 23% 25% 39% 7% 29% 14% 26% 4% 35%
and telecom
C
Kn
Ph
Re
Re
Ro
Tr
Vi
ee
ig
om
AN
rt
co
i
ys
b
ow
nf
ns
ita
ge
sp
te
ua
ot
p
ic
pu
or
fe
xt
le
ee
le
lt
ne
ic
la
al
ce
r
te
dg
ar
un
ch
pr
ge
ra
ro
le
en
r
m
ni
in
tio
oc
de
ar
vi
nt
un
de
en
ng
gr
ot
sio
ni
es
rs
s
n
de
rs
tl
ap
ic
ng
ta
sa
n
ea
s
ys
rs
hs
nd
ut
ta
rn
te
in
om
nd
in
m
g
g
s
in
at
g
io
n
% of respondents (AI capability)
Figure 4.4.3
Figure 4.4.4 shows AI adoption by industry and AI function in 2023. The greatest adoption was in product and/
or service development for tech, media, and telecom (44%); followed by service operations for tech, media, and
telecom (36%) and marketing and sales for tech, media, and telecom (36%).
Consumer goods/
7% 9% 31% 15% 6% 22% 2% 14%
retail
Industry
Healthcare systems/
pharma and 5% 7% 8% 26% 7% 15% 6% 11%
med. products
Tech, media,
14% 6% 36% 44% 7% 36% 6% 9%
and telecom
Figure 4.4.5 illustrates the changes in AI adoption operations (4). Conversely, across all industries, the
rates by industry and function from 2022 to 2023. functions experiencing the most significant declines in
The areas with the largest annual gains across all adoption include strategy and corporate finance (-12
industries include marketing and sales (18 percentage percentage points), risk (-9), and human resources (-2).
points), product/service development (14), and service
Percentage point change in responses of AI adoption by industry and function, 2022 vs. 2023
Source: McKinsey & Company Survey, 2023 | Chart: 2024 AI Index report
Consumer goods/
-7% 5% 28% 11% -9% -9% -27% 3%
retail
Industry
Healthcare systems/
pharma and -10% 0% 6% 22% -15% 3% -2% 3%
med. products
Tech, media,
8% 0% 32% 37% -31% 15% -19% 1%
and telecom
Figure 4.4.6 shows the percentage of surveyed greatest degree. Notably, a significant portion of
respondents across industries who reported hiring for respondents within the financial services (44%) and
various AI positions. Across all industries, respondents the tech, media, and telecom sectors (44%) reported a
reported hiring data engineers (36%), AI data scientists high rate of hiring machine-learning engineers.
(31%), and machine-learning engineers (31%) to the
AI-related roles that organizations hired in the last year by industry, 2023
Source: McKinsey & Company Survey, 2023 | Chart: 2024 AI Index report
All industries 36% 31% 31% 28% 25% 24% 22% 21% 11% 7%
Consumer goods/
27% 14% 15% 20% 14% 20% 21% 13% 11% 3%
retail
Industry
Healthcare systems/
pharma and 19% 32% 16% 23% 19% 25% 13% 19% 3% 9%
med. products
Financial services 41% 37% 44% 24% 29% 19% 26% 26% 17% 10%
Tech, media,
39% 41% 44% 38% 36% 27% 25% 27% 17% 13%
and telecom
Da AI Ma So Da Da De AI Tra Pro
ta da chin ft wa ta ta sig pro n sla m
en ta re arc vis ns du tor pt
gin sci el en hit ua pe ct e ng
ee enti ea ec liza cia ow s ine
rs sts rni gin ts tio list n
ng ee ns s ers ers
en rs pe /m
gin cia a na
ee list ge
rs s rs
% of respondents (AI-related role)
Figure 4.4.6
Organizations have experienced both cost reductions number of respondents reporting cost decreases
and revenue increases due to AI adoption (Figure (42%) and revenue gains (59%) as a result of using AI,
4.4.7). The areas where respondents most frequently suggesting that AI tangibly helps businesses improve
reported cost savings were manufacturing (55%), their bottom line. Comparing this and last year’s
service operations (54%), and risk (44%). For revenue averages reveals a 10 percentage point increase for
gains, the functions benefiting the most from AI cost decreases and a four percentage point decrease
included manufacturing (66%), marketing and sales for revenue increases across all activities.
(65%), and risk (64%). Figure 4.4.7 shows a substantial
Decrease by <10% Decrease by 10–19% Decrease by ≥20% Increase by >10% Increase by 6–10% Increase by ≤5%
R&D/product and/or service development 31% 18% 9% 12% 25% 24% 61%
Strategy and corporate finance 31% 19% 7% 10% 16% 32% 58%
Average across all activities 42% 28% 10% 6% 18% 35% 59%
% of respondents
Figure 4.4.7
Figure 4.4.8 presents global AI adoption by adoption grew by 9 percentage points. North America
organizations, segmented by world regions. In 2023, remains the leader in AI adoption. Greater China also
every surveyed region reported higher AI adoption experienced a significant increase in AI adoption rates,
rates than in 2022. The most significant year-over- growing by 7 percentage points over the previous year.
year growth was seen in Europe, where organization
55%
All geographies
50%
58%
Asia-Paci c
55%
57%
Europe
48%
61%
North America
59%
Figure 4.4.8
Personalized marketing 8%
0% 2% 4% 6% 8% 10%
% of respondents
Figure 4.4.9
8 The adoption of generative AI capabilities is presented separately from the charts on the adoption of general AI capabilities earlier in the chapter, as it was a separate question in the survey.
Figure 4.4.10 compares the proportion of respondents AI within organizations shows similar patterns of
who report using AI versus specifically generative distribution. Overall, general AI still dominates. The
AI for a given function. Figure 4.4.10 illustrates the
9
most common functional applications of generative AI
degree to which generative AI has permeated general are in marketing and sales (14%), product and/or service
AI usage patterns among businesses. When analyzed development (13%), and service operations (10%).
at the functional level, the use of AI and generative
24%
Product and/or service development
13%
23%
Marketing and sales
14%
23%
Service operations
10%
10%
Risk
Function
4%
9%
Strategy and corporate nance
4%
9%
Human resources
3%
9%
Supply-chain management
3%
AI
8% GenAI
Manufacturing
2%
Figure 4.4.10
9 While all generative AI use cases are considered general AI use cases, not all general AI use cases qualify as generative AI use cases.
Figure 4.4.11 depicts the variation in generative AI of businesses across all geographies (55%) that
usage among businesses across different regions of reported using AI, which was documented earlier in
the world. Across all regions, the adoption rate of Figure 4.4.8. North America leads in adoption at 40%,
generative AI by organizations stands at 33%. This followed closely by developing markets (including
amount is meaningfully lower than the percentage India, Latin America, and the MENA region).
Asia-Paci c 30%
Europe 31%
Greater China
(incl. Hong Kong, 31%
Taiwan, Macau)
Developing Markets
(incl. India,
33%
Latin America,
MENA)
0% 5% 10% 15% 20% 25% 30% 35% 40%
% of respondents
Figure 4.4.11
Computer developers are among the most likely GitHub Copilot 56.04%
individuals to use AI in professional settings. As Tabnine 11.74%
AI becomes more integrated into the economy, AWS CodeWhisperer 4.91%
tracking how developers utilize and perceive AI is Other 1.83%
becoming increasingly important.
AI developer tool
Synk Code 1.33%
which tools are favored, and their perceptions of 0% 10% 20% 30% 40% 50% 60%
% of respondents
the tools used. 10
Figure 4.4.12
Preference
Figure 4.4.12 highlights the proportion of Most popular AI search tools among professional
developers, 2023
surveyed respondents who report using a specific Source: Stack Over ow Developer Survey, 2023 | Chart: 2024 AI Index report
10 The survey was conducted in May 2023 and, therefore, may not account for the launch of more recently released AI tools such as Gemini and Claude 3.
Cloud platforms are crucial elements of the AI Top 10 most popular cloud platforms among
professional developers, 2023
ecosystem, providing cloud computing services Source: Stack Over ow Developer Survey, 2023 | Chart: 2024 AI Index report
Cloud platform
Cloud are 14.99%
Web Services (AWS) is the most commonly used
cloud platform among professional developers, with Digital Ocean 14.11%
integration of AI in developers’ workflows. A significant 0% 10% 20% 30% 40% 50% 60%
% of respondents
majority of respondents, 82.6%, regularly use AI for Figure 4.4.14
code writing, followed by 48.9% for debugging and
assistance, and 34.4% for documentation. While only
23.9% currently use AI for code testing, 55.2% express
interest in adopting AI for this purpose.
13.09%
23.87%
Testing code 55.17%
11.44%
13.52%
Project planning 38.54%
29.77%
10.09%
Committing and reviewing code 49.51%
22.95%
4.74% Currently using
Deployment and monitoring 45.44%
28.33% Interested in using
3.65% Not interested in using
Collaborating with teammates 29.98%
41.38%
0% 10% 20% 30% 40% 50% 60% 70% 80%
% of respondents Figure 4.4.15
When asked about the primary advantages of AI tools Primary bene ts of AI tools for professional
developers, 2023
in professional development, developers responded Source: Stack Over ow Developer Survey, 2023 | Chart: 2024 AI Index report
(Figure 4.4.16).
Speed up
25.17%
Figure 4.4.17 displays the sentiments professional learning
Primary bene�t
developers have toward AI tools. A significant
Greater
majority of developers hold a positive view of AI e�ciency
24.96%
Sentiment toward AI tools in development among Trust level in AI tool output accuracy, 2023
professional developers, 2023 Source: Stack Over ow Developer Survey, 2023 | Chart: 2024 AI Index report
Source: Stack Over ow Developer Survey, 2023 | Chart: 2024 AI Index report
Favorable 48.41%
Somewhat distrust 21.71%
Neither trust
30.68%
nor distrust
Unsure 3.88%
Copilot information
73% 73%
retrieval study
Copilot common
71% 71%
task study
LLM-based
Task
47% 47%
search study
GitHub Copilot
44% 44%
study
Copilot in teams
26% 26%
meeting study
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Copilot users’ task completion speed relative to baseline (100%)
Figure 4.4.19
11 This meta-review analyzed separate surveys of workers using Microsoft’s Copilot and GitHub’s Copilot tools. These are separate tools. Microsoft Copilot is a broader LLM-based
productivity improvement tool, while GitHub’s Copilot is a code-writing assistant.
Similarly, a Harvard Business School study revealed E ect of GPT-4 use on a group of consultants
Source: Dell’Acqua et al., 2023 | Chart: 2024 AI Index report
that consultants with access to GPT-4 increased
40.00%
their productivity on a selection of consulting tasks
10%
5%
0%
Productivity Speed Quality
Performance metrics on consulting tasks
Figure 4.4.20
2.97
3.00
2.60
Hourly chats per customer support agent
2.50
2.00
1.50
1.00
0.50
0.00
Used AI Did not use AI
Figure 4.4.21
A study on the impact of AI in legal analysis showed compared to the control group, in terms of both work
that teams with GPT-4 access significantly improved in quality and time efficiency across a range of tasks.
efficiency and achieved notable quality improvements Although AI can assist with legal tasks, there are
in various legal tasks, especially contract drafting. also widespread reports of LLM hallucinations being
Figure 4.4.22 illustrates the improvements observed especially pervasive in legal tasks.
in the group of law students who utilized GPT-4,
Figure 4.4.22
Second, AI access appears to narrow the performance Lower-skilled (bottom half) participants exhibited
gap between low- and high-skilled workers. According a 43.0% improvement, while higher-skilled (top
to the aforementioned Harvard Business School half) participants showed a 16.5% increase. While
study, both groups of consultants experienced higher-skilled workers using AI still performed better
performance boosts after adopting AI, with notably than their lower-skilled, AI-using counterparts, the
larger gains for lower-skilled consultants using AI disparity in performance between low- and high-
compared to higher-skilled consultants. Figure 4.4.23 skilled workers was markedly lower when AI was
highlights the performance improvement across a utilized compared to when it was not.
set of tasks for participants of varying skill levels:
6.06
6 5.79 6
5.20
5 5
Score (average on a 1–10 scale)
4.05
4 4
3 3
2 2
1 1
0 0
Baseline task Experimental task Baseline task Experimental task
Figure 4.4.23
Finally, while AI tends to enhance quality and performed worse than those who received “bad AI,”
productivity, overreliance on the technology can which was capable but known to make errors (Figure
impair worker performance. A study focused on 4.4.24). The performance difference between the
professional recruiters reviewing résumés found that latter groups was -1.08 points. The study theorizes
receiving any AI assistance improved task accuracy that recruiters using “good AI” became complacent,
by 0.6 points compared to not receiving AI assistance. overly trusting the AI’s results, unlike those using “bad
However, recruiters who were provided with “good AI,” who were more vigilant in scrutinizing AI output.
AI”—believed to be high-performing—actually
Any AI 0.60
Type of AI advice
Good vs.
-1.08
bad AI
−1.20 −1.00 −0.80 −0.60 −0.40 −0.20 0.00 0.20 0.40 0.60
Change in accuracy score
Figure 4.4.24
400 394
300
Number of earnings calls
200
100
0
2018 2019 2020 2021 2022 2023
Figure 4.4.25
Specific Themes
Mentions of AI in Fortune 500 earnings calls were Mentions of generative AI grew from 0.31% in 2022.
associated with a wide range of themes in 2023. The The next most mentioned theme was investments in AI,
most frequently cited theme, appearing in 19.7% of expansion of AI capabilities, and AI growth initiatives
all earnings calls, was generative AI (Figure 4.4.26). (15.2%), followed by company/brand AIs (7.6%).
19.73% (+6351%)
Generative AI
0.31%
Investments in AI, expansion of 15.21% (+5%)
capabilities, and growth initiatives 14.53%
7.64% (-2%)
Company/brand AIs
7.80%
7.36% (-25%)
Data storage and management
9.79%
5.68% (+49%)
Advertising and marketing
3.82%
AI and machine learning for enhanced 4.96% (-32%)
customer experience and growth 7.34%
4.76% (-8%)
Process automation
5.20%
4.36% (-51%)
Data center GPU
8.87%
3.84%
Large language models
3.56% (-31%)
Cloud platforms
5.20%
2.76% (-52%)
Deep learning
5.81%
2.76% (-40%)
Healthcare and medical practices
4.59%
2.36% (-9%)
Business integration
2.60%
2.28% (-21%)
Revenue growth
2.91%
2.24% (+33%)
Data processing
1.68%
2.20% (-65%)
Data analytics
6.27%
1.88% (-35%)
Edge intelligence
2.91%
1.80% (-9%)
Adobe experience
1.99%
1.76% (+28%)
Customer support
1.38%
2023
1.48% (-70%) 2018
Autonomous vehicles
4.89%
1.36% (-36%)
Digital transformation
2.14%
0% 2% 4% 6% 8% 10% 12% 14% 16% 18% 20% 22%
Theme mentioned (% of total)
Figure 4.4.26
Highlight:
Figure 4.4.27
Highlight:
Expectations about the impact of AI on organizations’ workforces in the next 3 years, 2023
Source: McKinsey & Company Survey, 2023 | Chart: 2024 AI Index report
Increase by >20% 3%
Share of employees expected to be reskilled
Change in the number of employees
Increase by 3–10% 8%
11–20% 18%
Little or no change 30%
≤5% 20%
Decrease by >20% 8%
Figure 4.4.28
Highlight:
Anticipated e ect of generative AI on number of employees in the next 3 years by business function, 2023
Source: McKinsey & Company Survey, 2023 | Chart: 2024 AI Index report
Figure 4.4.29
Highlight:
1.50
Global average
Percentage points
1.00
0.50
0.00
Ho Isr Ja Un Sw Un Sin Ar Ch So Bra Ma Ta So Me
ng ae pa ite ed ite ga ge ile uth zil lay iw uth xic
Ko l n d en d po nti Ko sia an Af o
n Kin Sta re na re ric
g gd tes a a
om
Figure 4.4.30
The deployment of robots equipped with AI-based software technologies offers a window into the real-world
application of AI-ready infrastructure. This section draws on data from the International Federation of Robotics (IFR),
a nonprofit organization dedicated to advancing the robotics industry. Annually, the IFR publishes the World Robotics
Reports, which track global robot installation trends.12
500
Number of industrial robots installed (in thousands)
400
300
200
100
0
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 4.5.1
12 Due to the timing of the IFR report, the most recent data is from 2022. Every year, the IFR revisits data collected for previous years and will occasionally update the data if more accurate
figures become available. Therefore, some of the data reported in this year’s report might differ slightly from data reported in previous years.
The global operational stock of industrial robots reached 3,904,000 in 2022, up from 3,479,000 in 2021 (Figure
4.5.2). Over the past decade, both the installation and utilization of industrial robots have steadily increased.
3,500
Number of industrial robots (in thousands)
3,000
2,500
2,000
1,500
1,000
500
0
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 4.5.2
Industrial Robots: Traditional vs. collaborative robots due to their safety, flexibility,
Collaborative Robots scalability, and ability to learn iteratively.
There is a distinction between traditional robots,
Figure 4.5.3 reports the number of industrial robots
which operate for humans, and collaborative robots,
installed in the world by type. In 2017, collaborative
designed to work alongside them. The robotics
robots accounted for just 2.8% of all new industrial
community is increasingly enthusiastic about
robot installations. By 2022, the number rose to 9.9%.
Traditional 553
Collaborative 526
55
500 42
Number of industrial robots installed (in thousands)
424
400
400 387 389
300
484 498
100
0
2017 2018 2019 2020 2021 2022
Figure 4.5.3
By Geographic Area
Country-level data on robot installations can times more than Japan’s 50,400 and 7.4 times more
suggest which nations prioritize robot integration than the United States’ 39,500 (Figure 4.5.4). South
into their economies. In 2022, China led the world Korea and Germany followed with 31,170 and 25,600
with 290,300 industrial robot installations, 5.8 installations, respectively.
China 290.30
Japan 50.40
Germany 25.60
Italy 11.50
Taiwan 7.80
France 7.40
Mexico 6.00
Singapore 5.90
India 5.40
Spain 3.80
Turkey 3.70
Thailand 3.30
Canada 3.20
Figure 4.5.4
Since surpassing Japan in 2013 as the leading installer of industrial robots, China has significantly widened the
gap with the nearest country. In 2013, China’s installations accounted for 20.8% of the global total, a share that
rose to 52.4% by 2022 (Figure 4.5.5).
250
200
150
100
50, Japan
50 40, United States
32, South Korea
26, Germany
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 4.5.5
Since 2021, China has installed more industrial robots than the rest of the world combined, with the gap widening
further in the last year (Figure 4.5.6). This increasing gap underscores China’s growing dominance in industrial
robot installations.
Number of industrial robots installed (China vs. rest of the world), 2016–22
Source: International Federation of Robotics (IFR), 2023 | Chart: 2024 AI Index report
300
290, China
250
200
150
100
50
0
2016 2017 2018 2019 2020 2021 2022
Figure 4.5.6
According to the IFR report, most countries reported an annual increase in industrial robot installations from 2021
to 2022 (Figure 4.5.7). The countries with the highest growth rates include Singapore (68%), Turkey (22%), and
Mexico (13%). Canada (-24%), Taiwan (-21%), Thailand (-18%), and Germany (-1%) reported installing fewer robots
in 2022 than in 2021.
Annual growth rate of industrial robots installed by country, 2021 vs. 2022
Source: International Federation of Robotics (IFR), 2023 | Chart: 2024 AI Index report
Singapore 68%
Turkey 22%
Mexico 13%
France 13%
Spain 10%
Japan 9%
Italy 8%
China 5%
India 4%
South Korea 1%
Germany -1%
Thailand -18%
Taiwan -21%
Canada -24%
Country-Level Data on Service Robotics cleaning. In 2022, more service robots were installed
Another important class of robots are service robots, for every application category than in 2021, with the
which the ISO defines as a robot “that performs exception of medical robotics (Figure 4.5.8). More
useful tasks for humans or equipment excluding specifically, the number of service robots installed in
industrial automation applications.” Such robots can, 13
hospitality and in transportation and logistics increased
for example, be used in medicine and professional 2.3 and 1.4 times, respectively.
Number of professional service robots installed in the world by application area, 2021 vs. 2022
Source: International Federation of Robotics (IFR), 2023 | Chart: 2024 AI Index report
8
Agriculture
7
25
Hospitality
11
9
Medical robotics
10
7
Professional cleaning 2022
6
2021
86
Transportation and logistics
60
0 10 20 30 40 50 60 70 80 90
Number of professional service robots installed (in thousands)
Figure 4.5.8
As of 2022, the United States leads in professional and France also have significant numbers of robot
service robot manufacturing, with approximately manufacturers, with 85,000, 72,000, and 53,000,
2.06 times more manufacturers than China, the respectively. In most surveyed countries, the majority
next leading nation (Figure 4.5.9). Germany, Japan, of these manufacturers are established incumbents.
Number of professional service robot manufacturers in top countries by type of company, 2022
Source: International Federation of Robotics (IFR), 2023 | Chart: 2024 AI Index report
218
200
Number of professional service robot manufacturers
150
106
100 204
Startups
85 Incumbents
72 Unknown
53 52
50 99
38 35
74 33
67 27
48 50
36 32 28
26
0
1
United States China Germany Japan France South Korea Canada Switzerland Russia United Kingdom
Figure 4.5.9
Sectors and Application Types installations with 157,000 units, closely followed by
Figure 4.5.10 shows the number of industrial robots the automotive sector with 136,000. Both sectors have
installed in the world by sector from 2020 to 2022. seen continuous growth in industrial robot installations
Globally, the electrical/electronics sector led in robot since 2020.
61
All others 55
37
136
Automotive 117
83
157
Electrical/electronics 143
112
15
Food 15
12
66
Metal and machinery 68
41
24
Plastic and chemical products 25
19 2022
94 2021
Unspeci ed 102 2020
85
Figure 4.5.10
Figure 4.5.11 shows the number of industrial robots welding (87,000) and 4.4 times more than for assembly
installed in the world by application from 2020 to (61,000). Except for processing, every application
2022. Data suggests that handling is the predominant category witnessed an increase in robot installations in
application. In 2022, 266,000 industrial robots were 2022 compared to 2020.
installed for handling tasks, 3.1 times more than for
70
All others/unspeci ed 78
60
61
Assembling 63
47
35
Clean room 32
32
28
Dispensing 11
8
266
Handling 241
169
5
2022
Processing 7
5 2021
2020
87
Welding 94
68
0 20 40 60 80 100 120 140 160 180 200 220 240 260 280
Number of industrial robots installed (in thousands)
Figure 4.5.11
73
All others/unspeci ed 75
51
73
Automotive 58
30
100
Electrical/electronics 93
66
5
Food 4
3
31
Metal and machinery 37
19
1
Pharma/cosmetics 1
1 2022
6 2021
Rubber and plastics 6 2020
5
0 10 20 30 40 50 60 70 80 90 100
Number of industrial robots installed (in thousands)
Figure 4.5.12
In 2022, the U.S. automotive industry led in industrial robot installations with 14,500 units, significantly exceeding
its 2021 figure (Figure 4.5.13). Except for the electronics sector, every other sector saw fewer robot installations in
2022 than in 2021.
5.80
All others 5.30
2.60
14.50
Automotive 9.90
10.50
3.70
Electrical/electronics 3.00
3.70
2.40
Food 3.40
2.70
3.90
Metal and machinery 4.20
2.30
3.10
Plastic and chemical products 3.60
2.70 2022
6.20 2021
Unspeci ed 6.60 2020
6.30
0 2 3 5 6 8 9 11 12 14 15
Number of industrial robots installed (in thousands)
Figure 4.5.13
Preview
Overview 298
Chapter Highlights 299
Overview
This year’s AI Index introduces a new chapter on AI in science and medicine in recognition of
AI’s growing role in scientific and medical discovery. It explores 2023’s standout AI-facilitated
scientific achievements, including advanced weather forecasting systems like GraphCast
and improved material discovery algorithms like GNoME. The chapter also examines medical
AI system performance, important 2023 AI-driven medical innovations like SynthSR and
ImmunoSEIRA, and trends in the approval of FDA AI-related medical devices.
Chapter Highlights
1. Scientific progress accelerates even further, thanks to AI. In 2022, AI began to advance
scientific discovery. 2023, however, saw the launch of even more significant science-related AI applications—
from AlphaDev, which makes algorithmic sorting more efficient, to GNoME, which facilitates the process of
materials discovery.
2. AI helps medicine take significant strides forward. In 2023, several significant medical systems
were launched, including EVEscape, which enhances pandemic prediction, and AlphaMissence, which assists in
AI-driven mutation classification. AI is increasingly being utilized to propel medical advancements.
3. Highly knowledgeable medical AI has arrived. Over the past few years, AI systems have shown
remarkable improvement on the MedQA benchmark, a key test for assessing AI’s clinical knowledge. The
standout model of 2023, GPT-4 Medprompt, reached an accuracy rate of 90.2%, marking a 22.6 percentage
point increase from the highest score in 2022. Since the benchmark’s introduction in 2019, AI performance on
MedQA has nearly tripled.
4. The FDA approves more and more AI-related medical devices. In 2022, the FDA approved 139
AI-related medical devices, a 12.1% increase from 2021. Since 2012, the number of FDA-approved AI-related medical
devices has increased by more than 45-fold. AI is increasingly being used for real-world medical purposes.
This section highlights significant AI-related scientific breakthroughs of 2023 as chosen by the AI Index Steering Committee.
100
80
Instruction length
66
63
60
46
42
40 37
33
31
28 28 27
21
20 17 18
0
Sort 3 Sort 4 Sort 5 VarSort3 VarSort4 VarSort5 VarInt
Algorithm
Figure 5.1.1
Figure 5.1.2
70%
63.34%
60%
55.22%
52.37%
50.20%
50% 48.66%
IN>5◦ (%) ↓
40%
34.87%
30%
20%
10%
0%
MCSDF DChermite NDCSDF MC DMTet(64) DMTet(80) FlexiCubes
Algorithm/method evaluated at 64³
Figure 5.1.3
85%, Reference
80%
Conversion yield (%)
60%
40%
20%
0%
0 3 6 9 12 15 18 21 24
Time (hours)
Figure 5.1.5
Figure 5.1.6
800
700
600
RMSE (m²/s²) ↓
500
400
300
200
100
0
1 2 3 4 5 6 7 8 9 10
Lead time (days)
Figure 5.1.7
GNoME vs. Materials Project: stable crystal count GNoME vs. Materials Project: distinct prototypes
Source: Merchant et al., 2023 | Chart: 2024 AI Index report Source: Merchant et al., 2023 | Chart: 2024 AI Index report
1,000,000
GNoME Material Project GNoME Material Project
20,000
100,000
Stable crystal count
Distinct prototypes
10,000 10,000
1,000
2 3 4 5 6 0
2 3 4 5 6
Unique elements Unique elements
Figure 5.1.9 Figure 5.1.10
0.80 0.80
Precision (median) ↑
Recall (median) ↑
0.60 0.60
0.40 0.40
0.20 0.20
0.00 0.00
1 (N=3,649) 2 (N=3,675) 5 (N=3,416) 10 (N=3,087) 1 (N=3,682) 2 (N=3,691) 5 (N=3,597) 10 (N=3,321)
Return period
Figure 5.1.11
1 An ungauged basin is a watershed for which there is insufficient streamflow data to model hydrological flows.
2 A return period (recurrence interval) measures the likelihood of a particular hydrological event recurring within a specific period. For example, a 100-year flood means there is a 1% chance of
the event being equaled or exceeded in any given year.
AI models are becoming increasingly valuable in healthcare, with applications for detecting polyps to aiding clinicians
in making diagnoses. As AI performance continues to improve, monitoring its impact on medical practice becomes
increasingly important. This section highlights significant AI-related medical systems introduced in 2023, the current
state of clinical AI knowledge, and the development of new AI diagnostic tools and models aimed at enhancing
hospital administration.
5.2 AI in Medicine
Notable Medical Systems SynthSR generations
Source: Iglesias et al., 2023
SynthSR
Transforming brain scans for advanced analysis
SynthSR is an AI tool that converts clinical brain
scans into high-resolution T-1 weighted images
(Figure 5.2.1). This advancement addresses the issue
of scan quality variability, which previously limited
the use of many scans in advanced research. By
transforming these scans into T1-weighted images,
known for their high contrast and clear brain
Figure 5.2.1
structure depiction, SynthSR facilitates the creation
of detailed 3D brain renderings. Experiments using
SynthSR demonstrate robust correlations between
observed volumes at both scan and subject levels,
suggesting that SynthSR generates images closely
resembling those produced by high-resolution T1
scans. Figure 5.2.2 illustrates the extent to which
SynthSR scans correspond with ground-truth
observations across selected brain regions. SynthID
significantly improves the visualization and analysis
of brain structures, facilitating neuroscientific
research and clinical diagnostics.
Subject level
0.91 0.93 0.91 0.99 0.89 0.90
(n=41)
Scan and subject level
Scan level
0.79 0.83 0.77 0.99 0.76 0.60
(n=435)
W Co Su Ve Hip Am
hit rt b co n tric po yg
em ica rt ca d
att lg ica les m ala
er ray lg pu
ma ray s
tte ma
r tte
r
Brain region
Figure 5.2.2
Coupled Plasmonic Infrared Sensors ImmunoSEIRA detection principle and the setup
Source: Kavungal et al., 2023
Coupled plasmonic infrared sensors for the
detection of neurodegenerative diseases
Diagnosis of neurodegenerative diseases such as
Parkinson’s and Alzheimer’s depends on fast and
precise identification of biomarkers. Traditional
methods, such as mass spectrometry and ELISA, are
useful in that they can focus on quantifying protein
levels; however, they cannot discern changes in
structural states. This year, researchers uncovered a
new method for neurodegenerative disease diagnosis
that combined AI-coupled plasmonic infrared sensors
that use Surface-Enhanced Infrared Absorption
(SEIRA) spectroscopy with an immunoassay
technique (ImmunoSEIRA; Figure 5.2.3). In tests that
compared actual fibril percentages with predictions Figure 5.2.3
Deep neural network predicted vs. actual brils percentages in test samples
Source: Kavungal et al., 2023 | Chart: 2024 AI Index report
100%
80%
Predicted brils concentration (%)
60%
40%
20%
0%
EVEscape
Forecasting viral evolution for pandemic of viruses (Figure 5.2.5). EVEscape evaluates
preparedness viral escape independently of current strain data
Predicting viral mutations is vital for vaccine design predicting 50.0% of observed SARS-CoV-2 mutations,
and pandemic minimization. Traditional methods, outperforming traditional lab studies which predicted
which rely on real-time virus strain and antibody data, 46.2% and 32.3%, as well as a previous model, which
face challenges during early pandemic stages due predicted only 24% of mutations (Figure 5.2.6).
to data scarcity. EVEscape is a new AI deep learning This performance highlights EVEscape’s potential
model trained on historical sequences and biophysical as a valuable asset for enhancing future pandemic
and structural information that predicts the evolution preparedness and response efforts.
EVEscape design
Source: Thadani et al., 2023
Figure 5.2.5
10 months
Pandemic start
17 months
50% 50%, EVEscape (prepandemic)
40%
Predicted mutations (%)
20%
10%
0%
AlphaMissense predictions
Source: Google DeepMind, 2023 | Chart: 2024 AI Index report
Human Pangenome Reference Graph genome for the MHC region of the genome
Source: Google Research, 2023
Using AI to map the human genome
The human genome is a set of molecular instructions
for a human. The first human genome draft was
released in 2000 and updated in 2022. However,
the update was somewhat incomplete. It did not
incorporate various genetic mutations, like blood
type, and did not as completely map diverse ancestry
groups. Therefore, under the existing genome
reference, it would be difficult to detect diseases or
find cures in certain groups of people.
of protein-coding genes, 99.42% of protein-coding This latest version of the genome represents the most
transcripts, 98.16% of noncoding genes, and 98.96% comprehensive and genetically diverse mapping of the
of noncoding transcripts, as detailed in Figure 5.2.10. human genome to date.
80%
% of identi ed (median)
60%
40%
20%
0%
Protein-coding genes Protein-coding transcripts Noncoding genes Noncoding transcripts
Genes and transcripts Figure 5.2.10
Clinical Knowledge
Evaluating the clinical knowledge of AI models remarkable improvement, with the leading system,
involves determining the extent of their medical GPT-4 Medprompt, achieving an accuracy rate of
expertise, particularly knowledge applicable in a 90.2%—an increase of 22.6 percentage points from
clinical setting. the top score in 2022 (Figure 5.2.11). Since MedQA’s
inception, AI capabilities on this benchmark have
MedQA nearly tripled, showcasing the rapid improvements of
Introduced in 2020, MedQA is a comprehensive clinically knowledgeable AI systems.
dataset derived from professional medical board
exams, featuring over 60,000 clinical questions
designed to challenge doctors.
90% 90.20%
80%
70%
Accuracy (%)
60%
50%
40%
Figure 5.2.11
Highlighted Research:
GPT-4 Medprompt
Although LLMs exhibit impressive Moreover, as noted earlier, GPT-4 Medprompt was the first to
general knowledge, it is commonly surpass the 90% accuracy mark on the MedQA benchmark.
assumed that significant fine-tuning This breakthrough not only underscores GPT-4 Medprompt’s
is required for them to excel at exceptional and potentially clinically useful medical
specialized knowledge, such as capabilities but also demonstrates that fine-tuning may not
answering medical questions. Fine- always be necessary for adapting models to specialized
tuning entails training an LLM on domains. Prompt engineering has shown to be a promising
domain-specific data. alternative strategy.
Highlighted Research:
100%
94.25%
89.88% 90.20%
86.50% 87.37%
81.80% 81.40% 82.00%
78.02% 79.00% 79.10%
80%
75.20%
72.30% 72.40%
67.60%
60% 57.60%
Accuracy (%)
40%
20%
0%
Flan-PaLM 540B Med-PaLM 2 GPT-4 GPT-4 Medprompt
2022 2023
Figure 5.2.13
Highlighted Research:
MediTron-70B
GPT-4 Medprompt is an impressive system; PaLM 2 (both closed models), it represents
however, it is closed-source, meaning its weights a significant improvement over the state-of-
are not freely available to the broader public for the-art results from 2023 and surpasses other
use. New research in 2023 has also sought to open-source models like Llama 2 (Figure 5.2.14).
advance the capabilities of open-source medical MediTron-70B’s score on MedQA is the highest
LLMs. Among this new research, MediTron-70B yet achieved by an open-source model. If medical
stands out as particularly promising. This model AI is to reach its fullest potential, it is important
achieves a respectable 70.2% accuracy on the that its capabilities are widely accessible. In this
MedQA benchmark. Although this is below the context, MediTron represents an encouraging
performance of GPT-4 Medprompt and Med- step forward.
Figure 5.2.14
Diagnosis
AI tools can also be used for diagnostic purposes including, for example, in radiology or cancer detection.
Highlighted Research:
CoDoC
AI medical imaging systems demonstrate robust (the ability to accurately identify those without it).
diagnostic capabilities, yet there are instances Specifically, across four medical datasets, CoDoC’s
where they overlook diagnoses that clinicians sensitivity surpasses clinicians’ by an average of
catch, and vice versa. This observation suggests 4.5 percentage points and a standalone AI model’s
a logical integration of AI systems and clinicians’ by 6.5 percentage points (Figure 5.2.15). In terms
diagnostic abilities. In 2023, researchers unveiled of specificity, CoDoC outperforms clinicians by
CoDoC (Complementarity-Driven Deferral to an average of 2.7 percentage points across tested
Clinical Workflow), a system designed to discern datasets and a standalone predictive model by 5.7
when to rely on AI for diagnosis and when to defer percentage points. Moreover, CoDoC has been
to traditional clinical methods. CoDoC notably shown to reduce clinical workflow by 66%. These
enhances both sensitivity (the ability to correctly findings suggest that AI medical systems can be
identify individuals with a disease) and specificity integrated into clinical workflows, thereby enhancing
diagnostic accuracy and efficiency.
CoDoC vs. standalone predictive AI system and clinical readers: sensitivity
Source: Dvijotham et al., 2023 | Chart: 2024 AI Index report
80%
72.60%
64.90%
62.70%
Sensitivity (%)
60% 56.90%
50.00%48.30%
40%
20%
0%
UK mammography dataset US mammography dataset 1 US mammography dataset 2 TB dataset
Breast cancer detection TB detection
Highlighted Research:
CT Panda
PANDA
Pancreatic ductal adenocarcinoma (PDAC) is a particularly
detection
lethal cancer, often detected too late for surgical intervention. Source:
Cao et al., 2023
Screening for PDAC in asymptomatic individuals is Figure 5.2.16
challenging due to its low prevalence and the risk of false
positives. This year, a Chinese research team developed
PANDA (pancreatic cancer detection with artificial
intelligence), an AI model capable of efficiently detecting
and classifying pancreatic lesions in X-rays (Figure 5.2.16). In
validation tests, PANDA surpassed the average radiologist in
sensitivity by 34.1% and in specificity by 6.3% (Figure 5.2.17).
In a large-scale, real-world test involving approximately
20,000 patients, PANDA achieved a sensitivity of 92.9% and
a specificity of 99.9% (Figure 5.2.18). AI medical tools like
PANDA represent significant advancements in diagnosing
challenging conditions, offering cost-effective and accurate
detection previously considered difficult or prohibitive.
PANDA vs. mean radiologist on multicenter validation PANDA performance on real-world multi-scenario
(6,239 patients) validation (20,530 patients)
Source: Cao et al., 2023 | Chart: 2024 AI Index report Source: Cao et al., 2023 | Chart: 2024 AI Index report
99.90%
100%
35% 34.10% 92.90%
30% 80%
Score for lesion detection
Performance di�erence
25%
60%
20%
15% 40%
10%
6.30% 20%
5%
0% 0%
Sensitivity Speci�city Sensitivity Speci city
Schopf et al., 2023 Breast cancer The authors conducted a meta-review of the literature exploring mammography-image-based
AI algorithms. They discovered that predicting future breast cancer risk using only
mammography images achieves accuracy that is comparable to or better than traditional risk
assessment tools.
Dicente Cid et al., 2023 X-ray interpretation The researchers developed two open-source neural networks, X-Raydar and X-Raydar-NLP,
for classifying chest X-rays using images and free-text reports. They found that these
automated classi cation methods perform at levels comparable to human experts and
demonstrate robustness when applied to external data sets.
Figure 5.2.19
124
120
107
Number of AI medical devices
100
80 77
63
60
40
26
20 18
6 5
3 3
0
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 5.2.20
3 The FDA last updated the list in October 2023, meaning that the totals for 2023 were incomplete. Consequently, the AI Index limited its data presentation to include only information
up to 2022.
Figure 5.2.21 illustrates the specialties associated with FDA-approved medical devices. Of the 139 devices
approved in 2022, a significant majority, 87.1%, were related to radiology. The next most common specialty was
cardiovascular, accounting for 7.2% of the approvals.
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Cardiovascular 0 0 0 1 4 6 9 12 7 11 10
Neurology 0 0 1 0 1 1 4 4 0 2 2
Microbiology 0 2 0 0 0 0 0 2 1 0 0
Medical specialty
General hospital 0 0 0 1 0 0 0 2 0 0 0
Ophthalmic 0 0 0 1 0 0 2 1 1 1 2
Clinical chemistry 0 0 0 1 1 0 2 1 0 0 1
Anesthesiology 0 0 0 1 0 1 0 0 1 0 0
Pathology 1 0 0 0 0 0 0 0 0 1 0
Dental 0 0 0 0 0 0 1 0 0 0 0
Orthopedic 0 0 0 0 0 0 1 0 0 0 0
Obstetrics and gynecology 0 0 0 0 0 0 0 1 0 0 0
Figure 5.2.21
Highlighted Research:
MedAlign
Despite significant advances benchmark with 983 questions and instructions and 303 clinician
in AI for healthcare, existing responses, drawn from seven different medical specialties (Figure
benchmarks like MedQA and 5.2.22). MedAlign is the first extensive EHR-focused benchmark.
USMLE, focused on knowledge-
The researchers then tested various existing LLMs on MedAlign. Of
based questions, do not fully
all LLMs, a GPT-4 variant using multistep refinement achieved the
capture the diverse tasks
highest correctness rate (65.0%) and was routinely preferred over
clinicians perform in patient
other LLMs (Figure 5.2.23). MedAlign is a valuable milestone toward
care. Clinicians often engage
using AI to alleviate administrative burdens in healthcare.
in information-intensive tasks,
such as creating tailored
MedAlign workflow
diagnostic plans, and spend a Source: Fleming et al., 2023
Highlighted Research:
MedAlign (cont’d)
Evaluation of model performance: human vs. COMET ranks
Source: Fleming et al., 2023 | Chart: 2024 AI Index report
GPT-4 (32k + MR) 48% 56% 73% 71% 82% GPT-4 (32k + MR) 50% 52% 66% 63% 79%
GPT-4 (32k) 52% 58% 72% 74% 81% GPT-4 (32k) 50% 51% 63% 58% 77%
Model A (winner)
Model A (winner)
GPT-4 (2k) 44% 42% 67% 70% 76% GPT-4 (2k) 48% 49% 66% 61% 79%
Vicuña-13B (2k) 27% 28% 33% 50% 63% Vicuña-13B (2k) 34% 37% 34% 49% 70%
Vicuña-7B (2k) 29% 26% 30% 50% 64% Vicuña-7B (2k) 37% 42% 39% 51% 71%
MPT-7B-Instruct (2k) 18% 19% 24% 37% 36% MPT-7B-Instruct (2k) 21% 23% 21% 30% 29%
GPT-4 (32k + MR)
GPT-4 (32k)
GPT-4 (2k)
Vicuña-13B (2k)
Vicuña-7B (2k)
MPT-7B-Instruct (2k)
GPT-4 (32k)
GPT-4 (2k)
Vicuña-13B (2k)
Vicuña-7B (2k)
MPT-7B-Instruct (2k)
Model B (loser) Model B (loser)
Figure 5.2.23
Preview
Overview 327
Chapter Highlights 328
Overview
This chapter examines trends in AI and computer science (CS) education, focusing on who is
learning, where they are learning, and how these trends have evolved over time. Amid growing
concerns about AI’s impact on education, it also investigates the use of new AI tools like
ChatGPT by teachers and students.
The analysis begins with an overview of the state of postsecondary CS and AI education in the
United States and Canada, based on the Computing Research Association’s annual Taulbee
Survey. It then reviews data from Informatics Europe regarding CS education in Europe. This
year introduces a new section with data from Studyportals on the global count of AI-related
English-language study programs.
The chapter wraps up with insights into K–12 CS education in the United States from Code.org
and findings from the Walton Foundation survey on ChatGPT’s use in schools.
Chapter Highlights
1. The number of American and Canadian CS bachelor’s graduates continues to rise, new
CS master’s graduates stay relatively flat, and PhD graduates modestly grow. While the
number of new American and Canadian bachelor’s graduates has consistently risen for more than a decade, the
number of students opting for graduate education in CS has flattened. Since 2018, the number of CS master’s and
PhD graduates has slightly declined.
3. Less transition of academic talent from industry to academia. In 2019, 13% of new AI faculty in the
United States and Canada were from industry. By 2021, this figure had declined to 11%, and in 2022, it further dropped
to 7%. This trend indicates a progressively lower migration of high-level AI talent from industry into academia.
4. CS education in the United States and Canada becomes less international. Proportionally
fewer international CS bachelor’s, master’s, and PhDs graduated in 2022 than in 2021. The drop in international
students in the master’s category was especially pronounced.
5. More American high school students take CS courses, but access problems remain.
In 2022, 201,000 AP CS exams were administered. Since 2007, the number of students taking these exams has
increased more than tenfold. However, recent evidence indicates that students in larger high schools and those in
suburban areas are more likely to have access to CS courses.
6. AI-related degree programs are on the rise internationally. The number of English-language,
AI-related postsecondary degree programs has tripled since 2017, showing a steady annual increase over the past
five years. Universities worldwide are offering more AI-focused degree programs.
7. The United Kingdom and Germany lead in European informatics, CS, CE, and IT graduate
production. The United Kingdom and Germany lead Europe in producing the highest number of new informatics,
CS, CE, and information bachelor’s, master’s, and PhD graduates. On a per capita basis, Finland leads in the
production of both bachelor’s and PhD graduates, while Ireland leads in the production of master’s graduates.
This section provides an overview of postsecondary education in CS and AI, highlighting graduation statistics across
North America and Europe for various degrees including bachelor’s, master’s, and PhDs. It also covers information on
AI-related courses offered in English.
30,000 28,527
Number of new CS bachelor’s graduates
26,709
25,000
22,343
20,000 18,954
15,256
15,000
12,228
11,049 10,776
10,000 9,008 9,286
5,000
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 6.1.1
1 It is important to note that not all PhD-granting departments targeted in the survey provided responses. Out of the 297 departments targeted, only 182 responded, yielding an overall
response rate of 61%.
For the first time in almost eight years, the proportion the data. The decline is also partially attributable to
of international students among CS bachelor’s international travel restrictions that were imposed
graduates in American and Canadian universities during the COVID-19 pandemic, affecting the ability
declined, falling from 16.3% in 2021 to 15.2% in of international students to study in the United States
2022 (Figure 6.1.2). This decline likely reflects the and Canada. Despite this recent drop, the overall
increased difficulty of obtaining study visas during trend over the last decade shows a steady increase in
the early years of the Trump administration, an the proportion of international students.
impact that is only now beginning to manifest in
New international CS bachelor’s graduates (% of total) in the United States and Canada, 2010–22
Source: CRA Taulbee Survey, 2023 | Chart: 2024 AI Index report
16%
15.20%
New international CS bachelor’s graduates (% of total)
12%
8%
4%
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 6.1.2
CS Master’s Graduates
AI courses are commonly included in CS master’s the number appears to have leveled out since 2018 and
degree programs. While the total number of new CS slightly decreased, by 2.5%, last year (Figure 6.1.3). This
master’s graduates from American and Canadian leveling is a reflection of the decline in international
universities more than doubled over the past decade, master’s students shown in the following graph.
14,000
13,037
Number of new CS master’s graduates
12,000
11,239
9,933
10,000
6,000
4,000
2,000
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 6.1.3
In 2022, American and Canadian universities experienced a notable decrease in international CS master’s students.
This downward trend began around 2017, but the decline was most pronounced last year, at 14.8 percentage points
(Figure 6.1.4). Currently, the split between international and domestic CS master’s graduates is roughly even.
New international CS master’s graduates (% of total) in the United States and Canada, 2010–22
Source: CRA Taulbee Survey, 2023 | Chart: 2024 AI Index report
80%
New international CS master’s graduates (% of total)
60%
50.40%
40%
20%
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 6.1.4
CS PhD Graduates
For the first time in a decade, there has been a significant increase in the number of new CS PhD graduates at
American and Canadian universities. In 2022, the number of CS PhD graduates reached 2,105, the highest since
2010 (Figure 6.1.5).
1,500
1,000
500
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 6.1.5
While the proportion of international students among CS PhD graduates has risen over the past decade, there was
a slight decrease in this proportion in the last year, dropping from 68.6% in 2021 to 65.9% in 2022 (Figure 6.1.6).
New international CS PhD graduates (% of total) in the United States and Canada, 2010–22
Source: CRA Taulbee Survey, 2023 | Chart: 2024 AI Index report
70%
65.90%
60%
New international CS PhD graduates (% of total)
50%
40%
30%
20%
10%
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 6.1.6
Where do newly minted AI PhDs choose to work However, by 2022, a significantly larger proportion
after graduating? Following a trend highlighted in last (70.7%) joined industry after graduation compared
year’s AI Index report, a growing share of AI doctoral to those entering academia (20.0%). The percentage
recipients are pursuing careers in industry (Figure 6.1.7 of new AI PhDs going into government roles has
and Figure 6.1.8). In 2011, around the same percentage remained relatively low and steady at around 0.7%
took jobs in industry (40.9%) as in academia (41.6%). over the past five years.
Employment of new AI PhDs (% of total) in the Employment of new AI PhDs in the United States and
United States and Canada by sector, 2010–22 Canada by sector, 2010–22
Source: CRA Taulbee Survey, 2023 | Chart: 2024 AI Index report Source: CRA Taulbee Survey, 2023 | Chart: 2024 AI Index report
Academia 362
70% 70.71%, Industry
350
Government
Industry 79
60%
300
New AI PhD graduates (% of total)
281
50% 249
250 238 84
219
40% 201
65
200 73
178 61
30% 154 154
63
150 60
134 132 136 280
47 123
20% 19.95%, Academia 72 43
63 51
100 42 195
180
162
10% 153
134
116
50 101
85
76 74 77
0% 0.76%, Government 64
0
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
Figure 6.1.72 Figure 6.1.8
2 The sums in Figure 6.1.7 do not add up to 100, as there is a subset of new AI PhDs each year who become self-employed, unemployed, or report an “other” employment status in the CRA
survey. These students are not included in the chart.
Number of CS, CE, and information faculty in the United States and Canada, 2011–22
Source: CRA Taulbee Survey, 2023 | Chart: 2024 AI Index report
10,000
Tenure Track Teaching Professors Other Instructors Research Postdoc
9,000 8,738
531
8,149
7,858 7,976
8,000 7,657 522
Number of CS, CE, and information faculty
530
7,362 668 919
653
7,000 6,806 6,887 691 426
6,629 465 736 861
6,478 589
6,314 649 432 617
6,138 689 494 1,252
6,000 602 766 390
432 1,150
656 895 1,183
529 1,180 831
515 676 1,122
447 1,014
5,000 661 863
669 487
4,000
3,000 5,733
5,059 5,214 5,252 5,231 5,310
4,536 4,549 4,548 4,711 4,786
4,366
2,000
1,000
0
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 6.1.9
In 2022, the United States had 7,084 CS faculty members, with the majority (65.7%) on the tenure track (Figure
6.1.10). The total number of American CS faculty has risen 4.4% since 2021 and 45.0% since 2011.
8,000
Tenure Track Teaching Professors Other Instructors Research Postdoc
7,084
7,000 6,789
6,654 453
6,430 6,533 428
424
6,098 531
518
6,000 5,729 736
5,637 567 382 693
426 618
5,256 491
5,202 535 408 436
534
5,068 947
Number of CS faculty
3,000
4,482 4,657
4,366 4,384 4,390
2,000 3,971 4,176
3,725 3,880
3,455 3,564 3,559
1,000
0
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 6.1.10
Last year, 915 new faculty were hired across CS, CE, and information disciplines in North America, a decade high.
455 of these positions were tenure track. (Figure 6.1.11).
New CS, CE, and information faculty hires in the United States and Canada, 2011–22
Source: CRA Taulbee Survey, 2023 | Chart: 2024 AI Index report
800
Number of new CS, CE, and information faculty hires
800 765
749 749
733
710
691
455
422
396 406
400 374
348 358
320 324
294
249 258
218
200
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 6.1.11
In 2022, 43% of new faculty appointments came from other academic positions, indicating a “churn” within
the academic workforce (Figure 6.1.12). Since these “new” faculty members vacated positions elsewhere, their
previous roles will eventually need to be filled. Additionally, the proportion of faculty transitioning from industry
in 2022 fell to 7% from 11% in the previous year and 13% in 2019.
Source of new faculty in American and Canadian CS, CE, and information departments, 2018–22
Source: CRA Taulbee Survey, 2023 | Chart: 2024 AI Index report
100%
7%
13% 13% 11% 11%
80%
Source of new faculty
16% 15%
15%
16%
40%
17%
0%
2018 2019 2020 2021 2022
Figure 6.1.12
The reasons for faculty positions remaining unfilled were turned down. This trend appears to reflect an
have varied over the past decade. In 2011, 37% of failed increasingly competitive market for new CS faculty.
searches were due to no offer being made, while 34% However, it remains unclear whether this indicates
were because the offer made was declined (Figure heightened competition with other academic positions
6.1.13). In contrast, in 2022, only 15% ended with no or with industry positions.
offer being made, while 55% involved offers that
Reason why new CS, CE, and information faculty positions remained un lled (% of total), 2011–22
Source: CRA Taulbee Survey, 2023 | Chart: 2024 AI Index report
Didn’t �nd a person who met our hiring goals Technically vacant, not �lled for admin reasons Other
O�ers turned down Hiring in progress
Reason faculty positions remained un�lled (% of total)
100%
10% 18%
10% 23% 23%
25% 26% 22%
31% 28% 27%
80% 17%
18%
5% 6%
10%
55%
6%
60% 45% 12%
34%
40%
36% 55%
40% 52% 51% 56% 53%
43%
44%
Figure 6.1.13
In 2022, North American departments in CS, CE, and information disciplines experienced a significant increase
in faculty departures, totaling 405, compared to 303 in 2021 (Figure 6.1.14). Of these losses, 38.5% left for other
academic positions, while 16.3% moved to nonacademic roles, maintaining a trend consistent with previous years.
Faculty losses in American and Canadian CS, CE, and information departments, 2011–22
Source: CRA Taulbee Survey, 2023 | Chart: 2024 AI Index report
Died Took academic position elsewhere Remained, but changed to part-time Unknown
450
Retired Took nonacademic position Other
405
400
350
327
312 66
303 303
300
270
Faculty losses
23 43 37
250 246
232 237 234 34 46
221 33
213
42
200 156
36 24
44 26 139
27 32 110
126
150 113
34 77 89
62 85
74 86
100 52
112
94 90 103 100
50 89 94 91
67 74 65 80
0
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 6.1.14
Since 2015, the increase in median nine-month salaries for full professors has slightly fallen below U.S. inflation
rates, whereas median salaries for assistant and associate professors have seen slight increases above inflation.
In 2022, a full professor’s salary was 3.2% higher than in 2021, which did not keep pace with the 7% U.S. inflation
rate, and 16.4% higher than in 2015, still below the 19% inflation increase over those years (Figure 6.1.15).
158.97 159.96
160 156.02
140 134.08
127.47
121.55 123.71
117.5 119.48 119.03
120 113.95 114.07
111.67 109.23
105.45 107.55
101.16 103.01
99.12
100
80
60
40
20
0
2015 2016 2017 2018 2019 2020 2021 2022
Figure 6.1.15
In 2022, the proportion of international hires among new tenure-track faculty in CS, CE, and information
disciplines significantly increased to 19.3% from 13.2% the previous year (Figure 6.1.16). This marked the second-
highest percentage recorded in the past decade, only surpassed by 2013.
New international CS, CE, and information tenure-track faculty hires (% of total) in the United States and
Canada, 2010–22
Source: CRA Taulbee Survey, 2023 | Chart: 2024 AI Index report
25%
New international tenure-track faculty hires (% of total)
20%
19.30%
15%
10%
5%
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 6.1.16
New informatics, CS, CE, and IT bachelor’s graduates by country in Europe, 2022
Source: Informatics Europe, 2023 | Chart: 2024 AI Index report
Germany 19,920
Turkey 13,826
Poland 9,241
Italy 8,508
Netherlands 6,877
Romania 6,740
Spain 6,650
Portugal 3,696
Finland 2,970
Norway 2,340
Ireland 1,995
Austria 1,900
Switzerland 1,789
0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 18,000 20,000 22,000 24,000
Number of new informatics, CS, CE, and IT bachelor’s graduates
Figure 6.1.17
3 There is no singular term for CS education that is used uniformly across European countries. Across Europe, CS education can be reflected in terms such as informatics, computer science
(CS), computer engineering (CE), computing, information technology (IT), information and communication technology (ICT), and information science and technology (IST). The full list of
subject names (and English translations) that Informatics Europe uses to identify informatics studies programs can be found at the following link.
4 Readers are cautioned against making per capita comparisons between the CRA North American data and the European CS graduate data detailed in subsequent sections, as the European
data is collected from national statistical offices and boasts broader coverage.
5 Note that not all countries for which the AI Index has data are visualized in the figures in this section. To access the complete data, please view the public data associated with this chapter.
Moreover, the year label refers to the year in which an academic year ends. For example, the figures visualizing new graduates for 2022 reflect the number of graduates reported for the
2021/2022 academic year. For the sake of visual simplicity, the Index opts to focus on the year in which students graduated.
Percentage change of new informatics, CS, CE, and IT bachelor’s graduates by country in Europe,
2012 vs. 2022
Source: Informatics Europe, 2023 | Chart: 2024 AI Index report
Norway 153%
Turkey 118%
Switzerland 117%
Netherlands 88%
Portugal 72%
Romania 70%
Ireland 66%
Austria 59%
Italy 48%
Germany 28%
Finland 23%
Spain -14%
Poland -15%
Finland (53.4), Norway (42.6), and the Netherlands (38.6) lead in the number of new bachelor’s graduates in
informatics CS, CE, and IT per 100,000 inhabitants (Figure 6.1.19). On a per capita basis, most sampled European
countries have seen increases in the total number of informatics, CS, CE, and IT bachelor’s graduates (Figure 6.1.20).
New informatics, CS, CE, and IT bachelor’s graduates per 100,000 inhabitants by country in Europe, 2022
Source: Informatics Europe, 2023 | Chart: 2024 AI Index report
Finland 53.38
Norway 42.63
Netherlands 38.61
Ireland 38.41
Romania 35.38
Portugal 35.31
Estonia 34.78
Lithuania 33.42
Poland 25.14
Bulgaria 25.14
Germany 23.61
Latvia 22.73
Austria 20.87
0 10 20 30 40 50
New informatics, CS, CE, and IT bachelor’s graduates (per 100,000 inhabitants)
Figure 6.1.19
Percentage change of new CS, CE, and Information bachelor’s graduates per 100,000 inhabitants by country
in Europe, 2012 vs. 2022
Source: Informatics Europe, 2023 | Chart: 2024 AI Index report
Norway 133%
Romania 78%
Netherlands 77%
Portugal 72%
Bulgaria 67%
Ireland 48%
Austria 47%
Estonia 25%
Germany 22%
Finland 20%
Poland -12%
Latvia -33%
Figure 6.1.20
Informatics, CS, CE, and IT Master’s Graduates graduates (Figure 6.1.21). In the last decade, Germany
Similar to bachelor’s graduates, the United Kingdom (259%), Turkey (197%), and Spain (194%) have seen the
leads Europe in producing new master’s graduates in greatest percentage growth in new informatics, CS, CE,
informatics, CS, CE, and IT, with approximately 20,000 and IT master’s graduates (Figure 6.1.22).
New informatics, CS, CE, and IT master’s graduates by country in Europe, 2022
Source: Informatics Europe, 2023 | Chart: 2024 AI Index report
Germany 11,092
Poland 4,271
Italy 3,883
Spain 3,214
Turkey 2,677
Romania 2,200
Netherlands 1,953
Ireland 1,620
Finland 1,458
Austria 1,327
Switzerland 1,202
Portugal 1,109
Norway 770
0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 18,000 20,000
Number of new informatics, CS, CE, and IT master’s graduates
Figure 6.1.21
Percentage change of new informatics, CS, CE, and IT master’s graduates by country in Europe,
2012 vs. 2022
Source: Informatics Europe, 2023 | Chart: 2024 AI Index report
Germany 259%
Turkey 197%
Spain 194%
Norway 191%
Switzerland 148%
Netherlands 147%
Ireland 140%
Portugal 61%
Finland 60%
Austria 54%
Italy 46%
Poland 8%
−30% 0% 30% 60% 90% 120% 150% 180% 210% 240% 270%
% change of new informatics, CS, CE, and IT master’s graduates
Figure 6.1.22
Per capita metrics paint a somewhat similar picture. Ireland has the most informatics, CS, CE, and IT master’s
graduates on a per capita basis (31.2), followed by the United Kingdom (29.8) and Estonia (27.4) (Figure 6.1.23).
On a per capita basis, Germany (243%) has also seen the greatest growth of informatics, CS, CE, and IT master’s
graduates in the last decade (Figure 6.1.24).
New informatics, CS, CE, and IT master’s graduates per 100,000 inhabitants by country in Europe, 2022
Source: Informatics Europe, 2023 | Chart: 2024 AI Index report
Ireland 31.19
Estonia 27.38
Finland 26.20
Austria 14.57
Norway 14.03
Switzerland 13.64
Germany 13.15
Denmark 12.66
Poland 11.62
Romania 11.55
Netherlands 10.96
Bulgaria 10.70
Portugal 10.59
0 5 10 15 20 25 30
New informatics, CS, CE, and IT master’s graduates (per 100,000 inhabitants)
Figure 6.1.23
Percentage change of new informatics, CS, CE, and IT master’s graduates per 100,000 inhabitants
by country in Europe, 2012 vs. 2022
Source: Informatics Europe, 2023 | Chart: 2024 AI Index report
Germany 243%
Estonia 189%
Norway 167%
Netherlands 132%
Switzerland 127%
Ireland 113%
Bulgaria 80%
Portugal 61%
Finland 56%
Denmark 56%
Austria 43%
Poland 11%
Informatics, CS, CE, and IT PhD Graduates 6.1.25). In the last decade, Turkey has seen the greatest
The United Kingdom (1,060) and Germany (910) also growth in new CS, CE, and information PhD graduates
produced the most informatics, CS, CE, and IT PhD (Figure 6.1.26).
graduates in 2022, followed by Italy (581) (Figure
New informatics, CS, CE, and IT PhD graduates by country in Europe, 2022
Source: Informatics Europe, 2023 | Chart: 2024 AI Index report
Germany 910
Italy 581
Spain 479
Turkey 298
Netherlands 120
Switzerland 116
Finland 114
Austria 96
Portugal 72
Ireland 65
Romania 47
Estonia 26
Bulgaria 24
0 100 200 300 400 500 600 700 800 900 1,000 1,100
Number of new informatics, CS, CE, and IT PhD graduates
Figure 6.1.25
Percentage change of new informatics, CS, CE, and IT PhD graduates by country in Europe, 2012 vs. 2022
Source: Informatics Europe, 2023 | Chart: 2024 AI Index report
Turkey 173%
Estonia 86%
Bulgaria 60%
Spain 14%
Italy 12%
Switzerland 12%
Portugal 3%
Germany 1%
Ireland -4%
Finland -18%
Austria -31%
−40% −20% 0% 20% 40% 60% 80% 100% 120% 140% 160% 180%
% change of new informatics, CS, CE, and IT PhD graduates
Figure 6.1.26
Finland has the greatest number of new informatics, capita basis, the growth rate of new informatics, CS,
CS, CE, and IT PhD graduates per capita. For every CE, and IT PhDs has been relatively marginal in several
100,000 inhabitants, it has 2.1 informatics, CS, CE, major European countries such as the United Kingdom,
and IT PhD graduates (Figure 6.1.27). Estonia slightly Portugal, and Switzerland (Figure 6.1.28).
trails (1.9), as does the United Kingdom (1.6). On a per
New informatics, CS, CE, and IT PhD graduates per 100,000 inhabitants by country in Europe, 2022
Source: Informatics Europe, 2023 | Chart: 2024 AI Index report
Finland 2.05
Estonia 1.90
Switzerland 1.32
Ireland 1.25
Germany 1.08
Austria 1.05
Spain 1.00
Italy 0.99
Portugal 0.69
Netherlands 0.67
Latvia 0.64
Bulgaria 0.37
Turkey 0.35
Percentage change of new informatics, CS, CE, and IT PhD graduates per 100,000 inhabitants by country
in Europe, 2012 vs. 2022
Source: Informatics Europe, 2023 | Chart: 2024 AI Index report
Turkey 142%
Bulgaria 81%
Estonia 79%
Italy 14%
Spain 10%
Portugal 3%
Switzerland 2%
Germany -4%
Latvia -8%
Ireland -15%
Finland -20%
Austria -36%
3.00
Number of AI university study programs in English (in thousands)
2.52
2.50
2.32
2.16
2.00
1.64
1.50 1.45
1.17
1.00 0.88
0.50
0.00
2017 2018 2019 2020 2021 2022 2023
Figure 6.1.29
6 Currently, Studyportals, the company supplying data on AI university study programs, tracks only English-language AI courses. In the coming years, the Index plans to extend its coverage to
include non-English programs.
Education Level
Broken down by educational level, the majority of AI study programs are offered at the master’s level (55.0%),
followed by the bachelor’s level (39.8%), and finally at the PhD level (5.3%) (Figure 6.1.30).
60%
54.97%
AI university study programs in English (% of total)
50%
39.77%
40%
30%
20%
10%
5.27%
0%
Bachelor’s Master’s PhD
Figure 6.1.30
Number of AI university study programs in English by geographic area, 2022 vs. 2023
Source: Studyportals, 2023 | Chart: 2024 AI Index report
Canada 89 (+2%)
87
Australia 84 (+1%)
83
Germany 80 (+54%)
52
France 69 (+15%)
60
Netherlands 59 (+7%)
55
India 45 (+2%)
44
Ireland 44 (+5%)
42
Spain 43 (+13%)
38
Italy 40 (+21%)
33
Malaysia 39 (+8%)
36
7 Although the United Kingdom has fewer universities overall compared to the United States, it likely reports a higher number of AI study programs for several reasons. Firstly, Studyportals
has slightly greater coverage of the United Kingdom than the United States in its data. Secondly, the structure of higher education in the United States tends to be more generalist compared
to the United Kingdom, meaning students studying AI might be enrolled in broader computer science programs that are not explicitly identified as AI study programs.
AI university study programs in English per 100,000 inhabitants by geographic area, 2022 vs. 2023
Source: Studyportals, 2023 | Chart: 2024 AI Index report
This section presents trends in high school CS education in the United States as a representation of K–12 AI education.
State-Level Trends AZ NM OK AR TN VA NC
In 2023, 30 American states
TX LA MS AL GA SC
required that all high schools
offer a foundational course in HI FL Yes
No
CS (Figure 6.2.1).
Figure 6.2.1
The percentage of public
Public high schools teaching foundational CS (% of total in state),
schools offering CS courses 2023
Source: Code.org, 2023 | Chart: 2024 AI Index report
varies significantly from state
AK ME
to state (Figure 6.2.2). The 51% 66%
top three states in terms of VT NH MA
76% 81% 83%
percentage of CS offerings
are Maryland (99%), WA MT ND SD MN WI MI NY CT RI
47% 34% 47% 44% 28% 56% 55% 58% 84% 86%
Arkansas (99%), and Nevada
OR ID WY NE IA IL IN OH PA NJ
(96%); the bottom three are 64% 38% 63% 50% 84% 54% 91% 62% 68% 89%
Minnesota (28%), Montana CA NV UT CO KS MO KY WV DC MD DE
45% 96% 77% 54% 36% 50% 79% 78% 45% 99% 40%
(34%), and Louisiana (35%).
AZ NM OK AR TN VA NC
36% 50% 62% 99% 64% 75% 71%
TX LA MS AL GA SC
54% 35% 78% 95% 71% 94%
HI FL
77% 40% Figure 6.2.2
K–12 CS education is expanding in the United States (Figure 6.2.3). In 2017, only a few states supported high
school CS programs. Now, approximately two-thirds of states require that CS be taught in high schools,
allocate funding for it, and have developed state plans for CS education.
25
Number of U.S. states
20
15
10
0
2017 2018 2019 2020 2021 2022 2023
Figure 6.2.3
179.19 181.04
158.56
150
130.90
99.87
100
54.38
50 46.34
37.33
29.55
24.78
19.39 19.83 20.96 19.39 21.14
0
2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 6.2.4
8 There are two types of AP CS exams: Computer Science A and Computer Science Principles. Data on computer science exams taken includes both exams. AP CS Principles was initially
offered in 2017.
AK ME
21.41 20.66
VT NH MA
17.62 37.60 92.66
WA MT ND SD MN WI MI NY CT RI
61.08 3.47 12.84 4.40 31.69 37.74 49.03 77.21 89.81 68.75
OR ID WY NE IA IL IN OH PA NJ
21.02 26.51 18.40 26.93 20.16 77.71 43.60 33.74 48.03 112.66
CA NV UT CO KS MO KY WV DC MD DE
85.20 77.92 18.69 52.68 11.75 22.63 40.61 18.38 60.06 126.48 60.42
AZ NM OK AR TN VA NC
25.54 21.06 15.65 49.24 33.30 70.77 64.32
TX LA MS AL GA SC
69.60 21.51 26.91 58.79 75.09 39.81
HI FL
50.02 73.04 Figure 6.2.6
Highlight:
90%
90%
Code.org data suggests that factors such
as school size and location significantly 80%
75%
influence CS education accessibility. 70%
Large schools (over 1,200 students)
60%
are 15 percentage points more likely to
offer CS courses than medium-sized % of schools 50%
41%
schools (500–1,200 students), with the 40%
70% 67.21%
60%
54.73% 54.62%
50%
% of schools
40%
30%
20%
10%
0%
Urban Suburban Rural
Figure 6.2.8
Highlight:
in the United States are already utilizing ChatGPT usage purposes among American K–12
teachers, 2023
ChatGPT, with usage increasing over Source: Impact Research, 2023 | Chart: 2024 AI Index report
9 To learn more about the surveys, including their methodologies, please visit the following links: March 2023 and July 2023.
Highlight:
Preview
Overview 368 7.4 AI Regulation 393
Chapter Highlights 369 U.S. Regulation 393
Overview 393
7.1 Overview of AI Policy in 2023 370 By Relevance 394
By Agency 395
7.2 AI and Policymaking 376 By Approach 396
Global Legislative Records on AI 376 By Subject Matter 397
Overview 376 EU Regulation 398
By Geographic Area 378 Overview 398
By Relevance 379 By Relevance 399
By Approach 380 By Agency 400
By Subject Matter 381 By Approach 401
U.S. Legislative Records 382 By Subject Matter 402
Federal Level 382
State Level 383 7.5 U.S. Public Investment in AI 403
AI Mentions 385 Federal Budget for AI R&D 403
Overview 385 U.S. Department of Defense Budget Requests 405
U.S. Committee Mentions 388 U.S. Government AI-Related Contract Spending 406
AI Contract Spending 406
7.3 National AI Strategies 391 Microelectronics and Semiconductor
By Geographic Area 391 Spending 409
Overview
AI’s increasing capabilities have captured policymakers’ attention. Over the past year,
several nations and political bodies, such as the United States and the European Union,
have enacted significant AI-related policies. The proliferation of these policies reflect
policymakers’ growing awareness of the need to regulate AI and improve their respective
countries’ ability to capitalize on its transformative potential.
This chapter begins examining global AI governance starting with a timeline of significant
AI policymaking events in 2023. It then analyzes global and U.S. AI legislative efforts,
studies AI legislative mentions, and explores how lawmakers across the globe perceive
and discuss AI. Next, the chapter profiles national AI strategies and regulatory efforts
in the United States and the European Union. Finally, it concludes with a study of public
investment in AI within the United States.
Chapter Highlights
1. The number of AI regulations in the United States sharply increases. The number of AI-related
regulations in the U.S. has risen significantly in the past year and over the last five years. In 2023, there were
25 AI-related regulations, up from just one in 2016. Last year alone, the total number of AI-related regulations
grew by 56.3%.
2. The United States and the European Union advance landmark AI policy action. In 2023,
policymakers on both sides of the Atlantic put forth substantial AI regulatory proposals. The European Union
reached a deal on the terms of the AI Act, a landmark piece of legislation enacted in 2024. Meanwhile, President
Biden signed an Executive Order on AI, the most notable AI policy initiative in the United States that year.
3. AI captures U.S. policymaker attention. The year 2023 witnessed a remarkable increase in AI-related
legislation at the federal level, with 181 bills proposed, more than double the 88 proposed in 2022.
4. Policymakers across the globe cannot stop talking about AI. Mentions of AI in legislative
proceedings across the globe have nearly doubled, rising from 1,247 in 2022 to 2,175 in 2023. AI was mentioned in
the legislative proceedings of 49 countries in 2023. Moreover, at least one country from every continent discussed
AI in 2023, underscoring the truly global reach of AI policy discourse.
5. More regulatory agencies turn their attention toward AI. The number of U.S. regulatory agencies
issuing AI regulations increased to 21 in 2023 from 17 in 2022, indicating a growing concern over AI regulation
among a broader array of American regulatory bodies. Some of the new regulatory agencies that enacted AI-
related regulations for the first time in 2023 include the Department of Transportation, the Department of Energy,
and the Occupational Safety and Health Administration.
This chapter begins with an overview of some of the most significant AI-related policy events in 2023, as selected by
the AI Index Steering Committee.
The sources cited in this section are for the images included in the text.
0
1–5
6–10
11–15
16–25
No available data
Figure 7.2.1
2 The analysis of passed AI policies may undercount the number of actual bills, given that large bills can include multiple sub-bills related to AI; for example, the CHIPS and Science Act
passed by the United States in 2022.
3 The AI Index monitored AI-related bills passed in Hong Kong and Macao, despite these not being officially recognized countries. Thus, the Index covers a total of 130 geographic areas.
Laws passed by Hong Kong and Macao were counted in the overall tally of AI-related bills. This year, the Index expanded its country sample compared to previous years, resulting in a
difference between the number of AI-related bills reported this year and those in prior reports.
Number of AI-related bills passed into law in 128 select countries, 2016–23
Source: AI Index, 2024 | Chart: 2024 AI Index report
40
35
30
28
Number of AI-related bills
25
20
15
10
0
2016 2017 2018 2019 2020 2021 2022 2023
Figure 7.2.2
Figure 7.2.3
Figure 7.2.4
Number of AI-related bills passed into law in select countries by relevance to AI, 2016–23
Source: AI Index, 2024 | Chart: 2024 AI Index report
40 39
Low Medium High
4
35
30
30
3 28
Number of AI-related bills
26 17
25 3
20
17
18
15 15
13
12
10
6 18
9
5 10
3 8 8
6
1 3 3
0
2016 2017 2018 2019 2020 2021 2022 2023
Figure 7.2.5
By Approach
policymakers focus on expanding AI capabilities,
The AI Index also categorized AI-related bills as
imposing restrictions, or balancing both.
either expansive or restrictive. Expansive bills aim to
enhance a nation’s AI capabilities, such as establishing Figure 7.2.6 indicates a global trend toward regulating
a network of publicly accessible supercomputers. AI usage, showing that, while the commitment to
Restrictive bills, on the other hand, impose limitations enhancing AI capabilities remains, there is a growing
on AI usage, like setting rules for deploying facial shift toward restrictive legislation. This change
recognition technology. A bill can be both, or neither. 4
suggests that legislators are increasingly focused on
Distinguishing between expansive or restrictive mitigating the potential harms of AI’s integration into
bills can highlight legislator priorities: whether society.
Number of AI-related bills passed into law in select countries by approach, 2016–23
Source: AI Index, 2024 | Chart: 2024 AI Index report
25
Expansive Restrictive 24
21 21
7
20
5 18
Number of AI-related bills
15 6
10
9
8 17
16
13
12
5
8
3 7
1 3
0
2016 2017 2018 2019 2020 2021 2022 2023
Figure 7.2.6
4 The AI Index only categorized bills as being expansive or restrictive if they were identified as having medium or high AI relevance. Consequently, the totals depicted in Figure 7.2.5 may not
fully correspond with those presented earlier in the chapter.
By Subject Matter
The AI Index’s global analysis of AI legislation in 2023 the distribution of primary topics among
classifies bills by their primary subject matter passed bills broadened significantly, encompassing
according to the typology used by the U.S. Congress a diverse range of policy areas. Specifically, two bills
to classify American legislation. Historically, 5
were passed in each of the following categories:
economics and public finance have been the armed forces and national security; civil rights and
predominant focus of AI-related legislation, reflecting liberties, minority issues; commerce; education;
the fact that AI-related policymaking matters are labor and employment; science, technology, and
often incorporated within budgetary bills related communication. This diversity indicates that AI policy
to public appropriations (Figure 7.2.7). However, concerns are increasingly spanning various sectors.
Number of AI-related bills passed into law in select countries by primary subject matter, 2016–23
Source: AI Index, 2024 | Chart: 2024 AI Index report
2016 2017 2018 2019 2020 2021 2022 2023
Education 0 0 0 0 3 1 0 2
Energy 0 0 0 0 1 0 0 0
Environmental protection 0 0 0 0 0 1 0 0
Finance and nancial sector 0 0 0 0 0 0 0 1
Foreign trade and international nance 0 0 0 0 0 0 0 1
Government operations and politics 0 0 0 0 0 1 2 0
Health 0 0 1 2 1 0 0 0
Labor and employment 0 0 0 1 0 3 4 2
Science, technology, communications 1 0 2 1 4 2 2 2
Taxation 0 0 0 0 0 1 0 0
Transportation and public works 0 0 1 0 0 0 0 0
Figure 7.2.7
5 Similar to the classification of bills as either expansive or restrictive, only bills coded as having a medium or high relevance to AI were coded for their primary subject matter. Consequently,
not all AI-related bills featured in this section’s analysis have subject matter coding available.
Number of AI-related bills in the United States, 2016–23 (proposed vs. passed)
Source: AI Index, 2024 | Chart: 2024 AI Index report
160
140
Number of AI-related bills
120
100
80
60
40
20
0 1, Passed
Figure 7.2.8
Figure 7.2.9
Figure 7.2.11 displays the total number of state-level proposed in 2022. A significantly greater proportion
AI-related bills proposed and passed in the United of AI-related bills are enacted into law at the state
States since 2016. In 2023, 150 total state-level bills level in the United States, compared to the federal
were proposed, a significant increase from the 61 bills level.
Number of state-level AI-related bills in the United States, 2016–23 (proposed vs. passed)
Source: AI Index, 2024 | Chart: 2024 AI Index report
150, Proposed
140
120
Number of AI-related bills
100
80
60
40 38, Passed
20
Figure 7.2.11
Overview
AI Mentions Figure 7.2.12 reveals a significant increase in the
Another barometer of legislative interest is the number mentions of AI in legislative proceedings across the
of mentions of artificial intelligence in governmental globe, nearly doubling from 1,247 in 2022 to 2,175 in
and parliamentary proceedings. The AI Index 2023. Since 2016, AI mentions in legislative discussions
conducted an analysis of the minutes or proceedings have risen almost tenfold. This data suggests that the
of legislative sessions in 80 countries that contain the emergence of AI systems such as ChatGPT in 2023
keyword “artificial intelligence” from 2016 to 2023. 6
has notably captured policymakers’ attention.
2,175
2,000
1,500
Number of mentions
1,000
500
0
2016 2017 2018 2019 2020 2021 2022 2023
Figure 7.2.12
6 The full list of countries analyzed can be found in the Appendix. The AI Index research team attempted to review the governmental and parliamentary proceedings of every country in the
world; however, publicly accessible governmental and parliamentary databases were not made available for all countries.
In 2023, the United Kingdom led in AI mentions within When legislative mentions are aggregated from
its legislative proceedings (405), followed by the 2016 to 2023, a somewhat similar trend emerges
United States (240) and Australia (227) (Figure 7.2.13). (Figure 7.2.14). The United Kingdom is first, with 1,490
Out of 80 countries analyzed, 48 mentioned AI at least mentions, followed by Spain (886) and the United
once. Moreover, AI discussions reached legislative States (868).
platforms in at least one country from every continent
in 2023, underscoring the truly global reach of AI
policy discourse.
0
1–55
56–120
121–250
251–410
No available data
Figure 7.2.13
0
1–220
221–440
441–660
661–890
891–1,500
No available data
Figure 7.2.14
70
60
50
Number of mentions
48
40
30
20
10
107th 108th 109th 110th 111th 112th 113th 114th 115th 116th 117th 118th
(2001–02) (2003–04) (2005–06) (2007–08) (2009–10) (2011–12) (2013–14) (2015–16) (2017–18) (2019–20) (2021–22) (2023–)
Figure 7.2.15
Figure 7.2.16 depicts AI mentions in the committee reports of the U.S. House of Representatives during the ongoing
118th congressional session. The Appropriations and Science, Space, and Technology committees feature the
highest number of AI mentions. Meanwhile, Figure 7.2.17 highlights AI mentions in Senate committee reports, with
Appropriations leading (9), followed by the Homeland Security and Governmental Affairs Committee (3).
Mentions of AI in committee reports of the US House of Representatives for the 118th congressional
session, 2023
Source: AI Index, 2024 | Chart: 2024 AI Index report
Appropriations 7
Rules 4
Agriculture 1
Armed Services 1
Financial Services 1
Foreign A airs 1
0 1 2 3 4 5 6 7
Number of mentions Figure 7.2.16
Mentions of AI in committee reports of the US Senate for the 118th congressional session, 2023
Source: AI Index, 2024 | Chart: 2024 AI Index report
Appropriations 9
Intelligence (Select) 2
Armed Services 1
0 1 2 3 4 5 6 7 8 9
Number of mentions Figure 7.2.17
Figures 7.2.18 and 7.2.19 show the total number of mentions in committee reports from congressional sessions
occurring since 2001. The House and Senate Appropriations committees, which regulate expenditures of money
by the federal government, lead their respective lists.
Appropriations 25
Armed Services 11
Intelligence (Select) 7
0 2 4 6 8 10 12 14 16 18 20 22 24 26
Number of mentions
Figure 7.2.19
This section offers an overview of national AI strategies, which are policy plans created by governments to guide
the development and deployment of AI within their country. Monitoring trends in these strategies is important for
assessing how countries prioritize the development and regulation of AI technologies. Sources include national or
regional government websites, the OECD AI Policy Observatory (oecd.ai), and news reports.7
Released
In development
Not released
Figure 7.3.1
7 The AI Index research team made efforts to identify whether there was a national AI strategy that was released or in development for every nation in the world. It is possible that some
strategies were missed.
Figure 7.3.3
The advent of AI has garnered significant attention from regulatory agencies—federal bodies tasked with regulating
sectors of the economy and steering the enforcement of laws. This section examines AI regulations within the United
States and the European Union. Unlike legislation, which establishes legal frameworks within nations, regulations are
detailed directives crafted by executive authorities to enforce legislation. In the United States, prominent regulatory
agencies include the Environmental Protection Agency (EPA), Food and Drug Administration (FDA), and Federal
Communications Commission (FCC). Since the specifics of legislation often manifest through regulatory actions,
understanding the AI regulatory landscape is essential in order to develop a deeper understanding of AI policymaking.
7.4 AI Regulation
U.S. Regulation from nearly all branches of the American government,
encompassing more than 436 agencies.8
This section examines AI-related regulations enacted
by American regulatory agencies between 2016 and Overview
2023. It provides an analysis of the total number of The number of AI-related regulations has risen
regulations, as well as their topics, scope, regulatory significantly, both in the past year and over the last five
intent, and originating agencies. To compile this years (Figure 7.4.1). In 2023, there were 25 AI-related
data, the AI Index team performed a keyword search regulations, a stark increase from just one in 2016. Last
for “artificial intelligence” on the Federal Register, a year alone, the total number of AI-related regulations
comprehensive repository of government documents grew by 56.3%.
25 25
20
Number of AI-related regulations
15
10
0
2016 2017 2018 2019 2020 2021 2022 2023
Figure 7.4.1
8 A full description of the project’s methodology can be found in the Appendix.
25
25 Low Medium High
20
Number of AI-related regulations
7
16 16
15
5
12
6
10
10
2 6
14
11
5
9
8
3
6
1
2
0
2016 2017 2018 2019 2020 2021 2022 2023
Figure 7.4.2
9 A high relevance regulation focuses entirely on AI or AI-related issues. A medium relevance regulation includes meaningful mentions of AI but is not solely centered on it. A low relevance
regulation mentions AI in passing, without a significant focus on AI-related matters.
By Agency10
Which agencies are the primary sources of AI (Figure 7.4.3). Furthermore, the number of agencies
regulations? In 2023, the Executive Office of the issuing AI regulations increased from 17 in 2022 to 21 in
President and the Commerce Department led with 2023, indicating a growing need for clarity and concern
five AI-related regulations each, followed by the regarding AI among a broader array of American
Health and Human Services Department and the regulatory bodies.
Industry and Security Bureau, with each issuing four
Figure 7.4.3
10 Regulations can originate from multiple agencies, so the annual totals in Figure 7.4.3 may exceed those in Figure 7.4.1.
13
Expansive Restrictive
12
10
Number of AI-related regulations
8 10
6
6
5
4 4
3 3
2 4
3
1 1
2 2 2
0
2016 2017 2018 2019 2020 2021 2022 2023
Figure 7.4.4
11 Expansive regulations refer to actions by regulatory agencies or governments aimed at augmenting AI capacity, including investments in supercomputing infrastructure. Restrictive
regulations involve steps to curtail AI capabilities, such as imposing restrictions on the use of facial recognition algorithms. Restrictive AI regulations may also be intended to address
underlying policy concerns, such as AI’s potential impact on citizens’ civil liberties. According to this coding typology, a regulation can be classified as both expansive and restrictive, or it may
fit neither category. The AI Index assigned the labels “expansive” or “restrictive” only to regulations deemed to have medium to high relevance to AI. Therefore the regulation totals in Figure
7.4.4 are less than those reported earlier in the section.
By Subject Matter
In 2023, American AI regulations were categorized by topics tied for second place, with two occurrences
primary subject matter. The most prevalent subject each: health; commerce; and science, technology, and
matter in AI-related regulation was foreign trade and communications (Figure 7.4.5).
international finance, with three instances. Three
Number of AI-related regulations in the United States by primary subject matter, 2016–23
Source: AI Index, 2024 | Chart: 2024 AI Index report
2016 2017 2018 2019 2020 2021 2022 2023
Commerce 0 NaN 0 0 0 1 0 2
Education 0 NaN 0 0 1 0 0 0
Primary subject matter
Health 0 NaN 0 0 3 4 1 2
Immigration 1 NaN 0 0 0 0 0 0
Figure 7.4.5
12 The AI Index team used Congress’ policy categorization typology. Only regulations that have medium and high AI relevance were coded for their primary subject matter.
45
40
35
Number of AI-related regulations
32
30
25
20
15
10
0
2017 2018 2019 2020 2021 2022 2023
Figure 7.4.6
13 The methodological approach refers to coding regulations based on relevance, originating agency, approach, and subject matter.
46
Low Medium High
45
40
35 19
Number of AI-related regulations
32
30
25
22 13
20
15 10
12 25
11
10 3
5 17
5 4 9 10
2 6
3
0
2017 2018 2019 2020 2021 2022 2023
Figure 7.4.7
By Agency
The two most prominent originator agencies for European Union AI regulations in 2023 were the Council of the
European Union (13) and European Parliament (9) (Figure 7.4.8).14
Number of AI-related regulations in the European Union by institution and body, 2017–23
Source: AI Index, 2024 | Chart: 2024 AI Index report
DG HOME (EC) 0 0 0 0 0 0 1
DG JUST (EC) 0 0 0 0 0 0 3
DG MOVE (EC) 0 0 0 0 0 0 1
DG SANTE (EC) 0 0 1 0 0 0 0
DG Trade (EC) 0 0 0 1 2 0 0
EU-Egypt Association Council 0 0 0 0 0 1 0
EU-Moldova Association Council 0 0 0 0 0 1 0
Euratom 0 0 0 2 0 0 0
EuroNest Parliamentary Assembly 1 0 1 0 1 0 0
European Commission 1 1 0 2 4 1 0
European Parliament 1 2 2 2 18 9 9
Eurostat (EC) 0 0 0 1 0 1 1
Joint Research Centre (EC) 0 0 0 0 1 0 0
Secretariat-General (EC) 0 0 0 0 2 0 0
Figure 7.4.8
14 Institutions abbreviated with DG are Directorates-General. These are departments with specific areas of ministerial responsibility.
By Approach
In recent years, AI-related regulation in the European Union has tended to take a more expansive approach (Figure
7.4.9). In 2023, there were eight regulations with a restrictive focus compared to 12 with an expansive one.
21
Expansive Restrictive 20
20
8
Number of AI-related regulations
15
15
10
15
12
5 10
4 4
2 2
2
1
2 2
0
2017 2018 2019 2020 2021 2022 2023
Figure 7.4.9
By Subject Matter
In 2023, the most common subject matters for and resilient electoral processes in the Union and
AI-related regulations in the European Union were enhancing the European nature and efficient conduct
science, technology, and communications (5); of the elections to the European Parliament. This
followed by government operations and politics (3) regulation acknowledged that AI could be used
(Figure 7.4.10). Regulations concerning government to generate political misinformation and outlined
operations and politics involve setting rules for steps the Commission has taken to ensure AI does
how governments and associated governmental not challenge the legitimacy of elections. Evidently,
processes operate. One such regulation was the European Union legislators are considering how AI
Commission Recommendation (EU) on inclusive will impact their government’s work.
Number of AI-related regulations in the European Union by primary subject matter, 2017–23
Source: AI Index, 2024 | Chart: 2024 AI Index report
2017 2018 2019 2020 2021 2022 2023
Commerce 0 0 0 1 2 2 1
Education 0 0 1 0 0 0 0
Energy 0 0 0 0 1 0 0
Health 0 0 0 0 0 4 0
International a�airs 0 0 0 0 0 0 1
Social welfare 0 0 0 0 0 0 1
Figure 7.4.10
This section examines public AI investment in the United States based on data from the U.S. government and Govini,
a company that uses AI and machine learning technologies to track U.S. public and commercial spending.
Every year in December, the National Science and According to the 2023 report, in the fiscal year 2023,
Technology Council publishes a report on the public U.S. government agencies allocated a total of $1.8 billion
sector AI R&D budget across various departments to AI research and development spending (Figure 7.5.1).
and agencies that participate in the Networking and The funding for AI R&D has risen annually since FY 2018,
Information Technology Research and Development more than tripling since then. For FY 2024, a larger
(NITRD) Program and National Artificial Intelligence budget of $1.9 billion has been requested.15
Initiative. These reports, however, do not include
1.60 1.53
1.43
1.40
Budget (in billions of U.S. dollars)
1.20
1.11
1.00
0.80
0.60 0.56
0.40
0.20
0.00
FY18 (enacted) FY19 (enacted) FY20 (enacted) FY21 (enacted) FY22 (enacted) FY23 (enacted) FY24 (requested)
Figure 7.5.1
Figure 7.5.2 details the breakdown of NITRD AI R&D budget requests by agency. For FY 2024, the National
Science Foundation (NSF) had the highest request at $531 million, followed by the Defense Advanced Research
Projects Agency (DARPA) at $322.1 million, and the National Institutes of Health (NIH) at $284.5 million.
15 Previous editions of the NITRD report have included spending figures for past years that differ slightly from those reported in the most recent edition. The AI Index reports the spending
amounts documented in the latest NITRD reports.
0
200
400
600
0
200
400
600
0
200
400
600
0
200
400
600
0
200
400
600
FY 2021 FY 2021 FY 2021 FY 2021 FY 2021
1.20
3.80
8.00
0.00
(enacted) (enacted) (enacted) (enacted) (enacted)
9.50
8.80
0.00
0.00
(enacted) (enacted) (enacted) (enacted) (enacted)
Table of Contents
DOI
NTIA
NARA
NIOSH
DARPA
FY 2023 FY 2023 FY 2023 FY 2023 FY 2023
8.40
0.50
0.00
13.70
(enacted) (enacted) (enacted) (enacted) (enacted)
0.10
8.40
Artificial Intelligence
0.00
(requested) (requested) (requested) (requested) (requested)
34.20
322.10
0
200
400
600
0
200
400
600
0
200
400
600
0
200
400
600
0
200
400
600
Source: U.S. NITRD Program | Chart: 2024 AI Index report
2.90
0.00
37.70
12.40
(enacted) (enacted) (enacted) (enacted) (enacted)
25.90
Chapter 7 Preview
FY 2022 FY 2022 FY 2022 FY 2022 FY 2022
6.40
0.40
2.90
41.40
26.00
DHS
DOT
NIST
NASA
TREAS
FY 2023 FY 2023 FY 2023 FY 2023 FY 2023
8.80
0.30
4.00
31.00
(enacted) (enacted) (enacted) (enacted) (enacted)
38.60
0.30
4.00
31.00
10.30
(requested) (requested) (requested) (requested) (requested)
34.30
0
400
600
0
200
400
600
0
200
400
600
0
200
400
600
0
200
400
600
US governmental agency NITRD budgets for AI, FY 2021–24
1.20
(enacted) (enacted) (enacted) (enacted)
10.00
(enacted)
107.50
178.20
200 145.20
FY 2022 FY 2022 FY 2022 FY 2022 FY 2022
0.00
0.00
93.20
NIH
DOD
USDA
NOAA
ED-IES
0.00
0.00
95.20
227.50 241.30
0.00
0.00
104.20
274.00
0
200
400
600
0
200
400
600
0
200
400
600
0
200
400
600
0
200
400
600
0.00
(enacted) (enacted) (enacted) (enacted) (enacted)
46.40
395.90
FY 2022 FY 2022 FY 2022 FY 2022 FY 2022
5.80
15.00
(enacted) (enacted) (enacted) (enacted) (enacted)
32.40
506.50
VA
NIJ
NSF
FDA
DOE
21.00
(enacted) (enacted) (enacted) (enacted) (enacted)
39.80
418.40
20.00
531.30
180.80 164.40 169.90 169.90
404
Figure 7.5.2
7.5 U.S. Public Investment in AI
Chapter 7: Policy and Governance
Artificial Intelligence Chapter 7: Policy and Governance
Index Report 2024 7.5 U.S. Public Investment in AI
US DoD budget request for AI-speci c research, development, test, and evaluation (RDT&E), FY 2020–24
Source: U.S. O ce of the Under Secretary of Defense (Comptroller), 2023 | Chart: 2024 AI Index report
1.80
1.80
1.60
Budget request (in billions of U.S. dollars)
1.40
1.20
1.10
1.00 0.93
0.87
0.84
0.80
0.60
0.40
0.20
0.00
Sum of FY20 funding Sum of FY21 funding Sum of FY22 funding Sum of FY23 funding Sum of FY24 funding
Figure 7.5.3
3.50
3.33
3.19
U.S. government spending (in billions of U.S. dollars)
0.23
3.00
0.55
2.59
2.50 2.47 0.21
0.23 1.24
2.00 1.04
1.90
0.81 1.01
Figure 7.5.4
16 In 2023, Govini made minor adjustments to their classification methodology. Consequently, the contract totals presented in Figure 7.5.4 may vary slightly from those reported in earlier
editions of the AI Index.
Figure 7.5.5 shows U.S. government spending by AI segment in FY 2022 and FY 2023. Spending significantly
increased for machine learning. Computer vision and natural language processing spending also rose, albeit less
prominently.
1.51 (+72%)
Machine learning
0.88
1.04 (+17%)
Computer vision
0.89
0.55 (-56%)
Autonomy
1.24
0.23 (+28%)
Natural language
processing 2023
0.18
2022
Figure 7.5.5
In FY 2023, the majority of federal AI contracts were prime contracts (50.6%), followed by grants (47.6%) (Figure
7.5.6). In the last year, the share of contracts has declined, while the share of grants has increased.
Total value of contracts, grants, and OTAs awarded by the US government for AI/ML and autonomy,
FY 2018–23
Source: Govini, 2023 | Chart: 2024 AI Index report
2.00
Total value awarded (in billions of U.S. dollars)
1.68, Contracts
1.58, Grants
1.50
1.00
0.50
0.06, OTAs
0.00
Figure 7.5.6
0.48
3.50
U.S. government spending (in billions of U.S. dollars)
3.00
2.53
2.50
0.24
2.10
2.00 0.12
2.29
1.00 1.98
1.57 1.57
1.4
0.50
0.00
2018 2019 2020 2021 2022 2023
Figure 7.5.7
Total value of contracts, grants, and OTAs awarded by the US government for microelectronics,
FY 2018–23
Source: Govini, 2023 | Chart: 2024 AI Index report
3.50
3.33, Contracts
3.00
Total value awarded (in billions of U.S. dollars)
2.50
2.00
1.50
1.00
Figure 7.5.8
Preview
Overview 413
Chapter Highlights 414
Overview
The demographics of AI developers often differ from those of users. For instance, a
considerable number of prominent AI companies and the datasets utilized for model training
originate from Western nations, thereby reflecting Western perspectives. The lack of diversity
can perpetuate or even exacerbate societal inequalities and biases.
This chapter delves into diversity trends in AI. The chapter begins by drawing on data from
the Computing Research Association (CRA) to provide insights into the state of diversity in
American and Canadian computer science (CS) departments. A notable addition to this year’s
analysis is data sourced from Informatics Europe, which sheds light on diversity trends within
European CS education. Next, the chapter examines participation rates at the Women in
Machine Learning (WiML) workshop held annually at NeurIPS. Finally, the chapter analyzes
data from Code.org, offering insights into the current state of diversity in secondary CS
education across the United States.
The AI Index is dedicated to enhancing the coverage of data shared in this chapter.
Demographic data regarding AI trends, particularly in areas such as sexual orientation, remains
scarce. The AI Index urges other stakeholders in the AI domain to intensify their endeavors to
track diversity trends associated with AI and hopes to comprehensively cover such trends in
future reports.
Chapter Highlights
1. U.S. and Canadian bachelor’s, master’s, and PhD CS students continue to grow more
ethnically diverse. While white students continue to be the most represented ethnicity among new resident
graduates at all three levels, the representation from other ethnic groups, such as Asian, Hispanic, and Black or
African American students, continues to grow. For instance, since 2011, the proportion of Asian CS bachelor’s
degree graduates has increased by 19.8 percentage points, and the proportion of Hispanic CS bachelor’s degree
graduates has grown by 5.2 percentage points.
2. Substantial gender gaps persist in European informatics, CS, CE, and IT graduates at
all educational levels. Every surveyed European country reported more male than female graduates in
bachelor’s, master’s, and PhD programs for informatics, CS, CE, and IT. While the gender gaps have narrowed in
most countries over the last decade, the rate of this narrowing has been slow.
3. U.S. K–12 CS education is growing more diverse, reflecting changes in both gender and
ethnic representation. The proportion of AP CS exams taken by female students rose from 16.8% in 2007 to
30.5% in 2022. Similarly, the participation of Asian, Hispanic/Latino/Latina, and Black/African American students
in AP CS has consistently increased year over year.
This section examines trends in diversity within CS and AI postsecondary education across North America and Europe.
Gender of new CS bachelor’s graduates (% of total) in the United States and Canada, 2010–22
Source: CRA Taulbee Survey, 2023 | Chart: 2024 AI Index report
90%
80%
77.70%, Male
70%
New CS bachelor’s graduates (% of total)
60%
50%
40%
30%
22.20%, Female
20%
10%
0% 0.10%, Nonbinary/Other
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 8.1.1
Over the past decade, the number of CS bachelor’s graduates of all ethnicities has grown, notably 4.7 times
for Hispanics and 2.5 times for African Americans (Figure 8.1.2). As a proportion of ethnicities among all CS
bachelor’s graduates, Asians have risen the fastest, doubling in the last 10 years (Figure 8.1.3).
1 The charts in this section look only at the ethnicity of domestic or native CS students and faculty. Although the CRA reports data on the proportion of nonresident aliens at each educational
level (i.e., bachelor’s, master’s, PhD, and faculty), data on the ethnicity of nonresident aliens is not included.
2 Not all PhD-granting departments targeted in the survey provided responses. Of the 297 departments targeted, only 182 responded, resulting in an overall response rate of 61%. The AI Index
advises against making per capita comparisons between the CRA North American data and the data on European CS graduates detailed in the subsequent sections due to the European data
being collected from national statistical offices, which affords it broader coverage.
Ethnicity of new resident CS bachelor’s graduates in the United States and Canada, 2011–22
Source: CRA Taulbee Survey, 2023 | Chart: 2024 AI Index report
10,970, White
10,000
8,795, Asian
Number of new CS bachelor’s graduates
8,000
6,000
4,000
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 8.1.2
Ethnicity of new resident CS bachelor’s graduates (% of total) in the United States and Canada, 2011–22
Source: CRA Taulbee Survey, 2023 | Chart: 2024 AI Index report
70%
60%
New CS bachelor’s graduates (% of total)
50%
44.58%, White
40%
35.74%, Asian
30%
20%
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 8.1.3
CS Master’s Graduates
The proportion of female CS master’s graduates has seen minimal growth in the last decade, increasing
to 26.3% in 2022 from 24.6% in 2011. Additionally, in 2022, 0.08% of CS master’s graduates identified as
nonbinary/other (Figure 8.1.4).
Gender of new CS master’s graduates (% of total) in the United States and Canada, 2011–22
Source: CRA Taulbee Survey, 2023 | Chart: 2024 AI Index report
80%
73.65%, Male
70%
New CS master’s graduates (% of total)
60%
50%
40%
30%
26.26%, Female
20%
10%
0% 0.08%, Nonbinary/Other
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 8.1.4
Among North American students, the most represented ethnicities are white (47.9%), Asian (35.8%), and
Hispanic (8.2%) (Figure 8.1.5 and Figure 8.1.6). Similar to CS bachelor’s graduates, the pool of CS master’s
graduates has become increasingly ethnically diverse over the last decade.
Ethnicity of new resident CS master’s graduates in the United States and Canada, 2011–22
Source: CRA Taulbee Survey, 2023 | Chart: 2024 AI Index report
2,500
Number of new CS master’s graduates
2,278, Asian
2,000
1,500
1,000
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Figure 8.1.5
Ethnicity of new resident CS master’s graduates (% of total) in the United States and Canada, 2011–22
Source: CRA Taulbee Survey, 2023 | Chart: 2024 AI Index report
70%
60%
New CS master’s graduates (% of total)
50%
47.87%, White
40%
35.76%, Asian
30%
20%
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Figure 8.1.6
CS PhD Graduates
In 2022, the percentage of female PhD graduates in CS slightly decreased to 22.1% (Figure 8.1.7), but the long-
term trend is unchanged.
Gender of new CS PhD graduates (% of total) in the United States and Canada, 2010–22
Source: CRA Taulbee Survey, 2023 | Chart: 2024 AI Index report
80%
77.78%, Male
70%
New CS PhD graduates (% of total)
60%
50%
40%
30%
22.10%, Female
20%
10%
0% 0.12%, Nonbinary/Other
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 8.1.7
From 2011 to 2022, the diversity among CS PhD graduates significantly increased (Figure 8.1.8 and Figure
8.1.9). In 2022, 41.1% of CS PhD graduates were Asian, Black, Hispanic, multiracial, American Indian, or Native
Hawaiian, marking a considerable rise from 2011.
Ethnicity of new resident CS PhD graduates in the United States and Canada, 2011–22
Source: CRA Taulbee Survey, 2023 | Chart: 2024 AI Index report
500
400
Number of new CS PhD graduates
327, White
300
200
164, Asian
100
28, Black or African American
26, Hispanic (any race)
7, Multiracial (not Hispanic)
2, American Indian or Alaska Native
0 1, Native Hawaiian or Paci c Islander
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Figure 8.1.8
Ethnicity of new resident CS PhD graduates (% of total) in the United States and Canada, 2011–22
Source: CRA Taulbee Survey, 2023 | Chart: 2024 AI Index report
70%
50%
40%
20%
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 8.1.9
Disability Status of CS, CE, and Information relatively low: 4.1% of bachelor’s, 1.5% of master’s,
Students and 1.1% of PhD students indicated a need for
For the second consecutive year, the CRA requested accommodations (Figure 8.1.10). Year over year,
departments to report the number of students at each the proportion of students requesting disability
degree level who received disability accommodations accommodations has remained consistent.
over the preceding year. The reported numbers were
CS, CE, and information students (% of total) with disability accomodations in United States and Canada,
2021 vs. 2022
Source: CRA Taulbee Survey, 2023 | Chart: 2024 AI Index report
2022 2021
4.10% 4.10%
4%
CS, CE, and information students (% of total)
3%
2%
1.50%
1.10%
1.00%
1%
0.80%
0%
PhD Master’s Bachelor’s
Figure 8.1.10
CS, CE, and Information Faculty male (75.6%), with women comprising 24.3% and
Data regarding the ethnicity and gender of faculty nonbinary individuals accounting for 0.1% (Figure
in CS, CE, and information fields highlight diversity 8.1.11). Although the proportion of female faculty in
trends in academic AI and CS. As of 2022, a majority these fields has risen since 2011, the increase has
of faculty members in CS, CE, and information are been small.
Gender of CS, CE, and information faculty (% of total) in the United States and Canada, 2011–22
Source: CRA Taulbee Survey, 2023 | Chart: 2024 AI Index report
80%
75.60%, Male
70%
CS, CE, and information faculty (% of total)
60%
50%
40%
30%
24.27%, Female
20%
10%
0% 0.12%, Nonbinary/Other
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 8.1.11
While the majority of new faculty hires in CS, CE, and information at American and Canadian universities
remain male (71.7%), the proportion of women reached 28.0% in 2022 (Figure 8.1.12), well above the proportion
of new female PhDs.
Gender of new CS, CE, and information faculty hires (% of total) in the United States and Canada, 2011–22
Source: CRA Taulbee Survey, 2023 | Chart: 2024 AI Index report
80%
New CS, CE, and information faculty hires (% of total)
71.67%, Male
70%
60%
50%
40%
30%
28.00%, Female
20%
10%
0% 0.33%, Nonbinary/Other
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 8.1.12
As of 2022, the majority of resident faculty in CS, CE, and information were white (57.3%), with Asian faculty
following at 30.1% (Figure 8.2.13 and Figure 8.1.14). The ethnic diversity gap is gradually closing: In 2011, the difference
between white faculty and the next largest ethnic group was 46.1%, but by 2021, it had narrowed to 27.2%.
Ethnicity of resident CS, CE, and information faculty in the United States and Canada, 2011–22
Source: CRA Taulbee Survey, 2023 | Chart: 2024 AI Index report
3,500
Number of CS, CE, and information faculty
3,000
2,500
2,100, Asian
2,000
1,500
1,000
406, Unknown
207, Hispanic (any race)
500 176, Black or African American
47, Multiracial (not Hispanic)
29, Native Hawaiian or Paci c Islander
0 17, American Indian or Alaska Native
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Figure 8.1.13
Ethnicity of resident CS, CE, and information faculty (% of total) in the United States and Canada, 2011–22
Source: CRA Taulbee Survey, 2023 | Chart: 2024 AI Index report
70%
60%
57.27%, White
CS, CE, and information faculty (% of total)
50%
40%
20%
5.82%, Unknown
10% 2.97%, Hispanic (any race)
2.52%, Black or African American
0.67%, Multiracial (not Hispanic)
0.42%, Native Hawaiian or Paci c Islander
0% 0.24%, American Indian or Alaska Native
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Figure 8.1.14
Europe
Data on diversity trends about European CS graduates In all surveyed European countries, informatics,
comes from Informatics Europe.3 CS, CE, and IT PhD graduates are predominantly
male. However, in nations such as the United
Informatics, CS, CE, and IT Bachelor’s
Kingdom, Germany, and Switzerland, the gender
Graduates
gap has narrowed over the last decade, with women
In the majority of surveyed European nations, there
constituting a growing share of PhD graduates (Figure
is a persistent gender disparity among bachelor’s-
8.1.17).4 In contrast, countries like Finland and Spain
level graduates in informatics, computer science,
have seen the gap slightly widen.
computer engineering, and information technology.
Despite some narrowing since 2011, men continue to
dominate. For example, France (14.8%), the United
Kingdom (17.8%), and Germany (21.5%) show relatively
low proportions of female graduates in these fields
(Figure 8.1.15). Bulgaria stands out among the surveyed
countries with the highest proportion of female
graduates (35.2%).
3 The year label refers to the year in which an academic year ends. For example, the figures visualizing new graduates for 2022 reflect the number of graduates reported for the 2021/2022
academic year. For the sake of visual simplicity, the Index opts to focus on the year in which students graduated.
4 In countries where the number of PhD graduates is relatively small, trends in gender proportions can be prone to sudden year-over-year changes. For example, in 2022 Bulgaria produced 24
total PhDs, Latvia 12, and Estonia 26.
Gender of new informatics, CS, CE, and IT bachelor’s graduates (% of total) in Europe, 2011–22
Source: Informatics Europe, 2023 | Chart: 2024 AI Index report
16.29% 21.03%
14.05% 12.98%
0% 0% 0% 0%
2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022
21.58% 21.37%
14.47% 14.96%
0% 0% 0% 0%
2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022
United Kingdom
100%
82.21%
50%
Male
17.79% Female
0%
2013 2016 2019 2022
Figure 8.1.15
Gender of new informatics, CS, CE, and IT master’s graduates (% of total) in Europe, 2011–22
Source: Informatics Europe, 2023 | Chart: 2024 AI Index report
41.98%
27.57% 23.14% 23.51%
0% 0% 0% 0%
2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022
23.50% 28.57%
18.59% 18.76%
0% 0% 0% 0%
2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022
United Kingdom
100%
68.67%
50%
31.33% Male
Female
0%
2013 2016 2019 2022
Figure 8.1.16
Gender of new informatics, CS, CE, and IT PhD graduates (% of total) in Europe, 2011–22
Source: Informatics Europe, 2023 | Chart: 2024 AI Index report
Male Female
38.46%
25.00%
13.54% 18.67%
0% 0% 0% 0%
2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022
25.17% 29.25%
20.25% 17.24%
0% 0% 0% 0%
2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022
Figure 8.1.17
8.2 AI Conferences
Women in Machine Learning Attendance at NeurIPS Women in Machine Learning
workshop, 2010–23
Source: Women in Machine Learning, 2023 | Chart: 2024 AI Index report
(WiML) NeurIPS Workshop
1,400
Women in Machine Learning (WiML), founded in
2006, is an organization dedicated to supporting 1,200
Number of attendees
1,000
learning. This section of the AI Index presents data
800
from the WiML annual technical workshop, hosted 714
at NeurIPS. 600
400
Workshop Participants
Despite a decline in participation over the last 200
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
than the attendance of 89 in 2010 (Figure 8.2.1).
The recent drop in WiML workshop attendance Figure 8.2.1
8%
6%
4.36%
4%
2%
0%
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
Figure 8.2.2
5 Figure 8.1.1 accounts for total attendance, which in some conference years comprised both in-person and virtual attendance.
Continent of residence of participants at NeurIPS Women in Machine Learning workshop, 2022 vs. 2023
Source: Women in Machine Learning, 2023 | Chart: 2024 AI Index report
North 56.43%
America 41.50%
21.79%
Europe
34.20%
11.43%
Asia
17.10%
8.93%
Africa
3.40%
Australia/ 1.07%
Oceania 1.40%
South 0.36%
America 1.60%
2023
0.00% 2022
Antarctica
0.20%
The majority of participants at the 2022 WiML workshop were female-identifying (84.2%), another 10.0% were
male-identifying, and 3.2% were nonbinary-identifying (Figure 8.2.4).
Gender breakdown of participants at NeurIPS Women in Machine Learning workshop, 2022 vs. 2023
Source: Women in Machine Learning, 2023 | Chart: 2024 AI Index report
84.23%
Female
37.00%
10.04%
Male
25.80%
Nonbinary/ 3.23%
Genderqueer/
Third gender 0.50%
1.43%
Prefer not to say
36.30%
0.72%
Questioning
This section uses data from Code.org, a U.S. nonprofit dedicated to advancing CS education in K–12 schools across the
country, to paint a picture of how AI diversity trends are reflected at the K–12 level.
80%
70%
AP computer science exams taken (% of total)
68.87%, Male
60%
50%
40%
20%
10%
0% 0.67%, Other
2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 8.3.1
6 There are two types of AP CS exams: Computer Science A and Computer Science Principles. Data on computer science exams taken includes both exams. AP CS Principles was initially
offered in 2017.
On a percentage basis, the AP computer science exams taken by female students (% of total),
states with the highest number
2022
Source: Code.org, 2023 | Chart: 2024 AI Index report
of female AP CS test-takers AK ME
26% 22%
in 2022 were Mississippi
(41%), Alabama (37%), and VT NH MA
23% 26% 31%
Washington, D.C. (37%) (Figure
WA MT ND SD MN WI MI NY CT RI
8.3.2). California, Texas, and 31% 33% 18% 20% 24% 22% 29% 36% 29% 23%
Washington, states known for OR ID WY NE IA IL IN OH PA NJ
26% 27% 23% 25% 18% 33% 24% 27% 28% 31%
significant CS and AI activity,
CA NV UT CO KS MO KY WV DC MD DE
also saw notable participation, 31% 32% 22% 25% 16% 22% 28% 30% 37% 33% 23%
with approximately 30% of AP AZ NM OK AR TN VA NC
25% 25% 26% 30% 30% 29% 30%
CS exam-takers being female.
TX LA MS AL GA SC
31% 34% 41% 37% 29% 33%
HI FL
30% 31%
Figure 8.3.2
70,000
Number of AP computer science exams taken
60,000
56,098, Asian
50,000
40,000
35,528, Hispanic/Latino/Latina
30,000
20,000
2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 8.3.3
60%
50%
40%
38.23%, White
30%
27.82%, Asian
20%
17.62%, Hispanic/Latino/Latina
2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 8.3.4
Preview
Overview 437
Chapter Highlights 438
Overview
As AI becomes increasingly ubiquitous, it is important to understand how public
perceptions regarding the technology evolve. Understanding this public opinion is vital in
better anticipating AI’s societal impacts and how the integration of the technology may
differ across countries and demographic groups.
This chapter examines public opinion on AI through global, national, demographic, and
ethnic perspectives. It draws upon several data sources: longitudinal survey data from Ipsos
profiling global AI attitudes over time, survey data from the University of Toronto exploring
public perception of ChatGPT, and data from Pew examining American attitudes regarding
AI. The chapter concludes by analyzing mentions of significant AI models on Twitter, using
data from Quid.
437
CHAPTER 9:
Artificial Intelligence
Index Report 2024 Public Opinion
Chapter Highlights
1. People across the globe are more cognizant of AI’s potential impact—and more nervous.
A survey from Ipsos shows that, over the last year, the proportion of those who think AI will dramatically affect
their lives in the next three to five years has increased from 60% to 66%. Moreover, 52% express nervousness
toward AI products and services, marking a 13 percentage point rise from 2022. In America, Pew data suggests
that 52% of Americans report feeling more concerned than excited about AI, rising from 38% in 2022.
3. The public is pessimistic about AI’s economic impact. In an Ipsos survey, only 37% of
respondents feel AI will improve their job. Only 34% anticipate AI will boost the economy, and 32% believe it will
enhance the job market.
5. ChatGPT is widely known and widely used. An international survey from the University of Toronto
suggests that 63% of respondents are aware of ChatGPT. Of those aware, around half report using ChatGPT at
least once weekly.
438
Artificial Intelligence Chapter 9: Public Opinion
Index Report 2024 9.1 Survey Data
Global opinions on products and services using AI (% of total), 2022 vs. 2023
Source: Ipsos, 2022–23 | Chart: 2024 AI Index report
1 See Appendix for more details about the survey methodology. The survey was conducted from May to June 2023.
Perceptions of AI’s benefits versus drawbacks vary developed Western nations, including Germany,
considerably by country, according to the Ipsos survey. the Netherlands, Australia, Belgium, Canada, and
78% of Indonesian, 74% of Thai, and 73% of Mexican the United States, were among the least positive
respondents view AI products and services as more about AI products and services. Since then, each of
beneficial than harmful (Figure 9.1.2). In contrast, only these countries has seen a rise in the proportion of
37% of Americans agree with this perspective. Among respondents acknowledging the benefits of AI, with
the 31 countries surveyed, the United States and France the Netherlands experiencing the most significant
exhibited the most skepticism. shift. By 2023, 43% of Dutch respondents viewed AI
products and services positively, up from 33% the
Attitudes toward AI are becoming more positive in
previous year.
countries that were previously critical. In 2022, several
‘Products and services using AI have more bene ts than drawbacks,’ by country (% of total), 2022 vs. 2023
‘Products
Source: Ipsos, and 2024
2022–23 | Chart: services
AI Indexusing
report AI have more bene ts than drawbacks,’ by country (% of total), 2022 vs. 2023
Source: Ipsos, 2022–23 | Chart: 2024 AI Index report
78%
Indonesia 78%
Indonesia
74% 74%
Thailand Thailand
73% 73%
Mexico Mexico 65%
65%
69%
Malaysia 69%
65%
Malaysia
65%
67%
Peru
70% 67%
Peru
67% 70%
Turkey
60%
67%
Turkey 66%
South Korea 60%
62%
65% 66%
South Korea India 62%71%
65% 65%
IndiaColombia 64% 71%
64%
Singapore 65%
Colombia
64%
64%
Brazil
57% 64%
Singapore
61%
Romania
64%
Brazil 59% 57%
South Africa
57%
61%
Romania Chile
59%
63%
57% 59%
South AfricaArgentina 55% 57%
55%
Italy
50% 59%
Chile
63%
52%
Japan
42% 57%
Argentina
50%
55%
Spain
53%
55%
Italy
Hungary
48% 50%
49%
52%
Japan Poland 42%47%
48%
46% 50%
Spain
Great Britain
38% 53%
44% 48%
New Zealand
Hungary
49%
43%
Netherlands
33% 47%
Poland
48%
42%
Germany
37%
46%
Great Britain
38%
40%
Ireland
44%
New Zealand 40%
Australia
37%
39% 43%
Netherlands Sweden
33% 40%
39% 42%
Germany Belgium 38%
37%
38%
Canada 40% 2023
Ireland 32%
2022
37%
United States
35% 40%
Australia
37%
37%
France
31%
39%
Sweden 0% 10% 20% 30% 40% 40% 50% 60% 70% 80%
% of respondents that “Agree” Figure 9.1.2
39%
Belgium
38%
38%
Canada 2023
Table of Contents Chapter 9 Preview 32% 441
2022
37%
United States
35%
37%
Artificial Intelligence Chapter 9: Public Opinion
Index Report 2024 9.1 Survey Data
Figure 9.1.3 shows responses to Ipsos’ survey on Conversely, Japanese respondents show the least
AI products and services by country. Indonesian understanding of AI (43%) and also report the lowest
respondents are notably optimistic: 84% claim a solid level of nervousness about AI (23%). Meanwhile,
understanding of AI, 79% believe AI will significantly Thai respondents exhibit the highest trust in AI’s
change their lives in the next three to five years, and 75% impartiality, believing it will not discriminate or show
express excitement about AI products and services. bias toward any group.
Australia
Belgium
Brazil
Canada
Chile
Colombia
France
Germany
Great Britain
Hungary
India
Indonesia
Ireland
Italy
Japan
Malaysia
Mexico
Netherlands
New Zealand
Peru
Poland
Romania
Singapore
South Africa
South Korea
Spain
Thailand
Turkey
United States
Sweden
Figure 9.1.3
A large majority of the countries surveyed by Ipsos in (24 percentage points), France (19), Chile (18), and
2022 were surveyed again in 2023, enabling cross- Australia (18).
year comparisons. Figure 9.1.4 highlights the year-over-
Likewise, except for South Africa, all countries in the
year percentage point change in answers to particular
survey sample are now more inclined to believe that
AI-related questions. For every country surveyed
AI will significantly impact their lives in the next three
in both 2022 and 2023, an increase was reported
to five years. The highest increase of 12 percentage
in the degree to which AI products make people
points was reported in Japan, Great Britain, Germany,
nervous. The sharpest increases were reported in Italy
and Australia.
Percentage point change in opinions about AI by country (% agreeing with statement), 2022–23
Source: Ipsos, 2022–23 | Chart: 2024 AI Index report
Australia
Belgium
Brazil
Canada
Chile
Colombia
France
Germany
Great Britain
Hungary
Italy
Japan
Malaysia
Mexico
Netherlands
Peru
Poland
South Africa
South Korea
Spain
Sweden
Turkey
United States
Figure 9.1.4
AI and Jobs
This year’s Ipsos survey included more questions about of respondents think AI is likely to change how they
how people perceive AI’s impact on their current jobs. perform their current job within the next five years,
Figure 9.1.5 illustrates the various global perspectives and 36% fear AI may replace their job in the same
on the expected impact of AI on employment. 57% time frame.
Figure 9.1.5
Opinions on whether AI will significantly impact an Specifically, 66% of Gen Z compared to 46% of
individual’s job vary significantly across demographic boomer respondents agree with the statement that
groups (Figure 9.1.6). Younger generations, such as AI will likely affect their current jobs. Additionally,
Gen Z and millennials, are more inclined to agree individuals with higher incomes, more education, and
that AI will change how they do their jobs compared decision-making roles are more likely to foresee AI
to older generations like Gen X and baby boomers. impacting their current employment.
70% 68%
66%
61% 60% 61%
60% 58%
55% 56% 56%
53% 54%
52% 51%
% of respondents
50% 46%
40%
30%
20%
10%
0%
Male
Female
Gen Z
Millennial
Gen X
Boomer
Lower income
Middle income
Upper income
Lower education
Medium education
Higher education
Yes
No
Gender Generation Household income Education Decision-maker
Figure 9.1.6
AI and Livelihood
The Ipsos survey explored the impact respondents and 37% think it will improve their job. Only 34%
believe AI will have on various aspects of their lives, anticipate AI will boost the economy, and just 32%
such as health and entertainment. On topics like time believe it will enhance the job market.
management and entertainment, the majority viewed
Similar to questions about AI products and services,
AI positively (Figure 9.1.7). For instance, 54% of global
responses showed intracountry consistency, with
respondents agree that AI will improve the efficiency
Japanese, Swedes, and Americans generally
of their tasks, and 51% believe AI will enhance
pessimistic about AI’s potential to improve
entertainment options like TV, movies, music, and
livelihoods, whereas Brazilians, Indonesians, and
books. However, skepticism was more prominent in
Mexicans were more optimistic.
other areas. Only 39% feel AI will benefit their health,
My entertainment options
(television/video content, 51% 63% 43% 37% 64% 42% 63% 64% 32% 39% 45% 41% 57% 71% 44% 45% 33% 57% 68% 50% 47% 71% 40% 46% 57% 66% 54% 51% 37% 68% 60% 40%
movies, music, books)
My health 39% 47% 29% 28% 55% 29% 47% 50% 37% 25% 33% 28% 50% 58% 29% 37% 16% 49% 61% 30% 31% 55% 23% 43% 40% 49% 38% 33% 25% 56% 47% 32%
My job 37% 36% 30% 21% 55% 25% 36% 41% 26% 23% 32% 24% 48% 62% 24% 32% 19% 47% 52% 25% 30% 56% 30% 30% 41% 51% 23% 27% 28% 66% 47% 28%
The economy in my country 34% 28% 26% 18% 51% 20% 31% 40% 24% 27% 31% 25% 54% 58% 25% 29% 22% 50% 48% 27% 28% 46% 29% 34% 50% 39% 34% 25% 21% 62% 39% 23%
The job market 32% 38% 20% 18% 49% 19% 32% 38% 21% 20% 21% 24% 49% 50% 22% 30% 22% 39% 54% 23% 22% 54% 26% 33% 37% 38% 17% 23% 21% 53% 50% 21%
Global
Argentina
Australia
Belgium
Brazil
Canada
Chile
Colombia
France
Germany
Great Britain
Hungary
India
Indonesia
Ireland
Italy
Japan
Malaysia
Mexico
Netherlands
New Zealand
Peru
Poland
Romania
Singapore
South Africa
South Korea
Spain
Thailand
Turkey
United States
Sweden
Figure 9.1.7
Significant demographic differences also exist in entertainment, health, and the economy compared to
perceptions of AI’s potential to enhance livelihoods, their lower-income and less-educated counterparts.
with younger generations generally expressing greater In general, members of Gen Z, those in higher
optimism. For instance, 59% of Gen Z respondents income brackets, and those with more education are
believe AI will improve entertainment options, the most optimistic about AI’s potential to improve
versus only 40% of baby boomers. Additionally, life, while those from the boomer generation, lower
individuals with higher incomes and education levels income brackets, and with less education are the
are more optimistic about AI’s positive impacts on least optimistic.
My entertainment options
(television/video content, 52% 51% 59% 55% 51% 40% 47% 51% 57% 47% 51% 55% 54% 47%
movies, music, books)
My health 42% 36% 46% 42% 37% 30% 34% 40% 43% 35% 39% 41% 41% 35%
My job 38% 35% 46% 41% 31% 26% 33% 36% 41% 34% 35% 40% 0% 0%
The economy in my country 37% 31% 40% 38% 31% 26% 32% 35% 38% 30% 33% 38% 37% 29%
The job market 34% 30% 39% 36% 28% 23% 31% 32% 34% 30% 32% 33% 34% 28%
Male
Female
Gen Z
Millennial
Gen X
Boomer
Lower income
Middle income
Upper income
Lower education
Medium education
Higher education
Employed
Nonemployed
Figure 9.1.8
Yes Unsure No
Misuse/use for
49% 63% 45% 46% 52% 56% 39% 44% 57% 34% 66% 47% 52% 34% 52% 31% 51% 62% 47% 58% 48% 43%
nefarious purposes
Impact of AI
49% 53% 45% 47% 49% 58% 39% 43% 42% 48% 59% 40% 36% 54% 54% 51% 37% 57% 68% 49% 49% 43%
on jobs
Violation of
45% 53% 45% 46% 48% 57% 39% 43% 44% 35% 46% 40% 33% 34% 43% 31% 49% 57% 55% 53% 40% 43%
citizens’ privacy
Dehumanization of
41% 52% 43% 44% 49% 54% 22% 51% 40% 28% 28% 42% 38% 37% 47% 21% 27% 59% 49% 54% 44% 39%
services
Lack of transparency
34% 35% 42% 35% 38% 35% 27% 30% 33% 36% 36% 31% 24% 36% 33% 28% 32% 41% 47% 35% 35% 32%
in decision-making
Impact of AI
33% 32% 28% 33% 34% 33% 26% 28% 27% 41% 35% 24% 25% 47% 30% 43% 28% 37% 43% 33% 27% 27%
on education
Ethical implications 30% 32% 38% 28% 37% 34% 25% 22% 27% 32% 23% 25% 24% 33% 29% 23% 25% 37% 30% 34% 35% 32%
Accuracy of results
28% 23% 38% 18% 32% 24% 24% 23% 24% 39% 30% 20% 29% 43% 25% 34% 20% 22% 38% 20% 31% 30%
and analysis
Uneven access
26% 35% 21% 35% 24% 32% 23% 23% 23% 29% 29% 19% 27% 22% 30% 19% 20% 32% 28% 28% 18% 19%
to AI
Potential for bias
24% 20% 34% 26% 30% 23% 20% 17% 22% 29% 33% 18% 18% 28% 23% 18% 16% 27% 29% 24% 32% 26%
and discrimination
My own ability
22% 22% 18% 20% 18% 24% 18% 19% 16% 36% 30% 17% 16% 33% 27% 31% 14% 22% 29% 18% 19% 19%
to use AI
Global
Argentina
Australia
Brazil
Canada
Chile
China
France
Germany
India
Indonesia
Italy
Japan
Kenya
Mexico
Pakistan
Poland
Portugal
South Africa
Spain
United Kingdom
United States
Figure 9.1.11
Since 2021, Pew Research Center has been More excited than concerned Equally concerned and excited
Figure 9.1.12 shows that over the last year, Americans 52%
% of respondents
have grown increasingly concerned about the use of 60%
18% 15%
AI in their daily lives. In 2021 and 2022, only 37% and
38% of Americans, respectively, reported feeling 40% 10%
Pew also surveyed Americans’ opinions on whether they believed AI helped or hindered in specific contexts
(Figure 9.1.13). Respondents reported that AI was more likely to be beneficial, particularly in assisting people to
find products or services online, with 49% expressing this view. However, 53% of respondents indicated that AI
was more likely to be detrimental than beneficial in safeguarding personal information privacy.
Helps more than it hurts Not sure Hurts more than it helps
Pew further segmented the data by education level degrees are more likely to report that AI can
level (Figure 9.1.14). Across various use categories, significantly aid doctors in delivering quality care to
Americans with higher education levels are more patients and assist people in discovering products
likely to believe in AI’s potential to help rather than and services online that interest them.
harm. For instance, individuals with college or higher-
Some college or less 44% 39% 17% 23% 49% 27% 39% 50%
Companies making safe cars and trucks Doctors providing quality care to patients People taking care of their health
College degree+ 45% 40% 15% 46% 38% 16% 41% 42% 16%
Some college or less 33% 46% 20% 32% 44% 23% 29% 50% 20%
People nding accurate information online Companies providing quality customer service
Stable Di�usion 40 37 53 45
Copilot 56 66 73 67
GPT-4 42 48 51 44
PaLM 2 NaN 62 66 75
Figure 9.2.1
Figure 9.2.2 highlights the proportion of AI-related the fourth quarter of 2023, GPT-4 still captured 45%
social media conversation that was dominated by the of social media attention. Other models that garnered
release of particular models. GPT-4 remained a dom-
2
significant attention included Grok, Stable Diffusion,
inant topic of consumer conversation throughout the and Gemini.
year. Despite the release of numerous new models by
Copilot 0% 1% 1% 1%
PaLM 2 NaN% 5% 4% 2%
Figure 9.2.2
2 The figures in this section consider all AI-related social media conversation. The percentage associated with a model in a quarter in Figure 9.2.2 represents the share of all AI-related social
media conversation in that quarter that was concerned with that model.
Highlight:
Highlight:
Appendix
1 https://eto.tech/tool-docs/cat/
2 https://cat.eto.tech/
3 Christian Schoeberl, Autumn Toney, and James Dunham, “Identifying AI Research” (Center for Security and Emerging Technology, July 2023), https://doi.org/10.51593/20220030.
4 James Dunham, Jennifer Melot, and Dewey Murdick, “Identifying the Development and Application of Artificial Intelligence in Scientific Text,” arXiv preprint, arXiv:2002.07143 (2020).
classifier; and Schoeberl, Toney, and Dunham describe publication type (e.g., academic journal articles,
the updated classifier used in this analysis. conference papers) were provided where available.
These publication types were disaggregated by
CSET matched each publication in the analytic corpus
affiliation country as described above.
with predictions from a field-of-study model derived
from Microsoft Academic Graph (MAG)’s taxonomy, CSET also provided publication affiliation sector(s)
which yields hierarchical labels describing the where, as in the country attribution analysis, sectors
published research field(s) of study and corresponding were associated with publications through authors’
scores. CSET researchers identified the most common
5
affiliations. Not all affiliations were characterized in
fields of study in our corpus of AI-relevant publications terms of sectors; CSET researchers relied primarily
since 2010 and recorded publications in all other fields on ROR for this purpose, and not all organizations can
as “Other AI.” English-language AI-relevant publications be found in or linked to ROR.6 Where the affiliation
were then tallied by their top-scoring field and sector is available, papers were counted toward these
publication year. sectors, by year.
Updates: The methodology to assign MAG fields of CSET counted cross-sector collaborations as distinct
study was updated from the methodology used in prior pairs of sectors across authors for each publication.
years. Toney and Dunham describe the field of study Collaborations are only counted once: For example,
assignment pipeline used in this analysis; prior years if a publication has two authors with an academic
used the original MAG implementation. affiliation and two with an industry affiliation, it is
counted as a single academic-industry collaboration.
CSET also provided publication counts and year-by-
year citations for AI-relevant work associated with each Patents From CSET’s AI and Robotics Patents
country. A publication is associated with a country Dataset
if it has at least one author whose organizational Source
affiliation(s) is located in that country. If there is CSET’s AI patents dataset was developed by CSET
no observed country, the publication receives an and 1790 Analytics and includes data from The Lens,
“Unknown/Missing” country label. Citation counts 1790 Analytics, and EPO’s PATSTAT. Patents relevant
aren’t available for all publications; those without to the development and application of AI and robotics
counts weren’t included in the citation analysis. Over were identified by their CPC/IPC codes and keywords.
70% of English-language AI papers published between
Methodology
2010 and 2022 have citation data available.
In this analysis, patents were grouped by year and
Additionally, publication counts by year and by country, and then counted at the “patent family”
5 These scores are based on cosine similarities between field-of-study and paper embeddings. See Autumn Toney and James Dunham, “Multi-Label Classification of Scientific Research
Documents Across Domains and Languages,” Proceedings of the Third Workshop on Scholarly Document Processing (Association for Computational Linguistics, 2022): 105–14, https://
aclanthology.org/2022.sdp-1.12/.
level.7 CSET extracted year values from the first 3. All of the landmark publications are
publication date within a family. Countries are assigned aggregated within time periods (e.g., monthly
to patents based on the country or filing office where or yearly) with the national contributions
a patent is first filed (e.g., if a patent is filed with the added up to determine what each country’s
USPTO on January 1, 2020, and then with the German contribution to landmark AI research was
Patenting Office on January 2, 2020, the patent is during each time period.
classified as a patent with U.S. inventors). Note that 8
4. The contributions of different countries are
the same patent may have multiple countries (but not compared over time to identify any trends.
years) attributed to it if the inventors filed their patent
in multiple countries on the same first filing date (e.g.,
if a patent is filed with the USPTO on January 1, 2020,
Epoch Notable Models
and then with the German Patenting Office on January Analysis
1, 2020, the patent is classified as a patent with U.S.
The AI forecasting research group Epoch maintains
inventors and as a patent with German inventors).
a dataset of landmark AI and ML models, along with
Note that patents filed with supranational accompanying information about their creators and
organizations, such as patents filed under WIPO (the publications, such as the list of their (co)authors,
World Intellectual Property Organization), EP (European number of citations, type of AI task accomplished,
Patent Organization), and EA (a special area of Spain and amount of compute used in training.
not included in the European Union), also fall under the
The nationalities of the authors of these papers have
“Rest of World” category.
important implications for geopolitical AI forecasting.
As various research institutions and technology
Ecosystems Graph Analysis companies start producing advanced ML models, the
To track the distribution of AI foundation models by global distribution of future AI development may shift
country, the AI Index team took the following steps: or concentrate in certain places, which in turn affects
the geopolitical landscape because AI is expected
1. A snapshot of the Ecosystems Graph was taken in
to become a crucial component of economic and
early January 2024.
military power in the near future.
2. Authors of foundation models are attributed to
countries based on their affiliation credited on To track the distribution of AI research contributions
the paper/technical documentation associated on landmark publications by country, the Epoch
with the model. For international organizations, dataset is coded according to the following
authors are attributed to the country where the methodology:
organization is headquartered, unless a more
specific location is indicated.
8 In CSET’s data analysis for the 2022 AI Index, we used the most recent publication date for a patent family. This method has the advantage of capturing updates within a patent
family (such as amendments). However, to remain consistent with CSET’s other data products, including the Country Activity Tracker (available at https://cat.eto.tech/), we opted to
use the first filing year instead in this data analysis.
9 The selected cutoff date was September 1, 2015, in accordance with Compute Trends Across Three Eras of Machine Learning (Epoch, 2022).
10 Historic prices were collected from archived snapshots of Amazon Web Services, Microsoft Azure, and Google Cloud Platform price catalogs viewed through the Internet Archive
Wayback Machine.
11 The chosen rental cost rate was the most recent published price for the hardware and cloud vendor used by the developer of the model, at a three-year commitment rental rate, after
subtracting the training duration and two months from the publication date. If this price was not available, the most analogous price was used: the same hardware and vendor at a different
date, otherwise the same hardware from a different cloud vendor. If a three-year commitment rental rate was unavailable, this was imputed from other rental rates based on the empirical
average discount for the given cloud vendor. If the exact hardware type was not available, e.g., “NVIDIA A100 SXM4 40GB,” then a generalization was used, e.g., “NVIDIA A100.”
Environmental Impact
To assess the environmental impact of AI models, the
AI Index team surveyed technical reports of prominent
foundation models to determine whether the model
developers disclosed carbon emissions. The Index
also reviewed papers by researchers that estimated
the carbon footprint of various models. The technical
reports surveyed, as well as the papers estimating the
carbon impact of various models, are included in the
works cited for this chapter.
RLHF
To identify foundation models using RLHF, the AI Index
team reviewed the technical documentation of every
foundation model included in the Ecosystem Graph,
and searched for evidence that RLHF had been used in
the model’s development process. The year in which a
model is said to have used RLHF refers to the year the
model was released.
Works Cited
Agostinelli, A., Denk, T. I., Borsos, Z., Engel, J., Verzetti, M., Caillon, A., Huang, Q., Jansen, A., Roberts, A., Tagliasacchi, M.,
Sharifi, M., Zeghidour, N. & Frank, C. (2023). MusicLM: Generating Music From Text (arXiv:2301.11325). arXiv.
http://arxiv.org/abs/2301.11325.
Ahn, M., Brohan, A., Brown, N., Chebotar, Y., Cortes, O., David, B., Finn, C., Fu, C., Gopalakrishnan, K., Hausman, K., Herzog,
A., Ho, D., Hsu, J., Ibarz, J., Ichter, B., Irpan, A., Jang, E., Ruano, R. J., Jeffrey, K., … Zeng, A. (2022). Do As I Can, Not As I Say:
Grounding Language in Robotic Affordances (arXiv:2204.01691). arXiv. https://doi.org/10.48550/arXiv.2204.01691.
Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., McKinnon, C., Chen, C.,
Olsson, C., Olah, C., Hernandez, D., Drain, D., Ganguli, D., Li, D., Tran-Johnson, E., Perez, E., … Kaplan, J. (2022). Constitutional
AI: Harmlessness From AI Feedback (arXiv:2212.08073). arXiv. https://doi.org/10.48550/arXiv.2212.08073.
Bairi, R., Sonwane, A., Kanade, A., C, V. D., Iyer, A., Parthasarathy, S., Rajamani, S., Ashok, B. & Shet, S. (2023). CodePlan:
Repository-Level Coding Using LLMs and Planning (arXiv:2309.12499). arXiv. https://doi.org/10.48550/arXiv.2309.12499.
Basu, S., Saberi, M., Bhardwaj, S., Chegini, A. M., Massiceti, D., Sanjabi, M., Hu, S. X. & Feizi, S. (2023). EditVal: Benchmarking
Diffusion Based Text-Guided Image Editing Methods (arXiv:2310.02426). arXiv. http://arxiv.org/abs/2310.02426.
Besta, M., Blach, N., Kubicek, A., Gerstenberger, R., Podstawski, M., Gianinazzi, L., Gajda, J., Lehmann, T., Niewiadomski,
H., Nyczyk, P. & Hoefler, T. (2024). Graph of Thoughts: Solving Elaborate Problems with Large Language Models
(arXiv:2308.09687). arXiv. http://arxiv.org/abs/2308.09687.
Bitton, Y., Bansal, H., Hessel, J., Shao, R., Zhu, W., Awadalla, A., Gardner, J., Taori, R. & Schmidt, L. (2023). VisIT-Bench:
A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use (arXiv:2308.06595). arXiv.
http://arxiv.org/abs/2308.06595.
Blattmann, A., Rombach, R., Ling, H., Dockhorn, T., Kim, S. W., Fidler, S. & Kreis, K. (2023). Align Your Latents: High-Resolution
Video Synthesis With Latent Diffusion Models (arXiv:2304.08818). arXiv. http://arxiv.org/abs/2304.08818.
Brohan, A., Brown, N., Carbajal, J., Chebotar, Y., Chen, X., Choromanski, K., Ding, T., Driess, D., Dubey, A., Finn, C.,
Florence, P., Fu, C., Arenas, M. G., Gopalakrishnan, K., Han, K., Hausman, K., Herzog, A., Hsu, J., Ichter, B., … Zitkovich, B.
(2023). RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control. (arXiv:2307.15818). arXiv.
https://arxiv.org/abs/2307.15818.
Castaño, J., Martínez-Fernández, S., Franch, X. & Bogner, J. (2023). Exploring the Carbon Footprint of Hugging Face’s ML
Models: A Repository Mining Study. 2023 ACM/IEEE International Symposium on Empirical Software Engineering and
Measurement (ESEM), 1–12. https://doi.org/10.1109/ESEM56168.2023.10304801.
Chen, L., Chen, Z., Zhang, Y., Liu, Y., Osman, A. I., Farghali, M., Hua, J., Al-Fatesh, A., Ihara, I., Rooney, D. W. & Yap, P.-S. (2023).
“Artificial Intelligence-Based Solutions for Climate Change: A Review.” Environmental Chemistry Letters 21, no. 5: 2525–57.
https://doi.org/10.1007/s10311-023-01617-y.
Chen, L., Zaharia, M. & Zou, J. (2023). How Is ChatGPT’s Behavior Changing Over Time? (arXiv:2307.09009). arXiv.
http://arxiv.org/abs/2307.09009.
Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. de O., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A.,
Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., Gray, S., … Zaremba, W. (2021). Evaluating Large
Language Models Trained on Code (arXiv:2107.03374; Version 2). arXiv. https://doi.org/10.48550/arXiv.2107.03374.
Christiano, P., Leike, J., Brown, T. B., Martic, M., Legg, S. & Amodei, D. (2023). Deep Reinforcement Learning From Human
Preferences (arXiv:1706.03741). arXiv. https://doi.org/10.48550/arXiv.1706.03741.
Cobbe, K., Kosaraju, V., Bavarian, M., Chen, M., Jun, H., Kaiser, L., Plappert, M., Tworek, J., Hilton, J., Nakano, R., Hesse, C. &
Schulman, J. (2021). Training Verifiers to Solve Math Word Problems (arXiv:2110.14168). arXiv. http://arxiv.org/abs/2110.14168.
Copet, J., Kreuk, F., Gat, I., Remez, T., Kant, D., Synnaeve, G., Adi, Y. & Défossez, A. (2024). Simple and Controllable Music
Generation (arXiv:2306.05284). arXiv. https://doi.org/10.48550/arXiv.2306.05284.
Dettmers, T., Pagnoni, A., Holtzman, A. & Zettlemoyer, L. (2023). QLoRA: Efficient Finetuning of Quantized LLMs
(arXiv:2305.14314). arXiv. http://arxiv.org/abs/2305.14314.
Driess, D., Xia, F., Sajjadi, M. S. M., Lynch, C., Chowdhery, A., Ichter, B., Wahid, A., Tompson, J., Vuong, Q., Yu, T., Huang,
W., Chebotar, Y., Sermanet, P., Duckworth, D., Levine, S., Vanhoucke, V., Hausman, K., Toussaint, M., Greff, K., … Florence, P.
(2023). PaLM-E: An Embodied Multimodal Language Model (arXiv:2303.03378). arXiv. http://arxiv.org/abs/2303.03378.
Gandhi, K., Fränken, J.-P., Gerstenberg, T. & Goodman, N. D. (2023). Understanding Social Reasoning in Language Models
With Language Models (arXiv:2306.15448). arXiv. http://arxiv.org/abs/2306.15448.
Gemini Team: Anil, R., Borgeaud, S., Wu, Y., Alayrac, J.-B., Yu, J., Soricut, R., Schalkwyk, J., Dai, A. M., Hauth, A., Millican, K.,
Silver, D., Petrov, S., Johnson, M., Antonoglou, I., Schrittwieser, J., Glaese, A., Chen, J., Pitler, E., … Vinyals, O. (2023). Gemini:
A Family of Highly Capable Multimodal Models (arXiv:2312.11805). arXiv. http://arxiv.org/abs/2312.11805.
Girdhar, R., Singh, M., Brown, A., Duval, Q., Azadi, S., Rambhatla, S. S., Shah, A., Yin, X., Parikh, D. & Misra, I. (2023). Emu Video:
Factorizing Text-to-Video Generation by Explicit Image Conditioning (arXiv:2311.10709). arXiv. http://arxiv.org/abs/2311.10709.
Guha, N., Nyarko, J., Ho, D. E., Ré, C., Chilton, A., Narayana, A., Chohlas-Wood, A., Peters, A., Waldon, B., Rockmore, D. N.,
Zambrano, D., Talisman, D., Hoque, E., Surani, F., Fagan, F., Sarfaty, G., Dickinson, G. M., Porat, H., Hegland, J., … Li, Z. (2023).
LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models (arXiv:2308.11462).
arXiv. http://arxiv.org/abs/2308.11462.
Haque, A., Tancik, M., Efros, A. A., Holynski, A. & Kanazawa, A. (2023). Instruct-NeRF2NeRF: Editing 3D Scenes With
Instructions (arXiv:2303.12789). arXiv. http://arxiv.org/abs/2303.12789.
Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D. & Steinhardt, J. (2021). Measuring Massive Multitask
Language Understanding (arXiv:2009.03300). arXiv. http://arxiv.org/abs/2009.03300.
Hendrycks, D., Burns, C., Kadavath, S., Arora, A., Basart, S., Tang, E., Song, D. & Steinhardt, J. (2021). Measuring Mathematical
Problem Solving With the MATH Dataset (arXiv:2103.03874). arXiv. http://arxiv.org/abs/2103.03874.
Hu, W., Fey, M., Zitnik, M., Dong, Y., Ren, H., Liu, B., Catasta, M. & Leskovec, J. (2021). Open Graph Benchmark:
Datasets for Machine Learning on Graphs (arXiv:2005.00687). arXiv. https://doi.org/10.48550/arXiv.2005.00687.
Huang, J., Chen, X., Mishra, S., Zheng, H. S., Yu, A. W., Song, X. & Zhou, D. (2024). Large Language Models Cannot
Self-Correct Reasoning Yet (arXiv:2310.01798). arXiv. http://arxiv.org/abs/2310.01798.
Huang, Q., Vora, J., Liang, P. & Leskovec, J. (2023). Benchmarking Large Language Models as AI Research Agents
(arXiv:2310.03302). arXiv. http://arxiv.org/abs/2310.03302.
Jimenez, C. E., Yang, J., Wettig, A., Yao, S., Pei, K., Press, O. & Narasimhan, K. (2023). SWE-bench: Can Language Models
Resolve Real-World GitHub Issues? (arXiv:2310.06770). arXiv. https://doi.org/10.48550/arXiv.2310.06770.
Jin, D., Pan, E., Oufattole, N., Weng, W.-H., Fang, H. & Szolovits, P. (2020). What Disease Does This Patient Have?
A Large-Scale Open Domain Question Answering Dataset From Medical Exams (arXiv:2009.13081). arXiv.
http://arxiv.org/abs/2009.13081.
Kıcıman, E., Ness, R., Sharma, A. & Tan, C. (2023). Causal Reasoning and Large Language Models: Opening a New Frontier
for Causality (arXiv:2305.00050). arXiv. http://arxiv.org/abs/2305.00050.
Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A. C., Lo, W.-Y., Dollár, P. &
Girshick, R. (2023). Segment Anything (arXiv:2304.02643). arXiv. http://arxiv.org/abs/2304.02643.
Kočiský, T., Schwarz, J., Blunsom, P., Dyer, C., Hermann, K. M., Melis, G. & Grefenstette, E. (2018). “The NarrativeQA
Reading Comprehension Challenge.” Transactions of the Association for Computational Linguistics 6: 317–28.
https://doi.org/10.1162/tacl_a_00023.
Krizhevsky, A. (2009). Learning Multiple Layers of Features From Tiny Images. https://www.semanticscholar.org/paper/
Learning-Multiple-Layers-of-Features-from-Tiny-Krizhevsky/5d90f06bb70a0a3dced62413346235c02b1aa086.
Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K.,
Toutanova, K., Jones, L., Kelcey, M., Chang, M.-W., Dai, A. M., Uszkoreit, J., Le, Q. & Petrov, S. (2019). “Natural Questions:
A Benchmark for Question Answering Research.” Transactions of the Association for Computational Linguistics 7: 452–66.
https://doi.org/10.1162/tacl_a_00276.
Lee, H., Phatale, S., Mansoor, H., Mesnard, T., Ferret, J., Lu, K., Bishop, C., Hall, E., Carbune, V., Rastogi, A. & Prakash, S.
(2023). RLAIF: Scaling Reinforcement Learning From Human Feedback With AI Feedback (arXiv:2309.00267). arXiv.
http://arxiv.org/abs/2309.00267.
Lee, T., Yasunaga, M., Meng, C., Mai, Y., Park, J. S., Gupta, A., Zhang, Y., Narayanan, D., Teufel, H. B., Bellagente, M., Kang, M.,
Park, T., Leskovec, J., Zhu, J.-Y., Fei-Fei, L., Wu, J., Ermon, S. & Liang, P. (2023). Holistic Evaluation of Text-to-Image Models
(arXiv:2311.04287). arXiv. https://doi.org/10.48550/arXiv.2311.04287.
Li, J., Cheng, X., Zhao, W. X., Nie, J.-Y. & Wen, J.-R. (2023). HaluEval: A Large-Scale Hallucination Evaluation Benchmark
for Large Language Models (arXiv:2305.11747). arXiv. https://doi.org/10.48550/arXiv.2305.11747.
Liang, P., Bommasani, R., Lee, T., Tsipras, D., Soylu, D., Yasunaga, M., Zhang, Y., Narayanan, D., Wu, Y., Kumar, A., Newman,
B., Yuan, B., Yan, B., Zhang, C., Cosgrove, C., Manning, C. D., Ré, C., Acosta-Navas, D., Hudson, D. A., … Koreeda, Y. (2023).
Holistic Evaluation of Language Models (arXiv:2211.09110). arXiv. https://doi.org/10.48550/arXiv.2211.09110.
Lin, S., Hilton, J. & Evans, O. (2022). TruthfulQA: Measuring How Models Mimic Human Falsehoods (arXiv:2109.07958). arXiv.
https://doi.org/10.48550/arXiv.2109.07958.
Liu, X., Yu, H., Zhang, H., Xu, Y., Lei, X., Lai, H., Gu, Y., Ding, H., Men, K., Yang, K., Zhang, S., Deng, X., Zeng, A., Du, Z., Zhang,
C., Shen, S., Zhang, T., Su, Y., Sun, H., … Tang, J. (2023). AgentBench: Evaluating LLMs as Agents (arXiv:2308.03688). arXiv.
https://doi.org/10.48550/arXiv.2308.03688.
Luccioni, A. S., Jernite, Y. & Strubell, E. (2023). Power Hungry Processing: Watts Driving the Cost of AI Deployment?
(arXiv:2311.16863). arXiv. http://arxiv.org/abs/2311.16863.
Luo, J., Paduraru, C., Voicu, O., Chervonyi, Y., Munns, S., Li, J., Qian, C., Dutta, P., Davis, J. Q., Wu, N., Yang, X., Chang, C.-
M., Li, T., Rose, R., Fan, M., Nakhost, H., Liu, T., Kirkman, B., Altamura, F., … Mankowitz, D. J. (2022). Controlling Commercial
Cooling Systems Using Reinforcement Learning (arXiv:2211.07357). arXiv. https://doi.org/10.48550/arXiv.2211.07357.
Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y. & Potts, C. (2011). “Learning Word Vectors for Sentiment Analysis.” In D.
Lin, Y. Matsumoto & R. Mihalcea, eds., Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:
Human Language Technologies: 142–50. Association for Computational Linguistics. https://aclanthology.org/P11-1015.
Melas-Kyriazi, L., Rupprecht, C., Laina, I. & Vedaldi, A. (2023). RealFusion: 360° Reconstruction of Any Object From a Single
Image (arXiv:2302.10663). arXiv. http://arxiv.org/abs/2302.10663.
Mihaylov, T., Clark, P., Khot, T. & Sabharwal, A. (2018). “Can a Suit of Armor Conduct Electricity? A New Dataset for
Open Book Question Answering.” In E. Riloff, D. Chiang, J. Hockenmaier & J. Tsujii, eds., Proceedings of the 2018
Conference on Empirical Methods in Natural Language Processing: 2381–91. Association for Computational Linguistics.
https://doi.org/10.18653/v1/D18-1260.
Mirchandani, S., Xia, F., Florence, P., Ichter, B., Driess, D., Arenas, M. G., Rao, K., Sadigh, D. & Zeng, A. (2023). Large Language
Models as General Pattern Machines (arXiv:2307.04721). arXiv. https://doi.org/10.48550/arXiv.2307.04721.
Mitchell, M., Palmarini, A. B. & Moskvichev, A. (2023). Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning
Tasks (arXiv:2311.09247). arXiv. http://arxiv.org/abs/2311.09247.
Mokady, R., Hertz, A., Aberman, K., Pritch, Y. & Cohen-Or, D. (2022). Null-Text Inversion for Editing Real Images Using Guided
Diffusion Models (arXiv:2211.09794). arXiv. https://doi.org/10.48550/arXiv.2211.09794.
Mooij, J. M., Peters, J., Janzing, D., Zscheischler, J. & Schölkopf, B. (2016). “Distinguishing Cause From Effect Using
Observational Data: Methods and Benchmarks.” The Journal of Machine Learning Research 17, no. 1: 1103–1204.
Nie, A., Zhang, Y., Amdekar, A., Piech, C., Hashimoto, T. & Gerstenberg, T. (2023). MoCa: Measuring Human-Language Model
Alignment on Causal and Moral Judgment Tasks (arXiv:2310.19677). arXiv. http://arxiv.org/abs/2310.19677.
Olabi, A. G., Abdelghafar, A. A., Maghrabie, H. M., Sayed, E. T., Rezk, H., Radi, M. A., Obaideen, K. & Abdelkareem, M. A.
(2023). “Application of Artificial Intelligence for Prediction, Optimization, and Control of Thermal Energy Storage Systems.”
Thermal Science and Engineering Progress, 39: 101730. https://doi.org/10.1016/j.tsep.2023.101730.
OpenAI, Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S.,
Anadkat, S., Avila, R., Babuschkin, I., Balaji, S., Balcom, V., Baltescu, P., Bao, H., Bavarian, M., Belgum, J., … Zoph, B. (2024).
GPT-4 Technical Report (arXiv:2303.08774). arXiv. https://doi.org/10.48550/arXiv.2303.08774.
Rafailov, R., Sharma, A., Mitchell, E., Ermon, S., Manning, C. D. & Finn, C. (2023). Direct Preference Optimization: Your
Language Model Is Secretly a Reward Model (arXiv:2305.18290). arXiv. http://arxiv.org/abs/2305.18290.
Rein, D., Hou, B. L., Stickland, A. C., Petty, J., Pang, R. Y., Dirani, J., Michael, J. & Bowman, S. R. (2023). GPQA: A Graduate-
Level Google-Proof Q&A Benchmark (arXiv:2311.12022). arXiv. http://arxiv.org/abs/2311.12022.
Rustia, D. J. A., Chiu, L.-Y., Lu, C.-Y., Wu, Y.-F., Chen, S.-K., Chung, J.-Y., Hsu, J.-C. & Lin, T.-T. (2022). “Towards Intelligent and
Integrated Pest Management Through an AIoT-Based Monitoring System.” Pest Management Science 78, no. 10: 4288–4302.
https://doi.org/10.1002/ps.7048.
Schaeffer, R., Miranda, B. & Koyejo, S. (2023). Are Emergent Abilities of Large Language Models a Mirage? (arXiv:2304.15004).
arXiv. http://arxiv.org/abs/2304.15004.
Schneider, F., Kamal, O., Jin, Z. & Schölkopf, B. (2023). Moûusai: Text-to-Music Generation With Long-Context Latent Diffusion
(arXiv:2301.11757). arXiv. https://doi.org/10.48550/arXiv.2301.11757.
Shams, S. R., Jahani, A., Kalantary, S., Moeinaddini, M. & Khorasani, N. (2021). “Artificial Intelligence Accuracy Assessment in NO2
Concentration Forecasting of Metropolises Air.” Scientific Reports 11, no. 1: 1805. https://doi.org/10.1038/s41598-021-81455-6.
Shi, Y., Wang, P., Ye, J., Long, M., Li, K. & Yang, X. (2024). MVDream: Multi-View Diffusion for 3D Generation (arXiv:2308.16512).
arXiv. http://arxiv.org/abs/2308.16512.
Soomro, K., Zamir, A. R. & Shah, M. (2012). UCF101: A Dataset of 101 Human Actions Classes From Videos in the Wild
(arXiv:1212.0402; Version 1). arXiv. http://arxiv.org/abs/1212.0402.
Stone, A., Xiao, T., Lu, Y., Gopalakrishnan, K., Lee, K.-H., Vuong, Q., Wohlhart, P., Kirmani, S., Zitkovich, B., Xia, F., Finn, C. &
Hausman, K. (2023). Open-World Object Manipulation Using Pre-trained Vision-Language Models (arXiv:2303.00905). arXiv.
https://doi.org/10.48550/arXiv.2303.00905.
Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., Bikel, D.,
Blecher, L., Ferrer, C. C., Chen, M., Cucurull, G., Esiobu, D., Fernandes, J., Fu, J., Fu, W., … Scialom, T. (2023). Llama 2: Open
Foundation and Fine-Tuned Chat Models (arXiv:2307.09288). arXiv. https://doi.org/10.48550/arXiv.2307.09288.
Valmeekam, K., Marquez, M., Olmo, A., Sreedharan, S. & Kambhampati, S. (2023). PlanBench: An Extensible Benchmark
for Evaluating Large Language Models on Planning and Reasoning About Change. Thirty-Seventh Conference on Neural
Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/forum?id=YXogl4uQUO.
Voynov, O., Bobrovskikh, G., Karpyshev, P., Galochkin, S., Ardelean, A.-T., Bozhcnko, A., Karmanova, E., Kopanev, P., Labutin-
Rymsho, Y., Rakhimov, R., Safin, A., Serpiva, V., Artemov, A., Burnaev, E., Tsetserukou, D. & Zorin, D. (2023). Multi-sensor Large-
Scale Dataset for Multi-view 3D Reconstruction. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition
(CVPR), 21392–403. https://doi.org/10.1109/CVPR52729.2023.02049.
Walker, C. M. & Gopnik, A. (2014). “Toddlers Infer Higher-Order Relational Principles in Causal Learning.” Psychological Science
25, no. 1: 161–69.
Wang, G., Xie, Y., Jiang, Y., Mandlekar, A., Xiao, C., Zhu, Y., Fan, L. & Anandkumar, A. (2023). Voyager: An Open-Ended
Embodied Agent With Large Language Models (arXiv:2305.16291). arXiv. http://arxiv.org/abs/2305.16291.
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q. & Zhou, D. (2023). Chain-of-Thought Prompting
Elicits Reasoning in Large Language Models (arXiv:2201.11903). arXiv. https://doi.org/10.48550/arXiv.2201.11903.
Xiao, T., Chan, H., Sermanet, P., Wahid, A., Brohan, A., Hausman, K., Levine, S. & Tompson, J. (2023). Robotic Skill Acquisition
via Instruction Augmentation With Vision-Language Models (arXiv:2211.11736). arXiv. https://doi.org/10.48550/arXiv.2211.11736.
Yang, C., Wang, X., Lu, Y., Liu, H., Le, Q. V., Zhou, D. & Chen, X. (2023). Large Language Models as Optimizers
(arXiv:2309.03409). arXiv. http://arxiv.org/abs/2309.03409.
Yang, D., Tian, J., Tan, X., Huang, R., Liu, S., Chang, X., Shi, J., Zhao, S., Bian, J., Wu, X., Zhao, Z., Watanabe, S. & Meng, H.
(2023). UniAudio: An Audio Foundation Model Toward Universal Audio Generation (arXiv:2310.00704). arXiv.
http://arxiv.org/abs/2310.00704.
Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T. L., Cao, Y. & Narasimhan, K. (2023). Tree of Thoughts: Deliberate Problem
Solving With Large Language Models (arXiv:2305.10601). arXiv. http://arxiv.org/abs/2305.10601.
Zellers, R., Bisk, Y., Farhadi, A. & Choi, Y. (2019). From Recognition to Cognition: Visual Commonsense Reasoning
(arXiv:1811.10830). arXiv. http://arxiv.org/abs/1811.10830.
Zhang, L., Rao, A. & Agrawala, M. (2023). Adding Conditional Control to Text-to-Image Diffusion Models (arXiv:2302.05543).
arXiv. http://arxiv.org/abs/2302.05543.
Zhang, Z., Han, L., Ghosh, A., Metaxas, D. & Ren, J. (2022). SINE: SINgle Image Editing With Text-to-Image Diffusion Models
(arXiv:2212.04489). arXiv. https://doi.org/10.48550/arXiv.2212.04489.
Chapter 3: Responsible AI
Acknowledgments
The AI Index would like to acknowledge Amelia Hardy The keywords searched include:
for her work contributing as a research assistant to
visualizations and supplementary analysis for this Fairness and bias: algorithmic fairness, bias detection,
chapter, and Andrew Shi for his work spearheading bias mitigation, discrimination, equity in AI, ethical
the analysis of responsible AI-related conference algorithm design, fair data practices, fair ML, fairness
submissions. The AI Index also acknowledges that and bias, group fairness, individual fairness, justice,
the Global State of Responsible AI analysis was non-discrimination, representational fairness, unfair,
done in collaboration with Accenture. The AI Index unfairness.
specifically wants to highlight the contributions of Arnab
Privacy and data governance: anonymity,
Chakraborty, Patrick Connolly, Jakub Wiatrak, Ray Eitel-
confidentiality, data breach, data ethics, data
Porter, Dikshita Venkatesh, and Shekhar Tewari to the
governance, data integrity, data privacy, data
data collection and analysis.
protection, data transparency, differential privacy,
inference privacy, machine unlearning, privacy by
Conference Submissions design, privacy-preserving, secure data storage,
Analysis trustworthy data curation.
For the analysis on responsible AI-related conference Security: adversarial attack, adversarial learning, AI
submissions, the AI Index examined the number of incident, attacks, audits, cybersecurity, ethical hacking,
responsible AI–related academic submissions at the forensic analysis, fraud detection, red teaming, safety,
following conferences: AAAI, AIES, FAccT, ICML, security, security ethics, threat detection, vulnerability
ICLR, and NeurIPS. Specifically, the team scraped the assessment.
conference websites or repositories of conference
submissions for papers containing relevant keywords Transparency and explainability: algorithmic
indicating they could fall into a particular responsible transparency, audit, auditing, causal reasoning,
AI category. The papers were then manually verified causality, explainability, explainable AI, explainable
by a human team to confirm their categorization. It is models, human-understandable decisions,
possible that a single paper could belong to multiple interpretability, interpretable models, model
responsible AI categories. explainability, outcome explanation, transparency, xAI.
Global Responsible State of Below is the full list of measures respondents were
Works Cited
Agarwal, A. & Agarwal, H. (2023). “A Seven-Layer Model With Checklists for Standardising Fairness Assessment Throughout
the AI Lifecycle.” AI Ethics. https://doi.org/10.1007/s43681-023-00266-9.
Alawida, M., Abu Shawar, B., Abiodun, O. I., Mehmood, A., Omolara, A. E. & Al Hwaitat, A. K. (2024). “Unveiling the
Dark Side of ChatGPT: Exploring Cyberattacks and Enhancing User Awareness.” Information 15, no. 1: 27.
https://doi.org/10.3390/info15010027.
Andreotta, A. J., Kirkham, N. & Rizzi, M. (2022). “AI, Big Data, and the Future of Consent.” AI & Society 37, no. 4: 1715–28.
https://doi.org/10.1007/s00146-021-01262-5.
Arous, A., Guesmi, A., Hanif, M. A., Alouani, I. & Shafique, M. (2023). Exploring Machine Learning Privacy/Utility Trade-Off
From a Hyperparameters Lens (arXiv:2303.01819). arXiv. http://arxiv.org/abs/2303.01819.
Balasubramaniam, N., Kauppinen, M., Rannisto, A., Hiekkanen, K. & Kujala, S. (2023). “Transparency and Explainability
of AI Systems: From Ethical Guidelines to Requirements.” Information and Software Technology 159: 107197.
https://doi.org/10.1016/j.infsof.2023.107197.
Bommasani, R., Klyman, K., Longpre, S., Kapoor, S., Maslej, N., Xiong, B., Zhang, D. & Liang, P. (2023). The Foundation Model
Transparency Index (arXiv:2310.12941). arXiv. https://doi.org/10.48550/arXiv.2310.12941.
Dhamala, J., Sun, T., Kumar, V., Krishna, S., Pruksachatkun, Y., Chang, K.-W. & Gupta, R. (2021). “BOLD: Dataset and Metrics
for Measuring Biases in Open-Ended Language Generation.” Proceedings of the 2021 ACM Conference on Fairness,
Accountability, and Transparency, 862–72. https://doi.org/10.1145/3442188.3445924.
Durmus, E., Nyugen, K., Liao, T. I., Schiefer, N., Askell, A., Bakhtin, A., Chen, C., Hatfield-Dodds, Z., Hernandez, D.,
Joseph, N., Lovitt, L., McCandlish, S., Sikder, O., Tamkin, A., Thamkul, J., Kaplan, J., Clark, J. & Ganguli, D. (2023).
Towards Measuring the Representation of Subjective Global Opinions in Language Models (arXiv:2306.16388). arXiv.
https://doi.org/10.48550/arXiv.2306.16388.
Ganguli, D., Lovitt, L., Kernion, J., Askell, A., Bai, Y., Kadavath, S., Mann, B., Perez, E., Schiefer, N., Ndousse, K., Jones, A.,
Bowman, S., Chen, A., Conerly, T., DasSarma, N., Drain, D., Elhage, N., El-Showk, S., Fort, S., Clark, J. (2022).
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned (arXiv:2209.07858).
arXiv. http://arxiv.org/abs/2209.07858.
Gehman, S., Gururangan, S., Sap, M., Choi, Y. & Smith, N. A. (2020). RealToxicityPrompts: Evaluating Neural Toxic Degeneration
in Language Models (arXiv:2009.11462). arXiv. https://doi.org/10.48550/arXiv.2009.11462.
Grinbaum, A. & Adomaitis, L. (2024). “Dual Use Concerns of Generative AI and Large Language Models.”
Journal of Responsible Innovation 11, no. 1. https://doi.org/10.1080/23299460.2024.2304381.
Hartvigsen, T., Gabriel, S., Palangi, H., Sap, M., Ray, D. & Kamar, E. (2022). ToxiGen: A Large-Scale Machine-Generated
Dataset for Adversarial and Implicit Hate Speech Detection (arXiv:2203.09509v4). arXiv. http://arxiv.org/abs/2203.09509.
Ippolito, D., Tramèr, F., Nasr, M., Zhang, C., Jagielski, M., Lee, K., Choquette-Choo, C. A. & Carlini, N. (2023).
Preventing Verbatim Memorization in Language Models Gives a False Sense of Privacy (arXiv:2210.17546v3). arXiv.
https://doi.org/10.48550/arXiv.2210.17546.
Janssen, M., Brous, P., Estevez, E., Barbosa, L. S. & Janowski, T. (2020). “Data Governance: Organizing Data for Trustworthy
Artificial Intelligence.” Government Information Quarterly 37, no. 3: 101493. https://doi.org/10.1016/j.giq.2020.101493.
Li, B., Sun, J. & Poskitt, C. M. (2023). How Generalizable Are Deepfake Detectors? An Empirical Study (arXiv:2308.04177). arXiv.
http://arxiv.org/abs/2308.04177.
Masood, M., Nawaz, M., Malik, K. M., Javed, A., Irtaza, A. & Malik, H. (2023). “Deepfakes Generation and Detection:
State-of-the-Art, Open Challenges, Countermeasures, and Way Forward.” Applied Intelligence 53, no. 4: 3974–4026.
https://doi.org/10.1007/s10489-022-03766-z.
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K. & Galstyan, A. (2022). “A Survey on Bias and Fairness in Machine Learning.”
ACM Computing Surveys 54, no. 6: 1–35. https://doi.org/10.1145/3457607.
Morreale, F., Bahmanteymouri, E., Burmester, B., Chen, A. & Thorp, M. (2023). “The Unwitting Labourer: Extracting Humanness
in AI Training.” AI & Society. https://doi.org/10.1007/s00146-023-01692-3.
Motoki, F., Pinho Neto, V. & Rodrigues, V. (2024). “More Human Than Human: Measuring ChatGPT Political Bias.”
Public Choice 198, no. 1: 3–23. https://doi.org/10.1007/s11127-023-01097-2.
Nasr, M., Carlini, N., Hayase, J., Jagielski, M., Cooper, A. F., Ippolito, D., Choquette-Choo, C. A., Wallace, E., Tramèr,
F. & Lee, K. (2023). Scalable Extraction of Training Data From (Production) Language Models (arXiv:2311.17035). arXiv.
https://doi.org/10.48550/arXiv.2311.17035.
Omiye, J. A., Lester, J. C., Spichak, S., Rotemberg, V. & Daneshjou, R. (2023). “Large Language Models Propagate Race-Based
Medicine.” npj Digital Medicine 6, no. 1: 1–4. https://doi.org/10.1038/s41746-023-00939-z.
P, D., Simoes, S. & MacCarthaigh, M. (2023). “AI and Core Electoral Processes: Mapping the Horizons.” AI Magazine 44, no. 3:
218–39. https://doi.org/10.1002/aaai.12105.
Pan, A., Chan, J. S., Zou, A., Li, N., Basart, S., Woodside, T., Ng, J., Zhang, H., Emmons, S. & Hendrycks, D. (2023).
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI
Benchmark (arXiv:2304.03279). arXiv. https://doi.org/10.48550/arXiv.2304.03279.
Parrish, A., Chen, A., Nangia, N., Padmakumar, V., Phang, J., Thompson, J., Htut, P. M. & Bowman, S. R. (2022). BBQ:
A Hand-Built Bias Benchmark for Question Answering (arXiv:2110.08193). arXiv. https://doi.org/10.48550/arXiv.2110.08193.
Pessach, D. & Shmueli, E. (2023). “Algorithmic Fairness.” In L. Rokach, O. Maimon & E. Shmueli (eds.), Machine Learning for Data
Science Handbook: Data Mining and Knowledge Discovery Handbook: 867–86. https://doi.org/10.1007/978-3-031-24628-9_37.
Petrov, A., La Malfa, E., Torr, P. H. S. & Bibi, A. (2023). Language Model Tokenizers Introduce Unfairness Between Languages
(arXiv:2305.15425). arXiv. https://doi.org/10.48550/arXiv.2305.15425.
Rudin, C. (2019). “Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models
Instead.” Nature Machine Intelligence 1, no. 5: 206–15. https://doi.org/10.1038/s42256-019-0048-x.
Senavirathne, N. & Torra, V. (2020). “On the Role of Data Anonymization in Machine Learning Privacy.” 2020 IEEE 19th
International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom): 664–75.
https://doi.org/10.1109/TrustCom50675.2020.00093.
Sheth, A., Roy, K. & Gaur, M. (2023). Neurosymbolic AI — Why, What, and How (arXiv:2305.00813). arXiv.
https://doi.org/10.48550/arXiv.2305.00813.
Shevlane, T., Farquhar, S., Garfinkel, B., Phuong, M., Whittlestone, J., Leung, J., Kokotajlo, D., Marchal, N., Anderljung, M., Kolt,
N., Ho, L., Siddarth, D., Avin, S., Hawkins, W., Kim, B., Gabriel, I., Bolina, V., Clark, J., Bengio, Y., … Dafoe, A. (2023). Model
Evaluation for Extreme Risks (arXiv:2305.15324). arXiv. http://arxiv.org/abs/2305.15324.
Steinke, T., Nasr, M. & Jagielski, M. (2023). Privacy Auditing With One (1) Training Run (arXiv:2305.08846). arXiv.
https://doi.org/10.48550/arXiv.2305.08846.
Sun, X., Yang, D., Li, X., Zhang, T., Meng, Y., Qiu, H., Wang, G., Hovy, E. & Li, J. (2021). Interpreting Deep Learning Models in
Natural Language Processing: A Review (arXiv:2110.10470). arXiv. http://arxiv.org/abs/2110.10470.
Trinh, L. & Liu, Y. (2021). An Examination of Fairness of AI Models for Deepfake Detection (arXiv:2105.00558). arXiv.
https://doi.org/10.48550/arXiv.2105.00558.
Wang, B., Chen, W., Pei, H., Xie, C., Kang, M., Zhang, C., Xu, C., Xiong, Z., Dutta, R., Schaeffer, R., Truong, S. T., Arora, S.,
Mazeika, M., Hendrycks, D., Lin, Z., Cheng, Y., Koyejo, S., Song, D. & Li, B. (2024). DecodingTrust: A Comprehensive Assessment
of Trustworthiness in GPT Models (arXiv:2306.11698). arXiv. https://doi.org/10.48550/arXiv.2306.11698.
Wang, W., Bai, H., Huang, J., Wan, Y., Yuan, Y., Qiu, H., Peng, N. & Lyu, M. R. (2024). New Job, New Gender?
Measuring the Social Bias in Image Generation Models (arXiv:2401.00763). arXiv. http://arxiv.org/abs/2401.00763.
Wang, Y., Li, H., Han, X., Nakov, P. & Baldwin, T. (2023). Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs
(arXiv:2308.13387). arXiv. http://arxiv.org/abs/2308.13387.
Zou, A., Wang, Z., Carlini, N., Nasr, M., Kolter, J. Z. & Fredrikson, M. (2023). Universal and Transferable Adversarial Attacks on
Aligned Language Models (arXiv:2307.15043). arXiv. http://arxiv.org/abs/2307.15043.
Chapter 4: Economy
Acknowledgments removes duplicates, and extracts data from job
postings text. This includes information on job title,
The AI Index would like to acknowledge James da Costa
employer, industry, and region, as well as required
for his work collecting information on significant AI-
experience, education, and skills.
related economic events, and Emma Williamson for her
work collecting data from the Stack Overflow survey. Job postings are useful for understanding trends in
the labor market because they allow for a detailed,
International Federation of real-time look at the skills employers seek. To assess
Lightcast delivers job market analytics that empower on comparisons between JOLTS and Lightcast, the
employers, workers, and educators to make data- labor market demand captured by Lightcast data
driven decisions. The company’s artificial intelligence represents over 99% of the total labor demand. Jobs
technology analyzes hundreds of millions of job not posted online are usually in small businesses (the
postings and real-life career transitions to provide classic example being the “Help Wanted” sign in the
insight into labor market patterns. This real-time restaurant window) and union hiring halls.
To support these analyses, Lightcast mined its dataset specifically, for the purposes of this report, all skills
of millions of job postings collected since 2010. below were considered AI skills. A job posting was
Lightcast collects postings from over 51,000 online job considered an AI job if it mentioned any of these skills
Artificial Intelligence: AI/ML Inference, AIOps Natural Language Understanding, Optical Character
(Artificial Intelligence for IT Operations), Applications Recognition (OCR), Screen Reader, Semantic
of Artificial Intelligence, Artificial General Intelligence, Analysis, Semantic Parsing, Semantic Search,
Artificial Intelligence, Artificial Intelligence Sentiment Analysis, Seq2Seq, Speech Recognition,
Development, Artificial Intelligence Markup Language Speech Recognition Software, Speech Synthesis,
(AIML), Artificial Intelligence Systems, Azure Cognitive Statistical Language Acquisition, Text Mining, Text-
Services, Baidu, Cognitive Automation, Cognitive to-Speech, Tokenization, Voice Assistant Technology,
Computing, Computational Intelligence, Cortana, Voice Interaction, Voice User Interface, Word
Ethical AI, Expert Systems, Explainable AI (XAI), Embedding, Word2Vec Models, fastText
IPSoft Amelia, Intelligent Control, Intelligent Systems,
Neural Networks: Apache MXNet, Artificial
Interactive Kiosk, Knowledge Engineering, Knowledge-
Neural Networks, Autoencoders, Caffe2, Chainer
Based Configuration, Knowledge-Based Systems,
(Deep Learning Framework), Convolutional Neural
Multi-agent Systems, Open Neural Network Exchange
Networks, Cudnn, Deep Learning, Deep Learning
(ONNX), OpenAI Gym, Operationalizing AI, Reasoning
Methods, Deeplearning4j, Evolutionary Acquisition
Systems, Watson Conversation, Watson Studio, Weka
of Neural Topologies, Fast.ai, Keras (Neural Network
Autonomous Driving: Advanced Driver Assistance Library), Long Short-Term Memory (LSTM),
Systems, Autonomous Cruise Control Systems, OpenVINO, PaddlePaddle, Recurrent Neural Network
Autonomous System, Autonomous Vehicles, Guidance (RNN), TensorFlow
Navigation and Control Systems, Light Detection and
Machine Learning: AdaBoost (Adaptive Boosting),
Ranging (LiDAR), OpenCV, Path Analysis, Path Finding,
Adversarial Machine Learning, Apache MADlib,
Remote Sensing, Unmanned Aerial Systems (UAS)
Apache Mahout, Apache SINGA, Apache Spark,
Generative Artificial Intelligence: ChatGPT, Association Rule Learning, Automated Machine
Generative Adversarial Networks, Generative Artificial Learning, Autonomic Computing, AWS SageMaker,
Intelligence, Large Language Modeling, Prompt Azure Machine Learning, Boosting, CHi-Squared
Engineering, Variational Autoencoders Automatic Interaction Detection (CHAID),
Classification and Regression Tree (CART), Cluster
Natural Language Processing (NLP): AI Copywriting,
Analysis, Collaborative Filtering, Confusion Matrix,
ANTLR, Amazon Textract, Apache OpenNLP, BERT
Cyber-Physical Systems, Dask (Software), Data
(NLP Model), Chatbot, Computational Linguistics,
Classification, Dbscan, Decision Models, Decision
Conversational AI, Dialog Systems, Fuzzy Logic,
Tree Learning, Dimensionality Reduction, Dlib (C++
Handwriting Recognition, Hugging Face (NLP
Library), Ensemble Methods, Feature Engineering,
Framework), Hugging Face Transformers, Intelligent
Feature Extraction, Feature Learning, Feature
Agent, Intelligent Virtual Assistant, Kaldi, Language
Selection, Gaussian Process, Genetic Algorithm,
Model, Latent Dirichlet Allocation, Lexalytics, Machine
Google AutoML, Gradient Boosting, H2O.ai, Hidden
Translation, Microsoft LUIS, Natural Language
Markov Model, Hyperparameter Optimization,
Generation, Natural Language Processing, Natural
Inference Engine, K-Means Clustering, Kernel
Language Programming, Natural Language Toolkits,
TF-IDF is a statistical measure that evaluates how members’ profiles. If four of the skills that engineers
representative a word (in this case, a skill) is to a selected possess belong to the AI skill group, this measure
entity). This is done by multiplying two metrics: indicates that the penetration of AI skills is estimated
1. The term frequency of a skill in an entity (“TF”). to be 8% among engineers (e.g., 4/50).
2. The logarithmic inverse entity frequency of the
Jobs or Occupations
skill across a set of entities (“IDF”). This indicates
LinkedIn member titles are standardized and grouped
how common or rare a word is in the entire entity
into approximately 15,000 occupations. These are
set. The closer IDF is to 0, the more common a
not sector- or country-specific. These occupations
word is.
are further standardized into approximately
So, if the skill is very common across LinkedIn entities, 3,600 occupation representatives. Occupation
and appears in many job or member descriptions, the representatives group occupations with a common
IDF will approach 0. If, on the other hand, the skill is role and specialty, regardless of seniority.
unique to specific entities, the IDF will approach 1.
AI Jobs or Occupations
Details are available at LinkedIn’s Skills Genome and
An “AI” job is an occupation representative
LinkedIn–World Bank Methodology note.
that requires AI skills to perform the job. Skills
AI Skills Penetration penetrations are used as a signal for whether AI skills
The aim of this indicator is to measure the intensity of are prevalent in an occupation representative, in any
AI skills in an entity (in a particular country, industry, sector where the occupation representative may
gender, etc.) through the following methodology: exist. Examples of such occupations include (but are
• Compute frequencies for all self-added skills by not limited to): Machine Learning Engineer, Artificial
LinkedIn members in a given entity (occupation, Intelligence Specialist, Data Scientist, and Computer
industry, etc.) in 2015–2023. Vision Engineer.
• Re-weight skill frequencies using a TF-IDF model AI Talent
to get the top 50 most representative skills in A LinkedIn member is considered AI talent if they
that entity. These 50 skills compose the “skill have explicitly added AI skills to their profile
genome” of that entity. and/or they are occupied in an AI occupation
• Compute the share of skills that belong to representative. The counts of AI talent are used to
the AI skill group out of the top skills in the calculate talent concentration metrics. For example,
selected entity. to calculate the country-level AI talent concentration,
Interpretation: The AI skill penetration rate signals we use the counts of AI talent at the country level
the prevalence of AI skills across occupations, or the vis-a-vis the counts of LinkedIn members in the
intensity with which LinkedIn members utilize AI skills respective countries. Note that concentration metrics
in their jobs. For example, the top 50 skills for the may be influenced by LinkedIn coverage in these
occupation of engineer are calculated based on the countries and should be utilized with caution.
weighted frequency with which they appear in LinkedIn
Interpretation: A relative penetration rate of 2 means Relative AI Talent Hiring Rate YoY Ratio
that the average penetration of AI skills in that country • L
inkedIn Hiring Rate or Overall Hiring
is two times the global average across the same set Rate is a measure of hires normalized by
of occupations. LinkedIn membership. It is computed as the
percentage of LinkedIn members who added
Global Comparison: By Industry
a new employer in the same period the job
The relative AI skills penetration by country for industry
began, divided by the total number of LinkedIn
provides an in-depth sectoral decomposition of AI skill
members in the corresponding location.
penetration across industries and sample countries.
• A
I Hiring Rate is computed following the
Interpretation: A country’s relative AI skill penetration
overall hiring rate methodology, but only
rate of 2 in the education sector means that the average
considering members classified as AI talent.
penetration of AI skills in that country is two times the
global average across the same set of occupations in • R
elative AI Talent Hiring Rate YoY Ratio is
that sector. the year-over-year change in AI Hiring Rate
relative to Overall Hiring Rate in the same
Global Comparison: By Gender
country. For each month, we first calculate AI
The “Relative AI Skills Penetration by Gender”
Hiring rate in the country, then divide AI Hiring
metric provides a cross-country comparison of AI
Rate by Overall Hiring Rate in that country,
skill penetrations within each gender, comparing
then calculate YoY change of this ratio, and then and destination countries, are normalized based on
take the 12-month moving average using the last LinkedIn membership in country A at the end of each
12 months. year and multiplied by 10,000. Hence, this metric
indicates relative talent migration from all countries to
Interpretation: In 2023 in India, the ratio of AI talent
and from country A.
hiring relative to overall hiring has grown 16.8% year
over year. AI Skills List Update Compared to Last Year
1. LinkedIn introduced “AI Literacy” skills.
AI Talent Migration
a. The following skills were added to the list and
Data on migration comes from the World Bank Group–
categorized as “AI Literacy” skills: ChatGPT,
LinkedIn “Digital Data for Development” partnership
DALL-E, GPT-3, GPT-4, Generative Art, Github
(please see Zhu et al. [2018]).
Copilot, Google Bard, Midjourney, Prompt
LinkedIn migration rates are derived from the self- Engineering, and Stable Diffusion.
identified locations of LinkedIn member profiles. For
2. LinkedIn updated the former AI skills list and
example, when a LinkedIn member updates his or
categorized them as “AI Engineering” skills:
her location from Paris to London, this is counted as
a. The following skills were excluded from
a migration. Migration data is available from 2019
the list: Alexa, Common Lisp, Data
onward. LinkedIn data provides insights into countries
Structures, Gaussian 03, Graph Theory, IBM
on the AI Talent gained or lost due to migration trends.
Watson, Information Retrieval, Jena, Julia
AI Talent migration is considered for all members
(Programming Language), Linked Data, Lisp,
with AI Skills/holding AI jobs at time t for country A
Pandas (Software), Parallel Algorithms, Perl
as the country of interest and country B as the source
Automation, Resource Description Framework,
of inflows and destination for outflows. Thus, net AI
Smalltalk, and dSPACE.
Talent migration between country A and country B—for
country A—is calculated as follows: b. The following skills were added to the
list: Apache Spark ML, Applied Machine
Learning, Audio Synthesis, Autoencoders,
Automated Clustering, Automated Feature
Engineering, Automated Machine Learning
Net flows are defined as total arrivals minus departures
(AutoML), Autoregressive Models, Chatbot
within the given time period. LinkedIn membership
Development, Chatbots, Concept Drift
varies considerably between countries, which makes
Adaptation, Conditional Generation,
interpreting absolute movements of members from
Conditional Image Generation, Decision
one country to another difficult. To compare migration
Trees, Deep Convolutional Generative
flows between countries fairly, migration flows are
Adversarial Networks (DCGAN), Deep Neural
normalized for the country of interest. For example,
Networks (DNN), Generative AI, Generative
if country A is the country of interest, all absolute net
Adversarial Imitation Learning, Generative
flows into and out of country A, regardless of origin
Adversarial Networks (GANs), Generative
Design Optimization, Generative Flow Models, PostgreSQL database delivery, and so on.
Generative Modeling, Generative Neural Quid applies best-in-class AI and NLP to reveal
Networks, Generative Optimization, Generative hidden patterns in large datasets, enabling users
Pre-training, Generative Query Networks to make data-driven decisions accurately, quickly,
(GQNs), Generative Replay Memory, Generative and efficiently.
Synthesis, Google Cloud AutoML, Graph
Search, Data Sources, and Scope
Embeddings, Graph Networks, Hyperparameter
Over 8 million global public and private company
Optimization, Hyperparameter Tuning, Image
profiles from multiple data sources are indexed to
Generation, Image Inpainting, Image Synthesis,
search across company descriptions, while filtering
Image-to-Image Translation, Large Language
and including metadata ranging from investment
Models (LLM), MLOps, Machine Learning
information to firmographic information, such as
Algorithms, Machine Translation, Meta-learning,
founded year, HQ location, and more. Company
Model Compression, Model Interpretation,
information is updated on a weekly basis. The Quid
Model Training, Music Generation, Neural
algorithm reads a large amount of text data from
Network Architecture Design, Predictive
each document to make links between different
Modeling, Probabilistic Generative Models,
documents based on their similar language. This
Probabilistic Programming, Random Forest,
process is repeated at an immense scale, which
Recurrent Neural Networks (RNN), Responsible
produces a network with different clusters identifying
AI, Style Transfer, StyleGAN, Synthetic
distinct topics or focus areas. Trends are identified
Data Generation, Text Generation, Text-to-
based on keywords, phrases, people, companies,
Image Generation, Time Series Forecasting,
and institutions that Quid identifies, and the other
Transformer Models, Variational Autoencoders,
metadata that is put into the software.
Variational Autoencoders (VAEs), Video
Generation, and k-means clustering. Data
Companies
that are not captured in Capital IQ. This not only AI” OR “generative artificial intelligence”) for
yields comprehensive and accurate data on all global companies that have received over $1.5M for
organizations, but it also captures early-stage startups the last 10 years (January 1, 2013, to December
and funding events data. Company information is 31, 2023).
updated on a weekly basis.
Target Event Definitions
Earnings Calls • P
rivate investments: A private placement is a
Quid leverages earnings call transcript data embedded private sale of newly issued securities (equity
from Seeking Alpha. For this report, Quid has analyzed or debt) by a company to a selected investor or
mentions of AI-related keywords across all earnings call a selected group of investors. The stakes that
transcripts from Fortune 500 companies from January buyers take in private placements are often
2018 through December 2023. New earnings call minority stakes (under 50%), although it is
transcript data is updated in Quid on the 1st and 15th of possible to take control of a company through
every month. a private placement as well, in which case the
private placement would be a majority stake
Search Parameters
investment.
Boolean query is used to search for focus areas, topics,
• M
inority investment: These refer to minority
and keywords within the archived company database,
stake acquisitions in Quid, which take place
within their business descriptions and websites. We can
when the buyer acquires less than 50% of
filter out the search results by HQ regions, investment
the existing ownership stake in entities, asset
amount, operating status, organization type (private/
products, and business divisions.
public), and founding year. Quid then visualizes these
• M
&A: This refers to a buyer acquiring more
companies by semantic similarity. If there are more than
than 50% of the existing ownership stake in
7,000 companies from the search result, Quid selects
entities, asset products, and business divisions.
the 7,000 most relevant companies for visualization
based on the language algorithm.
Stack Overflow
Data on the use of AI by developers was sourced from
the 2023 Developer Survey. The survey was conducted
from May 8, 2023, to May 19, 2023, and incorporates
the insights of 89,184 software developers from 185
countries around the world.
Works Cited
Brynjolfsson, E., Li, D. & Raymond, L. R. (2023). Generative AI at Work (Working Paper 31161). National Bureau of Economic
Research. https://doi.org/10.3386/w31161.
Cambon, A., Hecht, B., Edelman, B., Ngwe, D., Jaffe, S., Heger, A., Peng, S., Hofman, J., Farach, A., Bermejo-Cano, M.,
Knudsen, E., Sanghavi, H., Spatharioti, S., Rothschild, D., Goldstein, D. G., Kalliamvakou, E., Cihon, P., Demirer, M., Schwarz, M.
& Teevan, J. (2023). Early LLM-Based Tools for Enterprise Information Workers Likely Provide Meaningful Boosts to Productivity.
https://www.microsoft.com/en-us/research/uploads/prod/2023/12/AI-and-Productivity-Report-First-Edition.pdf.
Choi, J. H., Monahan, A. & Schwarcz, D. (2023). “Lawyering in the Age of Artificial Intelligence.” Minnesota Law Review
(forthcoming). https://doi.org/10.2139/ssrn.4626276.
Chui, M., Hazan, E., Roberts, R., Singla, A., Smaje, K., Sukharevsky, A., Yee, L. & Zemmel, R. (2023). The Economic Potential
of Generative AI: The Next Productivity Frontier. McKinsey & Company. https://www.mckinsey.com/capabilities/mckinsey-
digital/our-insights/the-economic-potential-of-generative-ai-the-next-productivity-frontier.
Chui, M., Yee, L., Hall, B., Singla, A. & Sukharevsky, A. (2023). The State of AI in 2023: Generative AI’s Breakout Year.
McKinsey & Company. https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai-in-2023-
generative-ais-breakout-year#widespreadhttp://ceros.mckinsey.com/commentary-ai-2023-lareina-ye-desktop.
Dell’Acqua, F. (2022). “Falling Asleep at the Wheel: Human/AI Collaboration in a Field Experiment on HR Recruiters.”
Laboratory for Innovation Science, Harvard Business School, working paper. https://static1.squarespace.com/
static/604b23e38c22a96e9c78879e/t/62d5d9448d061f7327e8a7e7/1658181956291/Falling+Asleep+at+the+Wheel+-
+Fabrizio+DellAcqua.pdf.
Dell’Acqua, F., McFowland, E., Mollick, E. R., Lifshitz-Assaf, H., Kellogg, K., Rajendran, S., Krayer, L., Candelon, F. & Lakhani,
K. R. (2023). “Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge
Worker Productivity and Quality.” Harvard Business School Technology & Operations Mgt. Unit Working Paper No. 24-013.
https://doi.org/10.2139/ssrn.4573321.
Hatzius, J., Briggs, J., Kodnani, D. & Pierdomenico, G. (2023). The Potentially Large Effects of Artificial Intelligence on Economic
Growth. Goldman Sachs. https://www.gspublishing.com/content/research/en/reports/2023/03/27/d64e052b-0f6e-45d7-
967b-d7be35fabd16.html.
Benchmarks
1. MedQA: Data on MedQA was taken from the
MedQA Papers With Code leaderboard in January
2024. To learn more about MedQA, please read the
original paper.
FDA-Approved AI-Medical
Devices
Data on FDA-approved AI-medical devices is
sourced from the FDA website that tracks artificial
intelligence and machine learning (AI/ML)–enabled
medical devices.
Works Cited
Cao, K., Xia, Y., Yao, J., Han, X., Lambert, L., Zhang, T., Tang, W., Jin, G., Jiang, H., Fang, X., Nogues, I., Li, X., Guo, W., Wang, Y.,
Fang, W., Qiu, M., Hou, Y., Kovarnik, T., Vocka, M., Lu, J. (2023). “Large-Scale Pancreatic Cancer Detection via Non-contrast
CT and Deep Learning.” Nature Medicine 29, no. 12: 3033–3043. https://doi.org/10.1038/s41591-023-02640-w.
Chen, Z., Cano, A. H., Romanou, A., Bonnet, A., Matoba, K., Salvi, F., Pagliardini, M., Fan, S., Köpf, A., Mohtashami, A.,
Sallinen, A., Sakhaeirad, A., Swamy, V., Krawczuk, I., Bayazit, D., Marmet, A., Montariol, S., Hartley, M.-A., Jaggi, M. & Bosselut,
A. (2023). MEDITRON-70B: Scaling Medical Pretraining for Large Language Models (arXiv:2311.16079). arXiv.
http://arxiv.org/abs/2311.16079.
Cheng, J., Novati, G., Pan, J., Bycroft, C., Žemgulytė, A., Applebaum, T., Pritzel, A., Wong, L. H., Zielinski, M., Sargeant, T.,
Schneider, R. G., Senior, A. W., Jumper, J., Hassabis, D., Kohli, P. & Avsec, Ž. (2023). “Accurate Proteome-Wide Missense
Variant Effect Prediction With AlphaMissense.” Science 381. https://doi.org/10.1126/science.adg7492.
Cid, Y. D., Macpherson, M., Gervais-Andre, L., Zhu, Y., Franco, G., Santeramo, R., Lim, C., Selby, I., Muthuswamy, K., Amlani,
A., Hopewell, H., Indrajeet, D., Liakata, M., Hutchinson, C. E., Goh, V. & Montana, G. (2024). “Development and Validation
of Open-Source Deep Neural Networks for Comprehensive Chest X-Ray Reading: A Retrospective, Multicentre Study.”
The Lancet Digital Health 6, no. 1: e44–e57. https://doi.org/10.1016/S2589-7500(23)00218-2.
Fleming, S. L., Lozano, A., Haberkorn, W. J., Jindal, J. A., Reis, E. P., Thapa, R., Blankemeier, L., Genkins, J. Z., Steinberg,
E., Nayak, A., Patel, B. S., Chiang, C.-C., Callahan, A., Huo, Z., Gatidis, S., Adams, S. J., Fayanju, O., Shah, S. J., Savage, T.,
… Shah, N. H. (2023). MedAlign: A Clinician-Generated Dataset for Instruction Following With Electronic Medical Records
(arXiv:2308.14089). arXiv. http://arxiv.org/abs/2308.14089.
Ha, T., Lee, D., Kwon, Y., Park, M. S., Lee, S., Jang, J., Choi, B., Jeon, H., Kim, J., Choi, H., Seo, H.-T., Choi, W., Hong, W., Park,
Y. J., Jang, J., Cho, J., Kim, B., Kwon, H., Kim, G., … Choi, Y.-S. (2023). “AI-Driven Robotic Chemist for Autonomous Synthesis
of Organic Molecules.” Science Advances 9, no. 44. https://doi.org/10.1126/sciadv.adj0461.
Iglesias, J. E., Billot, B., Balbastre, Y., Magdamo, C., Arnold, S. E., Das, S., Edlow, B. L., Alexander, D. C., Golland, P. & Fischl, B.
(2023). “SynthSR: A Public AI Tool to Turn Heterogeneous Clinical Brain Scans into High-Resolution T1-Weighted Images for 3D
Morphometry.” Science Advances 9, no. 5. https://doi.org/10.1126/sciadv.add3607.
Jin, D., Pan, E., Oufattole, N., Weng, W.-H., Fang, H. & Szolovits, P. (2020). What Disease Does This Patient Have?
A Large-Scale Open Domain Question Answering Dataset From Medical Exams (arXiv:2009.13081; Version 1). arXiv.
http://arxiv.org/abs/2009.13081.
Kavungal, D., Magalhães, P., Kumar, S. T., Kolla, R., Lashuel, H. A. & Altug, H. (2023). “Artificial Intelligence–Coupled Plasmonic
Infrared Sensor for Detection of Structural Protein Biomarkers in Neurodegenerative Diseases.” Science Advances 9, no. 28.
https://doi.org/10.1126/sciadv.adg9644.
Lam, R., Sanchez-Gonzalez, A., Willson, M., Wirnsberger, P., Fortunato, M., Alet, F., Ravuri, S., Ewalds, T., Eaton-Rosen, Z., Hu,
W., Merose, A., Hoyer, S., Holland, G., Vinyals, O., Stott, J., Pritzel, A., Mohamed, S. & Battaglia, P. (2023). “Learning Skillful
Medium-Range Global Weather Forecasting.” Science 382. https://doi.org/10.1126/science.adi2336.
Liao, W.-W., Asri, M., Ebler, J., Doerr, D., Haukness, M., Hickey, G., Lu, S., Lucas, J. K., Monlong, J., Abel, H. J., Buonaiuto, S.,
Chang, X. H., Cheng, H., Chu, J., Colonna, V., Eizenga, J. M., Feng, X., Fischer, C., Fulton, R. S., … Paten, B. (2023). “A Draft
Human Pangenome Reference.” Nature 617: 312–24. https://doi.org/10.1038/s41586-023-05896-x.
Mankowitz, D. J., Michi, A., Zhernov, A., Gelmi, M., Selvi, M., Paduraru, C., Leurent, E., Iqbal, S., Lespiau, J.-B., Ahern, A., Köppe,
T., Millikin, K., Gaffney, S., Elster, S., Broshear, J., Gamble, C., Milan, K., Tung, R., Hwang, M., … Silver, D. (2023). “Faster Sorting
Algorithms Discovered Using Deep Reinforcement Learning.” Nature 618: 257–63. https://doi.org/10.1038/s41586-023-06004-9.
Merchant, A., Batzner, S., Schoenholz, S. S., Aykol, M., Cheon, G. & Cubuk, E. D. (2023). “Scaling Deep Learning for Materials
Nearing, G., Cohen, D., Dube, V., Gauch, M., Gilon, O., Harrigan, S., Hassidim, A., Klotz, D., Kratzert, F., Metzger, A., Nevo, S.,
Pappenberger, F., Prudhomme, C., Shalev, G., Shenzis, S., Tekalign, T., Weitzner, D. & Matias, Y. (2023). AI Increases Global
Access to Reliable Flood Forecasts (arXiv:2307.16104). arXiv. http://arxiv.org/abs/2307.16104.
Nori, H., Lee, Y. T., Zhang, S., Carignan, D., Edgar, R., Fusi, N., King, N., Larson, J., Li, Y., Liu, W., Luo, R., McKinney, S. M.,
Ness, R. O., Poon, H., Qin, T., Usuyama, N., White, C. & Horvitz, E. (2023a). Can Generalist Foundation Models Outcompete
Special-Purpose Tuning? Case Study in Medicine (arXiv:2311.16452; Version 1). arXiv. http://arxiv.org/abs/2311.16452.
Schopf, C. M., Ramwala, O. A., Lowry, K. P., Hofvind, S., Marinovich, M. L., Houssami, N., Elmore, J. G., Dontchos, B. N., Lee, J.
M. & Lee, C. I. (2024). “Artificial Intelligence-Driven Mammography-Based Future Breast Cancer Risk Prediction: A Systematic
Review.” Journal of the American College of Radiology 21, no. 2: 319–28. https://doi.org/10.1016/j.jacr.2023.10.018.
Shen, T., Munkberg, J., Hasselgren, J., Yin, K., Wang, Z., Chen, W., Gojcic, Z., Fidler, S., Sharp, N. & Gao, J. (2023).
“Flexible Isosurface Extraction for Gradient-Based Mesh Optimization.” ACM Transactions on Graphics 42, no. 4: 1–16.
https://doi.org/10.1145/3592430.
Thadani, N. N., Gurev, S., Notin, P., Youssef, N., Rollins, N. J., Ritter, D., Sander, C., Gal, Y. & Marks, D. S. (2023).
“Learning From Prepandemic Data to Forecast Viral Escape.” Nature 622: 818–25. https://doi.org/10.1038/s41586-023-06617-0.
Chapter 6: Education
Code.org
State-Level Data Computing Research Association (CRA) members
The following link includes a full description of the are 200-plus North American organizations active
methodology used by Code.org to collect its data. The in computing research: academic departments
staff at Code.org also maintains a database of the state of computer science and computer engineering;
of American K–12 education and, in this policy primer, laboratories and centers in industry, government, and
provides a greater amount of detail on the state of academia; and affiliated professional societies (AAAI,
American K–12 education in each state. ACM, CACS/AIC, IEEE Computer Society, SIAM
USENIX). CRA’s mission is to enhance innovation
AP Computer Science Data
by joining with industry, government, and academia
The AP Computer Science data is provided to Code.org
to strengthen research and advance education in
as per an agreement the College Board maintains with
computing. Learn more about CRA here.
Code.org. The AP Computer Science data comes from
the college board’s national and state summary reports. The CRA Taulbee Survey gathers survey data during the
fall of each academic year by reaching out to over 200
Access to Computer Science Education
PhD-granting departments. Details about the Taulbee
Data on access to computer science education was
Survey can be found here. Taulbee doesn’t directly
drawn from Code.org’s State of Computer Science
survey the students. The department identifies each
Education 2023 report.
new PhD’s area of specialization as well as their type
of employment. Data is collected from September to
Computing Research January of each academic year for PhDs awarded in the
The AI Index used data from the following iterations of and concepts provided by these national agencies
the CRA survey: and reflects the national situation in the countries
considered. Those aspects that are not exposed by the
CRA, 2022
consulted agencies are not part of the dataset. A full list
CRA, 2021
of definitions and concepts used can be found in the
CRA, 2020
footnotes shown at the bottom of the Statistics section.
CRA, 2019
CRA, 2018 Since each national data repository has its own
CRA, 2017 structure and quite often provides all supporting
CRA, 2016 information in the national language, Informatics
CRA, 2015 Europe consults with its members—academics,
CRA, 2014 active and knowledgeable in the informatics field
CRA, 2013 from respective countries—who help to interpret the
CRA, 2012 statistics available and who understand the specificities
CRA, 2011 of these countries’ higher education systems. One
of the main challenges in integrating the statistical
Europe and published on the Informatics Europe The data covers the degrees delivered by both
Higher Education Data Portal, which is updated with traditional Research Universities (RU) and University of
the most recent data at the end of the year (typically Applied Science (UAS) for the countries where these
in December). In the interest of reliability, Informatics institutions also offer bachelor’s and master’s studies
Europe collects the data from countries where a solid in informatics. The full list of institutions covered can
and reasonably complete picture could be drawn be found in the Data Portal section “Institutions &
from official sources such as national statistical Academic Units.”
offices, educational agencies, or ministries. The full
list of sources can be found in the Data Portal section
“Data Sources.” The Data Portal follows the definitions
1. Classic scenario: When the study name closely matches one subdiscipline name.
a. “Chemistry” -> subdiscipline Chemistry.
2. C
lassic interdisciplinary scenario: When the study name closely matches two or three subdiscipline names.
“International Fashion Management and Marketing” -> subdisciplines Fashion Management + Marketing.
a.
3. Specializations scenario: When not all the subdisciplines are mentioned in the study name, but they are
listed as specific specializations, concentrations, or tracks.
“Business Administration with specializations in Finance and International Business” -> subdisciplines
a.
Business Administration + Finance + International Business
4. Mixed scenario: When the study name does not closely match any specific subdiscipline but can be
represented by combining two or three subdisciplines.
“Financial Economics” -> subdisciplines Finance + Economics
a.
5. Last resort scenario: When the study name does not closely match any specific subdiscipline and/or
combination, it is instead approximated as closely as possible.
a. “UK, EU, and US Copyright Law” -> subdisciplines Patent & Intellectual Property Law + International Law
+ European Law
Example assigned
Scenario Example study name
subdisciplines
Not all the subdisciplines are mentioned MBA Business Administration with Business
in the study name, but they are listed as specializations in Finance and Administration +
specific specializations, concentrations, International Business Finance + International
or tracks. Business
Study name does not closely match MSc Financial Economics Finance + Economics
any specific subdiscipline but can be
represented by combining two or three
subdisciplines.
Study name does not closely match LLM UK, EU, and US Copyright Law Patent and Intellectual
any specific subdiscipline and/ Property Law +
or combination and is instead International Law +
approximated as closely as possible. European Law
12 The National People’s Congress is held once per year and does not provide full legislative proceedings. Hence, the counts included in the analysis only searched mentions of “artificial
intelligence” in the only public document released from the Congress meetings, the Report on the Work of the Government, delivered by the premier.
The legislation was then coded by a team of two Federal Budget for
human coders for: (1) relevance to AI, (2) regulatory
approach, and (3) subject matter. The relevance to AI Nondefense AI R&D
categories were low, medium, and high. The regulatory Data on the federal U.S. budget for nondefense
approach categories were expansive or restrictive. For AI R&D was taken from previous editions of the
the subject matter categories, the Index employed the AI Index (namely the 2021 and 2022 versions) and
Congress policy typology. In cases where there were from the following National Science and Technology
disagreements on the coding schemas, a third coder Council reports:
was brought in to settle the differences.
Supplement to the President’s FY 2024 Budget terms that yield erroneous results, Govini delivers
Supplement to the President’s FY 2023 Budget a comprehensive yet discriminant taxonomy of
Supplement to the President’s FY2022 Budget subsegments that are mutually exclusive. Repeated
keyword searches and filters allow a consensus,
data-driven taxonomy to come into focus. Govini
Govini SMEs conduct final review of taxonomic structure to
Govini is a defense technology company. Ark, Govini’s complement this iterative, data-driven process.
flagship software, is a suite of AI-enabled applications,
powered by integrated government and commercial The use of artificial intelligence (AI) and supervised
data, that accelerate the Defense Acquisition Process. ML models enables analysis of the large volumes
of irregular data contained in federal contracts—
With Ark, the acquisition community eliminates data that often is inaccessible through regular
slow, manual processes and gains the ability government reporting processes or human-intensive
to rapidly imagine, produce, and field critical analytical approaches.
warfighting capabilities. Analysts and decision-
makers are equipped to solve challenges across the Moreover, beyond simply making usable an
entire spectrum of Defense Acquisition, including expansive body of data sources, Govini’s Ark
Supply Chain, Science and Technology, Production, platform and National Security Knowledge Graph
Sustainment, and Modernization. establishes high-fidelity standards in categorized and
fused data to produce a comprehensive and accurate
Govini curated USG AI spend data from their annual depiction of federal spending, and the supporting
Scorecard Taxonomy by applying supervised machine vendor ecosystem, over time.
learning (ML) and natural language processing (NLP)
techniques to parse, analyze, and categorize large
volumes of federal contracts data, including prime
National AI Strategies
contracts, grants, and other transaction authority (OTA) The AI Index did a web search to identify national
awards. Govini’s most recent Scorecard focused on strategies on AI. Below is a list of countries that were
Critical Technologies, of which AI/ML Technologies and identified as having a national AI strategy, including a
Microelectronics were segments. The AI/ML segment link to said strategy. For certain counties, noted with
consisted of five subsegments: Data Integration, an asterisk (*), the actual strategy was not found, and
Computer Vision, Machine Learning, Autonomy, a news article confirming the launch of the strategy
and Natural Language Processing. Microelectronics was linked instead.
is divided into two subsegments: Memory and
Processing, and Semiconductors. By initially generating
search terms and then subsequently excluding specific
Countries with AI Strategies in Place The regulation was then coded by a team of two
Algeria,* Argentina, Azerbaijan,* Australia, Austria, human coders for: (1) relevance to AI, (2) regulatory
Bahrain, Bangladesh, Benin,* Botswana,* Brazil, approach, and (3) subject matter. The relevance
Belgium,* Bulgaria, Canada, Chile, China, Colombia, to AI categories were low, medium, and high. The
Croatia, Cyprus, Czech Republic, Denmark, Dominican regulatory approach categories were expansive or
Republic,* Egypt, Arab Republic, Ethiopia, Estonia, restrictive. For the subject matter categories, the
Finland, France, Germany, Ghana, Greece, Hong Kong, Index employed the Congress policy typology.
Hungary, India, Indonesia, Iran,* Iraq,* Ireland, Israel,* In cases where there were disagreements on the
Italy, Japan, Jordan,* Kenya, Korea Republic, Latvia, coding schemas, a third coder was brought in to
Lithuania, Luxembourg, Malta, Malaysia, Mauritius, settle differences.
Mexico, The Netherlands, North Korea, Norway, Peru,
Philippines, Poland, Portugal, Qatar, Romania, Russia,
Rwanda, Saudi Arabia, Serbia, Sierra Leone, Singapore,
US Department of Defense
Slovak Republic, Slovenia, Spain, Sweden, Switzerland, Budget Requests
Thailand, Tunisia,* Turkey, Ukraine, United Arab
Data on the DoD nonclassified AI-related budget
Emirates, United Kingdom, United States, Uruguay,
requests was taken from previous editions of the AI
Vietnam
Index (namely the 2021 and 2022 versions) and from
the following reports:
Countries with AI Strategies in Development
Andorra,* Antigua and Barbuda,* Barbados,* Armenia,*
Defense Budget Overview United States Department
Belarus,* Costa Rica,* Cuba,* Iceland, Jamaica,*
of Defense Fiscal Year 2024 Budget Request
Kenya, Morocco, New Zealand,* Nigeria,* Pakistan,*
Defense Budget Overview United States Department
Senegal,* Uzbekistan
of Defense Fiscal Year 2023 Budget Request
Defense Budget Overview United States Department
US AI Regulation of Defense Fiscal Year 2022 Budget Request
US Committee Mentions
In order to research trends on the United States’
committee mentions of AI, the following search
was conducted:
Website: Congress.gov
Keyword: artificial intelligence
Filters: Committee Reports
Chapter 8: Diversity
Code.org
To learn more about the diversity data from Code.org,
please read the methodological note on Code.org’s
data included in the Chapter 6 subsection of
the Appendix.
Computing Research
Association
(CRA Taulbee Survey)
To learn more about the diversity data from the CRA,
please read the methodological note on the CRA’s data
included in the Chapter 6 subsection of the Appendix.
Informatics Europe
To learn more about the diversity data from Informatics
Europe, please read the methodological note on
Informatics Europe’s data included in the Chapter 6
subsection of the Appendix.
Ipsos
For brevity, the 2024 AI Index does not republish the
methodology used by the Ipsos survey that features
in the report. More details about the Ipsos survey’s
methodology can be found in the actual survey.
Pew Research
For brevity, the 2024 AI Index does not republish the
methodology used by the Pew surveys that feature in
the report. Data was taken from the 2023 Pew Research
Center survey.